Hello,
I was running Nextstrain normally today but now I’m receiving error messages and I can’t figure out what is the origin of the error.
Here is the entire output:
(nextstrain) [lcc88@cbsuahdcvir ncov]$ nextstrain build . --cores 16 --configfile my_profiles/builds.yaml
Your config specifies 'skip_travel_history_adjustment=True'. This is now always the case, and thus this parameter can be removed.
Building DAG of jobs...
Using shell: /home/lcc88/.nextstrain/runtimes/conda/env/bin/bash
Provided cores: 16
Rules claiming more threads will be scaled down.
Conda environments: ignored
Job counts:
count jobs
1 add_branch_labels
1 adjust_metadata_regions
1 all
1 ancestral
1 annotate_metadata_with_index
1 assign_rbd_levels
1 build_align
1 build_description
1 calculate_epiweeks
1 clade_files
1 clades
1 combine_input_metadata
1 combine_samples
1 combine_sequences_for_subsampling
1 diagnostic
1 distances
1 emerging_lineages
1 export
1 filter
1 finalize
1 include_hcov19_prefix
1 index
1 join_metadata_and_nextclade_qc
1 logistic_growth
1 mask
1 mutational_fitness
1 recency
1 refine
1 rename_emerging_lineages
1 sanitize_metadata
1 subsample
1 tip_frequencies
1 traits
1 translate
1 tree
35
[Thu Mar 9 15:31:09 2023]
rule sanitize_metadata:
input: data/CCTL_sequencing/metadata_03-09-23.tsv
output: results/sanitized_metadata_custom_data.tsv.xz
log: logs/sanitize_metadata_custom_data.txt
jobid: 37
benchmark: benchmarks/sanitize_metadata_custom_data.txt
wildcards: origin=custom_data
resources: mem_mb=2000
python3 scripts/sanitize_metadata.py --metadata data/CCTL_sequencing/metadata_03-09-23.tsv --metadata-id-columns strain name 'Virus name' --database-id-columns 'Accession ID' gisaid_epi_isl genbank_accession --parse-location-field Location --rename-fields 'Virus name=strain' Type=type 'Accession ID=gisaid_epi_isl' 'Collection date=date' 'Additional location information=additional_location_information' 'Sequence length=length' Host=host 'Patient age=patient_age' Gender=sex Clade=GISAID_clade 'Pango lineage=pango_lineage' pangolin_lineage=pango_lineage Lineage=pango_lineage 'Pangolin version=pangolin_version' Variant=variant 'AA Substitutions=aaSubstitutions' 'Submission date=date_submitted' 'Is reference?=is_reference' 'Is complete?=is_complete' 'Is high coverage?=is_high_coverage' 'Is low coverage?=is_low_coverage' N-Content=n_content GC-Content=gc_content --strip-prefixes hCoV-19/ SARS-CoV-2/ --output results/sanitized_metadata_custom_data.tsv.xz 2>&1 | tee logs/sanitize_metadata_custom_data.txt
[Thu Mar 9 15:31:09 2023]
rule clade_files:
input: defaults/clades.tsv
output: results/All_CCTL_sequences_03-09-23/clades.tsv
jobid: 25
benchmark: benchmarks/clade_files_All_CCTL_sequences_03-09-23.txt
wildcards: build_name=All_CCTL_sequences_03-09-23
cat defaults/clades.tsv > results/All_CCTL_sequences_03-09-23/clades.tsv
[Thu Mar 9 15:31:09 2023]
Job 32:
Combine and deduplicate aligned FASTAs from multiple origins in preparation for subsampling.
python3 scripts/sanitize_sequences.py --sequences results/aligned_custom_data.fasta.xz results/aligned_references.fasta.xz --strip-prefixes hCoV-19/ SARS-CoV-2/ --output /dev/stdout | xz -c -2 > results/combined_sequences_for_subsampling.fasta.xz
[Thu Mar 9 15:31:09 2023]
Job 19: Templating build description for Auspice
[Thu Mar 9 15:31:09 2023]
Finished job 25.
1 of 35 steps (3%) done
Your config specifies 'skip_travel_history_adjustment=True'. This is now always the case, and thus this parameter can be removed.
Job counts:
count jobs
1 build_description
1
[Thu Mar 9 15:31:10 2023]
Finished job 19.
2 of 35 steps (6%) done
Traceback (most recent call last):
File "/local/workdir/lcc88/Nextstrain/ncov/scripts/sanitize_metadata.py", line 405, in <module>
database_ids_by_strain = get_database_ids_by_strain(
File "/local/workdir/lcc88/Nextstrain/ncov/scripts/sanitize_metadata.py", line 211, in get_database_ids_by_strain
for metadata in metadata_reader:
File "/home/lcc88/.nextstrain/runtimes/conda/env/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 1698, in __next__
return self.get_chunk()
File "/home/lcc88/.nextstrain/runtimes/conda/env/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 1810, in get_chunk
return self.read(nrows=size)
File "/home/lcc88/.nextstrain/runtimes/conda/env/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 1778, in read
) = self._engine.read( # type: ignore[attr-defined]
File "/home/lcc88/.nextstrain/runtimes/conda/env/lib/python3.10/site-packages/pandas/io/parsers/python_parser.py", line 250, in read
content = self._get_lines(rows)
File "/home/lcc88/.nextstrain/runtimes/conda/env/lib/python3.10/site-packages/pandas/io/parsers/python_parser.py", line 1114, in _get_lines
new_rows.append(next(self.data))
File "/home/lcc88/.nextstrain/runtimes/conda/env/lib/python3.10/codecs.py", line 322, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa0 in position 1252: invalid start byte
[Thu Mar 9 15:31:10 2023]
Error in rule sanitize_metadata:
jobid: 37
output: results/sanitized_metadata_custom_data.tsv.xz
log: logs/sanitize_metadata_custom_data.txt (check log file(s) for error message)
shell:
python3 scripts/sanitize_metadata.py --metadata data/CCTL_sequencing/metadata_03-09-23.tsv --metadata-id-columns strain name 'Virus name' --database-id-columns 'Accession ID' gisaid_epi_isl genbank_accession --parse-location-field Location --rename-fields 'Virus name=strain' Type=type 'Accession ID=gisaid_epi_isl' 'Collection date=date' 'Additional location information=additional_location_information' 'Sequence length=length' Host=host 'Patient age=patient_age' Gender=sex Clade=GISAID_clade 'Pango lineage=pango_lineage' pangolin_lineage=pango_lineage Lineage=pango_lineage 'Pangolin version=pangolin_version' Variant=variant 'AA Substitutions=aaSubstitutions' 'Submission date=date_submitted' 'Is reference?=is_reference' 'Is complete?=is_complete' 'Is high coverage?=is_high_coverage' 'Is low coverage?=is_low_coverage' N-Content=n_content GC-Content=gc_content --strip-prefixes hCoV-19/ SARS-CoV-2/ --output results/sanitized_metadata_custom_data.tsv.xz 2>&1 | tee logs/sanitize_metadata_custom_data.txt
(one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
[Thu Mar 9 15:31:11 2023]
Finished job 32.
3 of 35 steps (9%) done
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: /local/workdir/lcc88/Nextstrain/ncov/.snakemake/log/2023-03-09T153109.250600.snakemake.log
Thank you
Leonardo