Hello,
I’m trying to run Nextstrain but I’m receiving the following error message. I wonder if it is somethign related to the date format. Is someone able to explain how to fix it?:
Traceback (most recent call last):
File "scripts/annotate_metadata_with_index.py", line 32, in <module>
metadata.merge(
File "/local/workdir/lcc88/Nextstrain_test/ncov/.snakemake/conda/9f0233e8/lib/python3.8/site-packages/pandas/core/frame.py", line 9345, in merge
return merge(
File "/local/workdir/lcc88/Nextstrain_test/ncov/.snakemake/conda/9f0233e8/lib/python3.8/site-packages/pandas/core/reshape/merge.py", line 107, in merge
op = _MergeOperation(
File "/local/workdir/lcc88/Nextstrain_test/ncov/.snakemake/conda/9f0233e8/lib/python3.8/site-packages/pandas/core/reshape/merge.py", line 704, in __init__
self._maybe_coerce_merge_keys()
File "/local/workdir/lcc88/Nextstrain_test/ncov/.snakemake/conda/9f0233e8/lib/python3.8/site-packages/pandas/core/reshape/merge.py", line 1257, in _maybe_coerce_merge_keys
raise ValueError(msg)
ValueError: You are trying to merge on object and int64 columns. If you wish to proceed you should use pd.concat
[Thu May 12 15:30:08 2022]
Error in rule annotate_metadata_with_index:
jobid: 23
output: results/WTD-NY/metadata_with_index.tsv
log: logs/annotate_metadata_with_index_WTD-NY.txt (check log file(s) for error message)
conda-env: /local/workdir/lcc88/Nextstrain_test/ncov/.snakemake/conda/9f0233e8
shell:
python3 scripts/annotate_metadata_with_index.py --metadata results/WTD-NY/metadata_with_nextclade_qc.tsv --sequence-index results/WTD-NY/sequence_index.tsv --output results/WTD-NY/metadata_with_index.tsv
(one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Hmm. Could you share a line or two of the results/WTD-NY/metadata_with_nextclade_qc.tsv file (please remove any sensitive information there) to help us see what’s happening?
The particular script causing the error merges the metadata and the index on the column strain. From the error message, it seems like you may have values in your strain column that caused it to be interpreted as integers.
This should be an error that we can fix on our end by forcing the dtype of strain to always be ‘string’.
Indeed, my strain names were composed only by numbers. I changed that, but now I get the following error:
ERROR: All samples have been dropped! Check filter rules and metadata file format.
329 strains were dropped during filtering
165 had no metadata
164 of these were dropped by --exclude-all
164 strains were added back because they were in results/WTD-NY/sample-all.txt
[Sun May 15 15:16:14 2022]
Error in rule combine_samples:
jobid: 30
output: results/WTD-NY/WTD-NY_subsampled_sequences.fasta.xz, results/WTD-NY/WTD-NY_subsampled_metadata.tsv.xz
log: logs/subsample_regions_WTD-NY.txt (check log file(s) for error message)
conda-env: /local/workdir/lcc88/Nextstrain_test/ncov/.snakemake/conda/9f0233e8
shell:
augur filter --sequences results/aligned_WTD-test.fasta.xz --metadata results/sanitized_metadata_WTD-test.tsv.xz --exclude-all --include results/WTD-NY/sample-all.txt --output-sequences results/WTD-NY/WTD-NY_subsampled_sequences.fasta.xz --output-metadata results/WTD-NY/WTD-NY_subsampled_metadata.tsv.xz 2>&1 | tee logs/subsample_regions_WTD-NY.txt
(one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
Removing output files of failed job combine_samples since they might be corrupted:
results/WTD-NY/WTD-NY_subsampled_sequences.fasta.xz, results/WTD-NY/WTD-NY_subsampled_metadata.tsv.xz
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: /local/workdir/lcc88/Nextstrain_test/ncov/.snakemake/log/2022-05-15T151607.817823.snakemake.log
and here is the builds section of the buids.yaml file:
builds:
Focus on New York State (division)
with a build name that will produce the following URL fragment on Nextstrain/auspice:
/ncov/north-america/usa/new-york
WTD-NY: # name of the build; this can be anything
subsampling_scheme: custom-county # use a custom subsampling scheme defined below
region: North America
country: USA
# Whatever your finest geographic scale is (here, ‘location’ since we are doing a county in the USA)
# list ‘up’ from here the geographic area that location is in.
It’s strange because i have only 164 samples and metadata rows, but the error says 329:
ERROR: All samples have been dropped! Check filter rules and metadata file format.
329 strains were dropped during filtering
165 had no metadata
164 of these were dropped by --exclude-all
The format of my strain column is for example:
WDC/165692/2021