Error with augur filter after latest git pull

Good morning,

I update our Nextstrain builds biweekly, and today the run failed after just a few seconds. I got this error for the augur filter job:

Error in rule filter:
jobid: 59
output: results/filtered.fasta
log: logs/filtered.txt (check log file(s) for error message)
shell:

    augur filter             --sequences data/sequences.fasta             --metadata data/metadata.tsv             --include defaults/include.txt             --max-date 2020.7418032786886             --min-date 2019-10-01             --exclude defaults/exclude.txt             --exclude-where division='USA' date='2020' date='2020-01-XX' date='2020-02-XX' date='2020-03-XX' date='2020-04-XX' date='2020-05-XX' date='2020-06-XX' date='2020-07-XX' date='2020-08-XX' date='2020-09-XX' date='2020-10-XX' date='2020-11-XX' date='2020-12-XX' date='2020-01' date='2020-02' date='2020-03' date='2020-04' date='2020-05' date='2020-06' date='2020-07' date='2020-08' date='2020-09' date='2020-10' date='2020-11' date='2020-12'            --min-length 27000             --output results/filtered.fasta 2>&1 | tee logs/filtered.txt

    (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Error from logs/filtered.txt
usage: augur filter [-h] --sequences SEQUENCES --metadata METADATA
[–min-date MIN_DATE] [–max-date MAX_DATE]
[–min-length MIN_LENGTH] [–non-nucleotide]
[–exclude EXCLUDE] [–include INCLUDE]
[–priority PRIORITY]
[–sequences-per-group SEQUENCES_PER_GROUP]
[–group-by GROUP_BY [GROUP_BY …]]
[–subsample-seed SUBSAMPLE_SEED]
[–exclude-where EXCLUDE_WHERE [EXCLUDE_WHERE …]]
[–include-where INCLUDE_WHERE [INCLUDE_WHERE …]]
–output OUTPUT
augur filter: error: argument --min-date: invalid float value: ‘2019-10-01’

Is this from an error in the GISAID metadata? Or is from the latest update on the ncov repo?

Thank you,
Sarah Schmedes

Hi @seschmedes!
I’m very sorry this is causing you trouble. This was something we added this morning to help filter out samples that are reported to have been taken before 2020 (we think it’s likely a year typo - 2002 is given!). The addition is that the argument --min-date 2019-10-01 is sent to augur filter.

This should work - but I believe the ability to take --min-date arguments in the YYYY-MM-DD format was something added to augur filter only in the last few months!

Can you try updating augur on your computer/cluster and re-running? I hope this should solve the problem.

However, I imagine you are not the only one who will run into this - thank you for reporting it! I will change the minimum date into a float so that this won’t impact any other users running slightly older versions of augur!

1 Like

Thanks Emma! I just updated augur and that seems to be the trick. It’s now running.

Thank you so much for your help and quick reply!

Sarah

No problem! Sorry for the inconvenience. And thanks again for letting us know - I’ll be updating this to be a number instead, so that it doesn’t impact others in the same way!

Hi, Francis from UAE, could you please help in this error

Error in rule filter:
jobid: 113
output: results/filtered.fasta
log: logs/filtered.txt (check log file(s) for error message)
shell:

    augur filter             --sequences data/SKMC_UAE_sequences.fasta             --metadata data/SKMC_UAE_metadata.tsv             --include defaults/include.txt             --max-date 2020.782786885246             --min-date 2019.74             --exclude defaults/exclude.txt             --exclude-where division='USA' date='2020' date='2020-01-XX' date='2020-02-XX' date='2020-03-XX' date='2020-04-XX' date='2020-05-XX' date='2020-06-XX' date='2020-07-XX' date='2020-08-XX' date='2020-09-XX' date='2020-10-XX' date='2020-11-XX' date='2020-12-XX' date='2020-01' date='2020-02' date='2020-03' date='2020-04' date='2020-05' date='2020-06' date='2020-07' date='2020-08' date='2020-09' date='2020-10' date='2020-11' date='2020-12'            --min-length 27000             --output results/filtered.fasta 2>&1 | tee logs/filtered.txt
    
    (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Hi I m new to nexstrain…

I managed to change build.yaml and config.yaml, but I see the below error.
region: Asia
country: United Arab Emirates

Error in rule subsample:
jobid: 94
output: results/north-america_usa/sample-country.fasta
shell:

    augur filter             --sequences results/masked.fasta             --metadata data/skmc_uae_metadata.tsv             --include defaults/include.txt             --exclude-where 'country!=USA'                                                    --group-by division year month             --sequences-per-group 200             --output results/north-america_usa/sample-country.fasta 2>&1 | tee 
    
    (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

how do I fix it

Welcome @Francis! The details about what went wrong with each step of the workflow (usually) get stored in the logs/ directory. For example, to find out what went wrong with the first filter command you posted, you can inspect the contents of the file logs/filtered.txt. If the contents of this file do not make sense, though, please share the contents here and we can help try to interpret it.

I noted above that usually we log all output from the workflow, but the subsample rule that you shared in your second post above has been an exception where we were not logging the output. We have just fixed this issue, so if you pull (or copy) the latest contents of the ncov repository and re-run your workflow, you should find information to help you debug the error in logs/subsample_north-america_usa_country.txt. As with the filter error above, please share the contents of the subsample log here, if you would like some help interpreting the output.

[quote=“Francis, post:8, topic:137, full:true”]
Hi @jlhudd, Thanks for the prompt response,
I could run until 20 of 69 (29%)… then this error comes up…
ERROR: All samples have been dropped! Check filter rules and metadata file format.
[Wed Oct 14 12:16:16 2020]
Error in rule subsample:
jobid: 57
output: results/asia/sample-global.fasta
shell:

    augur filter             --sequences results/masked.fasta             --metadata data/skmc_metadata.tsv             --include defaults/include.txt             --exclude-where 'region=asia'                                       --priority results/asia/proximity_region.tsv             --group-by country year month             --sequences-per-group 4             --output results/asia/sample-global.fasta 2>&1 | tee 
    
    (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

-3 sequences were dropped during filtering
0 of these were dropped because of ‘country!=unitedarabemirates’
0 of these were dropped because of subsampling criteria

3 sequences were added back because they were in defaults/include.txt

89 sequences have been written out to results/asia_unitedarabemirates/sample-country.fasta
[Wed Oct 14 12:16:16 2020]
Finished job 51.
20 of 69 steps (29%) done
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: /Users/francis/ncov/.snakemake/log/2020-10-14T121547.593524.snakemake.log

When I check the log, some of the logs are empty

  • subsample_regions_global.txt
  • mask
  • diagnostics
  • aggregate_alignments
  • recency_asia
  • recency_global
  • recency_asia_unitedarabemirates

is this error due to the global data, sample rule or metadata format?
Your support is highly appreciated

Best
Francis