Error msg when running a small dataset

Hello,

I installed nextStrain by following the tutorial (container installation). I was able to successfully run the zika-tutorial. When I run a small example with 10 sequences downloaded from NCBI Virus website step by step, I could successfully build the alignments and create the raw tree. However, when I run the refine step (using the same format as the zika-tutorial, it reports the following error:

nextstrain:/nextstrain/build $ augur refine \

–tree results/tree_raw.nwk
–alignment results/aligned.fasta
–metadata data/metadata2.tsv
–output-tree results/tree.nwk
–output-node-data results/branch_lengths.json
–timetree
–coalescent opt
–date-confidence
–date-inference marginal
–clock-filter-iqd 4

augur refine is using TreeTime version 0.7.4
Traceback (most recent call last):
File “/usr/bin/augur”, line 11, in
load_entry_point(‘nextstrain-augur’, ‘console_scripts’, ‘augur’)()
File “/nextstrain/augur/augur/main.py”, line 10, in main
return augur.run( argv[1:] )
File “/nextstrain/augur/augur/init.py”, line 74, in run
return args.command.run(args)
File “/nextstrain/augur/augur/refine.py”, line 209, in run
covariance=args.covariance, resolve_polytomies=(not args.keep_polytomies))
File “/nextstrain/augur/augur/refine.py”, line 37, in refine
verbose=verbosity, gtr=‘JC69’, precision=precision)
File “/usr/lib/python3.6/site-packages/treetime/treetime.py”, line 34, in init
super(TreeTime, self).init(*args, **kwargs)
File “/usr/lib/python3.6/site-packages/treetime/clock_tree.py”, line 83, in init
self._assign_dates()
File “/usr/lib/python3.6/site-packages/treetime/clock_tree.py”, line 130, in _assign_dates
raise MissingDataError(“ERROR: ALMOST NO VALID DATE CONSTRAINTS”)
treetime.MissingDataError: ERROR: ALMOST NO VALID DATE CONSTRAINTS

Can you please help?

Thanks,

Jim

This is likely do to a mismatch between the names of your sequences in the tree and the metadata, Otherwise it could be due to mal-formated dates (augur expects ISO format YYYY-MM-DD).

please check that there are valid date entries for the exact taxon names of your tree in the meta data. Furthermore, these names should be in a column strain or name in the metadata.

Thanks, I checked and correct names and date format. It worked once, then when I add more fields to the metadata, move to the next step " Annotate the Phylogeny", it does not work any more. Then I back to the refine step, it does not work anymore. Here is the commend and error msg.

augur refine \

–tree results/tree_raw.nwk
–alignment results/aligned.fasta
–metadata data/metadata3.txt
–output-tree results/tree.nwk
–output-node-data results/branch_lengths.json
–timetree
–coalescent opt
–date-confidence
–date-inference marginal
–clock-filter-iqd 4
augur refine is using TreeTime version 0.7.4
Traceback (most recent call last):
File “/usr/bin/augur”, line 11, in
load_entry_point(‘nextstrain-augur’, ‘console_scripts’, ‘augur’)()
File “/nextstrain/augur/augur/main.py”, line 10, in main
return augur.run( argv[1:] )
File “/nextstrain/augur/augur/init.py”, line 74, in run
return args.command.run(args)
File “/nextstrain/augur/augur/refine.py”, line 194, in run
min_max_year=args.year_bounds)
File “/nextstrain/augur/augur/utils.py”, line 143, in get_numerical_dates
v = m[date_col]
KeyError: ‘date’

Here is the metadata:

strain virus date country
NC_045512 Severe acute respiratory syndrome-related coronavirus 2019-12-31 China
MT653094 Severe acute respiratory syndrome-related coronavirus 2020-01-11 USA
MT653095 Severe acute respiratory syndrome-related coronavirus 2020-02-12 USA
MT653096 Severe acute respiratory syndrome-related coronavirus 2020-03-11 USA
MT653097 Severe acute respiratory syndrome-related coronavirus 2020-04-11 USA
MT653098 Severe acute respiratory syndrome-related coronavirus 2020-05-11 USA
MT653099 Severe acute respiratory syndrome-related coronavirus 2020-06-12 USA
MT653100 Severe acute respiratory syndrome-related coronavirus 2020-01-12 USA
MT653101 Severe acute respiratory syndrome-related coronavirus 2020-02-12 USA
MT653102 Severe acute respiratory syndrome-related coronavirus 2020-03-12 USA

Here is the raw tree:

(NC_045512:0.00013455,(MT653097:0.00003324,(((MT653095:0.00000100,MT653101:0.00003593):0.00003610,MT653096:0.00018008):0.00000189,(MT653100:0.00003627,MT653102:0.00000201):0.00003527):0.00006826):0.00000296,((MT653094:0.00000100,MT653098:0.00000100):0.00003319,MT653099:0.00016813):0.00006691):0.00000000;

Any possible reasons?

your metadata should be a tab-delimited file. your file extension is .txt which makes me think this is where the problem might be.

Thanks, but it is tab delimited, even though I use .txt extension.

Hi @jxl175! Augur currently only reads metadata as tab-delimited if the metadata file has a .tsv extension. Can you try renaming your metadata file to something like metadata.tsv and running the analysis again?

I’ve created an issue on GitHub for this behavior with some suggestions for how to make the interface more user-friendly. We will discuss possible solutions and hopefully get a fix into an upcoming augur release.

Thanks. That is the reason. I really appreciate your help. Now I can run it on a small example. I notice the following you may consider to improve:

  1. in your Zika tutorial (https://nextstrain.org/docs/tutorials/zika), the commend “augur ancestral” session, one actually has to specify what kind of output (e.g., output-node-data). The one in the tutorial using just “output” will not work.

  2. I cannot find how to specify/construct users’ “auspice_config.json”. Can you please help on that? Or we have to edit the .json file directly?

  3. Tutorial or the document does not tell how to create the snakefile. Can you help?

Thanks again.

Thanks for your feedback! We’re working on correcting number 1 right now.

For number 2, yes - you have to modify the .json file directly, there’s no way to correct it. However, you can use the ones given in the tutorials or other repositories as a great starting point - copy these and edit!

The snakefile also needs to be created yourself - or again, I recommend taking one from the tutorial and editing it. Snakefiles will be personal to what different people want to do in an analysis, so they may be different from project to project/group to group. Some people use different ‘workflow’ programs, like nextflow. Augur will work with any of these - we just tend to use Snakemake.