My colleages and I am working on ancient pathogens and we would like to use Nextstrain to display our ancient pathogens with their modern counterparts and to show the evolution of the pathogen over time.
So far, it was fairly easy to set up the pipeline to obtain a tree, however we have run into a problem. In our dataset we have ancient genomes that predate the year 0. Right now, we have set up the metadata for all the genomes that are older than year 0 to 00XX-XX-XX, however this is technically incorrect. We have been discussing to use the older sample as year 0 and include an off set of all the other genomes. An example of this concept will be: we have 3 samples in our dataset one being 5000 years ago, another being from year 2000 and a current one 2021. Our older sample (5000 years ago) will then become year 0 and in the date column will look like 00XX-XX-XX, the sample from 2000 will look like 7000-XX-XX and the one from this year (2021) will be 7021-XX-XX. This work around is however not ideal.
I would like to ask if you have any recommendations in how to deal with this and if it will be possible to include negative years as -XXXX-XX-XX.
Thank you for your quick response. I gave a try to your suggestion of changing the date notation from XXXX-XX-XX to dates with only the year as 2000 or for ancient strains that are 4000 years old as -2000.5. (Did I understand the suggestion correctly?). When running augur refine with the following command: augur refine --tree tree_raw.nwk --alignment my_alignment.fasta --metadata Metadata_newnotation.tsv --timetree --coalescent opt --root outgroup_Y.pseudo --output-refined_tree.nwk --output-node-data new_branch_lengths.json
I got the following error
augur refine is using TreeTime version 0.8.1
WARNING: SAMPLE_A has an invalid data string: 521.0
WARNING: SAMPLE_B has an invalid data string: -1495.5
Traceback (most recent call last):
File “/Users/andrades/opt/miniconda3/envs/NextStrain/bin/augur”, line 10, in
sys.exit(main())
File “/Users/andrades/opt/miniconda3/envs/nextstrain/lib/python3.8/site-packages/augur/main.py”, line 10, in main
return augur.run( argv[1:] )
File “/Users/andrades/opt/miniconda3/envs/nextstrain/lib/python3.8/site-packages/augur/init.py”, line 75, in run
return args.command.run(args)
File “/Users/andrades/opt/miniconda3/envs/nextstrain/lib/python3.8/site-packages/augur/refine.py”, line 206, in run
tt = refine(tree=T, aln=aln, ref=ref, dates=dates, confidence=args.date_confidence,
File “/Users/andrades/opt/miniconda3/envs/nextstrain/lib/python3.8/site-packages/augur/refine.py”, line 36, in refine
tt = TreeTime(tree=tree, aln=aln, ref=ref, dates=dates,
File “/Users/andrades/opt/miniconda3/envs/nextstrain/lib/python3.8/site-packages/treetime/treetime.py”, line 34, in init
super(TreeTime, self).init(*args, **kwargs)
File “/Users/andrades/opt/miniconda3/envs/nextstrain/lib/python3.8/site-packages/treetime/clock_tree.py”, line 83, in init
self._assign_dates()
File “/Users/andrades/opt/miniconda3/envs/nextstrain/lib/python3.8/site-packages/treetime/clock_tree.py”, line 130, in _assign_dates
raise MissingDataError(“ERROR: ALMOST NO VALID DATE CONSTRAINTS”)
treetime.MissingDataError: ERROR: ALMOST NO VALID DATE CONSTRAINTS
I just pasted two of the warnings thrown, but it complains about all the samples in my metadata file.
I am sorry (both for suggesting without testing and for taking so long to respond). Within treetime, this would have worked. But augur's date parsing is slightly more tricky. could you try adding the open --date-format "" to the command? This might trick augur into parsing it correctly Otherwise, I don’t see an easy way do use pre-historic dates in augur at the moment.
However it does not seem to work out. I leave here the error:
augur refine is using TreeTime version 0.8.1
Traceback (most recent call last):
File "/Users/andrades/opt/miniconda3/envs/NextStrain/bin/augur", line 10, in <module>
sys.exit(main())
File "/Users/andrades/opt/miniconda3/envs/nextstrain/lib/python3.8/site-packages/augur/__main__.py", line 10, in main
return augur.run( argv[1:] )
File "/Users/andrades/opt/miniconda3/envs/nextstrain/lib/python3.8/site-packages/augur/__init__.py", line 75, in run
return args.__command__.run(args)
File "/Users/andrades/opt/miniconda3/envs/nextstrain/lib/python3.8/site-packages/augur/refine.py", line 198, in run
dates = get_numerical_dates(metadata, fmt=args.date_format,
File "/Users/andrades/opt/miniconda3/envs/nextstrain/lib/python3.8/site-packages/augur/utils.py", line 128, in get_numerical_dates
numerical_dates = {k:float(v) for k,v in meta_dict.items()}
File "/Users/andrades/opt/miniconda3/envs/nextstrain/lib/python3.8/site-packages/augur/utils.py", line 128, in <dictcomp>
numerical_dates = {k:float(v) for k,v in meta_dict.items()}
TypeError: float() argument must be a string or a number, not 'dict'
Thank you for opening the issue in github, please let me know if I can do anything to help with bug fixing or testing.
I tested this on a dataset with only negative dates (by subtracting 2500 from each date in the Zika dataset) and things worked ok, with only a minor error that the root branch wasn’t in focus as it should be: