Including ancient genomes with dates before 0

Dear all,

My colleages and I am working on ancient pathogens and we would like to use Nextstrain to display our ancient pathogens with their modern counterparts and to show the evolution of the pathogen over time.
So far, it was fairly easy to set up the pipeline to obtain a tree, however we have run into a problem. In our dataset we have ancient genomes that predate the year 0. Right now, we have set up the metadata for all the genomes that are older than year 0 to 00XX-XX-XX, however this is technically incorrect. We have been discussing to use the older sample as year 0 and include an off set of all the other genomes. An example of this concept will be: we have 3 samples in our dataset one being 5000 years ago, another being from year 2000 and a current one 2021. Our older sample (5000 years ago) will then become year 0 and in the date column will look like 00XX-XX-XX, the sample from 2000 will look like 7000-XX-XX and the one from this year (2021) will be 7021-XX-XX. This work around is however not ideal.
I would like to ask if you have any recommendations in how to deal with this and if it will be possible to include negative years as -XXXX-XX-XX.

Thank you in advance,

Aida

I think it should be possible to use decimal dates as in 2014.5 or -333.4 instead of the YYYY-MM-DD notation. using -XXXX-XX-XX probably won’t work.

Thank you for your quick response. I gave a try to your suggestion of changing the date notation from XXXX-XX-XX to dates with only the year as 2000 or for ancient strains that are 4000 years old as -2000.5. (Did I understand the suggestion correctly?). When running augur refine with the following command:
augur refine --tree tree_raw.nwk --alignment my_alignment.fasta --metadata Metadata_newnotation.tsv --timetree --coalescent opt --root outgroup_Y.pseudo --output-refined_tree.nwk --output-node-data new_branch_lengths.json
I got the following error

augur refine is using TreeTime version 0.8.1
WARNING: SAMPLE_A has an invalid data string: 521.0
WARNING: SAMPLE_B has an invalid data string: -1495.5
Traceback (most recent call last):
File “/Users/andrades/opt/miniconda3/envs/NextStrain/bin/augur”, line 10, in
sys.exit(main())
File “/Users/andrades/opt/miniconda3/envs/nextstrain/lib/python3.8/site-packages/augur/main.py”, line 10, in main
return augur.run( argv[1:] )
File “/Users/andrades/opt/miniconda3/envs/nextstrain/lib/python3.8/site-packages/augur/init.py”, line 75, in run
return args.command.run(args)
File “/Users/andrades/opt/miniconda3/envs/nextstrain/lib/python3.8/site-packages/augur/refine.py”, line 206, in run
tt = refine(tree=T, aln=aln, ref=ref, dates=dates, confidence=args.date_confidence,
File “/Users/andrades/opt/miniconda3/envs/nextstrain/lib/python3.8/site-packages/augur/refine.py”, line 36, in refine
tt = TreeTime(tree=tree, aln=aln, ref=ref, dates=dates,
File “/Users/andrades/opt/miniconda3/envs/nextstrain/lib/python3.8/site-packages/treetime/treetime.py”, line 34, in init
super(TreeTime, self).init(*args, **kwargs)
File “/Users/andrades/opt/miniconda3/envs/nextstrain/lib/python3.8/site-packages/treetime/clock_tree.py”, line 83, in init
self._assign_dates()
File “/Users/andrades/opt/miniconda3/envs/nextstrain/lib/python3.8/site-packages/treetime/clock_tree.py”, line 130, in _assign_dates
raise MissingDataError(“ERROR: ALMOST NO VALID DATE CONSTRAINTS”)
treetime.MissingDataError: ERROR: ALMOST NO VALID DATE CONSTRAINTS

I just pasted two of the warnings thrown, but it complains about all the samples in my metadata file.

I am sorry (both for suggesting without testing and for taking so long to respond). Within treetime, this would have worked. But augur's date parsing is slightly more tricky. could you try adding the open --date-format "" to the command? This might trick augur into parsing it correctly Otherwise, I don’t see an easy way do use pre-historic dates in augur at the moment.

(either way, we need to fix this – I made an issue here: Numerical dates · Issue #741 · nextstrain/augur · GitHub)

@rneher Hi there is also a problem with auspice. When changing the dates to make a tree showing the “transition rates” in each branch it was buggy

(the “transition rates” stuff is related to this paper Mutation signatures inform the natural host of SARS-CoV-2 )

When adding 2000 to all the dates it worked fine as shown there: auspice

You can get the buggy json there DL.FREE.FR
Screenshot of what it does on auspice.us:

thanks for flagging this. some issues with date parsing in auspice. Tagging @james

Hi @rneher, I also apologise for the late reply. I have now tried to run it as follows:

augur refine --date-format "" --tree results/DatesTrial/tree_raw.nwk --alignment Data/Alignments/Dataset_genotyped_98_pd.fasta --metadata Data/Metadata/NextPlague_Metadata_fields_correctedates_newnotation.tsv --timetree --coalescent opt --root outgroup_Y.pseudo --output-tree results/DatesTrial/refined_tree.nwk --output-node-data results/DatesTrial/new_branch_lengths.json 

However it does not seem to work out. I leave here the error:

augur refine is using TreeTime version 0.8.1
Traceback (most recent call last):
  File "/Users/andrades/opt/miniconda3/envs/NextStrain/bin/augur", line 10, in <module>
    sys.exit(main())
  File "/Users/andrades/opt/miniconda3/envs/nextstrain/lib/python3.8/site-packages/augur/__main__.py", line 10, in main
    return augur.run( argv[1:] )
  File "/Users/andrades/opt/miniconda3/envs/nextstrain/lib/python3.8/site-packages/augur/__init__.py", line 75, in run
    return args.__command__.run(args)
  File "/Users/andrades/opt/miniconda3/envs/nextstrain/lib/python3.8/site-packages/augur/refine.py", line 198, in run
    dates = get_numerical_dates(metadata, fmt=args.date_format,
  File "/Users/andrades/opt/miniconda3/envs/nextstrain/lib/python3.8/site-packages/augur/utils.py", line 128, in get_numerical_dates
    numerical_dates = {k:float(v) for k,v in meta_dict.items()}
  File "/Users/andrades/opt/miniconda3/envs/nextstrain/lib/python3.8/site-packages/augur/utils.py", line 128, in <dictcomp>
    numerical_dates = {k:float(v) for k,v in meta_dict.items()}
TypeError: float() argument must be a string or a number, not 'dict'

Thank you for opening the issue in github, please let me know if I can do anything to help with bug fixing or testing.

From an Auspice perspective, negative dates are fine, for instance:

              "num_date": {
                "value": -2616.363587650789
              },

As an example which uses both BCE and CE dates, see the following community dataset community/ktmeaton/plague-phylogeography/timetree

I tested this on a dataset with only negative dates (by subtracting 2500 from each date in the Zika dataset) and things worked ok, with only a minor error that the root branch wasn’t in focus as it should be: