Hi, I made my first Nextstrain build,
Sarbecovirus partial RdRp nextstrain tree
The root is an internal node obtained from a previous tree, at mid distance between SARS and SARS2, this is to force a topology where the different lineages emerge from distant past.
This paper seems to say that it should be much earlier than 1832 (which is obtained from 0.0004 mut/site/year in the snakemake). The data lacks temporal signal. It makes me think about needing a ~log scale option when plotting the timetree in Auspice.
Note that the lineage that I called HKU3 doesn’t have any accepted name. It contains most East China sequences: the Hong-Kong 2005 sequences (HKU3), the Hubei 2004 bat sequences Rm1,Rf1, as well as Zhejiang bat-SL-CoVZC45 (which is a recombinant in ORF1b).
The tree is supposed to contain almost all available partial RdRp sequences (many strains have only their partial RdRp sequenced). The main methodology was to blast some partial RdRp sequences to obtain the Sarbecoviruses accessions in the results. A Sarbecovirus is anything with more than 90% alignment in this region to one of the main lineages.