Reference tree for RSV used in NextClade webstie

Hi,

If I understand it correctly, then in the publication: ‘The unified proposal for classification of human respiratory syncytial virus below the subgroup level’., a set of 1480 RSV-A sequences and 1385 RSV-B sequences is mentioned that are used to build the trees for genotype assignment. Is this the set of sequences that are used for genotype assignment using the nextclade website and in the corresponding phylogenetic tree view? Also are these the same sequences as provided here: Classification_proposal/Full-genomes-trees at main · rsv-lineages/Classification_proposal · GitHub?

Best,
Thomas

Hi @Thomas,

I will let our scientists to comment with certainty, but in the meantime you could explore (and to run) the source code used to produce the trees for the official Nextclade RSV datasets:

You could also experiment and create your own datasets. The process is documented in nextclade_data repo and that’s also the place if you want to submit a new dataset:

Hi Thomas, the Nextclade reference tree uses a subset of all high quality genomes that are available on NCBI, and in addition uses the genomes in the reference alignments you mention if they are not in NCBI. These are here in (with permission from the authors):

The lineages on the tree are defined via characteristic sets of mutations that were determined using the reference alignments. These mutations sets are documented here

and analogously for RSV-B

best,
richard

Thank you, both, that answers my question!