Dear nextstrain team, how can I add RBD immune escape score and ACE2 affinity score in the phylogenetics tree?
When we use auspice to visualize after running nextrain on localization with our own data, neither RBD immune escape score and ACE2 affinity score were available in colored by.
This data comes from Nextclade and it’s currently only available for “SARS-CoV-2 relative to BA.2” dataset (technical name: sars-cov-2-21L
).
The data you see on nextstrain.org is computed in ncov-ingest pipeline, which runs Nextclade twice, with the general-purpose dataset and with 21L dataset:
https://github.com/nextstrain/ncov-ingest/blob/3b711bf40a62335bb92a4b0292e4037dada823dd/workflow/snakemake_rules/nextclade.smk#L5-L8
And then this data is read from nextclade.tsv output file, and added to the metadata.tsv file
https://github.com/nextstrain/ncov-ingest/blob/3b711bf40a62335bb92a4b0292e4037dada823dd/bin/join-metadata-and-clades#L44
The metadata.tsv file is further passed to ncov pipeline, where Augur produces the tree with the colorings you see, as configured here:
https://github.com/nextstrain/ncov/blob/23d1243127e8838a61b7e5c1a72bc419bf8c5a0d/workflow/snakemake_rules/export_for_nextstrain.smk#L205-L219
So to get the same thing, you’d need to mimic what’s happening in ncov-ingest and ncov. Perhaps you don’t need all the bells and whistles and the complexity, so you can take only small pieces.
Additionally, for a simpler setup, with less features, you could also run just Nextclade alone, with 21L dataset, either by using sars-cov-2-21L
dataset in Nextclade CLI:
nextclade run -d sars-cov-2-21L --output-tsv=nextclade.tsv my_sequences.fasta
and you’d get nextclade.tsv with the numbers.
Or by selecting the dataset “SARS-CoV-2 relative to BA.2” in Nextclade Web:
https://clades.nextstrain.org/?dataset-name=sars-cov-2-21L
(Click “Load example” or drop your own fasta file, then click “Run”, and you should see the columns “Immune escape” and “ACE2 binding” in the results table, and also the corresponding colorings on the tree page*, if you click on the “Tree” button)
Nextclade not nearly as good as the full phylogenetic analysis with Augur/Auspice, but is much easier and faster to setup, especially if you only need the immune numbers.
Nextclade docs in case you want to set it up:
https://docs.nextstrain.org/projects/nextclade/en/stable/
* P.S. I just realized Nextclade Web does not show the phenotype values (such as ace2_binding
and immune_escape
) on the tree page for the newly placed nodes. So I submitted a fix for that, pending a review.