Resource for representative nucleotide changes for Nextstrain clades

Hello all,

Firstly, thanks for the contribution to the field with Nextstrain and all the associated resources!

We would like to use Nextstrain resources in our project, and I have been digging into your resources for a while.

However, after spending quite a lot of hours (please excuse me if my words are conceptually not so correct as I’m a newbie in the virus world), I do not seem to understand how I can create a resource that contains all the mutations that each Nextstrain clade contains for SARS-CoV-2. Here I must state that I am not looking for defining mutations like Nextstrain uses for augur clades to define clades on the tree: ncov/defaults/clades.tsv at master · nextstrain/ncov · GitHub.
What I am looking for is a set of all the mutations that each clade contains.
In Nextclade CLI, I came across tree.json for SARS-CoV-2 among dataset files. It seems to contain all clades with nucleotide changes but I have realized it’s not inclusive enough when I compared individual clades with the mutations shown in covariants.org plus it’s also not the same with what’s offered with clades.tsv.
I have looked up a lot and I did not find a resource that I could extract all mutations for each clade.
I would be very glad if anybody could give me any ideas or some insights on my problem.

Thank you in advance!

To obtain the mutations of a particular clade from the tree.json, you need to accumulate all mutations on the path from the root to the sequence/node of interest. This then includes all mutations and deletion, but not the insertions.

another resource would be the pango-lineage consensus sequences maintained by @corneliusroemer

If you compare these to the reference, you can read off all mutations.