Trouble identifying mutations in clade definitions (20J/501Y.V3)

Hi NextStrain community,

My group is trying to make sense of the distinction between the pangolin lineages “P.1” and “P.2”. This got me looking at how NextStrain identifies these lineages, but the substitutions listed for P.1 (20J/501Y.V3) don’t seem to include the 501Y mutation.

As I understand it, P.1 is basically synonymous with 20J/501Y.V3. I read the naming strategy, and followed it to clade definition table. My understanding is that changes at position 23063 are behind the N501Y mutation (using a handy cheat sheet, 0-indexed). I see that position listed for both 501Y.V1 and 501Y.V2, but not for 501Y.V3. Is this mutation not actually part of the 20J/501Y.V3 definition?

I’ve attached a screenshot of the clade definition table, highlighting the search results for 23063 (finding 501Y.V1 and V2)

Any input will be appreciated.
Thanks
Adam

While 501Y is a mutation in P.1, it does not really help defining the clade because there are several other clades with this mutation. The clade definitions nextstrain uses are not meant to be exhaustive lists of mutations, they only need to be sufficiently specific to uniquely define the clade.

Thanks for the explanation.