Twice-mutated locations between parent and child lineages

I have been working with data that is generated over at GitHub - corneliusroemer/pango-sequences: Consensus sequences for each Pango lineage. So CC @corneliusroemer

Since the documentation states the following, I thought I would give this forum a try.

The sequences contained here are the ones used in Nextclade reference trees and produced by code contained in the nextclade_data_workflows/sars-cov-2 repository.

Within the Caveats documentation, it states the following

  • Currently, when one position is mutated twice, e.g. 486F->S->L this gives rise to both a reversion and a new substitution. This might be changed in the future.

Well, I have been able to locate all of these twice mutated lineages, and I am wondering how to go about processing these special cases.

I am a novice when it comes to using the nextStrain UI. I have noticed that a large portion of these twice-mutated lineages don’t seem searchable, so I can’t verify those. I have been able to find a few lineages within NextStrain, and it seems like the Reversion is ignored and the substitution is accepted (i.e. B.1.617.2). XBB.1.5 is also a lineage that has twice-mutated positions. I have been trying to figure out how to view the mutations for just XBB.1.5 and not one of the individual 261 sequences of XBB.1.5

Based on the documentation, dropping the duplicated reversions does seem to be a reasonable assumption to do for all these twice-mutated lineages.

… The algorithm used to create these sequences has a high threshold to allow reversions, almost all sequences need to be reverted, otherwise it’s assumed the reversions are an artefact.

So I am wondering if there are valid twice-mutated locations and what that actually means for what happened. I get the general idea as stated in the documentation (486F->S->L). But a lot of questions and confusion for how that can happen.

I have some sample output from my processing of the data at GitHub - corneliusroemer/pango-sequences: Consensus sequences for each Pango lineage.

Output

XBB.1.5

Here is data for XBB.1.5 and it’s parent lineage XBB.1. There are several XBB.1.5 sublineages that also have this twice-mutated positions.

For nucleotide position 1, there have been 2 nucleotide mutations.

  • One Reversion A1G
  • One Substitution A1T.
Looking at variant:  XBB.1.5

	PARENT (XBB.1)
	(1) PARENT ALL MUTS     :  {'S:S373P', 'S:G252V', 'C25584T', 'T27384C', '29734-29758', 'A18163G', 'ORF1a:K47R', 'T21810C', 'S:Y505H', 'S:N969K', 'C23525T', 'N:E31-', 'ORF1a:P3395H', 'S:Q183E', '21633-21641', 'S:P26-', 'C241T', 'A24424T', '28362-28370', 'S:R346T', 'A22688G', 'S:D614G', 'C23854A', 'ORF1a:F3677-', 'S:T478K', 'A23013C', 'S:F486S', 'C27807T', 'C17410T', 'T16342C', 'G22813T', 'ORF9b:P10S', 'E:T9I', 'C21618T', 'C26858T', 'T22679C', 'ORF9b:A29-', 'C15738T', 'C26270T', 'ORF3a:T223I', 'S:S477N', 'S:S375F', 'A29510C', 'C23604A', 'G28883C', 'G22992A', 'A22786C', 'A23403G', 'ORF1a:G1307S', 'C14408T', 'M:Q19E', 'S:T19I', 'S:N440K', 'S:D405N', 'G4184A', 'C12880T', 'S:N679K', 'C3037T', 'C9866T', 'S:Q954H', 'C25416T', 'S:H146Q', 'S:G446S', 'G22317T', 'G28881A', 'ORF9b:E27-', 'N:R203K', 'G22578A', 'ORF1b:R1315C', 'C19955T', 'T15939C', 'S:N764K', 'S:L24-', 'A27259C', 'C9344T', 'S:L368I', 'C22674T', 'ORF1a:L3201F', 'S:R408S', 'C9534T', 'S:K417N', 'S:A27S', 'S:Q498R', 'ORF1a:T842I', 'ORF9b:N28-', 'T22200A', 'ORF1a:T3090I', 'G21987A', 'S:Y144-', 'C22686T', 'T17859C', 'S:N501Y', 'G10447A', 'S:G142D', 'S:N460K', 'C15714T', 'ORF6:D61L', 'N:R32-', 'S:D796Y', 'G22895C', 'N:P13L', 'G22898A', 'E:T11A', 'ORF1a:L3027F', 'C22995A', 'C10449A', 'ORF1b:S959P', 'T23031C', 'S:T376A', 'S:P25-', 'G26709A', 'A20055G', 'C2790T', 'C26577G', 'C22664A', 'G23948T', 'C10198T', 'G28882A', 'C22109G', 'A405G', 'A1G', 'T22942G', 'A28271T', 'S:E484A', 'M:A63T', 'A23055G', 'G15451A', 'G27382C', 'N:S33-', 'S:S371F', 'ORF1a:G3676-', 'G22577C', 'ORF1a:S3675-', 'ORF1b:P314L', 'C28311T', 'C25000T', 'C44T', 'C4321T', 'N:G204R', '11288-11296', 'T23019C', 'C22000A', 'T23599G', 'A19326G', 'S:V445P', 'S:V213E', 'G22775A', 'C26060T', 'T22882G', 'ORF1b:T2163I', 'T24469A', 'ORF1a:S135R', 'S:G339H', 'A9424G', 'A27383T', 'ORF1a:T3255I', 'S:H655Y', 'G22599C', 'T23075C', 'A26275G', 'S:V83A', 'T22896C', 'ORF1b:G662S', 'S:F490S', 'A23063T', 'ORF1b:I1566V', 'N:S413R', 'T670G', 'C10029T', 'S:P681H', '21992-21994'}

	VARIANT (XBB.1.5)
	(2) ALL NEW MUTS        :  {'T23018C', 'S:F486P', '29734-29759', 'ORF8:G8*', 'A1T', 'G27915T'}
	(3) NUC REVERSIONS ONLY :  {'A1G', '29734-29758'}

	INTERSECTION (1) ∩ (3)  :  {'A1G', '29734-29758'}

	NUC NEW SUBSTITUTION    :  {'T23018C', 'A1T', 'G27915T'}
	NUC ALL REVERSIONS      :  {'A1G', '29734-29758'}
	DUP LOCATIONS (2) & (3) :  ['nucleotide substitution: Root[A 1 T] Parent[? 1 T]', 'nucleotide substitution: Root[A 1 G] Parent[? 1 G]']

Subset of other XBB.1.5 sublineages

Looking at variant:  XBB.1.5.27

	PARENT (XBB.1.5)
	(1) PARENT ALL MUTS     :  {'S:S373P', 'S:G252V', 'C25584T', 'T27384C', 'A18163G', 'ORF1a:K47R', 'T21810C', 'S:Y505H', 'S:N969K', 'C23525T', 'N:E31-', 'ORF1a:P3395H', 'S:Q183E', '21633-21641', 'S:P26-', 'C241T', 'A24424T', '28362-28370', 'S:R346T', 'ORF8:G8*', 'A22688G', 'S:D614G', 'C23854A', 'ORF1a:F3677-', 'S:T478K', 'A23013C', 'C27807T', 'C17410T', 'T16342C', 'G22813T', 'ORF9b:P10S', 'E:T9I', 'C21618T', 'C26858T', 'T22679C', 'ORF9b:A29-', 'C15738T', 'C26270T', 'ORF3a:T223I', 'S:S477N', 'S:S375F', 'A29510C', 'C23604A', 'G28883C', 'G22992A', 'A22786C', 'A23403G', 'ORF1a:G1307S', 'C14408T', 'M:Q19E', 'S:T19I', 'S:N440K', 'S:D405N', 'G4184A', 'C12880T', 'S:N679K', 'C3037T', 'C9866T', 'S:Q954H', 'C25416T', 'S:H146Q', 'S:G446S', 'G22317T', 'G28881A', 'ORF9b:E27-', 'N:R203K', 'G22578A', 'ORF1b:R1315C', 'C19955T', 'T15939C', 'S:N764K', 'S:L24-', 'A27259C', 'C9344T', 'S:L368I', 'C22674T', 'ORF1a:L3201F', 'S:R408S', 'C9534T', 'S:K417N', 'S:A27S', 'S:Q498R', 'ORF1a:T842I', 'ORF9b:N28-', 'T22200A', 'ORF1a:T3090I', 'G21987A', 'G27915T', 'S:Y144-', 'C22686T', 'T17859C', 'S:N501Y', 'G10447A', 'S:G142D', 'S:N460K', '29734-29759', 'C15714T', 'ORF6:D61L', 'N:R32-', 'S:D796Y', 'G22895C', 'N:P13L', 'G22898A', 'E:T11A', 'ORF1a:L3027F', 'C22995A', 'C10449A', 'ORF1b:S959P', 'T23031C', 'S:T376A', 'S:P25-', 'G26709A', 'A20055G', 'C2790T', 'C26577G', 'C22664A', 'G23948T', 'C10198T', 'G28882A', 'C22109G', 'A405G', 'T22942G', 'A28271T', 'S:E484A', 'M:A63T', 'A23055G', 'G15451A', 'G27382C', 'N:S33-', 'S:S371F', 'ORF1a:G3676-', 'G22577C', 'ORF1a:S3675-', 'ORF1b:P314L', 'C28311T', 'C25000T', 'C44T', 'C4321T', 'N:G204R', '11288-11296', 'T23019C', 'C22000A', 'T23599G', 'A19326G', 'S:V445P', 'S:V213E', 'G22775A', 'C26060T', 'T22882G', 'ORF1b:T2163I', 'T24469A', 'ORF1a:S135R', 'S:G339H', 'A9424G', 'A27383T', 'ORF1a:T3255I', 'S:H655Y', 'G22599C', 'T23075C', 'A26275G', 'T23018C', 'S:F486P', 'S:V83A', 'T22896C', 'ORF1b:G662S', 'S:F490S', 'A23063T', 'ORF1b:I1566V', 'N:S413R', 'T670G', 'C10029T', 'S:P681H', 'A1T', '21992-21994'}

	VARIANT (XBB.1.5.27)
	(2) ALL NEW MUTS        :  {'T10204C', 'S:T478R', 'T17124C', 'C22995G', 'T11431C', 'ORF1a:E1015G', 'A3309G'}
	(3) NUC REVERSIONS ONLY :  {'C22995A'}

	INTERSECTION (1) ∩ (3)  :  {'C22995A'}

	NUC NEW SUBSTITUTION    :  {'T10204C', 'T17124C', 'C22995G', 'T11431C', 'A3309G'}
	NUC ALL REVERSIONS      :  {'C22995A'}
	DUP LOCATIONS (2) & (3) :  ['nucleotide substitution: Root[C 22995 G] Parent[? 22995 G]', 'nucleotide substitution: Root[C 22995 A] Parent[? 22995 A]']


Looking at variant:  XBB.1.5.28

	PARENT (XBB.1.5)
	(1) PARENT ALL MUTS     :  {'S:S373P', 'S:G252V', 'C25584T', 'T27384C', 'A18163G', 'ORF1a:K47R', 'T21810C', 'S:Y505H', 'S:N969K', 'C23525T', 'N:E31-', 'ORF1a:P3395H', 'S:Q183E', '21633-21641', 'S:P26-', 'C241T', 'A24424T', '28362-28370', 'S:R346T', 'ORF8:G8*', 'A22688G', 'S:D614G', 'C23854A', 'ORF1a:F3677-', 'S:T478K', 'A23013C', 'C27807T', 'C17410T', 'T16342C', 'G22813T', 'ORF9b:P10S', 'E:T9I', 'C21618T', 'C26858T', 'T22679C', 'ORF9b:A29-', 'C15738T', 'C26270T', 'ORF3a:T223I', 'S:S477N', 'S:S375F', 'A29510C', 'C23604A', 'G28883C', 'G22992A', 'A22786C', 'A23403G', 'ORF1a:G1307S', 'C14408T', 'M:Q19E', 'S:T19I', 'S:N440K', 'S:D405N', 'G4184A', 'C12880T', 'S:N679K', 'C3037T', 'C9866T', 'S:Q954H', 'C25416T', 'S:H146Q', 'S:G446S', 'G22317T', 'G28881A', 'ORF9b:E27-', 'N:R203K', 'G22578A', 'ORF1b:R1315C', 'C19955T', 'T15939C', 'S:N764K', 'S:L24-', 'A27259C', 'C9344T', 'S:L368I', 'C22674T', 'ORF1a:L3201F', 'S:R408S', 'C9534T', 'S:K417N', 'S:A27S', 'S:Q498R', 'ORF1a:T842I', 'ORF9b:N28-', 'T22200A', 'ORF1a:T3090I', 'G21987A', 'G27915T', 'S:Y144-', 'C22686T', 'T17859C', 'S:N501Y', 'G10447A', 'S:G142D', 'S:N460K', '29734-29759', 'C15714T', 'ORF6:D61L', 'N:R32-', 'S:D796Y', 'G22895C', 'N:P13L', 'G22898A', 'E:T11A', 'ORF1a:L3027F', 'C22995A', 'C10449A', 'ORF1b:S959P', 'T23031C', 'S:T376A', 'S:P25-', 'G26709A', 'A20055G', 'C2790T', 'C26577G', 'C22664A', 'G23948T', 'C10198T', 'G28882A', 'C22109G', 'A405G', 'T22942G', 'A28271T', 'S:E484A', 'M:A63T', 'A23055G', 'G15451A', 'G27382C', 'N:S33-', 'S:S371F', 'ORF1a:G3676-', 'G22577C', 'ORF1a:S3675-', 'ORF1b:P314L', 'C28311T', 'C25000T', 'C44T', 'C4321T', 'N:G204R', '11288-11296', 'T23019C', 'C22000A', 'T23599G', 'A19326G', 'S:V445P', 'S:V213E', 'G22775A', 'C26060T', 'T22882G', 'ORF1b:T2163I', 'T24469A', 'ORF1a:S135R', 'S:G339H', 'A9424G', 'A27383T', 'ORF1a:T3255I', 'S:H655Y', 'G22599C', 'T23075C', 'A26275G', 'T23018C', 'S:F486P', 'S:V83A', 'T22896C', 'ORF1b:G662S', 'S:F490S', 'A23063T', 'ORF1b:I1566V', 'N:S413R', 'T670G', 'C10029T', 'S:P681H', 'A1T', '21992-21994'}

	VARIANT (XBB.1.5.28)
	(2) ALL NEW MUTS        :  {'T17124C', 'C22995G', 'S:T478R'}
	(3) NUC REVERSIONS ONLY :  {'C22995A'}

	INTERSECTION (1) ∩ (3)  :  {'C22995A'}

	NUC NEW SUBSTITUTION    :  {'T17124C', 'C22995G'}
	NUC ALL REVERSIONS      :  {'C22995A'}
	DUP LOCATIONS (2) & (3) :  ['nucleotide substitution: Root[C 22995 G] Parent[? 22995 G]', 'nucleotide substitution: Root[C 22995 A] Parent[? 22995 A]']

Finally re-found the link to specific variants https://nextstrain.org/staging/nextclade/sars-cov-2/?s=XBB.1.5

Yeah, so even for XBB.1.5, the reversion is not what is listed