Masking of position 21987 in SARS-CoV-2 builds

Hi,
In the SARS-CoV-2 builds there are several positions in the alignment that are masked due to various problems. See here.

I am particularly wondering about position 21987, which relates to amino acid 142 in Spike. The position is listed as amplicon_drop_or_primer_artefact,back_to_ref which means:

##	amplicon_drop_or_primer_artefact = Amplicon dropout and/or failed primer trimming
##	back_to_ref = The alternate allele is not called for this position due to issues with amplicon dropout and primer trimming. For more details, see: https://github.com/W-L/ProblematicSites_SARS-CoV2/issues/7 and https://github.com/cov-lineages/pango-designation/issues/95

Does anyone know if this information is still valid? In our data we see the substitusion G142D in sequences across different sequencing platforms and protocols. However it migh be overrepresented in the SWIFT-protocol, which could support some primer-related issues. But looking at outbreak.info, G142D is very common: outbreak.info

Our current thinking is that this is complicated by primer artefacts, but that there was a G142D mutation at the base of Delta. We are actively looking into it. Here’s a build from before we masked this position, and you can see the switching between G and D throughout the delta clade (>80 switches), which is an artefact of something.

1 Like