Large, novel Spike deletions of unknown origin (not reproducible outside nextstrain framework)

guy · November 15, 2021, 3:18pm

Dear forum,
I was alerted by my colleagues that a newly sequenced SARS-CoV-2 sample had multiple novel deletions in the starting amino acids of the Spike protein.

Bracing for another variant of concern, I ran the sequencing data through my own pipeline (bwa-mem2+iVar) and submitted both to nextclade.

Nextclade identified large Spike deletions in both sequences:

However, I could not find a single indication of these deletions outside of the nextclade results.

Both consensus sequences had the Spike start codon fully in-tact.

My variant caller (iVar) found no such deletions, save for the well known ones further down the gene (69-70 and so forth).
Other tools, such as the GISAID CoVsurver and cov-glue, didn’t find anything out of the ordinary either.

The consensus sequences (same sequencing data, different piplines) can be obtained here.

Do you have any idea what happened here?

Thanks in advance,
Guy

rneher · November 16, 2021, 8:47am

Thanks for flagging this. You are right, these deletions aren’t real.

the reason seems to be that the amino acid alignment is thrown off by the large numbers of N early in the sequence preceding the deletion around position 65 in spike. We’ll try to fix…

ivan-aksamentov · November 29, 2021, 1:12am

Hi @guy,

We have improved alignment of peptides and now it should be a little better. It is released in the Nextclade CLI 1.5.0 and Nextclade Web 1.8.0.

Let us know if it solves the problems you observed.

See this PR for more details:

github.com/nextstrain/nextclade

feat: skip seed match for aa and infer alignment params from nuc alignment

nextstrain:master ← nextstrain:feat/aa-alignment-params-from-aligned

opened 05:13PM - 19 Nov 21 UTC

ivan-aksamentov

+276 -26

This is an implementation ~of Richard's idea here: https://github.com/nextstrain…/nextclade/issues/609#issuecomment-973891398~ Update: since then we refined the idea quite more. The aim is to improve peptide alignment, and to hopefully solve problems like #609 #594, which might be due to inferred band width of the banded alignment being too small. The proposed solution is to infer the band width and mean shift for peptide alignment from nucleotide alignment, instead of running another seed matching pass. ~This branch is based on #612 and includes debug traces.~ (Not important) Resolves #609 Resolves #594

guy · December 23, 2021, 3:22pm

Hi Ivan,
Yes, this solved it. Thanks for the quick reply!

Topic		Replies	Views
Missing S-protein Y144 deletion from B.1.1.7 lineage? Help and Getting Started	0	412	January 26, 2021
Monkeypox and common deletions General	0	425	September 28, 2022
SARS-CoV-2 E484K mutation and del 69-70 deletion General	7	626	July 18, 2023
Spike protein sequences filtered for lineage General	1	606	February 10, 2022
Rule align & deletions in coding regions General	3	494	January 23, 2021

Large, novel Spike deletions of unknown origin (not reproducible outside nextstrain framework)

Related topics