Large, novel Spike deletions of unknown origin (not reproducible outside nextstrain framework)

Dear forum,
I was alerted by my colleagues that a newly sequenced SARS-CoV-2 sample had multiple novel deletions in the starting amino acids of the Spike protein.

Bracing for another variant of concern, I ran the sequencing data through my own pipeline (bwa-mem2+iVar) and submitted both to nextclade.

Nextclade identified large Spike deletions in both sequences:

However, I could not find a single indication of these deletions outside of the nextclade results.

Both consensus sequences had the Spike start codon fully in-tact.

My variant caller (iVar) found no such deletions, save for the well known ones further down the gene (69-70 and so forth).
Other tools, such as the GISAID CoVsurver and cov-glue, didn’t find anything out of the ordinary either.

The consensus sequences (same sequencing data, different piplines) can be obtained here.

Do you have any idea what happened here?

Thanks in advance,

Thanks for flagging this. You are right, these deletions aren’t real.

the reason seems to be that the amino acid alignment is thrown off by the large numbers of N early in the sequence preceding the deletion around position 65 in spike. We’ll try to fix…

1 Like

Hi @guy,

We have improved alignment of peptides and now it should be a little better. It is released in the Nextclade CLI 1.5.0 and Nextclade Web 1.8.0.

Let us know if it solves the problems you observed.

See this PR for more details:

Hi Ivan,
Yes, this solved it. Thanks for the quick reply!