SARS-CoV-2 E484K mutation and del 69-70 deletion

Hi,

I am using IDSeq/Nextclade
Screenshot 2023-05-05 133228
to identify my SARS-CoV-2 variants and export all the S-protein mutation regions to see what was covered in my reads. We noticed that the E484K and deletion 69-70 regions were not reported in the output. Here is an output for mostly B.1.1.7 and the deletion 69-70 region was not reported. Does Nextclade report these regions?

Hi Lucy!

It’s hard to say definitively why S:E484K is not shown. The mutation is either not present in your sequence, or that region is unsequenced (N).

Can you share the sequence name/accession number with me so I can have a look?

You can see whether the region is sequenced or not by switching to “Gene S” view using the gene view dropdown:

If it’s greyed out, then a region is unsequenced, in my case S:229-289 is unsequenced.

You can check what regions are Ns by hovering over the number in the “Ns” column, then you get a tooltip that will show which regions are Ns:

Deletions are shown in the column called “Gaps”:

I hope that makes things clearer. Let me know if you have any questions.

If you have any particular questions regarding IDSeq, you could try asking here in their repo: GitHub - chanzuckerberg/czid-web: Infectious Disease Sequencing Platform, I’m not familiar with IDSeq, only with Nextclade.

Best,

Cornelius

Hi Cornelius,

I ran all my sequences in IDSeq which in turn uses Nextclade, how can I find the accession number?

Katie

Hi Cornelius,

I was also wondering what are the recommended % genome recoveries necessary for accurate SARS-CoV-2 variant calling using NextClade? I was thinking you could get a low % genome recovery from the consensus alignment, but maybe your primers sequenced key regions for accurate variant calling. What key regions on the SARS-CoV-2 genome should be sequenced in order for NextClade to accurately call a variant? Thanks!

I ran all my sequences in IDSeq which in turn uses Nextclade, how can I find the accession number?

I’m not quite sure what you mean by accession number, as Nextclade just looks at the sequences you give it. I don’t know how IDSeq uses Nextclade. Maybe you could ask at an IDSeq forum? Or show me a screenshot and explain in more detail what you mean. Happy to help! I’m actually curious how IDSeq incorporates Nextclade, so if you could share a bit that would be super interesting.

I answered here:

Why do you get low % genome recovery? Are you working with waste water and trying to generate a consensus from a waste water sequence? Or are you trying to be more cost-effective and only sequence parts of the genome of patient samples?

Hi @corneliusroemer here is the link to the IDSeq pipelines for SARS-CoV-2 CZ ID Pipeline Overviews – CZ ID Help Center (zendesk.com)

View SARS-CoV-2 Genomes in Nextclade – CZ ID Help Center (zendesk.com)

Hi @corneliusroemer thanks so much for your response this is so helpful! I am working on a paper, and we basically sequencing wastewater for SARS-CoV-2 using minION mk1b using the ARTIC v3 and v4.1 primers. We also used RT-ddPCR with the GT Molecular SARS-CoV-2 kit for variant calling. We analyzed all our data and noticed that we get 50%-75% genome recovery when the N1 and N2 genes are >30k GC/100mL and 75%-100% when >48k GC/100mL. I guess I just wanted to know what the best % genome recovery gives you the most accurate variant calling (for the discussion part of the paper). What is a good cutoff % recovery? I did a rapid literature review search and most publications are setting their % genome recovery to >90%. I just wanted to know your input, since you are an expert on how NextClade works.

Katie

Thanks! Katie