Select SARS-COV-2 sequence with alpha, beta, gamma, delta mutations

Hi, After I downloaded the full genome from GISAID, could I run a query to extract all genomes with alpha, beta, gamma, delta mutations?

Right now, it seems that i could only use “==” inside a query, but not something like “like”.

Please advise.

Thank you & best regards,
Jie

hi @jiehuang001, if you are using the open metadata (see here https://nextstrain.org/blog/2021-07-08-ncov-open-announcement) you can do things like

--query "Nextstrain_clade=='21A (Delta)'"

This would select all sequences assigned to this variant.

Depending on what data you download from GISAID, you can do similar queries on pango_lineages. Nextstrain_clade is only present in Nextstrain supplied files.

Thanks, Meher!

I opened the link that you provided. It seems that the metadata listed over there has less than 1 million rows, while the latest GISAID download has more than 2 million SARS-COV-2 genomes.

Based on the WHO’s webiste (Tracking SARS-CoV-2 variants), teh Delta mutation has a GISAID clade of “G/478K.V1” and a Nextstrain clade of “21A”. But, is this strictly a one-to-one relationship? That is, does all virus belonging to Nextstrain clade “21A” have Delta mutation, and vice versa? I wish there is a formula on how to classify/predict the alpha, beta, gamma, delta mutations?

Best regards,
Jie

The covariants.org website provides a list of mutations that define various VoCs:
CoVariants