Perhaps this is the wrong place to ask, but I don’t see any sort of GISAID forum to ask at… I generally update my gisaid dataset for Nextstrain ever couple weeks. I’ve noticed that, whereas all the datasets before this (two weeks ago and farther back) had NextClade designations for all strains in the metadata, the 4/10/2022 and 4/11/2022 datasets do not. There’s still a nextclade column, but every sample has ‘?’ for its clade.

Do you know if this is a temporary thing (a technical problem?), or are they moving away from NextClade designations towards one of the other classifications systems? A few of my nextstrain builds were using that column to look at specific variants in specific locations and so forth. I could probably switch over to pangolin or similar, but it would be a bit of a pain.


I’m sorry for the late reply.

I don’t know why Nextclade clade information was missing. Speculating, it could be that GISAID runs sequences through Nextclade only once every few days so it takes time to update. You could try to contact GISAID through the contact form on the website. I’d be interested to know how things have developed in the past few weeks: did the annotations reappear?

As an alternative that is independent of GISAID annotations, you can always run your sequences through Nextclade CLI yourself to get those Nextclade clade annotations. That’s how Nextstrain does it internally, using ncov-ingest: GitHub - nextstrain/ncov-ingest: A pipeline that ingests SARS-CoV-2 (i.e. nCoV) data from GISAID and Genbank, transforms it, stores it on S3, and triggers Nextstrain nCoV rebuilds.

