Dear All,

I am wondering whether GISAID covers all the COVID genomic sequencing data or I still need to download the data from NCBI, ENA and then merge with GISAID data to have a complete COVID genomic sequencing data?



I think GISAID pretty much covers all the genomic sequencing data. It is recommended by my senior.

Hi Shicheng,

Apologies for the late reply. I haven’t done a thorough analysis of this (it’s difficult to do because there is no authoritative source of cross-links between GISAID and Genbank sequences).

It’s hard to know how many sequences are Genbank-only. There don’t seem to be too many Genbank only sequences (maybe because GISAID eventually imports Genbank-only sequences). One exception seems to be Bahrain: on Genbank, there are 1800 sequences with collection date in the past 6 months (covSPECTRUM) - in GISAID data there are apparently only around 700 for the same time period (covSPECTRUM)

@AngieHinrichs has tried to create and maintain a file of cross-links, I think she would be best placed to answer your question in more detail.

