I would like to download the data of SARS-CoV-2 lineages, but when I search on GISAID I can not download this data.
My intention is to obtain the greatest number of lineage records between the years 2020 and 2022 in the state of Pernambuco in Brazil.
Thanks for your help!
Hello! What data can’t you download? When I filter locations by
South America / Brazil / Pernambuco I get around 8.2k results
It may not be possible to download all 8.2k sequences in one go, so you could “paginate” by making multiple queries with shifting windows, like collection date between Jan 2020 to April 2020, May 2020 to September 2020 etc and then download the smaller sets.
You should be able to download sequences and metadata through the “Input for Augur pipeline” checkbox:
(select sequences, then click download at the bottom right)
Metadata should contain a Pango lineage annotation that GISAID has created through pangoLEARN.
You can get Nextclade or Usher (pangolin) to produce similar annotations. But for a start you can probably just use the lineages that are in the metadata.
I hope this helps!