im getting the hang of using augur + auspice, and this is more of a GISAID user gripe, but is there a way to download more than 5000 entries from GISAID at a time?
if I just used the GISAID download augur input option on multiple instances, does augur accept multiple fastas and metadata files? or do i have to merge them manually as single inputs first?
I’m sorry for the late reply. Nextstrain is completely separate from GISAID, so my answer should only be seen as suggestions from a fellow GISAID user (not authoritative).
As far as I’m aware, none of the GISAID users I know can download more than 5000 entries via the web interface. This seems to be a restriction placed by GISAID. I don’t know why, you would have to ask GISAID.
However, some researchers have been given “API” access by GISAID, which is basically a special, private, password-protected download link that allows the user to download all sequences on GISAID (currently >15 million) and metadata.
This API access has to be individually negotiated, as far as I’m aware. I’m not aware of any documentation on the GISAID website, so you may have to email someone at GISAID and ask about it.
You can try to email: contact@gisaid.org but beware that other users have reported long delays in getting responses.
From conversations with other researchers who have requested and/or received “API” access, GISAID will likely ask what you want to do with the data and then give you a contract to sign. In contrast to general terms of access, that contract/agreement is often more restrictive.
Of course the other option is to manually download 5000 sequences at a time and merge prior to analysis. I’ve heard that there are tools available to make downloads less tedious, e.g. GitHub - Wytamma/GISAIDR: Programmatically interact with the GISAID database. (I’ve never used this myself though as Nextstrain thankfully has been granted API access)
Let me know if you have further questions, I’ll try my best to help!