How to download avian influenza fasta and metadata files from GISAID or GenBank in a compatible format?

corneliusroemer · December 15, 2023, 4:04pm

Hi Albert,

This is a great question! We have a small tutorial for this: Preparing Your Metadata — Augur 23.1.1 documentation

For Augur you generally need separate sequences.fasta and a metadata.tsv file where the sequence headers from the fasta are also present in a column of the metadata so that the two can be linked.

Often, Genbank and GISAID sequence exports contain metadata inside the fasta sequence headers. E.g. OS123|2023-10-01|Australia. Augur offers the augur parse command as a convenience to split that into the required fasta and metadata tsv.

It might be able to do the job just fine for a start. At Nextstrain, we often do more further processing of the metadata in so called ingest workflows to make it less annoying to work with, e.g. renaming fields and changing values. But this isn’t strictly necessary.

@joverlee has written a lot of tooling to make such data massaging easier with the augur curate command. There are also various scripts we’ve developed in various repos that you could use for inspiration.

Some repositories for inspiration:

Mpox ingest that takes data from Genbank/NCBI and produces sequences.fasta and metadata.tsv: mpox/ingest at master · nextstrain/mpox · GitHub
General template to use as a starting point for “ingest”: pathogen-repo-template/ingest at main · nextstrain/pathogen-repo-template · GitHub
hepatitisB ingest: hepatitisB/ingest at 7bd1b05e55a7fe0179195d13d44abaf40755d1ef · nextstrain/hepatitisB · GitHub
Dengue ingest: dengue/ingest/README.md at f513d319055c706a11370b76a95fdad729edc1cc · nextstrain/dengue · GitHub

I hope this helps! Feel free to make a new post if you hit particular challenges!

Best,

Cornelius

Topic		Replies	Views
Updated example command needed for updated GISAID file	4	567	August 30, 2021
Nextmeta and nextfasta not on GISAID	34	2658	June 30, 2021
Using a metadata table in the Flu build? Help and Getting Started	1	150	April 4, 2024
GISAID download augur input	1	726	April 28, 2023
Strain ID and the tab-delimited metadata in Prepare the Sequences Help and Getting Started	4	40	June 23, 2025

How to download avian influenza fasta and metadata files from GISAID or GenBank in a compatible format?

Related topics