I am currently developing a machine learning model that will train on certain genes in the COVID genome. All that is necessary is the genome as a list of base pairs with a time stamp and maybe a geographical location. Is there away to download this data a single file, maybe csv, txt, or fasta file?
You may find what you want among the GenBank-based (really all of INSDC) input files we provide as for our SARS-CoV-2 (“ncov”) workflow. Genomes are in the sequences.fasta.zst
files, dates and locations for them in the metadata.tsv.zst
files.