Hi,
If I have 16 million of SC2 sequences, how can I know the Nextclade_pango for each of SC2 sequences?
You can download the metadata file for the open GenBank data at https://data.nextstrain.org/files/ncov/open/metadata.tsv.zst. This lists Nextclade_pango
for every genome in the dataset.
A broader listing of these sorts of files is here: nextstrain.org/projects/ncov/en/latest/reference/remote_inputs.html
Thanks @trvrb.
BTW, do you know where to find the representative genomes for each Pango lineages?
Sure thing. @corneliusroemer has prototype sequences for each Pango lineage on his GitHub repo here: github.com/corneliusroemer/pango-sequences.
Cool! Thanks. @trvrb