How can I know the Nextclade_pango for each of SC2 sequences?

Hi,
If I have 16 million of SC2 sequences, how can I know the Nextclade_pango for each of SC2 sequences?

You can download the metadata file for the open GenBank data at https://data.nextstrain.org/files/ncov/open/metadata.tsv.zst. This lists Nextclade_pango for every genome in the dataset.

A broader listing of these sorts of files is here: nextstrain.org/projects/ncov/en/latest/reference/remote_inputs.html

Thanks @trvrb.

BTW, do you know where to find the representative genomes for each Pango lineages?

Sure thing. @corneliusroemer has prototype sequences for each Pango lineage on his GitHub repo here: github.com/corneliusroemer/pango-sequences.

Cool! Thanks. @trvrb