Is there any annotation file providing correspondence between Nextclade and Pangolin variant nomenclature/annotations, to annotate some SARS-CoV-2 genomes from Gisaid with both these?
For the moment I could only find:
What I’m looking for is an annotation as follows:
lineage clade
AY.43 21J
AY.4 21J
... ...
The idea is to generate a table as follows:
genome lineage clade
hCoV-19/Germany/BW-RKI-I-195742/2021 AY.43 21J
... ... ...
The way I proceed now is the following:
- Parse the Pangolin’s lineage designation file to find a target genome
- Look for the target genome in Gisaid (for quality check)
- Parse the metadata associated to the selected genome to get the lineage for verification (only Pangolin lineage is reported)
These steps allow me to build the first part of the table:
genome lineage
hCoV-19/Germany/BW-RKI-I-195742/2021 AY.43
... ...
Then:
- Paste the genome to Nextclade, wait for the analysis to be completed
- Report the calculated clade
With this information, I can then build the table:
genome lineage clade
hCoV-19/Germany/BW-RKI-I-195742/2021 AY.43 21J
... ... ...
The idea behind this is that both lineages and clades can be used for “tagging” a genome, depending on the aim (a given clade e.g. 21J includes many lineages e.g. AY.43 and AY.4 in the example above) and that I don’t enjoy manual and error-prone procedures;) Thanks for your input.