Where can I look up clade defining mutations for seasonal flu?

This is a Q&A that happened privately that I’m open sourcing for future reuse

I am using NextStrain to assign clades to the H1N1 viruses in my dataset, but I noticed there were many unassigned viruses. For the H1N1 clades prior to 2014 that aren’t assigned by Nextclade, could you share the criteria NextStrain uses to define these pre-2014 clades (e.g., clade 7) so I could apply the same criteria to my data and be consistent with NextStrain?

The clade defining mutations are contained in the clades.tsv file in the seasonal-flu repo https://github.com/nextstrain/seasonal-flu/blob/master/config/clades_h1n1pdm_ha.tsv

If you are making a build using augur tools, you can assign these clades using the augur clades command using that clades.tsv from above: augur clades — Augur 23.1.1 documentation

1 Like

Hi Cornelius,

Thank you for posting this. Does this list only contain the mutations that were used to define the clades? For RSV, you had a more extensive list of amino acids reaching >=90% in clades, published in the unified proposal for classification of human RSV below the subgroup level. Is a more complete list also available for flu? Now that I look at it more closely, it seems the clades genome file for RSV is organized hierarchically (rsv/config/clades_genome_a.tsv at master · nextstrain/rsv · GitHub), vs the HA file for flu is not (seasonal-flu/config/h1n1pdm/ha/clades-long.tsv at master · nextstrain/seasonal-flu · GitHub).

Thank you!
Thomas