H7 hemmaglutinin phylogeny showing FCS

Hi, Influenza H7 hemmaglutinin phylogenies are a bit complicated due to many indels at the most interesting portion: the furin cleavage site which is a key determinant of low/high pathogenicity.

My current methodology is:

  • to blast on ncbi a full-length H7/HA sequence and obtain the fasta of the 1720 sequences with very good cover,
  • to take as reference CY130150 having a large FCS and to call augur align without fill-gaps,
  • to edit the alignement with aliview and notepad++ moving several nucleotides to the left or right of the deletions to make the deletions in-frame,
  • to run MEGA/minium evolution tree to obtain a raw newick tree,
  • to call augur refine, augur ancestral --keep-ambiguous, augur translate, augur export

You can see the result there http://babarlelephant.free-hoster.net/dist/index_H7.html?c=gt-HA_337,338,339,340,341

The whole process is a bit hard to reproduce. My main concerns:

  • What is your solution for making the deletions in-frame?
  • Several sequences have an insertion, often not exactly at the FCS, making the FCS deletion not a multiple of 3, as the insertion is discarded in the alignement this is making the AA translation a bit sloppy for those sequences (those with a X in the translation)
  • Why is IQTREE giving a poor result in contrary to MEGA ?
    It is not a problem with the alignement which is very good. I didn’t try yet to do a fill-gap only for the missing nucleotides at the beginning and end of the sequences.


For IQTREE it might mainly a rooting problem.

There is a tree for H5 http://babarlelephant.free-hoster.net/dist/index_H5.html?c=gt-HA_341,342,343,344,345,346,347

This time I downloaded the data from http://openflu.vital-it.ch/ (all the full length H5 sequences).
MEGA gave a not too bad result except for one H5N6 clade, and I am making the deletions in-frame with a simple script. For the alignement I am calling mafft --keeplength --addfragments sequences.fasta reference.fasta > aligned.fasta