H7 hemmaglutinin phylogeny showing FCS

Hi, Influenza H7 hemmaglutinin phylogenies are a bit complicated due to many indels at the most interesting portion: the furin cleavage site which is a key determinant of low/high pathogenicity.

My current methodology is:

  • to blast on ncbi a full-length H7/HA sequence and obtain the fasta of the 1720 sequences with very good cover,
  • to take as reference CY130150 having a large FCS and to call augur align without fill-gaps,
  • to edit the alignement with aliview and notepad++ moving several nucleotides to the left or right of the deletions to make the deletions in-frame,
  • to run MEGA/minium evolution tree to obtain a raw newick tree,
  • to call augur refine, augur ancestral --keep-ambiguous, augur translate, augur export

You can see the result there http://babarlelephant.free-hoster.net/dist/index_H7.html?c=gt-HA_337,338,339,340,341

The whole process is a bit hard to reproduce. My main concerns:

  • What is your solution for making the deletions in-frame?
  • Several sequences have an insertion, often not exactly at the FCS, making the FCS deletion not a multiple of 3, as the insertion is discarded in the alignement this is making the AA translation a bit sloppy for those sequences (those with a X in the translation)
  • Why is IQTREE giving a poor result in contrary to MEGA ?
    It is not a problem with the alignement which is very good. I didn’t try yet to do a fill-gap only for the missing nucleotides at the beginning and end of the sequences.

image

1 Like

For IQTREE it might mainly a rooting problem.

There is a tree for H5 http://babarlelephant.free-hoster.net/dist/index_H5.html?c=gt-HA_341,342,343,344,345,346,347

This time I downloaded the data from http://openflu.vital-it.ch/ (all the full length H5 sequences).
MEGA gave a not too bad result except for one H5N6 clade, and I am making the deletions in-frame with a simple script. For the alignement I am calling mafft --keeplength --addfragments sequences.fasta reference.fasta > aligned.fasta

1 Like

For align using MAFFT, mafft --genafpair --maxiterate 1000 input.fas > output.fas may also give you a good alignment result. If the sequence quality was satisfied, using mafft (within Phylosuite or Bioaider) under codon alignment model may also give you an in-frame result.
Otherwise, using augur mask the inferior alignment sites.

The translation problems with indels are one of the primary reasons we developed nextalign. this would try to place indels such that they align with codons and produce a separate amino acid alignment that can be used to annotate the tree with amino acid sequences and mutations. It will still strip insertions though.

2 Likes