H7 hemmaglutinin phylogeny showing FCS

babarlelephant · May 10, 2021, 12:29pm

Hi, Influenza H7 hemmaglutinin phylogenies are a bit complicated due to many indels at the most interesting portion: the furin cleavage site which is a key determinant of low/high pathogenicity.

My current methodology is:

to blast on ncbi a full-length H7/HA sequence and obtain the fasta of the 1720 sequences with very good cover,
to take as reference CY130150 having a large FCS and to call augur align without fill-gaps,
to edit the alignement with aliview and notepad++ moving several nucleotides to the left or right of the deletions to make the deletions in-frame,
to run MEGA/minium evolution tree to obtain a raw newick tree,
to call augur refine, augur ancestral --keep-ambiguous, augur translate, augur export

You can see the result there http://babarlelephant.free-hoster.net/dist/index_H7.html?c=gt-HA_337,338,339,340,341

The whole process is a bit hard to reproduce. My main concerns:

What is your solution for making the deletions in-frame?
Several sequences have an insertion, often not exactly at the FCS, making the FCS deletion not a multiple of 3, as the insertion is discarded in the alignement this is making the AA translation a bit sloppy for those sequences (those with a X in the translation)
Why is IQTREE giving a poor result in contrary to MEGA ?
It is not a problem with the alignement which is very good. I didn’t try yet to do a fill-gap only for the missing nucleotides at the beginning and end of the sequences.

babarlelephant · May 10, 2021, 2:35pm

For IQTREE it might mainly a rooting problem.

babarlelephant · May 12, 2021, 2:02pm

There is a tree for H5 http://babarlelephant.free-hoster.net/dist/index_H5.html?c=gt-HA_341,342,343,344,345,346,347

This time I downloaded the data from http://openflu.vital-it.ch/ (all the full length H5 sequences).
MEGA gave a not too bad result except for one H5N6 clade, and I am making the deletions in-frame with a simple script. For the alignement I am calling mafft --keeplength --addfragments sequences.fasta reference.fasta > aligned.fasta

Vetdog · November 12, 2021, 2:32pm

For align using MAFFT, mafft --genafpair --maxiterate 1000 input.fas > output.fas may also give you a good alignment result. If the sequence quality was satisfied, using mafft (within Phylosuite or Bioaider) under codon alignment model may also give you an in-frame result.
Otherwise, using augur mask the inferior alignment sites.

rneher · November 13, 2021, 9:06am

The translation problems with indels are one of the primary reasons we developed nextalign. this would try to place indels such that they align with codons and produce a separate amino acid alignment that can be used to annotate the tree with amino acid sequences and mutations. It will still strip insertions though.

Topic		Replies	Views
Exclusion of forced sequences after augur filter step - seasonalflu build General	4	507	January 30, 2023
Error with a flu reference sequence for alignment Help and Getting Started	8	308	March 27, 2024
Helping iqtree and treetime by removing the unmutated columns from the alignment (Monkeypox)	1	396	June 28, 2022
Using influenza datasets in clades.nextstrain.org Help and Getting Started	7	69	March 20, 2025
Monkeypox and common deletions General	0	404	September 28, 2022

H7 hemmaglutinin phylogeny showing FCS

Related topics