Hi, Influenza H7 hemmaglutinin phylogenies are a bit complicated due to many indels at the most interesting portion: the furin cleavage site which is a key determinant of low/high pathogenicity.
My current methodology is:
- to blast on ncbi a full-length H7/HA sequence and obtain the fasta of the 1720 sequences with very good cover,
- to take as reference CY130150 having a large FCS and to call augur align without fill-gaps,
- to edit the alignement with aliview and notepad++ moving several nucleotides to the left or right of the deletions to make the deletions in-frame,
- to run MEGA/minium evolution tree to obtain a raw newick tree,
- to call augur refine, augur ancestral --keep-ambiguous, augur translate, augur export
You can see the result there http://babarlelephant.free-hoster.net/dist/index_H7.html?c=gt-HA_337,338,339,340,341
The whole process is a bit hard to reproduce. My main concerns:
- What is your solution for making the deletions in-frame?
- Several sequences have an insertion, often not exactly at the FCS, making the FCS deletion not a multiple of 3, as the insertion is discarded in the alignement this is making the AA translation a bit sloppy for those sequences (those with a X in the translation)
- Why is IQTREE giving a poor result in contrary to MEGA ?
It is not a problem with the alignement which is very good. I didn’t try yet to do a fill-gap only for the missing nucleotides at the beginning and end of the sequences.