Helping iqtree and treetime by removing the unmutated columns from the alignment (Monkeypox)

Hi, with only 200 sequences Monkeypox is getting pretty long to build (197000 bases and a lot of NNN)

So I am now removing from the alignment the columns such that all the sequences have the same base as the reference or a N.

  • I am writing the correspondance between the coordinates in the original alignment and the new alignment in a .tsv
  • Then I’m running iqtree, augur refine --timetree, augur ancestral,
  • Then I’m parsing the obtained muts.json, rewriting the mutations in the correct coordinates, and rewriting similarly the ancestral sequences.

What do you think? What about adding to augur the commands to do it?
augur compress alignment --input alignment.fasta --output-alignment compressed.fasta --output-coordinates coords.tsv
augur compress renumber --input-mutations mutsCompressed.json --input-coordinates coords.tsv --output muts.json

Though it would need giving to augur refine --timetree the length of the original alignment, so it can scale the clockrate accordingly.

Hi @babarelephant,

in principle treetime (and I believe IQtree as well) should be already doing this. But at least in TreeTime it is not done very efficiently (treetime/sequence_data.py at master · neherlab/treetime · GitHub). Now with MPXV, I should look more closely why this doesn’t seem to result in the speedup we expect.

TreeTime does allow you to specify an alignment with only variable sites and --sequence-length, the difference between the two will be assumed constant and used in branch length calculations.

1 Like