As we have started to get more data, we have begun to accumulate a number of complex mutation events. For example, three consecutive substitutions or a substitution and deletion is close proximity. We know from other mutational studies that these complex mutation events (come are calling these MNVs in the human world) are a single mutation event not a series of consecutive events.
First off props to nextstrain for annotating these correctly at the amino acid level (while most fail)!
However, looking at the temporal branches, I think these are being counted as more than one event based on the long branch lengths. For an example, we have a local outbreak we are investigating where a triple substitution happens to be a hallmark (20263-20265, ORF1b, Glu2266->Ile) .
Look at USA/OR-OHSU-0176/2020 (and cluster).
So my questions are:
Am I correct that augur is treading these all as separate mutations? In my example, 4 mutations (triplet and another single mutation) on this branch and not 2. If so, are there any work arounds for these apart from hacking the sequences? While these are not the most common mutation event, they do occur in every species I know of (~5% of the de novo mutations in human) and we have a handful of these in the first 100 genomes we sequenced.
This tree also has another good example re indels. USA/OR-OHSU-0177/2020 has a 6 nt deletion that separates it from the rest of the outbreak. From that I could tell, at some point indels stopped being considered several weeks ago now in the main build parameters. I was curious why and if altering this setting in our local builds would cause a lot of issues.