Understanding divergence

Hi everyone,

I’m trying to understand what exactly divergence is measuring. I’ve looked through the documentation and I see that for the nextstrain pipeline it defaults to mutations rather than mutations-per-site, and I’ve left it as default in my builds.

Is divergence measuring just the number of mutations in the sample relative to the Wuhan reference based on the alignment? If so, is this measured in nucleotide or amino acid mutations?

Additionally, I’ve noticed that the same sequence can be assigned a different level of divergence in each run, although the mutations identified don’t change. My guess is that this might be due to slight differences in the alignment, but it’s a very basic guess so please do correct me if it’s wrong.

Sorry for all the questions, and thanks in advance for any clarification!

Hi Josie - by default the units will be substitutions per site (see here for an introduction), however for our ncov datasets we multiply this by the genome size to give overall mutations. This won’t always be the same as simply counting the mutations relative to the root (Wuhan/Hu-1 or similar), as the underlying phylogenetic model will consider reversions and multiple mutations at the same site. There is some uncertainty in the phylogeny, so as that changes the divergence may change. If the alignment changed, you should see a different set of mutations to the root when clicking on a tip.

1 Like

Thanks for the clarification! This clears up why I wasn’t seeing the same figures.

Thank you (and thanks for such a great tool)!