Make Auspice entropy panel show mutations with respect to arbitrary sequence (reference) rather than (reconstructed) root of the tree

From a user question

The entropy panel shows mutations with respect to the root of the tree. I would like to work around this and show mutations with respect to a reference sequence that is not at the root of the tree. How can I do this without forcing the tree to root to the reference?

This answer was originally written by @jlhudd

The mutations reflected in the entropy panel come from ancestral sequence reconstruction with TreeTime. Each internal node gets assigned a nucleotide sequence that maximizes a likelihood on the tree given its descendants and its parent node. Each node also gets assigned a list of nucleotide mutations per site based on any mismatches between that node sequence and its parent’s. The inferred sequences and mutations usually end up in a file named “nt_muts.json” in our workflows as output from the “augur ancestral” command. We then translate the inferred sequence for each internal node and the observed sequences for the tips and assign amino acid mutations to each node relative to its immediate parent. These outputs appear in our workflows in a file named “aa_muts.json” as output from the “augur translate” command. When you run the final “augur export v2” command that pulls all of this information into the main Auspice JSON, the nucleotide and amino acid mutations appear for each node in the tree under a JSON key called “branch_attrs” and then under another key called “mutations”, like in this example I’ve cut from a longer JSON:

...snip...
"branch_attrs": {
  "mutations": {
    "nuc": [
      "G199A",
      "C208T",
      "C302T",
      "G657T",
      "T899G",
      "C1271A",
      "A1541G",
      "A1688G"
    ],
    "HA1": [
      "S45N",
      "T48I",
      "N278K"
    ]
  }
...snip...

The logic in Auspice that produces the counts you see in the entropy panel walks through each node visible in the tree and counts how many nucleotide or amino acid mutations it finds at each node based on the data in the Auspice JSON. In the example snippet above, that node would increase the nucleotide mutation counts by 1 across 8 different positions and the amino acid counts by 1 across 3 positions.

Since the entropy panel logic just looks for mutations stored in each node in a specific format, you could choose to generate your own nucleotide and/or amino acid mutations per node with a different approach. For example, you could post-process the “nt_muts.json” file from “augur ancestral” to calculate nucleotide mutations for each internal node’s inferred sequence relative to the reference sequence instead of its parent sequence and use a similar approach with the amino acid mutations. As long as your Auspice JSON stores the mutations in the expected format (like that shown above), the entropy panel will use those values. This should work in principle, although I have not tried it before, so it is possible other aspects of Auspice might not work as expected after modifying mutations like this.

the entropy panel is independent of what is considered the root node. It either shows entropy of alignment columns, or the number of mutations at a particular position.