Adding a new category vs topic? Entropy/Events rate of evolution


As a new user, I am not allowed to make new categories. I am thinking a category to discuss rates of evolution, entropy and events might warrant a new category, rather than just a new topic in General or another category. I am not sure if a moderator can move this post or not, but I will explain more here anyway, and if it can’t be moved, I can copy/paste or re-write it later under that category if it is created.

As explained to a new user in the introductions discussion, the code for calculating entropy and vents is in the Augur package. I would like to suggest adding another measurement that could be equally or more interesting to see, and that is the rate of evolution observed. I am aware of at least 2 programs that will output the rate of evolution per site (column) in a multiple sequence alignment; Gary Olsen’s DNArates program, and the IQ-tree program. Also, the Consurf database has calculated rates for proteins with known 3d Structures and provided a view of the 3D structure with each amino acid colored by rate of evolution.

Given that this would be a modification or addition to Augur, rather than Nextstrain, it may be better to discuss it there in Github? I think it would be good here, because many of us are just users of Nextstrain and not programmers with Github accounts.

The point is, that rates of evolution are related to entropy, but a site that is 50% Ala and 50% Val in an alignment will have the same entropy whether that site mutated one time very early in the evolution (near the root of the tree), or has mutated many times back and forth between Ala and Val. So the “events” plot is more like a rate of evolution, but I am not sure it is the same.

Poking around on a few trees new, I see that if I double click on a branch of a tree, the view of the tree not only zooms in to that branch but also the entropy and events plots are now specific to that branch of the tree.

I am attaching an example of a site that seems to have an unbelievable rate of evolution, the spike codon 484 in SARS-CoV-2.

Surely by any measurement of diversity or mutation rate, this codon will stand out as being highly selected for change in SARS-CoV-2 and several publications have shown why it is an important amino acid in the spike glycoprotein.

HIV-1 M group Envelope glycoprotein is noted for it’s high diversity, especially in what are known as the V1-V2 and V4 variable loops. Our database provides tools to analyze how the evolution of the virus takes place within an between the subtypes (clades, or lineages within the HIV-1 M group)

And we have prebuilt multiple sequence alignments for the HIV genomes and genes available.

I just found more discussion of entropy and events here: