I am new to this nextstrain. As I was looking through, I found the “show entropy” button in this link. https://nextstrain.org/ncov/global By trying out, I know roughly what it corresponds to. I know entropy in communication system roughly means to quantify the disorder of a system. People usually say the entropy of a probability distribution and there is a formula corresponding to it, and it has the maximum value when it is a uniform distribution, meaning no info is given at all, and all possibility is possible. I was confused here. And I was just wondering, what is the definition of it? or is there a definition.

The graph shows the entropy of an alignment column and measures the diversity of the viruses at a particular position in their genome. If for example a fraction `p`

had base `A`

and `1-p`

base `G`

at position `x`

, the entropy would be `-p log(p) - (1-p) log(1-p)`

. If there are more than two states, this generalizes to `-sum(p_i log(p_i))`

where `i`

runs over all states at the position.

Dear Richard,

I just joined this community and actually have a question which is somewhat related to the question by emtf.

My interest is in the number that is termed “event”

I am interested in comparing the presence and absence of mutations at certain locations in the genome and want to use your database to do it. Let us just say that the locations I am interested in are nt 222, nt 224 and nt 23403.

I have selected the events and the nt in the diversity panel on your website (https://nextstrain.org/ncov/global) and downloaded the genetic diversity data (TSV). That file gives the locations and a number called “event” for each location.

For example in today’s data set the event numbers for

222 is 11

224 in not mentioned, which I then would understand that at that location there were no mutations ever reported.

23403 is 7.

Since 23403 is a very frequent mutation site, the “event” number is not simply the frequency of mutations at that location. Would you please explain what the number “event” means, or if there is an article that I should read which explains it, I would appreciate your kindness to point me to the right direction.

Hey Agnes - **Events** represent a count of changes in the nucleotide or codon at that position across the (displayed) (sub-)tree. They rely on the ancestral state reconstruction to infer where these changes occured within the tree.

(somewhat hidden in the docs… https://docs.nextstrain.org/en/latest/guides/share/download-data.html#diversity-entropy-data)