How to interpret the entropy data

I have problems interpreting the entropy data, when I saw them I assumed that it goes from 0 to 1, but in my analyzes I find higher values, (1.1 / 1.2 / 1.3) I have read the posts on nextstrain and some papers but in general, they all limit themselves to say that entropy is a measure of diversity.

Attached screenshot

Thanks for your time

Hi David,

Entropy is normalised Shannon entropy, measuring the “uncertainty” inherent in the possible nucleotides or codons at a given position.

Events represent a count of changes in the nucleotide or codon at that position across the (displayed) (sub-)tree. They rely on the ancestral state reconstruction to infer where these changes occured within the tree.

(Docs here and the code which computes it is here.)

Hello James
then I must assume that there is an error in the calculation of my data???
Do I have to change the formula?

thanks for your answer

We also have values > 1 (e.g.) so I’ll try to find time to take a closer look

1 Like

mathematical statistics - Why am I getting information entropy greater than 1? - Cross Validated explains why we have entropy values > 1.

As an example, looking at a recent nCoV build at spike position 371 we have entropy of 1.056 which is the sum of each of the 4 observed codons: R (1177 tips / 3199 total tips) entropy=0.36, H (1378 / 3199) entropy: 0.36, L (1 / 3199) entropy: 0.00252, P (643 / 3199) entropy: 0.322.

1 Like

Thanks Thanks Thanks Thanks :slightly_smiling_face: