How to interpret the entropy data

juan_dc · June 7, 2022, 6:15pm

Hello
I have problems interpreting the entropy data, when I saw them I assumed that it goes from 0 to 1, but in my analyzes I find higher values, (1.1 / 1.2 / 1.3) I have read the posts on nextstrain and some papers but in general, they all limit themselves to say that entropy is a measure of diversity.

Attached screenshot

Thanks for your time
David

james · June 7, 2022, 10:08pm

Hi David,

Entropy is normalised Shannon entropy, measuring the “uncertainty” inherent in the possible nucleotides or codons at a given position.

Events represent a count of changes in the nucleotide or codon at that position across the (displayed) (sub-)tree. They rely on the ancestral state reconstruction to infer where these changes occured within the tree.

(Docs here and the code which computes it is here.)

juan_dc · June 8, 2022, 8:02pm

Hello James
then I must assume that there is an error in the calculation of my data???
Do I have to change the formula?

thanks for your answer

james · June 8, 2022, 8:06pm

We also have values > 1 (e.g.) so I’ll try to find time to take a closer look

james · June 9, 2022, 3:32am

mathematical statistics - Why am I getting information entropy greater than 1? - Cross Validated explains why we have entropy values > 1.

As an example, looking at a recent nCoV build at spike position 371 we have entropy of 1.056 which is the sum of each of the 4 observed codons: R (1177 tips / 3199 total tips) entropy=0.36, H (1378 / 3199) entropy: 0.36, L (1 / 3199) entropy: 0.00252, P (643 / 3199) entropy: 0.322.

juan_dc · June 9, 2022, 6:01pm

Thanks Thanks Thanks Thanks

mathissweet · May 4, 2023, 9:40pm

Hi James. Are each of these entropy values normalized and then added together (meaning they can be >1)? Or how are these entropy values normalized? Also, are they normalized to the whole genome/protein coding regions of the genome or normalized to the ORF they are from? Thanks!

james · May 4, 2023, 10:06pm

Hi @mathissweet - each position (AA or Nt) is computed independently. For a given position, the count of an observed residue/nuc is normalized by the number of (visible) tips, entropy is calculated, and then we report the sum of these entropies for that position. Code here if that helps.

mathissweet · May 4, 2023, 10:17pm

Awesome, thank you!!

Topic		Replies	Views
How Shannon entropy calculated per codon? Help and Getting Started	5	910	February 29, 2024
What is entropy in the https://nextstrain.org/ncov/global link? General	3	1621	December 10, 2020
Understanding divergence Help and Getting Started	2	741	September 16, 2021
Entropy panel data Help and Getting Started	1	932	June 30, 2020
Adding a new category vs topic? Entropy/Events rate of evolution Site Feedback	2	250	February 20, 2024

How to interpret the entropy data

Related topics