 # How Shannon entropy calculated per codon?

I would like to know how Shannon entropy is calculated and by what formula and based on what characteristics for each codon in Nextstrain? And how and based on what characteristics the uncertainty is calculated for each codon?
For example, based on the number of mutations in each codon?

1 Like

It’s normalized Shannon entropy of the valid nucleotides / codons at each site. Code is here if you’d like to explore more.

2 Likes

Thanks
Can you download `ncov_global.json` in https://github.com/Developercovid/auspise.git and see it in https://auspice.us/.
for example for `Codon 220` in `protein N` number of mutation(Events) is `1` and Entropy of this codon is `0.690`.but When I calculated Entropy with formula that you send for this codon the result is `0.055`.why?

@Mahan.iz I also want to know the formula of this. Thanks for sharing wonderful info with me. This link was very helpful for me.

@Mahan.iz Hi, you have 47 sequences with the mutation, the remaining 102-47 sequences don’t have it, when taking a sequence uniformly randomly the probability that it has the mutation is p = 47/102

and the entropy is -p*ln(p)-(1-p)*ln(1-p) = 0.69006827928

1 Like