I would like to know how Shannon entropy is calculated and by what formula and based on what characteristics for each codon in Nextstrain? And how and based on what characteristics the uncertainty is calculated for each codon?
For example, based on the number of mutations in each codon?
It’s normalized Shannon entropy of the valid nucleotides / codons at each site. Code is here if you’d like to explore more.


Can you download ncov_global.json in and see it in
for example for Codon 220 in protein N number of mutation(Events) is 1 and Entropy of this codon is 0.690.but When I calculated Entropy with formula that you send for this codon the result is 0.055.why?

@Mahan.iz Hi, you have 47 sequences with the mutation, the remaining 102-47 sequences don’t have it, when taking a sequence uniformly randomly the probability that it has the mutation is p = 47/102

and the entropy is -p*ln(p)-(1-p)*ln(1-p) = 0.69006827928

