SARS-CoV-2 Mutation Data

My name is Pat Foster and I am retired from the Biology Department at Indiana University, Bloomington, IN, USA. I am a microbiologist interested in mutational processes and their consequences. In particular, for the last several years I have been collecting mutational data from E. coli and found that mutations can be predicted from sequence context. I would like to have a look at the SARS-CoV-2 mutation data base to see if there are similar sequence biases. Where can I get the data with all the mutations to date?

Hi @plfoster! Thanks for reaching out! We don’t have a way to provide the file with all the mutations directly. However, one could parse this out of the existing data.

All of our data comes from GISAID, and if you sign up for an account there, you should be able to download the sequences. One could either parse the mutations from this, or if you run our ncov pipeline you should be able to generate intermediate files which contain the mutations (for example, nt_muts.json) - though it’s worth noting this is done along the tree, so to get the total number of mutations in a sequence, you’d need to parse along each branch of the tree and ‘count up’ the mutations leading to a tip/sequence.

If you use the ncov buid it’s also worth noting that our builds are by default subsamples of the total dataset, so you wouldn’t have data for the whole dataset!

We provide the data behind the trees shown at via GISAID (the terms of sequence sharing don’t allow us to share this data directly). If you sign up, you can then go to as shown here and download “nextregions” and then “Global” to download a JSON files that will list all the mutations observed on the tree.

from a vaccine point of view can anyone make me understand that the phylogenetically strain of SARS CoV2 has how many variants?
All the vaccine are targetting towards the Spike protein as an antigen to induce antibodies except the inactivated vaccine. I can c there is an entropy peak, what does it mean?