Trends in the prevalence of "private mutations"

Hi All. I have an interest in the evolution of respiratory viruses - my co-author and I published a review article in Reviews in Medical Virology last year.

I have a question about the SARS-CoV-2 strains that are excluded from Nextstrain because they have too many private mutations. I’m interested to know if the proportion of excluded strains increases during surges in C-19, and whether it decreases in the lulls.

The reason I’m interested in this is explained in the thread below - originally a Twitter thread.

I would also consider writing an article on this if anyone is interested.

All comments and constructive criticism would be much appreciated - please let me know if I’m missing something important!

Thx Patrick

We’re told that the genome of CoV-2 is so large it needs a proofreading function. Large RNA viruses are said to live on the edge of ‘error catastrophe’, where a small increase in mutation rate would drastically decrease the odds of stable replication

But what does the evolution of the proofreading mechanism itself look like? That’s not obvious

We know the CoV-2 proofreading function reduces mutation by a factor of ~ 20. It includes a protein called ExoN/NSP14.

But some mutations still get through - so what happens if a mutation lands in eg ExoN itself? That’s likely to increase the mutation rate, including more mutations landing in ExoN

On the other hand, it seems likely that a higher mutation rate would benefit the virus in the short-term. There’s no shortage of mutations within each infected human host, but many fewer mutations are likely to be breathed out and actually make the journey to another host

(Bear in mind also that mutations that increase the chance of transmission might not be selected within the host - in fact the opposite may well be the case – many can only be selected during transmission)

So, with more mutations, it seems likely that our low-fidelity mutant could adapt more quickly to new opportunities than the regular strains

So - the first problem is to explain how this proof-reading mechanism could have evolved and how it could be maintained.

An analogy that’s used is cancer. Cancer cells are favored by “natural selection” in the short-term, but they have no long-term future. It’s a bit like a human situation where you might say “a few selfish people are going to spoil this for everyone”

So how does this work and what would the outward appearance of a low-fidelity mutant be during an epidemic?

Presumably it would generate a surge of cases, outcompeting the high-fidelity strains. However, it would then be unable to stop mutating, and (a bit like in cancer) mutations would accumulate in vital viral functions, so the surge would be followed by a collapse


At that point the hi-fi strains would once again be selected, and we would be back with a more stable strain, maybe now with one or two beneficial new mutations (the hi-fi strains still evolve, slowly)

I can’t see how these “cancerous” surges can be avoided, and I can’t see how the resulting cases profile can look like anything other than repeated cycles of “surge and collapse”

It just happens this is exactly what we see with C19 – impressive & unpredictable surges, followed by very rapid peaking & collapse. (I know there’re other explanations **see below.) Here⬇️are cases in India & S Africa – countries where control measures/lockdowns may be difficult

So here’s the problem: if the mutation rate is fluctuating, why don’t we see⬇️an increased number of mutations in eg during surges, and why doesn’t the number of mutations fall back when the hi-fi strains come through at the end of the surge?

Amazingly, there is a possible answer to this question: NextStrain has a policy of excluding any strain with an unusual number of what they call “private mutations” – those that differ between the query sequence and the nearest neighbor sequence

So here’s my prediction: the proportion of sequences excluded by NextStrain goes up during surges, and falls in the lulls.

**BTW I know there are other explanations of the extraordinary Covid peaks and monotonic (continuous) falls, such as increasing immunity, behavioural changes in hosts, and non-linear percolation effects

However, IMO these explanations can’t work well: can we believe that immunity and behaviour changed so dramatically that R in South Africa fell from > 1.2 (pink band) to < 0.8 (blue band) in 11 days? We need something really powerful to explain these reversals

And bear in mind that immunity was not high after this particular surge - it was followed by three similar surges in the following months/years!

1 Like

I was only allowed one picture. I wanted to show this

Also this

One more

Well, 65 people have looked at the thread, but no replies after 12 days. This is very important - doesn’t anyone have thoughts on this?

I’ve been thinking about the conventional explanations of these extraordinary surges and collapses

(1) Increasing immunity cannot plausibly explain why cases collapse at the end of e.g. the first surge in South Africa - when it was followed by several larger surges

1 Like

(2) Behavioural changes are not well-correlated with cases. For example in December 2021 cases in South Africa first surged, then collapsed, although mobility was increasing during the collapse⬇️

(3) Non-linear percolation effects can explain some of these peaks, but it’s hard to see how they could routinely generate monotonic rises followed by monotonic falls. In an extreme case of non-linearity I’d expect to see curves more like this (based on data from the London Stock Exchange):arrow_down:

I think we need a biological mechanism to explain these extraordinary surges and collapses⬇️


1 Like

thanks for sharing these thoughts. The challenge I see with testing this idea that whenever a new variant pops up, there tend to be problems with sequencing quality as the amplification schemes don’t work as well for the new variant. This generates a lot of noise that will be hard to tease apart from a biological effect.


Thank you for your very interesting reply.

So one interpretation is that scientists have already seen the effect I’m postulating but have misinterpreted it.

(I know very little about sequencing but) once primers etc have been sorted out it should be possible to go back to early samples and see whether the reason for the “noise” really was a lack of good amplification or noise in the actual sequences.

Is the point here that more rounds of amplification tend to be needed for new variants? If so I guess a rough calculation based on the error rate of PCR could be done to see if stuttering in the early cycles of amplification can explain the extra mutations seen.

They’re not the most dramatic peaks, but the USA and Israel :arrow_down: in the middle of Delta :arrow_down: might contain a lot of low-fidelity strains.

Presumably your collaborators who provide the sequence data understand your policy of excluding noisy-looking sequences, so probably they don’t in fact submit strains with many private mutations, in which case we may need to ask the people in the lab who generate or select the sequences to be submitted whether they have noticed this correlation.

I notice that a lot of the outliers at the top of the divergence plot come from far-flung places such as Mauritania, Bolivia, Kosovo. Is there a reason for that?

1 Like

[I deleted the wrong post - never mind]

I’ve just realized what might have happened to SARS-CoV-1

And also, perhaps, why it’s quite rare for spillovers to become endemic

Maybe CoV-1 got down to just a few strains that had lost their proof-reading function and so had no future

This may also explain why CoV-1’s mutation rate :arrow_down: was roughly an order of magnitude greater than the black regression line on the Nextstrain CoV-2 “Clock” plot @