From a newbie: difficulty finding multiple coincident mutations in spike

Using the web-based nextstrain tool using (/ncov/global) I am searching for coincident mutations occurring within the SARS-CoV-2 spike protein S that authors of peer-reviewed literature claim are deposited to GISAID; for example changes to S at positions 261 and 453 from among the Dutch mink sequenced in spring/summer 2020. To do this, I type ‘genotype S 261’ in the Filter Data field and select 261 D from the pull-down menu. From there I see one genome on the genome tree that contains this S sequence change, and none from the Netherlands or among mink where I’d expected them (if I select to filter by mink hosts). Interestingly, if I search for Y453F alone at covariants.org I can link from there to auspice and filtering this build for G261D does yield mink sequences on the tree, …but when I hover over any of the dotted highlighted genomes, they do not explicitly say that that genome contains a mutation in S at 261 to D. I am new to nextstrain, and I am clearly missing something. Perhaps only the tips of the tree are shown and not the whole list of genomes? If so, how do I change that using the web-based server? Or must I build nextstrain on my local computer? Please help. Any advice is welcome.

Hi @earturo – due to the size of the data available on GISAID, each nextstrain “build” (i.e. each tree) will be showing a subset of the entire dataset – often less than 1% of the total genomes available. Our subsampling strategy depends on the aim of the build – for instance @emmahodcroft’s build covariants / S.Y435F will preferentially select samples with that mutation whereas nextstrain / ncov / global subsamples geographically.

You can filter the above covariants build to highlight genomes with a Spike 261D mutation, but again please be aware that this dataset is not representative of all genomes on GISAID.

To explicitly test the claims from your post I believe you’d have to login to GISAID, download all the data, and filter accordingly.

1 Like