Regarding Extracting Nucleotide Mutations

vrmarathe · April 29, 2021, 2:46pm

I am looking to extract the nucleotide mutations from my data downloaded from GISAID. How can I get the nucleotide mutations ? Which python script,command,file or procedure I need to execute to get the mutation data ?

james · April 29, 2021, 10:40pm

Hi @vrmarathe, Nextclade is a web-app which will give you mutations relative to the reference in standard numbering.

If you want to compare to another sequence, then a simple approach would be to align your sequences of interest and compare them to get the changes.

vrmarathe · April 30, 2021, 10:15am

@james I have used the next clade web application, Is there a way to do it locally for a larger dataset ? or could i run specific scripts which can do it ?

rneher · May 2, 2021, 6:28pm

yes, there is a cli that you can run locally. We just released an alpha version of a faster reimplentation:

vrmarathe · May 7, 2021, 10:44pm

@rneher Thanks for the information regarding the nextclade tool. I have used the nextclade API and it filters the data or sequences based on some precondtions . I think there are as follows :
" By default, sequences less than 27,000 bases in length or with more than 3,000 N (unknown) bases are omitted from the analysis.** For a basic QC and preliminary analysis of your sequence data, you can use clades.nextstrain.org. This tool will check your sequences for excess divergence, clustered differences from the reference, and missing or ambiguous data. In addition, it will assign nextstrain clades and call mutations relative to the reference."

I am looking for a way to take all the sequences from GISAID and run analysis based on the conditions of my research with my advisor. Is there a way to not use the precondition and filtering? I wish to use my own conditions and filtering condtions for the analysis.

james · May 9, 2021, 11:56pm

If you have access to GISAID data, the nextstrain/ncov pipeline is going to give you the most flexibility in terms of filtering & custom analysis, but has a steeper learning curve than nextalign.

vrmarathe · June 14, 2021, 4:54am

@james @rneher Thanks for the help regarding the nextclade CLI tool. I have a question, How could I output the EPI ID or Assension ID in the metadata into the nextclade output in the tsv file ? I need it connect the sequences from the metadata(from GISAID ) and the nextclade output(tsv file)?

rneher · June 25, 2021, 7:28pm

nextclade will report whatever name the sequences are given in the input fasta file. So simply name your sequences

>my-sequence|EPI_ISL_XXXXX
ACATCTCT...

and the my-sequence|EPI_ISL_XXXXX will be the name in the output

Topic		Replies	Views
Resource for representative nucleotide changes for Nextstrain clades General	1	44	August 2, 2024
Select SARS-COV-2 sequence with alpha, beta, gamma, delta mutations	6	851	January 18, 2022
Migrating from nextclade 0.* to 1.* for CLI use	0	385	June 24, 2021
Nextclade cli - shortcuts to get just seqName and Nextclade_pango for all recent GISAID samples Help and Getting Started	7	504	January 14, 2023
1 fundamental (maybe naive) question on nextStrain	1	440	May 19, 2021

Regarding Extracting Nucleotide Mutations

Related topics