Regarding Extracting Nucleotide Mutations

image
I am looking to extract the nucleotide mutations from my data downloaded from GISAID. How can I get the nucleotide mutations ? Which python script,command,file or procedure I need to execute to get the mutation data ?

Hi @vrmarathe, Nextclade is a web-app which will give you mutations relative to the reference in standard numbering.

If you want to compare to another sequence, then a simple approach would be to align your sequences of interest and compare them to get the changes.

@james I have used the next clade web application, Is there a way to do it locally for a larger dataset ? or could i run specific scripts which can do it ?

yes, there is a cli that you can run locally. We just released an alpha version of a faster reimplentation:

@rneher Thanks for the information regarding the nextclade tool. I have used the nextclade API and it filters the data or sequences based on some precondtions . I think there are as follows :
" By default, sequences less than 27,000 bases in length or with more than 3,000 N (unknown) bases are omitted from the analysis.** For a basic QC and preliminary analysis of your sequence data, you can use clades.nextstrain.org. This tool will check your sequences for excess divergence, clustered differences from the reference, and missing or ambiguous data. In addition, it will assign nextstrain clades and call mutations relative to the reference."

I am looking for a way to take all the sequences from GISAID and run analysis based on the conditions of my research with my advisor. Is there a way to not use the precondition and filtering? I wish to use my own conditions and filtering condtions for the analysis.

If you have access to GISAID data, the nextstrain/ncov pipeline is going to give you the most flexibility in terms of filtering & custom analysis, but has a steeper learning curve than nextalign.

@james @rneher Thanks for the help regarding the nextclade CLI tool. I have a question, How could I output the EPI ID or Assension ID in the metadata into the nextclade output in the tsv file ? I need it connect the sequences from the metadata(from GISAID ) and the nextclade output(tsv file)?

nextclade will report whatever name the sequences are given in the input fasta file. So simply name your sequences

>my-sequence|EPI_ISL_XXXXX
ACATCTCT...

and the my-sequence|EPI_ISL_XXXXX will be the name in the output