Augur align insertions output - clarification

Dana · August 23, 2021, 8:06am

Hi,
I am using augur align for multiple sequence alignment,
and I am not sure about the insertions csv output file format.

each column that represents an insertion has a title as follows: Xbp @ ref pos Y
for example:
insertion: 2706bp @ ref pos 1968.

what does the Xbp stand for?
In some cases it matches the nucleotide fragment length that was inserted in that sample, but in other cases it does not.

Thank you in advance,
Dana.

rneher · August 29, 2021, 6:34pm

Hi Dana,

could you provide more detail? how exactly are you running this? I thought the X bp should match the length of the insertion.

best,
richard

Dana · August 30, 2021, 5:58am

I am running augur align on a fasta file that contains multiple consensus sequences.
I run the command:

augur align \
--sequences not_aligned.fasta \
--reference-sequence REF_NC_045512.2.fasta \
--output aligned.fasta

The alignment output file turns out ok.
Example of insertions csv output:

Thank you (:

james · August 31, 2021, 1:37am

Hi @dana. I believe this is because we remove “-”, “N” and “?” characters from the insertion before reporting it. So in this case it looks like the 739bp insertion is largely due to missing data.

I don’t think we have an easy way for augur align to produce an alignment which does not remove insertions, but you could try running the following and then examining the alignment file itself to see exactly what the insertion in strain “18925” is

mafft --reorder --anysymbol --nomemsave --adjustdirection --thread <num_threads> <input_fasta> > <output_fasta>

rneher · August 31, 2021, 7:45pm

You might also be interested in nextalign which does reference alignments (and translations). We are using this for large SARS-CoV-2 alignments. It reports insertions in a similar way. See here for details:

https://docs.nextstrain.org/projects/nextclade/en/latest/user/nextalign-cli.html

Topic		Replies	Views
Question about mafft alignment using a reference sequence	2	319	June 14, 2024
Error with a flu reference sequence for alignment Help and Getting Started	8	308	March 27, 2024
Updated example command needed for updated GISAID file	4	563	August 30, 2021
Using --existing-alignment During Augur Align Help and Getting Started	8	1367	October 10, 2020
Error: unknown sequence format for Augur tree Help and Getting Started	2	488	March 27, 2024

Augur align insertions output - clarification

Related topics