Question about mafft alignment using a reference sequence

Hi,
I add the following arguments to augur align:

        augur align \
            --sequences {input.sequences} \
            --reference-sequence {input.reference} \
            --nthreads 10 \
            --method mafft \
            --output {output.alignment} \
            --fill-gaps

And the exctuted command looks like:
mafft --reorder --anysymbol --nomemsave --adjustdirection --thread 10 nextstrain_results/aligned.fasta.to_align.fasta 1> nextstrain_results/aligned.fasta 2> nextstrain_results/aligned.fasta.log

I read from the documentation that adding the argument --reference-sequence will strip insertions relative to the reference sequence. But is this happening outside of mafft? Are all input sequences first aligned using mafft and then insertions relative to the reference are stripped? Or are all sequences aligned only to the reference, one by one, and insertsions removed?

Thanks,

Jon

1 Like

Hi Jon, thanks for writing and sorry for the slow response.

Are all input sequences first aligned using mafft and then insertions relative to the reference are stripped?

Yes. augur align runs mafft and then processes the MAFFT output to strip sites relative to reference. We can see this call to strip_non_reference in the codebase here: github.com/nextstrain/augur/blob/master/augur/align.py#L179.

Thanks for the clarification!