Using --existing-alignment During Augur Align

Hello!

I’ve been trying to use the --existing-alignment subcommand in augur align. With smaller previous alignments the command works and the build run smoothly (less than 100 total genomes), but when I use larger previous alignments I get the following error:

ERROR: b’’
shell exited 1 when running: mafft --add results/aligned.fasta.to_align.fasta --keeplength --reorder --anysymbol --nomemsave --adjustdirection --thread 1 config/aligned.fasta.ref.fasta 1> results/aligned.fasta 2> results/aligned.fasta.log

Error during alignment
[Wed Jul 22 17:36:11 2020]
Error in rule align:
jobid: 10
output: results/aligned.fasta
shell:

    augur align             --existing-alignment config/aligned.fasta             --sequences results/filtered.fasta             --reference-sequence config/reference.gb             --output results/aligned.fasta             --remove-reference             --fill-gaps
    (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Removing output files of failed job align since they might be corrupted:
results/aligned.fasta
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message.

It’s worth noting that the build will run without error if I remove the ‘–existing-alignment’ line in the snakefile. Has anyone run into this issue before?

Hi @colejensen, welcome. :slightly_smiling_face: I expect there’s more detail about the error in the log file results/aligned.fasta.log. Can you share that with us?

Hi @trs. This is the entirety of the results/aligned.fasta.log:

“Check results/aligned.fasta.to_align.fasta”.

The aligned.fasta.to_align.fasta file is empty.

Hey,

Last time I saw that error, it was caused by a corrupted fasta file with missing new line characters, which merged two genomes, like shown below.

GGGCTATATAAACGTTTTCGCTTTTCCGTTTACGATATATAGTCTACTCTTGTGCAGAATGAATTCTCGTAACTACATAGCACAAGTAGATGTAGTTAACTTTAATCTCACATAGCAATCTTTAATCAGTGTGTAACATTAGGGAGGACTTGAAAGAGCCACCACATTTTCACCGAGGCCACGCGGAGTACGATCGAGTGTACAGTGAACAATGCTAGGGAGAGCTGCCTATATGGAAGAGCCCTAATGTGTAAAATTAATTTTAGTAGT>HCOV-19/FINLAND/5MAY46S12/2020|EPI_ISL_481700|2020-05-05NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNAGATCTGTTCTCTAAACGAACTTTAAAATCTGTGTGGCTGTCACTCGGCTGCATGCTTAGTGCACTCACGCAGTATAATTAATAACTAATTACTGTCGTTGACAGGACACGAGTAACTCGTCTATCTTCTGCAGGCTGCTTACGGTTTCGTCCGTGTTGCAGCCGATCATCAGCACATCTAGGTTTT

Hi @andersonbrito,

I can confirm that the alignment file used that produced this error doesn’t have merged genomes.

Ah, I’d bet this is what mafft errors about. The question then is:

  1. Are there really no new sequences to align? That is, results/filtered.fasta contains no sequences which aren’t in config/aligned.fasta?
  2. or, does Augur have a bug which makes it incorrectly write out an empty “to align” file?

Can you check on this? Or, if you provide the original input files to your augur align invocation, we can check. :slight_smile:

Here are the links to the inputs you asked for.



There should be new sequences to align, but I’d love a second opinion.

@trs thank you.

Hmm, there are definitely lots of new sequences to align in those files. The files also work fine for me when I run the following locally:

augur align \
  --existing-alignment ~/Downloads/aligned.fasta \
  --sequences ~/Downloads/filtered.fasta \
  --reference-sequence ~/Downloads/reference.gb \
  --output /tmp/output.fasta \
  --remove-reference \
  --fill-gaps

I don’t know if it makes a difference, but what version of Augur are you using? augur --version will report it if you’re running Augur inside Conda. If you’re using the Nextstrain CLI, nextstrain version --verbose will report it.

So far the only way I can reproduce the error you’re seeing and Mafft’s “Check results/aligned.fasta.to_align.fasta” message is by setting up inputs such that there’s nothing new to align.

Hey,

I’m not sure if you found a solution for this issue, but I often get that error when the sequences in aligned.fasta match exactly the content in filtered.fasta.

The solution: try removing one of the sequences in aligned.fasta. MAFFT will only realign that extra sequence, and move on.

ERROR: b''
    shell exited 1 when running: mafft --add results/aligned.fasta.to_align.fasta --keeplength --reorder --anysymbol --nomemsave --adjustdirection --thread 1 config/aligned.fasta.ref.fasta 1> results/aligned.fasta 2> results/aligned.fasta.log

    Error during alignment

It could have other causes, though.

Cheers