Building a Custom Nextstrain - Alignment Error

I am running in Docker on Windows, I have played the tutorials (they worked just fine).

I am testing Nextstrain with a custom bacterial database at the moment with a small set of genomes (see attach), a ref seq, and a meta_file.

When I rerun the commands (as in the tutorial):

augur index --sequences data/genomes.fna --output results/sequence_index.tsv

I get the index file.

But when I want to run the alignment using mafft I get the following error:

$ augur align --sequences data/genomes.fna --reference-sequence data/ref.fna --output results/aligned.fasta --fill-gaps

using mafft to align via:
        mafft --reorder --anysymbol --nomemsave --adjustdirection --thread 1 results/aligned.fasta.to_align.fasta 1> results/aligned.fasta 2> results/aligned.fasta.log

        Katoh et al, Nucleic Acid Research, vol 30, issue 14
        https://doi.org/10.1093%2Fnar%2Fgkf436


ERROR: Shell exited 1 when running: mafft --reorder --anysymbol --nomemsave --adjustdirection --thread 1 results/aligned.fasta.to_align.fasta 1> results/aligned.fasta 2> results/aligned.fasta.log

Error during alignment: please see the log file 'results/aligned.fasta.log' for more details

I would upload my test files and logs, but as a new user I cant at the moment =/

Best Regards and thanks in advance!

Max

This is my error in the log file

Summary
/usr/local/bin/mafft: line 2747:  1966 Killed                  "$prefix/disttbfast" -q $npickup -E $cycledisttbfast -V "-"$gopdist -s $unalignlevel $legacygapopt $mergearg -W $tuplesize $termgapopt $outnum $addarg $add2ndhalfarg -C $numthreads-$numthreadstb $memopt $weightopt $treeinopt $treeoutopt $distoutopt $seqtype $model -g $gexp -f "-"$gop -Q $spfactor -h $aof $param_fft $algopt $treealg $scoreoutarg $anchoropt -x $maxanchorseparation $oneiterationopt < infile > pre 2>> "$progressfile"

done.
reallocating...
done.
reallocating...
done.
reallocating...
done.
reallocating...
done.
reallocating...
done.
reallocating...
done.
reallocating...
done.
reallocating...
done.
reallocating...
done.
reallocating...
done.
reallocating...
done.
generating a scoring matrix for nucleotide (dist=200) ... done

Step 1/2
1   

Step 2/2
 1 / 7 (1 threads)   
makedirectionlist (nuc) Version 7.475
alg=m, model=DNA200 (2), 1.53 (4.59), 0.37 (1.11), noshift, amax=0.0
1 thread(s)

directionfile = _direction
inputfile = infile
subalignment = 0
subalignmentoffset = 0
reallocating...
done.
reallocating...
done.
reallocating...
done.
reallocating...
done.
reallocating...
done.
reallocating...
done.
reallocating...
done.
7 x 5230115 - 5213322 d
reallocating...
done.
reallocating...
done.
reallocating...
done.
reallocating...
done.
reallocating...
done.
reallocating...
done.
reallocating...
done.
directionfile = _direction
inputfile = orig
subalignment = 0
subalignmentoffset = 0
reallocating...
done.
reallocating...
done.
reallocating...
done.
reallocating...
done.
reallocating...
done.
reallocating...
done.
reallocating...
done.
7 x 5230115 - 5213322 d
reallocating...
done.
reallocating...
done.
reallocating...
done.
reallocating...
done.
reallocating...
done.
reallocating...
done.
reallocating...
done.
nthread = 1
nthreadpair = 1
nthreadtb = 1
ppenalty_ex = 0
reallocating...
done.
reallocating...
done.
reallocating...
done.
reallocating...
done.
reallocating...
done.
reallocating...
done.
reallocating...
done.
stacksize: 8192 kb
reallocating...
done.
reallocating...
done.
reallocating...
done.
reallocating...
done.
reallocating...
done.
reallocating...
done.
reallocating...
done.
generating a scoring matrix for nucleotide (dist=200) ... done
Gap Penalty = -1.53, +0.00, +0.00



Making a distance matrix ..

    1 / 7 (thread    0)
done.

Constructing a UPGMA tree (efffree=0) ... 

    0 / 7
done.

Progressive alignment 1/2... 

STEP     1 / 6 (thread    0) f

@NyxMoiren Welcome! Glad to hear the tutorials worked fine, but sorry that you’re running into this problem.

From your logs, this line is indicative of the problem:

/usr/local/bin/mafft: line 2747:  1966 Killed …

It tells us that a subprocess (pid 1966) that mafft was running was terminated by the operating system. Usually this happens due to out-of-memory conditions, during which the OS will forcibly terminate processes using the most memory to reduce memory pressure.

How much memory does your computer have available?

Can you share more details about the size of the data you’re working with (number of sequences, typical sequence length, etc) or the data itself (data/genomes.fna, data/ref.fna)?

Ah, and because this is Docker on Windows, there will be a separate (and smaller, by default) memory limit than the rest of your computer (which is limited by physical memory installed). Docker’s limit can be adjusted upwards if necessary. Running nextstrain check-setup docker will output a quick diagnostic check of the memory limit.

Good Morning!

I have ran

nextstrain check-setup docker

nextstrain-cli is up to date!

Testing your setup…

docker is supported

:heavy_check_mark: yes: docker is installed
:heavy_check_mark: yes: docker run works
:heavy_check_mark: yes: containers have access to >2 GiB of memory (limit is 7.8 GiB)
:heavy_check_mark: yes: image is new enough for this CLI version

Supported Nextstrain runtimes: docker

All good! Default runtime (docker) is supported.

I am trying to build a world distribution of B. anthracis (bacteria, 5.2MB genome size, monophyletic ie highly clonal). I already discovered that i cant use the plasmid sequences and have to work just with the chromosome.

I actually tried to setup a .wslconfig file in %UserProfile% (used notepad.exe to open the thing) and I wrote that inside (as I have 16 GB RAM, 8 Processors á 3.80 GHz)

[wsl2]
memory=12GB
processors=6
swap=10GB

But this did not change anything.

For my data:

These are my meta_data files. structure of the .fna containing all the sequences and the ref.-file

Hmm. Did you wait long enough before testing, or manually stop and restart the WSL2 VM, per Microsoft’s documentation?

For a 5.8 Mbase genome with the 7.8 GiB of memory you have available, you may need to use MAFFT’s “memory save” mode, which unfortunately augur align unconditionally disables with the --nomemsave option. Disabling it improves alignment times, but requires more working memory. As an alternative to running augur align, at least until it’s imporved, you could try running mafft directly using a version of the command line invocation printed by Augur. You’re also welcome to use other alignment tools. Subsequent Augur commands that take an alignment as input don’t require that you used augur align specifically.