Regarding Error with nextstrain - Memory Problems

I am currently using nextstrain for the analysis of the GISAID database. I am using msa_0908 for this analysis.

The error shows that the error may have been caused by out of memory problems. The VM that I am using, according to my advisor has around 200GB of memory and around 5.5TB of disk space

MSA_0908 is already aligned. Does this create a problem with nextstrain ?
image

.

My questions :

  1. How should I solve this problem related to the memory problems or something else ?

  2. MSA_0908 is already aligned. Does this create a problem with nextstrain ?

The error is as below

[Thu Oct 14 11:48:30 2021]
Job 6: Building tree

    augur tree             --alignment results/default-build/aligned.fasta             --tree-builder-args '-ninit 10 -n 4'             --output results/default-build/tree_raw.nwk             --nthreads 4 2>&1 | tee logs/tree_default-build.txt

ERROR: Shell exited from fatal signal SIGKILL when running: iqtree2 -ninit 2 -n 2 -me 0.05 -nt 4 -s results/default-build/aligned-delim.fasta -m GTR -ninit 10 -n 4 > results/default-build/aligned-delim.iqtree.log
Command output was:
/bin/bash: line 1: 40718 Killed iqtree2 -ninit 2 -n 2 -me 0.05 -nt 4 -s results/default-build/aligned-delim.fasta -m GTR -ninit 10 -n 4 > results/default-build/aligned-delim.iqtree.log
The OS may have terminated the command due to an out-of-memory condition.

Building a tree via:
iqtree2 -ninit 2 -n 2 -me 0.05 -nt 4 -s results/default-build/aligned-delim.fasta -m GTR -ninit 10 -n 4 > results/default-build/aligned-delim.iqtree.log
Nguyen et al: IQ-TREE: A fast and effective stochastic algorithm for estimating maximum likelihood phylogenies.
Mol. Biol. Evol., 32:268-274. https://doi.org/10.1093/molbev/msu300

ERROR: TREE BUILDING FAILED
Please see the log file for more details: results/default-build/aligned-delim.iqtree.log

Building original tree took 4195.915202617645 seconds
[Thu Oct 14 12:58:28 2021]
Error in rule tree:
jobid: 6
output: results/default-build/tree_raw.nwk
log: logs/tree_default-build.txt (check log file(s) for error message)
shell:

    augur tree             --alignment results/default-build/aligned.fasta             --tree-builder-args '-ninit 10 -n 4'             --output results/default-build/tree_raw.nwk             --nthreads 4 2>&1 | tee logs/tree_default-build.txt
    
    (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Logfile logs/tree_default-build.txt:

ERROR: Shell exited from fatal signal SIGKILL when running: iqtree2 -ninit 2 -n 2 -me 0.05 -nt 4 -s results/default-build/aligned-delim.fasta -m GTR -ninit 10 -n 4 > results/default-build/aligned-delim.iqtree.log
Command output was:
** /bin/bash: line 1: 40718 Killed iqtree2 -ninit 2 -n 2 -me 0.05 -nt 4 -s results/default-build/aligned-delim.fasta -m GTR -ninit 10 -n 4 > results/default-build/aligned-delim.iqtree.log**
The OS may have terminated the command due to an out-of-memory condition.

Building a tree via:
iqtree2 -ninit 2 -n 2 -me 0.05 -nt 4 -s results/default-build/aligned-delim.fasta -m GTR -ninit 10 -n 4 > results/default-build/aligned-delim.iqtree.log
Nguyen et al: IQ-TREE: A fast and effective stochastic algorithm for estimating maximum likelihood phylogenies.
Mol. Biol. Evol., 32:268-274. https://doi.org/10.1093/molbev/msu300

ERROR: TREE BUILDING FAILED
Please see the log file for more details: results/default-build/aligned-delim.iqtree.log

Building original tree took 4195.915202617645 seconds

Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: /home/vishwajeet/data/ncov/.snakemake/log/2021-10-13T040404.505628.snakemake.log

Hi @vrmarathe! We haven’t tested GISAID’s pre-aligned sequences in our workflows, but these alignments should work without much trouble. The main issue you seem to be encountering is that IQ-TREE cannot build a tree from all GISAID data without using more memory than most computers have these days.

Check out our data preparation guide for the ncov workflow, to see a couple of examples about how to filter your data to a reasonable (much smaller) set to use for phylogenetic analysis.

Hi @jlhudd. Thank you for the information. As I have mentioned the memory is more than 200GB and the space is more than 5TB in the VM. Is the memory requirements more than 200GB for the IQ-Tree ?

@vrmarathe Yes, in this case, it looks like you are trying to build a tree with all 3+ million SARS-CoV-2 genomes. IQ-TREE’s memory requirements appear to scale with the size of the multiple sequence alignment input which is now hundreds of gigabytes of uncompressed data. In addition to the memory requirements, the time required to infer a phylogeny of this size with IQ-TREE would be much longer than you’d want to wait.

If you are interested in very large tree building efforts, check out efforts from Russ Corbett et al. at UCSC like this one that uses Taxonium and USHER tools to view millions of samples in a single tree.

2 Likes