Tree building snakemake rule failed, but iqtree successfully built?

fshepherd · April 5, 2021, 7:36pm

Hello all,

I’m building a tree with global data from GISAID, but got an error during the tree building step:

Error in rule tree:
jobid: 5
output: results/spike_global/tree_raw.nwk
log: logs/tree_spike_global.txt (check log file(s) for error message)
shell:
   augur tree             --alignment results/spike_global/aligned.fasta             --tree-builder-args '-ninit 10 -n 4'             --output results/spike_global/tree_raw.nwk             --nthreads 4 2>&1 | tee logs/tree_spike_global.txt
(one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
Logfile logs/tree_spike_global.txt:
Building a tree via:
iqtree -ninit 2 -n 2 -me 0.05 -nt 4 -s results/spike_global/aligned-delim.fasta -m GTR -ninit 10 -n 4 > results/spike_global/aligned-delim.iqtree.log
Nguyen et al: IQ-TREE: A fast and effective stochastic algorithm for estimating maximum likelihood phylogenies.
Mol. Biol. Evol., 32:268-274. IQ-TREE: A Fast and Effective Stochastic Algorithm for Estimating Maximum-Likelihood Phylogenies | Molecular Biology and Evolution | Oxford Academic

ERROR: TREE BUILDING FAILED
Please see the log file for more details: results/spike_global/aligned-delim.iqtree.log

Building original tree took 4101.029722690582 seconds

Removing output files of failed job tree since they might be corrupted:
results/spike_global/tree_raw.nwk
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message

However when I look at the results/spike_global/aligned-delim.iqtree.log file, it looks like the tree was built:

Alignment was printed to results/spike_global/aligned-delim.fasta.uniqueseq.phy

For your convenience alignment with unique sequences printed to results/spike_global/aligned-delim.fasta.uniqueseq.phy

Create initial parsimony tree by phylogenetic likelihood library (PLL)… 28.085 seconds

NOTE: 392 MB RAM (0 GB) is required!
Estimate model parameters (epsilon = 0.500)

Initial log-likelihood: -60288.255

Current log-likelihood: -52884.715
Optimal log-likelihood: -52884.294
Rate parameters: A-C: 0.36331 A-G: 0.86950 A-T: 0.21617 C-G: 0.39216 C-T: 2.05273 G-T: 1.00000
Base frequencies: A: 0.279 C: 0.209 G: 0.206 T: 0.305
Parameters optimization took 2 rounds (70.383 sec)
Computing ML distances based on estimated model parameters… 2837.491 sec
Computing BIONJ tree…
1474.574 seconds
Log-likelihood of BIONJ tree: -52448.971

INITIALIZING CANDIDATE TREE SET

Generating 8 parsimony trees… 224.761 second

Computing log-likelihood of 8 initial trees … 33.582 seconds

Current best score: -52448.971

Do NNI search on 2 best initial trees
Estimate model parameters (epsilon = 0.500)
BETTER TREE FOUND at iteration 1: -52310.505
Finish initializing candidate tree set (12)
Current best tree score: -52310.505 / CPU time: 395.445
Number of iterations: 2

OPTIMIZING CANDIDATE TREE SET

TREE SEARCH COMPLETED AFTER 4 ITERATIONS / Time: 1h:23m:56s

FINALIZING TREE SEARCH

Performs final model parameters optimization

Estimate model parameters (epsilon = 0.050)

Initial log-likelihood: -52310.505
Optimal log-likelihood: -52310.497
Rate parameters: A-C: 0.36195 A-G: 0.89554 A-T: 0.20011 C-G: 0.33955 C-T: 2.09522 G-T: 1.00000
Base frequencies: A: 0.279 C: 0.209 G: 0.206 T: 0.305
Parameters optimization took 1 rounds (11.039 sec)
BEST SCORE FOUND : -52310.497
Total tree length: 1.322

Total number of iterations: 4
CPU time used for tree search: 602.458 sec (0h:10m:2s)
Wall-clock time used for tree search: 605.088 sec (0h:10m:5s)
Total CPU time used: 5036.391 sec (1h:23m:56s)
Total wall-clock time used: 5048.890 sec (1h:24m:8s)

Analysis results written to:
IQ-TREE report: results/spike_global/aligned-delim.fasta.iqtree
Maximum-likelihood tree: results/spike_global/aligned-delim.fasta.treefile
Likelihood distances: results/spike_global/aligned-delim.fasta.mldist
Screen log file: results/spike_global/aligned-delim.fasta.log

Date and Time: Mon Apr 5 13:34:14 2021

There’s quite a bit of text above in the log file that I’m not sure is relevant, but it’s related to renaming problematic strains, e.g.

Lu’an_DELIM-QHMJMXEJNNOYGPIHOOFN_133_DELIM-QHMJMXEJNNOYGPIHOOFN_2020_DELIM-QHMJMXEJNNOYGPIHOOFN_EPI_ISL_1069210 → Lu_an_DELIM-QHMJMXEJNNOYGPIHOOFN_133_DELIM-QHMJMXEJNNOYGPIHOOFN_2020_DELIM-QHMJMXEJNNOYGPIHOOFN_EPI_ISL_1069210

…additional verbiage that follows this sort of regarding highly gapped seqs (for quite a few strains, just listing one here):

Gap/Ambiguity Composition p-value
1 Netherlands_DELIM-QHMJMXEJNNOYGPIHOOFN_Oss_1363500_DELIM-QHMJMXEJNNOYGPIHOOFN_2020_DELIM-QHMJMXEJNNOYGPIHOOFN_EPI_ISL_413581

87.23% failed 0.00%

…and finally text that lists duplicate strains (just one listed here for brevity):

NOTE: Netherlands_DELIM-QHMJMXEJNNOYGPIHOOFN_Tilburg_1363354_DELIM-QHMJMXEJNNOYGPIHOOFN_2020_DELIM-QHMJMXEJNNOYGPIHOOFN_EPI_ISL_413586 is identical to Netherlands_DELIM-QHMJMXEJNNOYGPIHOOFN_Oss_1363500_DELIM-QHMJMXEJNNOYGPIHOOFN_2020_DELIM-QHMJMXEJNNOYGPIHOOFN_EPI_ISL_413581 but kept for subsequent analysis

The warnings related to gaps are expected; I’m trying to create a spike-specific build that works by masking the majority of the sequence. The resulting .phy alignment file looks fine, and the .iqtree file looks like it contains tree info, including in newick format. But I’m not sure why the snakemake rule fails. Any ideas? Thanks!

Topic		Replies	Views
Error: tree building failed Help and Getting Started	4	1830	March 15, 2021
Problem with iqtree	1	793	September 23, 2021
Tree building failed	2	899	February 21, 2022
Error message executing new tutorial Help and Getting Started	11	1625	July 16, 2020
Error in "main_workflow.smk" file General	9	1272	October 29, 2021

Tree building snakemake rule failed, but iqtree successfully built?

Do NNI search on 2 best initial trees Estimate model parameters (epsilon = 0.500) BETTER TREE FOUND at iteration 1: -52310.505 Finish initializing candidate tree set (12) Current best tree score: -52310.505 / CPU time: 395.445 Number of iterations: 2

Related topics

Do NNI search on 2 best initial trees
Estimate model parameters (epsilon = 0.500)
BETTER TREE FOUND at iteration 1: -52310.505
Finish initializing candidate tree set (12)
Current best tree score: -52310.505 / CPU time: 395.445
Number of iterations: 2