Tree building snakemake rule failed, but iqtree successfully built?

Hello all,

I’m building a tree with global data from GISAID, but got an error during the tree building step:

Error in rule tree:
jobid: 5
output: results/spike_global/tree_raw.nwk
log: logs/tree_spike_global.txt (check log file(s) for error message)
shell:

   augur tree             --alignment results/spike_global/aligned.fasta             --tree-builder-args '-ninit 10 -n 4'             --output results/spike_global/tree_raw.nwk             --nthreads 4 2>&1 | tee logs/tree_spike_global.txt

(one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
Logfile logs/tree_spike_global.txt:
Building a tree via:
iqtree -ninit 2 -n 2 -me 0.05 -nt 4 -s results/spike_global/aligned-delim.fasta -m GTR -ninit 10 -n 4 > results/spike_global/aligned-delim.iqtree.log
Nguyen et al: IQ-TREE: A fast and effective stochastic algorithm for estimating maximum likelihood phylogenies.
Mol. Biol. Evol., 32:268-274. https://doi.org/10.1093/molbev/msu300

ERROR: TREE BUILDING FAILED
Please see the log file for more details: results/spike_global/aligned-delim.iqtree.log

Building original tree took 4101.029722690582 seconds

Removing output files of failed job tree since they might be corrupted:
results/spike_global/tree_raw.nwk
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message

However when I look at the results/spike_global/aligned-delim.iqtree.log file, it looks like the tree was built:

Alignment was printed to results/spike_global/aligned-delim.fasta.uniqueseq.phy

For your convenience alignment with unique sequences printed to results/spike_global/aligned-delim.fasta.uniqueseq.phy

Create initial parsimony tree by phylogenetic likelihood library (PLL)… 28.085 seconds

NOTE: 392 MB RAM (0 GB) is required!
Estimate model parameters (epsilon = 0.500)

  1. Initial log-likelihood: -60288.255
  2. Current log-likelihood: -52884.715
    Optimal log-likelihood: -52884.294
    Rate parameters: A-C: 0.36331 A-G: 0.86950 A-T: 0.21617 C-G: 0.39216 C-T: 2.05273 G-T: 1.00000
    Base frequencies: A: 0.279 C: 0.209 G: 0.206 T: 0.305
    Parameters optimization took 2 rounds (70.383 sec)
    Computing ML distances based on estimated model parameters… 2837.491 sec
    Computing BIONJ tree…
    1474.574 seconds
    Log-likelihood of BIONJ tree: -52448.971

INITIALIZING CANDIDATE TREE SET

Generating 8 parsimony trees… 224.761 second
Computing log-likelihood of 8 initial trees … 33.582 seconds
Current best score: -52448.971

Do NNI search on 2 best initial trees
Estimate model parameters (epsilon = 0.500)
BETTER TREE FOUND at iteration 1: -52310.505
Finish initializing candidate tree set (12)
Current best tree score: -52310.505 / CPU time: 395.445
Number of iterations: 2

OPTIMIZING CANDIDATE TREE SET

TREE SEARCH COMPLETED AFTER 4 ITERATIONS / Time: 1h:23m:56s


FINALIZING TREE SEARCH

Performs final model parameters optimization
Estimate model parameters (epsilon = 0.050)

  1. Initial log-likelihood: -52310.505
    Optimal log-likelihood: -52310.497
    Rate parameters: A-C: 0.36195 A-G: 0.89554 A-T: 0.20011 C-G: 0.33955 C-T: 2.09522 G-T: 1.00000
    Base frequencies: A: 0.279 C: 0.209 G: 0.206 T: 0.305
    Parameters optimization took 1 rounds (11.039 sec)
    BEST SCORE FOUND : -52310.497
    Total tree length: 1.322

Total number of iterations: 4
CPU time used for tree search: 602.458 sec (0h:10m:2s)
Wall-clock time used for tree search: 605.088 sec (0h:10m:5s)
Total CPU time used: 5036.391 sec (1h:23m:56s)
Total wall-clock time used: 5048.890 sec (1h:24m:8s)

Analysis results written to:
IQ-TREE report: results/spike_global/aligned-delim.fasta.iqtree
Maximum-likelihood tree: results/spike_global/aligned-delim.fasta.treefile
Likelihood distances: results/spike_global/aligned-delim.fasta.mldist
Screen log file: results/spike_global/aligned-delim.fasta.log

Date and Time: Mon Apr 5 13:34:14 2021

There’s quite a bit of text above in the log file that I’m not sure is relevant, but it’s related to renaming problematic strains, e.g.

Lu’an_DELIM-QHMJMXEJNNOYGPIHOOFN_133_DELIM-QHMJMXEJNNOYGPIHOOFN_2020_DELIM-QHMJMXEJNNOYGPIHOOFN_EPI_ISL_1069210 → Lu_an_DELIM-QHMJMXEJNNOYGPIHOOFN_133_DELIM-QHMJMXEJNNOYGPIHOOFN_2020_DELIM-QHMJMXEJNNOYGPIHOOFN_EPI_ISL_1069210

…additional verbiage that follows this sort of regarding highly gapped seqs (for quite a few strains, just listing one here):

Gap/Ambiguity Composition p-value
1 Netherlands_DELIM-QHMJMXEJNNOYGPIHOOFN_Oss_1363500_DELIM-QHMJMXEJNNOYGPIHOOFN_2020_DELIM-QHMJMXEJNNOYGPIHOOFN_EPI_ISL_413581

87.23% failed 0.00%

…and finally text that lists duplicate strains (just one listed here for brevity):

NOTE: Netherlands_DELIM-QHMJMXEJNNOYGPIHOOFN_Tilburg_1363354_DELIM-QHMJMXEJNNOYGPIHOOFN_2020_DELIM-QHMJMXEJNNOYGPIHOOFN_EPI_ISL_413586 is identical to Netherlands_DELIM-QHMJMXEJNNOYGPIHOOFN_Oss_1363500_DELIM-QHMJMXEJNNOYGPIHOOFN_2020_DELIM-QHMJMXEJNNOYGPIHOOFN_EPI_ISL_413581 but kept for subsequent analysis

The warnings related to gaps are expected; I’m trying to create a spike-specific build that works by masking the majority of the sequence. The resulting .phy alignment file looks fine, and the .iqtree file looks like it contains tree info, including in newick format. But I’m not sure why the snakemake rule fails. Any ideas? Thanks!