Iqtree error: Tree taxa and alignment sequence do not match

Hi all,
Suddenly I get an error with IQTree that I have not seen before when using Nextstrain:

[Fri Oct 25 10:57:05 2024]
Job 3: Building tree
Reason: Missing output files: nextstrain_results/tree_raw.nwk; Input files updated by another job: nextstrain_results/aligned.fasta


        augur tree             --alignment nextstrain_results/aligned.fasta             --output nextstrain_results/tree_raw.nwk             --method iqtree             --override-default-args             --substitution-model GTR+F+I+G4             --nthreads 10             --tree-builder-args '-ninit 100 -n 100'
        
Building a tree via:
        iqtree -ntmax 10 -s nextstrain_results/aligned-delim.fasta -m GTR+F+I+G4 -ninit 100 -n 100 > nextstrain_results/aligned-delim.iqtree.log
        Nguyen et al: IQ-TREE: A fast and effective stochastic algorithm for estimating maximum likelihood phylogenies.
        Mol. Biol. Evol., 32:268-274. https://doi.org/10.1093/molbev/msu300


ERROR: Shell exited 2 when running: iqtree -ntmax 10 -s nextstrain_results/aligned-delim.fasta -m GTR+F+I+G4 -ninit 100 -n 100 > nextstrain_results/aligned-delim.iqtree.log
Command output was:
  ERROR: Alignment sequence KX621511 does not appear in the tree
  ERROR: Alignment sequence EU482844 does not appear in the tree
  ERROR: Alignment sequence EU155233 does not appear in the tree
  ERROR: Alignment sequence EU155347 does not appear in the tree
  ERROR: Alignment sequence EU155351 does not appear in the tree
  ERROR: Alignment sequence EU255928 does not appear in the tree
  ....
  ERROR: Alignment sequence MN164857 does not appear in the tree
  ERROR: Alignment sequence MN164862 does not appear in the tree
  ERROR: Tree taxa and alignment sequence do not match (see above)

ERROR: TREE BUILDING FAILED
ERROR: Command '['/bin/bash', '-c', 'set -euo pipefail; iqtree -ntmax 10 -s nextstrain_results/aligned-delim.fasta -m GTR+F+I+G4 -ninit 100 -n 100 > nextstrain_results/aligned-delim.iqtree.log']' returned non-zero exit status 2.
Please see the log file for more details: nextstrain_results/aligned-delim.iqtree.log

Building original tree took 1.5041491985321045 seconds
[Fri Oct 25 10:57:07 2024]
Error in rule tree:
    jobid: 3
    input: nextstrain_results/aligned.fasta
    output: nextstrain_results/tree_raw.nwk
    shell:
        
        augur tree             --alignment nextstrain_results/aligned.fasta             --output nextstrain_results/tree_raw.nwk             --method iqtree             --override-default-args             --substitution-model GTR+F+I+G4             --nthreads 10             --tree-builder-args '-ninit 100 -n 100'
        
        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: .snakemake/log/2024-10-25T105551.303109.snakemake.log

The problem is mentioned in earlier versions of IQTree, but should have been fixed in the version used in Nextstrain. See here: Another issue with ModelFinder: "ERROR: Alignment sequence XXX does not appear in the tree" · Issue #47 · iqtree/iqtree2 · GitHub

I also tried adding -keep-ident to --tree-builder-args but get the same error.

I can’t really understand what is going on? The missing sequences are in the aligned-delim.fasta and the error message appears very early after IQtree is started.

I’m sorry you’re running into this issue, @jonr. It looks like we’ve experienced this in the past, too.

Are you able to share your alignment FASTA here as an attachment? If you are using the Nextstrain CLI, can you share the output of this command, too? (Note that this can take a while to finish running depending on your environment.)

nextstrain version --verbose

If you’re managing your own environment, can you share the output of this command?

iqtree --version

Thanks for answering!

I use the Nextstrain CLI and heres the version output:

nextstrain.cli 8.5.3

Python
  /home/jonr/mambaforge/envs/NEXTSTRAIN/bin/python3.11
  3.11.0 | packaged by conda-forge | (main, Oct 25 2022, 06:24:40) [GCC 10.4.0]

Runners
  docker (default)
    nextstrain/base:build-20241023T213105Z (cd13b330b671, 2024-10-23 23:40:05 +0200 CEST)
    augur 26.0.0
    auspice v2.59.1
    fauna 23e57d6
    sacra not present

  conda 
    nextstrain-base unknown

  singularity 
    docker://nextstrain/base (not present)

  ambient 
    unknown

  aws-batch 
    unknown

This is surveillance data so I can’t share the alignment openly, but I’ll send it to you directly.

By the way, I also get the same error running IQTree versions 2.2.2.3 and 2.3.6 outside of the Nextstrain environment.

And when I run Mafft on the filtered.fasta and then Iqtree I don’t get this error…

Sorry for the delay in responding, @jonr, but I’ve tried out your augur tree command above with the data you sent and the same Docker image (nextstrain/base:build-20241023T213105Z) and couldn’t replicate the error. It sounds like you’ve tried a couple different paths through the augur tools and the corresponding standalone tools, but I can’t pinpoint the issue yet.

One possible issue is that IQ-TREE has historically changed sequence names that contain specific invalid characters which augur tree tries to work around by temporarily rewriting the sequence names in the input alignment with placeholder characters that IQ-TREE accepts and then changing those names back to their original values when the tree is built. This process is what produces the aligned-delim.fasta for a given aligned.fasta.

You mentioned that you noticed the same error when running IQ-TREE outside of Nextstrain. Did you run IQ-TREE on the original aligned.fasta or on the aligned-delim.fasta? Can you try both inputs to IQ-TREE separately just to confirm? If you still see the same error with the aligned.fasta input when you skip augur tree and use IQ-TREE directly, that helps us rule out augur tree at least. If you only see the error for the aligned-delim.fasta, that points to augur tree causing the issue.

To make sure you’re always using the same IQ-TREE version for these tests, you can jump into a Nextstrain shell with:

nextstrain shell --docker .

And then you can run iqtree like you would with the standalone binary installation. When you run iqtree --version from this shell, you should see something like:

IQ-TREE multicore version 2.3.5.cmaple COVID-edition for Linux x86 64-bit built May 27 2024
Developed by Bui Quang Minh, James Barbetti, Nguyen Lam Tung, Olga Chernomor,
Heiko Schmidt, Dominik Schrempf, Michael Woodhams, Ly Trong Nhan, Thomas Wong

Separately, if you want to also send me the original aligned.fasta that goes with the aligned-delim.fasta that you sent, I could try again to recreate the problem from the file that hasn’t yet been processed by augur tree.

Thanks again,
Running IQTREE (v 2.3.5 inside the Nextstrain shell - but also v 2.3.6 outside of nextstrain) gives the error for aligned-delim.fasta but not for aligned.fasta.

Entering the Nextstrain runtime (docker)

Mapped volumes:
  /nextstrain/build is from /media/jonr/SATA6TB1/HCV_1_year

Run the command "exit" to leave the runtime.

 Nextstrain  ~/build $ iqtree -ntmax 10 -s nextstrain_results/aligned-delim.fasta -m GTR+F+I+G4 -ninit 100 -n 100

******************************************************
CHECKPOINT: Resuming analysis from nextstrain_results/aligned-delim.fasta.ckp.gz

IQ-TREE multicore version 2.3.5.cmaple COVID-edition for Linux x86 64-bit built May 27 2024
Developed by Bui Quang Minh, James Barbetti, Nguyen Lam Tung, Olga Chernomor,
Heiko Schmidt, Dominik Schrempf, Michael Woodhams, Ly Trong Nhan, Thomas Wong

Host:    27bd6bf3b8d2 (AVX512, FMA3, 251 GB RAM)
Command: iqtree -ntmax 10 -s nextstrain_results/aligned-delim.fasta -m GTR+F+I+G4 -ninit 100 -n 100
Seed:    466968 (Using SPRNG - Scalable Parallel Random Number Generator)
Time:    Mon Nov  4 14:27:43 2024
Kernel:  AVX+FMA - 1 threads (12 CPU cores detected)

HINT: Use -nt option to specify number of threads because your CPU has 12 cores!
HINT: -nt AUTO will automatically determine the best number of threads to use.

WARNING: Number of command-line arguments differs from checkpoint
WARNING: Command-line differs from checkpoint!
Reading alignment file nextstrain_results/aligned-delim.fasta ... Fasta format detected
Reading fasta file: done in 0.0986895 secs using 88.75% CPU
Alignment most likely contains DNA/RNA sequences
Alignment has 913 sequences with 9456 columns, 8053 distinct patterns
5329 parsimony-informative, 669 singleton sites, 3458 constant sites
              Gap/Ambiguity  Composition  p-value
Analyzing sequences: done in 0.0481258 secs using 99.99% CPU
   1  D17763.1        0.00%    passed     94.58%
   2  KY620602        0.26%    passed     95.32%
   3  KY620603        0.25%    passed     95.33%
   4  2302108-HCV     0.39%    passed     97.47%
   5  2305867-HCV     0.37%    passed     98.24%
   6  HCV042023B2     0.90%    passed     94.67%
   7  HCV062022F2     6.40%    passed     95.98%
 ...
 910  KY620877        0.22%    passed     40.89%
 911  KU746825        4.18%    passed     73.64%
 912  HCV102023D1     1.85%    passed     31.82%
 913  Virus2011002    0.75%    passed     19.79%
****  TOTAL           3.09%  2 sequences failed composition chi2 test (p-value<5%; df=3)

CHECKPOINT: Initial tree restored

NOTE: 546 MB RAM (0 GB) is required!
CHECKPOINT: Model parameters restored, LogL: -868690.630
Wrote distance file to... 
--------------------------------------------------------------------
|             INITIALIZING CANDIDATE TREE SET                      |
--------------------------------------------------------------------
CHECKPOINT: Candidate tree set restored, best LogL: -766699.163
Finish initializing candidate tree set (20)
Current best tree score: -766699.163 / CPU time: 0.000
Number of iterations: 252
TREE SEARCH COMPLETED AFTER 252 ITERATIONS / Time: 0h:0m:0s

ERROR: Alignment sequence KY620674 does not appear in the tree
ERROR: Alignment sequence KY620649 does not appear in the tree
ERROR: Alignment sequence KY620653 does not appear in the tree
ERROR: Alignment sequence KY620677 does not appear in the tree
...
(D17763.1,(((KY620602,KY620501),(2305867-HCV,HCV042023B2)),((((KY620603,(2302108-HCV,2260039-HCV)),((((HCV082022A3,KY620604),2164844-HCV),HCV062023B5),HCV072022H1)),(HCV062022F2,((2198448-HCV,KY620606),((((((KY620489,KJ437307),(KY620528,((KJ437313,KX621475),(((KX621519,KX621431),((((KY620369,(KY620372,KX621531)),(KY620370,KY620371)),KY620368),KX621530)),((KY620377,KY620380),(((KY620378,(KY620374,(KY620375,KY620381))),((((KY620376,KX621507),KY620428),(KY620367,KY620373)),(KX621490,(KY620379,KX621508)))),KY620396)))))),KY620529),KY620530),((((((((((KY620506,(KX621429,KY620515)),((KX621453,KY620580),((GQ356216,KY620517),KJ437311))),KY620545),((KU871296,HCV092022A2),(KY620522,KY620539))),(((KY620516,(((KY620538,KY620551),(((KY620554,(KY620487,KY620543)),(KY620486,KY620544)),KY620537)),(KY620510,KY620578))),((KY620519,(KY620562,KY620567)),((KY620550,(KY620555,KX621472)),KY620568))),(KJ437327,KY620518))),((KX621479,((((((MN628595,MN628579),MN628589),MN628590),MN628593),(((MN628581,(MN628591,KY620566)),MN628577),(MN628585,(((MN628597,(MN628588,MN628583)),(MN628578,MN628596)),MN628584)))),((MN628594,MN628575),MN628582))),(((MN628586,MN628576),KY620556),KY620553))),(((((KY620521,(KX621486,KY620579)),(HCV012023C5,(KY620559,((KY620552,MN628574),((2325984-HCV,MN628587),KY620577))))),(((KY620540,(KY620541,((KY620548,KX621503),KY620549))),KY620564),(((KY620520,(JQ717254,KY620536)),KY620533),2223942-HCV))),(KX621477,(KY620542,KY620563))),KY620557)),KY620514),(Virus210413,KC844041)),(KY620565,HCV022023B3))),(((2190609-HCV,(((((HCV032023F3,KY620570),KY620569),KY620574),KY620817),(Virus210409,AY956467))),(KY620400,KY620418)),((KY620413,((((((((((((((((((((((KY620397,KY620387),KY620388),((KY620389,(KY620390,KY620391)),KY620393)),HCV072023B3),(KY620453,KY620640)),(KY620392,(KY620424,KY620644))),KY620669),(KY620475,KY620575)),((KX621522,KY620642),(KY620432,KY620645))),(((((((((2201320-HCV,KY620482),((KY620395,(KY620403,2204675-HCV)),KY620384)),(HCV082022D2,(KY620454,KY620398))),(2200918-HCV,(KY620420,KY620663))),((HCV092022G1,(HCV062023A2,(KX621489,KY620407))),(KY620410,(KY620421,KY620426)))),((KY620414,KY620469),(((KY620417,KY620401),KY620402),KY620613))),((((((HCV102022H1,2149283-HCV),KY620473),(KX621470,KY620465)),(((KX621498,KY620394),KY620466),((KY620472,HCV022023E1),((HCV112023D3,HCV112023C3),KY620474)))),((KX621516,KY620638),KY620416)),((((KY620411,((KY620406,(KX621518,KY620385)),KX621513)),((KX621492,KY620382),KY620383)),KX621454),((KY620399,KY620386),(KY620470,KY620476))))),((((HCV082023H1,((KY620425,KY620434),(KY620442,GQ356205))),((((KY620408,KY620366),KY620365),2201352-HCV),KY620595)),KY620435),((((KY620427,KY620429),KY620430),2255630-HCV),KY620456))),(KY620422,KY620415))),((((KY620409,((KY620436,KY620437),2149682-HCV)),(((KY620445,KY620433),KY620419),HCV122022F1)),((((KY620404,(KY620405,KX621441)),2276364-HCV),((KX621469,KY620412),(KY620423,KY620643))),(KY620480,Virus210404))),2201160-HCV)),KM587622),((((((KU871297,KY620471),((KY620452,KY620447),((KY620449,KY620446),(KX621481,KY620462)))),(KX621483,KX621478)),(((KY620448,KY620455),KY620464),((KY620450,KY620461),KY620639))),((KY620459,((KY620460,KY620444),KY620443)),KY620451)),((KX621476,KY620463),(KX621488,KY620646)))),((KY620431,((((((KX621537,(KY620588,KY620479)),KY620478),(KY620484,KY620468)),KY620483),(KY620477,(KY620547,KY620573))),((GQ356217,KY620786),JF509175))),(KY620467,((((((((HCV082023B2,KY620594),((KY620571,KY620591),(KY620481,KY620364))),(KY620576,2266432-HCV)),((KY620584,KY620585),KY620581)),(((KY620586,(2201237-HCV,KX621432)),(KY620582,(((((KY620600,KY620624),(KY620625,GQ356201)),KY620617),((((KY620629,(KY620630,KY620752)),KY620619),((((GQ356204,GQ356206),KY620660),KY620635),KY620636)),KY620620)),KY620670))),(((KY620626,KY620612),KY620627),KY620628))),(((2289968-HCV,2151740-HCV),HCV072022C2),HCV082022G3)),(((((((((KY620583,KY620485),GQ356211),((KY620665,KX621434),(((KY620614,KY620615),HCV062023H3),(KU871299,2201513-HCV)))),(KY620745,2325783-HCV)),(((KY620590,KY620621),KY620601),(((((KY620589,KX621539),((KY620587,KY620622),HCV092022E2)),(Virus220206,HCV092022D2)),((HCV102023H3,Virus200437),(HCV062023E6,GQ356208))),(GQ356200,HCV022023B1)))),(((KY620616,HCV062023B2),2217828-HCV),KY620623)),((HCV052023B3,2254393-HCV),HCV122022B2)),((KX621501,KY620597),HCV082023A4)),(((KY620598,KY620633),((KY620599,((KY620618,(KY620641,(2281008-HCV,HCV042023A2))),2194578-HCV)),HCV052023B2)),HCV092022C4))),(HCV122022B5,KY620637))))),KY620662),(KY620457,KY620458)),KY620647),(KY620697,KY620675)),KX621534),GQ356215),(KY620803,GQ356209)),(KY620688,GQ356214))),(KM043280,KM043281))))))),((((((((((((((((2155337-HCV,DQ430819),KY620560),(KY620561,KY620493)),((((HCV062023C5,HCV082023B4),HCV082023B3),KY620502),(((HCV102023G3,HCV122023D3),KY620558),HCV032023H1))),((HCV092023F3,HCV102022D2),(KX621487,(Virus210414,HCV112023A1)))),(HCV092022A4,KY620491)),KY620512),((2280987-HCV,HCV122023E1),HCV122022D3)),((KX621527,JF509177),(GQ356203,KY620492))),(((DQ430820,(KY620505,KY620534)),((((((KY620605,HCV072023E3),2160613-HCV),(KY620490,(HCV092023E3,2270512-HCV))),(((KY620494,2149310-HCV),KY620525),(((2158371-HCV,HCV042023C1),(2311538-HCV,2165011-HCV)),HCV082023F3))),(((2201484-HCV,HCV032023E1),Virus210605),(2258561-HCV,((((HCV082023E2,KY620526),KY620527),2161950-HCV),HCV062023G4)))),((HCV062023D4,(HCV052023D2,HCV062023E3)),((HCV122022B4,HCV102022B3),Virus210411)))),(HCV072023A3,(KY620511,KY620508)))),((((((((KY620609,(KY620610,2201381-HCV)),KY620504),HCV112023E2),(((KY620503,2324952-HCV),(HCV092022F3,HCV072023C1)),((((HCV062023E5,HCV082022E3),2260374-HCV),KY620509),(HCV082023C4,HCV122022D5)))),HCV062023G5),2281857-HCV),HCV082023D3),((((KY620611,KY120332),((MN231294,MN231295),HCV102022C1)),(HCV102022F3,HQ912953)),((KY620523,(KY620524,2324251-HCV)),(2253104-HCV,2212315-HCV))))),(((((((KY620607,KY620608),(KY620546,HCV012023B6)),((HCV082023D2,KX621502),HCV122023H4)),((KY620499,KY620507),2257410-HCV)),KY620498),(KX621428,(KY620535,HCV082023C2))),(KY620531,2197190-HCV))),(KY620500,(KX621426,HCV102023G1))),((((KJ437318,(KY620497,KY620495)),KY620496),KX621541),(HCV112023H2,(HCV072023H3,KY620488)))),KY620513),KY620532))),HCV042023F2);
ERROR: Tree taxa and alignment sequence do not match (see above)

I see now that it says that ERROR: Tree taxa and alignment sequence do not match (see above). And it’s able to create the candidate trees. Could it be restoring candidate trees based on a different dataset?

I looked at one of the problematic sequences flagged by iqtree, “KY620674”, but it seems to have the same name in both alignments:

(NEXTSTRAIN) jonr@jonr-HP-Z4-G4-Workstation:/media/jonr/SATA6TB1/HCV_1_year$ grep "KY620674" nextstrain_results/aligned-delim.fasta
>KY620674
(NEXTSTRAIN) jonr@jonr-HP-Z4-G4-Workstation:/media/jonr/SATA6TB1/HCV_1_year$ grep "KY620674" nextstrain_results/aligned.fasta
>KY620674

Thanks! That is an interesting result about the candidate trees, but I wonder if those trees appear to work in your example above because they’ve been loaded from a previous IQ-TREE run (guessing here based on the “CHECKPOINT Initial tree restored” message in the logs). When you run iqtree with aligned.fasta and then with aligned-delim.fasta, you’ll want to use the -redo flag, too, to prevent IQ-TREE from resuming a previous analysis like it did in the example above.

If you run both of these commands, do you get the same results where the first command works and the second command errors out?

iqtree -ntmax 10 -s nextstrain_results/aligned.fasta -m GTR+F+I+G4 -ninit 100 -n 100 -redo

iqtree -ntmax 10 -s nextstrain_results/aligned-delim.fasta -m GTR+F+I+G4 -ninit 100 -n 100 -redo

Also, just to rule out any effect that these specific IQ-TREE settings may have compared to augur tree’s defaults, could you try the same inputs above but with these default settings from Augur?

iqtree -s nextstrain_results/aligned.fasta -m GTR -ninit 2 -n 2 -me 0.05 -nt AUTO -redo

iqtree -s nextstrain_results/aligned-delim.fasta -m GTR -ninit 2 -n 2 -me 0.05 -nt AUTO -redo

Yes, -redo seems to do the trick! All of the commands you suggested work. I also added -redo to --tree-builder-args in augur tree and it also works.

But I’m not sure why this happened? I have crated Nextstrain builds in the same folder before with no problem, and I start the analysis with the --rerun-incomplete and --forceall flags.

1 Like

Oh good! IQ-TREE creates a checkpoint per input file (and, supposedly, other command line arguments), so if a run with the aligned-delim.fasta stopped for some reason initially with one dataset and then you tried a different dataset with the same input file name, IQ-TREE would load the checkpoint from the earlier dataset and fail when it couldn’t find the same samples in the new. Normally when you run augur tree, the default IQ-TREE arguments include the -redo flag, so you’d never encounter this specific issue. Since you need to implement custom arguments here, that introduced the risk of the checkpoint issue.

The --rerun-incomplete and --forceall flags are Snakemake arguments that control how the whole workflow runs, but they don’t have any effect on the specific commands run by the workflow like MAAFT, IQ-TREE, Augur, etc.

I understand. Thanks for your help!