Thanks again,
Running IQTREE (v 2.3.5 inside the Nextstrain shell - but also v 2.3.6 outside of nextstrain) gives the error for aligned-delim.fasta
but not for aligned.fasta
Entering the Nextstrain runtime (docker)
Mapped volumes:
/nextstrain/build is from /media/jonr/SATA6TB1/HCV_1_year
Run the command "exit" to leave the runtime.
Nextstrain ~/build $ iqtree -ntmax 10 -s nextstrain_results/aligned-delim.fasta -m GTR+F+I+G4 -ninit 100 -n 100
CHECKPOINT: Resuming analysis from nextstrain_results/aligned-delim.fasta.ckp.gz
IQ-TREE multicore version 2.3.5.cmaple COVID-edition for Linux x86 64-bit built May 27 2024
Developed by Bui Quang Minh, James Barbetti, Nguyen Lam Tung, Olga Chernomor,
Heiko Schmidt, Dominik Schrempf, Michael Woodhams, Ly Trong Nhan, Thomas Wong
Host: 27bd6bf3b8d2 (AVX512, FMA3, 251 GB RAM)
Command: iqtree -ntmax 10 -s nextstrain_results/aligned-delim.fasta -m GTR+F+I+G4 -ninit 100 -n 100
Seed: 466968 (Using SPRNG - Scalable Parallel Random Number Generator)
Time: Mon Nov 4 14:27:43 2024
Kernel: AVX+FMA - 1 threads (12 CPU cores detected)
HINT: Use -nt option to specify number of threads because your CPU has 12 cores!
HINT: -nt AUTO will automatically determine the best number of threads to use.
WARNING: Number of command-line arguments differs from checkpoint
WARNING: Command-line differs from checkpoint!
Reading alignment file nextstrain_results/aligned-delim.fasta ... Fasta format detected
Reading fasta file: done in 0.0986895 secs using 88.75% CPU
Alignment most likely contains DNA/RNA sequences
Alignment has 913 sequences with 9456 columns, 8053 distinct patterns
5329 parsimony-informative, 669 singleton sites, 3458 constant sites
Gap/Ambiguity Composition p-value
Analyzing sequences: done in 0.0481258 secs using 99.99% CPU
1 D17763.1 0.00% passed 94.58%
2 KY620602 0.26% passed 95.32%
3 KY620603 0.25% passed 95.33%
4 2302108-HCV 0.39% passed 97.47%
5 2305867-HCV 0.37% passed 98.24%
6 HCV042023B2 0.90% passed 94.67%
7 HCV062022F2 6.40% passed 95.98%
910 KY620877 0.22% passed 40.89%
911 KU746825 4.18% passed 73.64%
912 HCV102023D1 1.85% passed 31.82%
913 Virus2011002 0.75% passed 19.79%
**** TOTAL 3.09% 2 sequences failed composition chi2 test (p-value<5%; df=3)
CHECKPOINT: Initial tree restored
NOTE: 546 MB RAM (0 GB) is required!
CHECKPOINT: Model parameters restored, LogL: -868690.630
Wrote distance file to...
CHECKPOINT: Candidate tree set restored, best LogL: -766699.163
Finish initializing candidate tree set (20)
Current best tree score: -766699.163 / CPU time: 0.000
Number of iterations: 252
ERROR: Alignment sequence KY620674 does not appear in the tree
ERROR: Alignment sequence KY620649 does not appear in the tree
ERROR: Alignment sequence KY620653 does not appear in the tree
ERROR: Alignment sequence KY620677 does not appear in the tree
ERROR: Tree taxa and alignment sequence do not match (see above)
I see now that it says that ERROR: Tree taxa and alignment sequence do not match (see above)
. And it’s able to create the candidate trees. Could it be restoring candidate trees based on a different dataset?
I looked at one of the problematic sequences flagged by iqtree, “KY620674”, but it seems to have the same name in both alignments:
(NEXTSTRAIN) jonr@jonr-HP-Z4-G4-Workstation:/media/jonr/SATA6TB1/HCV_1_year$ grep "KY620674" nextstrain_results/aligned-delim.fasta
(NEXTSTRAIN) jonr@jonr-HP-Z4-G4-Workstation:/media/jonr/SATA6TB1/HCV_1_year$ grep "KY620674" nextstrain_results/aligned.fasta