It’s been awhile since I’ve run the local build of nextstrain on my computer, so I went ahead and updated everything, first from github, then from the CLI using “conda update --all”
Ever since the update, I have not been able to successfully run the analysis, looks like a lot has changed, and I was wondering if someone could help me nail down where this issue is coming from.
I am not sure why it is asking for 3 sequences in the alignment. Is it saying that none of my sequences passed the quality checks, and now only the two reference sequences (Wuhan/Hu-1/2019 and Wuhan/WHO1/2019) remain?
If it helps, those two reference sequences are the only sequences in my aligned-delim.fasta file in the results folder. Almost all of the sequences I have passed to nextstrain have passed the quality checks in the past. I’m not sure where to look next.
This is one of the places where sequences may disappear.
The path to the logs looks like this: logs/subsample_{build_name}_{subsample}.txt
In the Snakemake workflow that I linked to above you can see the output files from that rule. It’s worth checking them and the rules that consume these output files.
It’s hard to debug something like this from afar but if you share logs and input output files etc we can try to get there and maybe help others who have a similar issue.
I’d also be curious whether @jbarnell has figured it out in the meantime - although it’s been quite a while, sorry for that.
Same issue, cloned nextrain and ncov yesterday, running native.
Not sure why near all the example data is being dropped.
Job 10:
Combine and deduplicate FASTAs
Reason: Input files updated by another job: results/default-build/sample-all.txt
augur filter --sequences results/aligned_reference_data.fasta.xz --metadata results/sanitized_metadata_reference_data.tsv.xz --exclude-all --include results/default-build/sample-all.txt --output-sequences results/default-build/default-build_subsampled_sequences.fasta.xz --output-metadata results/default-build/default-build_subsampled_metadata.tsv.xz 2>&1 | tee logs/subsample_regions_default-build.txt
488 strains were dropped during filtering
240 had no metadata
250 of these were dropped by `--exclude-all`
250 strains were added back because they were in results/default-build/sample-all.txt
2 strains passed all filters
Note: You did not provide a sequence index, so Augur will generate one. You can generate your own index ahead of time with `augur index` and pass it with `augur filter --sequence-index`.
0 strains were dropped during filtering
2 strains passed all filters
[Wed Aug 31 10:21:38 2022]
Finished job 7.
10 of 30 steps (33%) done
Select jobs to execute...
[Wed Aug 31 10:21:38 2022]
Job 6: Building tree
Reason: Missing output files: results/default-build/tree_raw.nwk; Input files updated by another job: results/default-build/filtered.fasta
Same error
Error in rule tree:
jobid: 6
output: results/default-build/tree_raw.nwk
log: logs/tree_default-build.txt (check log file(s) for error message)
conda-env: path-to/ncov/.snakemake/conda/606fba2748c6c88ce497ee03a13af39a_
shell:
augur tree --alignment results/default-build/filtered.fasta --tree-builder-args '-ninit 10 -n 4' --exclude-sites defaults/sites_ignored_for_tree_topology.txt --output results/default-build/tree_raw.nwk --nthreads 8 2>&1 | tee logs/tree_default-build.txt
(one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
Hi everyone, I’m currently encountering the same error and was wondering if anyone here has managed to resolve it. If so, would you be kind enough to share how you addressed the issue? Your insights would be greatly appreciated. Thank you in advance!
Hi @limxr01, the problem that @henrykmer described may be different than the problem @jbarnell originally described, but I would guess that there is a mismatch in the strain names in the input sequences and metadata files. For example, in @henrykmer’s log output above, the following lines indicate that augur filter found 490 (488 plus the 2 it kept) strains, but 240 didn’t have metadata and 250 had sequences:
488 strains were dropped during filtering
240 had no metadata
250 of these were dropped by `--exclude-all`
250 strains were added back because they were in results/default-build/sample-all.txt
2 strains passed all filters
This suggests that the names in the sequences and metadata didn’t match, so augur filter couldn’t link the records and output them during the filtering process. The filter command would report the sequences that don’t have matching metadata records as having no metadata. The solution is to recreate the sequences and metadata files with matching strain names before running the workflow.
@limxr01 Can you share the complete command you are using to run your ncov workflow and the complete error output you’re getting, so we have a better sense of the problem?
I forgot to mention that to @jbarnell’s original question, you need at least 3 sequences to build a phylogenetic tree, so IQ-TREE throws an error when it only finds the 2 reference sequences. This error is an internal sanity check for IQ-TREE. In this context, the error lets you know that something went wrong with the workflow upstream of the tree building step.
Hi @jlhudd, thank you for your response! It looks like I’m encountering both of the issues you mentioned: some strains are being dropped during filtering due to missing metadata, and IQ-TREE is throwing an error because it only has the two reference sequences to work with. I’ve attached my full command and error log in the .txt file for your reference. failed_complete.txt (19.2 KB)
I’ll double-check my sequence and metadata files for any strain name mismatches as you suggested. If you have any further tips on resolving this, I’d appreciate it! Thank you so much for your help!
Thank you for the log file, @limxr01; that helps a lot! It looks like by the time the workflow gets to the filtering rule (Job 6 in your log file), there is only one record in the sequences and metadata. This suggests something went wrong upstream between that filter step and the initial subsampling where 4334 passed the initial filters.
To get a better idea of when those records got dropped, can you share the full output of the following command run from the ncov directory?