Error in "main_workflow.smk" file

Team,

I recently have been running into an error in “main_workflow.smk” with latency wait. The error message is attached in the image below. The error only occurs when I run our complete local genomes (~4,500 genomes). However I do not see this error when I run genomes for from the last 4 epi weeks although they both use identical build.yaml files. Subsampling schemes for both runs are identical except the volume (~4,500 local genomes, ~1,500 north america, ~500 global contextual sequences → for the complete build, and ~500 + 500 + 250 genomes respectively for the last 4 epi weeks).

Not sure why this is happening. Yes, I have installed/updated the latest ncov repos in my Ubuntu environment, and have installed/updated the nextstrain conda environment.

My command looks like this: snakemake --cores 6 --configfile my_profiles/nebraskabuild/builds.yaml --latency-wait 30 --use-conda

Error Message:

MissingOutputException in line 665 of /mnt/c/Users/ncov/workflow/snakemake_rules/main_workflow.smk:
Job Missing files after 30 seconds:
results/LocationName/tree_raw.nwk
This might be due to filesystem latency. If that is the case, consider to increase the wait time with --latency-wait.
Job id: 6 completed successfully, but some output files are missing. 6

Yes, I have increased the latency wait time to 60 seconds and get the same error message.

I have glanced through the page to see if others have reported a similar error message. This was pretty close… (Error in "main_workflow.smk" file in --latency-wait - #4 by james)

  • Yes, I have both reference genomes’ names in metadata updated in the metadata

Will appreciate any troubleshooting help

Team, Thanks for all your work advancing genomic surveillance efforts globally! Couldn’t appreciate you all enough!!

@Bryan, I think that it Looks like ram problem: core dump. Test with the example data provided in the my_profile directory (example), if it works.

@mattoslmp … the same error is reproduced when I use a cluster in the cloud. The example data does not reproduce the same error.

IQ-TREE is crashing with a segfault (SIGSEGV), which means the file Snakemake is expecting (tree_raw.nwk) will never appear (and a longer --latency-wait won’t help, as you’ve observed). This is almost certainly a bug in IQ-TREE and not something the Nextstrain team can directly fix.

The crash appears to be data dependent since the example data works fine; it may or may not be dependent on the version of IQ-TREE too. In the past, we’ve observed IQ-TREE crash on certain data where the workaround was avoiding certain versions.

In this case, I’d try upgrading the version of IQ-TREE in workflow/envs/nextstrain.yaml from:

  - iqtree=2.1.2

to the latest version available with Conda:

  - iqtree=2.1.4_beta

and re-running your build to see if that helps. If the crash still occurs, you can instead try downgrading to the previous version available with Conda:

  - iqtree=2.0.3

Let us know how it goes! You might also consider filing a bug with the IQ-TREE project.

Team, just following up with this. We are still running into this error. We have upgraded to the latest version of iqtree with conda (2.1.4_beta) and also downgraded to the previous version of iqtree (2.0.3). Same error message. This only occurs when we our complete build, unaffected by recently collected samples (last 8 weeks for example). Have been contacted by colleagues running into a similar error message. Wondering what’s causing this.

Waiting at most 30 seconds for missing files.
MissingOutputException in line 750 of /mnt/c/Users/bryan/Documents/ncov/workflow/snakemake_rules/main_workflow.smk:
Job Missing files after 30 seconds:
results/Nebraska/nt_muts.json
This might be due to filesystem latency. If that is the case, consider to increase the wait time with --latency-wait.
Job id: 38 completed successfully, but some output files are missing. 38

Follow-up Update:
I reached out to IQ-TREE. They think it is something to do with the nextstrain snakemake workflow, and not iqtree.

Hmm, the screenshot you posted in your first message clearly shows IQ-TREE crashing with a segfault. Have you shown the IQ-TREE authors that screenshot?

Ah, I see the IQ-TREE issue on GitHub. I’ve left a comment to try to clear things up and link the IQ-TREE authors to the context here. Let’s see what they say.

Hi Folks,

Just hopping over here from GitHub to see if we can sort this out. If you can send the IQ-TREE input file and the commandline, it would really help us to try and figure this out. Without this, it can be really really difficult to help.

I’m a GISAID member, so you can share GISAID data with me if that’s what you’re working with. My contact details are here: Robert Lanfear | ANU Research School of Biology

If you can post as much information (e.g. command lines, IQ-TREE output files, etc) to the github issue, that will help too. The more of the files are on GitHub, the easier for the developers to work on it.

Rob

1 Like

Cross-referencing to another report we received of an IQ-TREE segfault.