Error in "main_workflow.smk" file

Team,

I recently have been running into an error in “main_workflow.smk” with latency wait. The error message is attached in the image below. The error only occurs when I run our complete local genomes (~4,500 genomes). However I do not see this error when I run genomes for from the last 4 epi weeks although they both use identical build.yaml files. Subsampling schemes for both runs are identical except the volume (~4,500 local genomes, ~1,500 north america, ~500 global contextual sequences → for the complete build, and ~500 + 500 + 250 genomes respectively for the last 4 epi weeks).

Not sure why this is happening. Yes, I have installed/updated the latest ncov repos in my Ubuntu environment, and have installed/updated the nextstrain conda environment.

My command looks like this: snakemake --cores 6 --configfile my_profiles/nebraskabuild/builds.yaml --latency-wait 30 --use-conda

Error Message:

MissingOutputException in line 665 of /mnt/c/Users/ncov/workflow/snakemake_rules/main_workflow.smk:
Job Missing files after 30 seconds:
results/LocationName/tree_raw.nwk
This might be due to filesystem latency. If that is the case, consider to increase the wait time with --latency-wait.
Job id: 6 completed successfully, but some output files are missing. 6

Yes, I have increased the latency wait time to 60 seconds and get the same error message.

I have glanced through the page to see if others have reported a similar error message. This was pretty close… (Error in "main_workflow.smk" file in --latency-wait - #4 by james)

  • Yes, I have both reference genomes’ names in metadata updated in the metadata

Will appreciate any troubleshooting help

Team, Thanks for all your work advancing genomic surveillance efforts globally! Couldn’t appreciate you all enough!!

@Bryan, I think that it Looks like ram problem: core dump. Test with the example data provided in the my_profile directory (example), if it works.

@mattoslmp … the same error is reproduced when I use a cluster in the cloud. The example data does not reproduce the same error.

IQ-TREE is crashing with a segfault (SIGSEGV), which means the file Snakemake is expecting (tree_raw.nwk) will never appear (and a longer --latency-wait won’t help, as you’ve observed). This is almost certainly a bug in IQ-TREE and not something the Nextstrain team can directly fix.

The crash appears to be data dependent since the example data works fine; it may or may not be dependent on the version of IQ-TREE too. In the past, we’ve observed IQ-TREE crash on certain data where the workaround was avoiding certain versions.

In this case, I’d try upgrading the version of IQ-TREE in workflow/envs/nextstrain.yaml from:

  - iqtree=2.1.2

to the latest version available with Conda:

  - iqtree=2.1.4_beta

and re-running your build to see if that helps. If the crash still occurs, you can instead try downgrading to the previous version available with Conda:

  - iqtree=2.0.3

Let us know how it goes! You might also consider filing a bug with the IQ-TREE project.