Troubleshooting parallelism in Snakemake

I want to build two datasets with my ruleset (a minimally modified version of the ncov worklow) in the same Snakemake profile and take advantage of parallelism, so I went and tried this:

  1. Defined the second build in the profile’s builds.yaml.
  2. Set my cluster job (AWS Batch, self-managed (= not Nextstrain CLI’s AWS Batch mode)) to 8 vCPUs.
  3. Set cores: 8 and set-threads: tree=4 in the profile’s config.yaml.

I understand #3 should allow two instances of the tree rule to run in parallel, or one in parallel with one instance of refine, or two instances of refine in parallel. But instead I see:

  • Multiple instances of subsample do run in parallel;
  • Only one instance of tree or refine runs at a time.

I read up on Snakemake’s mem_mb resource and looked at the defaults set for that in the ncov workflow’s main_workflow.smk, wondered whether that was limiting my parallelism… but even if I go to a container with 32 GiB of memory and call Snakemake with --resources mem_mb=30720, still no parallelism. Actually worse, I saw parallelism go away for the subsample jobs.

The main_workflow.smk file (at ncov commit d7dc587) defines the memory usage of the tree rule like this:

mem_mb=lambda wildcards, input: 40 * int(input.size / 1024 / 1024)

…and the filtered.fasta files that are being used as input to those steps are 145.3 MB and 125.1 MB, so 40 times that respectively is 5,812 MB and 5,004 MB; a 16 GiB EC2 instance should be more than enough to run them in parallel! Questions:

  • Am I missing some other lever that needs to be pulled here?
  • How do I troubleshoot Snakemake’s parallel scheduling decisions? I looked through the documentation for the command-line options and nothing jumped out to my eye.