Missing input files for rule all

seschmedes · March 30, 2021, 5:03pm

I need help trying to troubleshoot the following error:

localrules directive specifies rules that are not present in the Snakefile:
upload
download

Building DAG of jobs…
WorkflowError:
Target rules may not contain wildcards. Please specify concrete files or a rule without wildcards.
localrules directive specifies rules that are not present in the Snakefile:
upload
download

Building DAG of jobs…
MissingInputException in line 97 of /blue/bphl-florida/schmedess/data/SARS-CoV-2/A/nextstrain_builds/20210330_A_global/ncov/Snakefile:
Missing input files for rule all:
auspice/ncov_A.json
auspice/ncov_A_tip-frequencies.json

I normally run my nextstrain builds using a profile with subsampling. However, for this build I’m just trying to make with all samples in the input file (no subsampling). I keep getting the following error. Can you help me troubleshoot?

Thanks,
Sarah Schmedes

jlhudd · March 31, 2021, 5:06pm

Hi @seschmedes, just to make sure I understand how you’ve setup your build, do you currently have a builds.yaml file that does not have a subsampling top-level key? The current workflow doesn’t allow you to skip the subsampling step, although we’ve talked about this as a feature to implement.

The best workaround to get the effect of including all sequences while subsampling is to define an empty subsampling rule. For example, in the example getting started profile, you could define the following simple build and subsampling scheme:

builds:
  global:
    subsampling_scheme: getting-started
    region: global

subsampling:
  getting-started:
    # Define one subsampling rule in the `getting-started` scheme that selects all
    # input sequences.
    all:
      # Define an empty placeholder in the `all` dictionary, so Snakemake will know
      # this is a dictionary.
      empty:

We use a similar approach in the example multiple inputs profile where two different metadata sets get merged with a column named aus added during the merging. This example build uses just the exclude key to filter out strains that are not from the Australian dataset, effectively keeping all data from that Australian dataset.

barneypotter · April 20, 2021, 10:33am

Hi @jlhudd, I have a bit of a follow up to this (or maybe a separate issue with a similar error message).

I think I am encountering a somewhat related error, though I’ve made some major changes to the pipeline by using a totally custom subsampling scheme that has replaced the entirety of the standard workflow up to the augur tree step, so hopefully this is still a relevant question.

I have some builds that are also getting the Missing input files for rule all error. These builds are all defined in the builds.yaml file (sorry for using screenshots instead of direct blockquotes, there is a lot of less important cruft from other builds that I’m omitting, and I was having trouble formatting my indentations ):

Oddly I get errors for the build titled P.1 (and for the other Pango-based builds that are collapsed), but not for the demo build at the top.

I have confirmed that my all (in Snakefile) rule’s input:

matches the output of the last rule from my workflow (in main_workflow.smk):

When I try to run things I get the input file error:

$ time snakemake --cores 35 --profile my_profiles/sars-cov-2-belgium/ -p
Building DAG of jobs…
MissingInputException in line 75 of ~/projects/sars-cov-2-belgium/Snakefile:
Missing input files for rule all:
auspice/sars-cov-2-belgium_P.1.json
auspice/sars-cov-2-belgium_P.1_tip-frequencies.json
auspice/sars-cov-2-belgium_B.1.214_tip-frequencies.json
auspice/sars-cov-2-belgium_B.1.214.json

real 0m0.397s
user 0m0.344s
sys 0m0.046s

I should also note that the builds shown in the builds.yaml file that don’t appear as errors here are the ones that already have their json files in the auspice directory, as they were completed under a previous version of the pipeline.

My only guesses at this time are that either I broke a connection between Snakefile and main_workflow.smk somehow, or that something with how I’ve formatted my builds.yaml file is wrong.

Thanks!

jlhudd · April 21, 2021, 4:53pm

Hey @barneypotter! I’m almost certain that the issue here is that the workflow doesn’t currently allow periods in build names. There isn’t a good reason for this restriction, though, so we should just update the regex for the build_name wildcard to allow these.

In the short term, you can update the regex yourself in your own copy of the workflow or you can rename your builds to replace periods with underscores or some other delimiter.

barneypotter · April 21, 2021, 5:59pm

That worked great!

It seems that the numbers were also an issue (which explains when I tested taking out the period characters it also messed up).

For anyone who runs into a similar issue I changed the line as follows:

build_name = r'(?:[_a-zA-Z0-9.-](?!(tip-frequencies|gisaid|zh)))+',

Topic		Replies	Views
MissingInputException in line 262 of Help and Getting Started	1	445	May 24, 2022
Problems automating build with a Snakefile	2	493	May 13, 2023
Augur error while subsampling - updated Help and Getting Started	0	496	November 21, 2020
Only global build found in ./auspice General	4	574	October 23, 2020
Diagnosing error + filtering issues Help and Getting Started	14	1656	November 9, 2020

Missing input files for rule all

Related topics