I need help trying to troubleshoot the following error:

MissingInputException in line 97 of /blue/bphl-florida/schmedess/data/SARS-CoV-2/A/nextstrain_builds/20210330_A_global/ncov/Snakefile:
I normally run my nextstrain builds using a profile with subsampling. However, for this build I’m just trying to make with all samples in the input file (no subsampling). I keep getting the following error. Can you help me troubleshoot?

Sarah Schmedes

Hi @seschmedes, just to make sure I understand how you’ve setup your build, do you currently have a builds.yaml file that does not have a subsampling top-level key? The current workflow doesn’t allow you to skip the subsampling step, although we’ve talked about this as a feature to implement.

The best workaround to get the effect of including all sequences while subsampling is to define an empty subsampling rule. For example, in the example getting started profile, you could define the following simple build and subsampling scheme:

    subsampling_scheme: getting-started
    region: global

    # Define one subsampling rule in the `getting-started` scheme that selects all
    # input sequences.
      # Define an empty placeholder in the `all` dictionary, so Snakemake will know
      # this is a dictionary.

We use a similar approach in the example multiple inputs profile where two different metadata sets get merged with a column named aus added during the merging. This example build uses just the exclude key to filter out strains that are not from the Australian dataset, effectively keeping all data from that Australian dataset.

Hi @jlhudd, I have a bit of a follow up to this (or maybe a separate issue with a similar error message).

I think I am encountering a somewhat related error, though I’ve made some major changes to the pipeline by using a totally custom subsampling scheme that has replaced the entirety of the standard workflow up to the augur tree step, so hopefully this is still a relevant question.

I have some builds that are also getting the Missing input files for rule all error. These builds are all defined in the builds.yaml file (sorry for using screenshots instead of direct blockquotes, there is a lot of less important cruft from other builds that I’m omitting, and I was having trouble formatting my indentations :sweat_smile:):

Oddly I get errors for the build titled P.1 (and for the other Pango-based builds that are collapsed), but not for the demo build at the top.

I have confirmed that my all (in Snakefile) rule’s input:

matches the output of the last rule from my workflow (in main_workflow.smk):

When I try to run things I get the input file error:

$ time snakemake --cores 35 --profile my_profiles/sars-cov-2-belgium/ -p
Building DAG of jobs…
MissingInputException in line 75 of ~/projects/sars-cov-2-belgium/Snakefile:
Missing input files for rule all:

real 0m0.397s
user 0m0.344s
sys 0m0.046s

I should also note that the builds shown in the builds.yaml file that don’t appear as errors here are the ones that already have their json files in the auspice directory, as they were completed under a previous version of the pipeline.

My only guesses at this time are that either I broke a connection between Snakefile and main_workflow.smk somehow, or that something with how I’ve formatted my builds.yaml file is wrong.


Hey @barneypotter! I’m almost certain that the issue here is that the workflow doesn’t currently allow periods in build names. There isn’t a good reason for this restriction, though, so we should just update the regex for the build_name wildcard to allow these.

In the short term, you can update the regex yourself in your own copy of the workflow or you can rename your builds to replace periods with underscores or some other delimiter.

That worked great!

It seems that the numbers were also an issue (which explains when I tested taking out the period characters it also messed up).

For anyone who runs into a similar issue I changed the line as follows:

build_name = r'(?:[_a-zA-Z0-9.-](?!(tip-frequencies|gisaid|zh)))+',

