Missing input files for rule all

I need help trying to troubleshoot the following error:

localrules directive specifies rules that are not present in the Snakefile:
upload
download

Building DAG of jobs…
WorkflowError:
Target rules may not contain wildcards. Please specify concrete files or a rule without wildcards.
localrules directive specifies rules that are not present in the Snakefile:
upload
download

Building DAG of jobs…
MissingInputException in line 97 of /blue/bphl-florida/schmedess/data/SARS-CoV-2/A/nextstrain_builds/20210330_A_global/ncov/Snakefile:
Missing input files for rule all:
auspice/ncov_A.json
auspice/ncov_A_tip-frequencies.json

I normally run my nextstrain builds using a profile with subsampling. However, for this build I’m just trying to make with all samples in the input file (no subsampling). I keep getting the following error. Can you help me troubleshoot?

Thanks,
Sarah Schmedes

Hi @seschmedes, just to make sure I understand how you’ve setup your build, do you currently have a builds.yaml file that does not have a subsampling top-level key? The current workflow doesn’t allow you to skip the subsampling step, although we’ve talked about this as a feature to implement.

The best workaround to get the effect of including all sequences while subsampling is to define an empty subsampling rule. For example, in the example getting started profile, you could define the following simple build and subsampling scheme:

builds:
  global:
    subsampling_scheme: getting-started
    region: global

subsampling:
  getting-started:
    # Define one subsampling rule in the `getting-started` scheme that selects all
    # input sequences.
    all:
      # Define an empty placeholder in the `all` dictionary, so Snakemake will know
      # this is a dictionary.
      empty:

We use a similar approach in the example multiple inputs profile where two different metadata sets get merged with a column named aus added during the merging. This example build uses just the exclude key to filter out strains that are not from the Australian dataset, effectively keeping all data from that Australian dataset.