Max-date (and min-date) -related problem

Hello. I’m running into a small problem when trying to use max-date in a special build. I’ve updated nextstrain to the most recent version, and the problem remains, so I don’t know what I’m doing wrong. My builds.yaml is set up like this:

subsampling:
  earlyG:
    focus:
      group_by: "year month"
      max_date: "2020-05-31"
      max_sequences: 1000
      exclude: "--exclude-where 'Nextstrain_clade!=20G'"
    background:
      group_by: "year month"
      max_date: "2020-05-31"
      max_sequences: 100
      exclude: "--exclude-where 'Nextstrain_clade!=20C'"
      priorities:
        type: "proximity"
        focus: "focus"

(I’m trying to look at the early cases of 20G, and the genetically closest known 20C cases in order to understand when and where the variant may have appeared)

When I try running this build, however, I get errors because the filter command is putting the ‘2020-05-31’ in the command line, but not the --max-date command before it:

Error in rule subsample:
    jobid: 23
    output: results/earlyG/sample-focus.fasta, results/earlyG/sample-focus.txt
    log: logs/subsample_earlyG_focus.txt (check log file(s) for error message)
    shell:
        
        augur filter             --sequences results/filtered_gisaid.fasta             --metadata data/metadata_gisaid.tsv             --sequence-index results/combined_sequence_index.tsv             --include defaults/include.txt             --exclude defaults/exclude.txt                          2020-05-31             --exclude-where 'Nextstrain_clade!=20G'                                                                 --group-by year month                          --subsample-max-sequences 1000                          --output results/earlyG/sample-focus.fasta             --output-strains results/earlyG/sample-focus.txt 2>&1 | tee logs/subsample_earlyG_focus.txt
        
        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
Logfile logs/subsample_earlyG_focus.txt:
ERROR: Could not open file of excluded strains '['defaults/exclude.txt', '2020-05-31']'

Because the --max-date is not in the command, it is treating the date in the command line as a modifier for the --exclude command.

What am I doing wrong? I’ve tried both max_date and max-date in the builds.yaml. Neither works. I have the same problem with min-date.

Hi @mbosmeny – these need to be added explicitly to the value, like so:

subsampling:
  earlyG:
    focus:
      group_by: "year month"
      max_date: "--max-date 2020-05-31"
      max_sequences: 1000
      exclude: "--exclude-where 'Nextstrain_clade!=20G'"
    background:
      group_by: "year month"
      max_date: "--max-date 2020-05-31"
      max_sequences: 100
      exclude: "--exclude-where 'Nextstrain_clade!=20C'"
      priorities:
        type: "proximity"
        focus: "focus"

(It’s not well documented which options need to include the argument in the supplied value and which ones don’t; we’ll try to write comprehensive docs for the builds YAML soon!)

I see. Thanks for the quick response!