I have a build where I want to focus in on a specific range of dates.
builds:
#Wave one focused build
waveone:
subsampling_scheme: waves-scheme # use a custom subsampling scheme defined below
country: Zambia
min_date: 2020-05-13
max_date: 2020-10-05
filter:
zambia: #when deprecated - remove this line to nest the below to filter:
min_length: 5000 # Allow shorter genomes. Parameter used to filter alignment.
skip_diagnostics: True # skip diagnostics (which can remove genomes) for this input
# STAGE 2: Subsampling parameters
subsampling:
waves-scheme:
# filter each dataset for each build
allFromzambia:
#exclude: "--exclude-where 'country!={country}'"
min_date: "--min-date {min_date}"
max_date: "--max-date {max_date}"
allFromworldwide:
exclude: "--exclude-where 'country={country}'"
min_date: "--min-date {min_date}"
max_date: "--max-date {max_date}"
worldwideglobalBackground:
exclude: "--exclude-where 'country={country}'"
group_by: year month
seq_per_group: 5
The json output file from this build creates entries throughout 2020 and 2021. I looked closer and for the zambia sequences there are <500 within the date range, but the augur filter output:
--output results/waveone/sample-allFromzambia.fasta
has almost 700 sequences. Clearly I am doing something wrong or misunderstand something, but unsure where to start. Any help greatly appreciated.
Thanks in advance!