I have a large set of sequences to run in a number of builds. Is there a way to pass the index that I already created with augur index within the build? Was expecting to do something like this:
inputs:
name: “worldwide”
metadata: data/worldwide/metadata.tsv
sequences: data/worldwide/sequences.fasta
index: data/worldwide/sequences.fasta.index
Apologies if I missed this in the documentation somewhere…
rneher
December 20, 2021, 8:40pm
2
you didn’t miss anything. this is currently not possible without reaching into the workflow.
- max-date: {params.max_date}
- {params.exclude_ambiguous_dates_argument}
- exclude: {params.exclude_argument}
- include: {params.include_argument}
- query: {params.query_argument}
- priority: {params.priority_argument}
"""
input:
sequences = _get_unified_alignment,
metadata = _get_unified_metadata,
sequence_index = rules.index_sequences.output.sequence_index,
include = config["files"]["include"],
priorities = get_priorities,
exclude = config["files"]["exclude"]
output:
sequences = "results/{build_name}/sample-{subsample}.fasta",
strains="results/{build_name}/sample-{subsample}.txt",
log:
"logs/subsample_{build_name}_{subsample}.txt"
benchmark:
"benchmarks/subsample_{build_name}_{subsample}.txt"
for subsample in subsampling_settings
]
rule combine_samples:
message:
"""
Combine and deduplicate FASTAs
"""
input:
sequences=_get_unified_alignment,
sequence_index=rules.index_sequences.output.sequence_index,
metadata=_get_unified_metadata,
include=_get_subsampled_files,
output:
sequences = "results/{build_name}/{build_name}_subsampled_sequences.fasta.xz",
metadata = "results/{build_name}/{build_name}_subsampled_metadata.tsv.xz"
log:
"logs/subsample_regions_{build_name}.txt"
benchmark:
"benchmarks/subsample_regions_{build_name}.txt"
conda: config["conda_environment"]
We know that filter and index are currently a big pain an are working on ways to circumvent this.
Thanks @rneher - good to know!