How to pass an sequence index into a build?

dbridges · December 20, 2021, 11:56am

I have a large set of sequences to run in a number of builds. Is there a way to pass the index that I already created with augur index within the build? Was expecting to do something like this:

inputs:

name: “worldwide”
metadata: data/worldwide/metadata.tsv
sequences: data/worldwide/sequences.fasta
index: data/worldwide/sequences.fasta.index

Apologies if I missed this in the documentation somewhere…

rneher · December 20, 2021, 8:40pm

you didn’t miss anything. this is currently not possible without reaching into the workflow.

github.com

nextstrain/ncov/blob/master/workflow/snakemake_rules/main_workflow.smk#L365

    
      
               - max-date: {params.max_date}
               - {params.exclude_ambiguous_dates_argument}
               - exclude: {params.exclude_argument}
               - include: {params.include_argument}
               - query: {params.query_argument}
               - priority: {params.priority_argument}
              """
          input:
              sequences = _get_unified_alignment,
              metadata = _get_unified_metadata,
              sequence_index = rules.index_sequences.output.sequence_index,
              include = config["files"]["include"],
              priorities = get_priorities,
              exclude = config["files"]["exclude"]
          output:
              sequences = "results/{build_name}/sample-{subsample}.fasta",
              strains="results/{build_name}/sample-{subsample}.txt",
          log:
              "logs/subsample_{build_name}_{subsample}.txt"
          benchmark:
              "benchmarks/subsample_{build_name}_{subsample}.txt"

github.com

nextstrain/ncov/blob/master/workflow/snakemake_rules/main_workflow.smk#L487

    
      
                  for subsample in subsampling_settings
              ]
          
          
rule combine_samples:
              message:
                  """
                  Combine and deduplicate FASTAs
                  """
              input:
                  sequences=_get_unified_alignment,
                  sequence_index=rules.index_sequences.output.sequence_index,
                  metadata=_get_unified_metadata,
                  include=_get_subsampled_files,
              output:
                  sequences = "results/{build_name}/{build_name}_subsampled_sequences.fasta.xz",
                  metadata = "results/{build_name}/{build_name}_subsampled_metadata.tsv.xz"
              log:
                  "logs/subsample_regions_{build_name}.txt"
              benchmark:
                  "benchmarks/subsample_regions_{build_name}.txt"
              conda: config["conda_environment"]

We know that filter and index are currently a big pain an are working on ways to circumvent this.

dbridges · December 21, 2021, 6:20am

Thanks @rneher - good to know!

Topic		Replies	Views
Problems automating build with a Snakefile	2	514	May 13, 2023
Build for multiple counties? Help and Getting Started	4	825	July 13, 2020
Number of subsampled metadata and sequences lower than indexed General	1	349	October 31, 2022
Zika Index Sequences Error Help and Getting Started	1	391	May 1, 2023
Help with Build Help and Getting Started	2	505	December 24, 2021

How to pass an sequence index into a build?

Related topics