Is it possible to supply multiple inputs as a list of strain names?

dlu · July 12, 2021, 9:44pm

In example here using multiple inputs: Running an analysis starting from multiple inputs | Tutorial: Using Nextstrain for SARS-CoV-2
It is so useful for filtering and coloring!

Is there an easy way to supply each of the separate input as a list of strain names, if all strains are already included in the gisaid metadata and fasta? Or the only options are to subset the metadata and fasta before Nextstrain, or write another rule in the snakemake file?

Thanks!

james · July 14, 2021, 9:18pm

Hi @dlu - you can supply a list of strains to force-include in the analysis by adding them to a text file and specifying this in the builds YAML like so:

files:
  include: "defaults/include.txt" # this is the default include file

If you wish to restrict the analysis to only these sequences, I think you can specify a dummy subsampling definition (in the builds.yaml) like so, which appears to select very few sequences but will in fact include all of the force-included sequences.

builds:
  your_build_name:
    subsampling_scheme: restrictive
subsampling:
  restrictive:
    main:
      group_by: "year"
      seq_per_group: 100

Note: make sure to include the strain you are using to root the tree in the include TXT file. By default this is "Wuhan/Hu-1/2019".

dlu · July 14, 2021, 10:06pm

That’s good to know, thanks James!! include.txt is really very useful.

Adding a note for others that the strains listed in include.txt must also be present in the metadata and fasta supplied in the input section for it to work.

Topic		Replies	Views
Subsample different datasets	2	413	September 1, 2022
Running local samples in global background Help and Getting Started	3	555	August 3, 2020
Using Genomic Epidemiology from GISAID Help and Getting Started	1	338	November 5, 2021
Using existing alignment Help and Getting Started	5	535	January 29, 2022
Multiple subsampling from same alignment	2	364	September 1, 2021

Is it possible to supply multiple inputs as a list of strain names?

Related topics