Is it possible to supply multiple inputs as a list of strain names?

In example here using multiple inputs: Running an analysis starting from multiple inputs | Tutorial: Using Nextstrain for SARS-CoV-2
It is so useful for filtering and coloring!

Is there an easy way to supply each of the separate input as a list of strain names, if all strains are already included in the gisaid metadata and fasta? Or the only options are to subset the metadata and fasta before Nextstrain, or write another rule in the snakemake file?


Hi @dlu - you can supply a list of strains to force-include in the analysis by adding them to a text file and specifying this in the builds YAML like so:

  include: "defaults/include.txt" # this is the default include file

If you wish to restrict the analysis to only these sequences, I think you can specify a dummy subsampling definition (in the builds.yaml) like so, which appears to select very few sequences but will in fact include all of the force-included sequences.

    subsampling_scheme: restrictive
      group_by: "year"
      seq_per_group: 100

Note: make sure to include the strain you are using to root the tree in the include TXT file. By default this is "Wuhan/Hu-1/2019".

That’s good to know, thanks James!! include.txt is really very useful.

Adding a note for others that the strains listed in include.txt must also be present in the metadata and fasta supplied in the input section for it to work.