Is there an easy way to supply each of the separate input as a list of strain names, if all strains are already included in the gisaid metadata and fasta? Or the only options are to subset the metadata and fasta before Nextstrain, or write another rule in the snakemake file?
Hi @dlu - you can supply a list of strains to force-include in the analysis by adding them to a text file and specifying this in the builds YAML like so:
files:
include: "defaults/include.txt" # this is the default include file
If you wish to restrict the analysis to only these sequences, I think you can specify a dummy subsampling definition (in the builds.yaml) like so, which appears to select very few sequences but will in fact include all of the force-included sequences.
That’s good to know, thanks James!! include.txt is really very useful.
Adding a note for others that the strains listed in include.txt must also be present in the metadata and fasta supplied in the input section for it to work.