I am trying to run a few hundred samples through NextStrain in a global background from GISAID. I include the sample names in the “include.txt” file, and see them added back in on the log files. The samples show up in the sequence-diagnostics.tsv output and each has an associated align_#.txt, but none of the jsons seem to include the samples. The log files do not indicate that they were filtered out. The samples all have >99% complete genomes and the metadata was carefully defined to match the standards.
Can you help me figure out why the samples don’t appear in the json output files?
what exact pipeline/snakefile are you running. In many of our runs, there are two filtering steps: One to align all sensible sequences and another later to pick a relevant subsample. Could it be that you only force inclusion of your samples in the first and not the second step?
I do not believe I am forcing the inclusion of the samples in the second step. I am running everything default according to the SARS-CoV-2 tutorial aside from using only the global build and including my samples in include.txt.
My snakemake call looks like: snakemake --profile my_profiles/modified_example -p
I am looking at the main_workflow.smk, but I do not readily see how I can force the inclusion of my samples in the subsampling. Can you guide me through this or point me to a page where it is described?
I see now that I have to make additional changes according to:
I will spend more time with the advanced customization example and will come back if I have questions!