What is the purpose of "subsampling" in the workflow?

emtf · January 6, 2021, 4:06am

Hi, I’m new to Nextstrain and I was wondering, why do we have the “subsampling” step during the snakemake workflow? Is it because the total number of sequences is huge and we’d like to maybe only focus on a small part of it based on some criteria? If so, maybe it is better to be called something like “criteria-ing” or “criteria-based-sampling”? Moreover, what exactly is this “subsampling”? Does it mean the criteria-based-sequences are chosen uniformly or so? Many thanks.

Topic		Replies	Views
Perform analysis merging my dataset and South America dataset, without subsampling General	3	613	February 24, 2022
Separate subsampling procedure General	0	423	January 5, 2021
Number of subsampled metadata and sequences lower than indexed General	1	349	October 31, 2022
Augur error while subsampling - updated Help and Getting Started	0	504	November 21, 2020
How to select representative data from GISAID Help and Getting Started	0	323	June 5, 2022

What is the purpose of "subsampling" in the workflow?

Related topics