Hi, I’m new to Nextstrain and I was wondering, why do we have the “subsampling” step during the snakemake workflow? Is it because the total number of sequences is huge and we’d like to maybe only focus on a small part of it based on some criteria? If so, maybe it is better to be called something like “criteria-ing” or “criteria-based-sampling”? Moreover, what exactly is this “subsampling”? Does it mean the criteria-based-sequences are chosen uniformly or so? Many thanks.
Related topics
Topic | Replies | Views | Activity | |
---|---|---|---|---|
Perform analysis merging my dataset and South America dataset, without subsampling | 3 | 579 | February 24, 2022 | |
Separate subsampling procedure | 0 | 404 | January 5, 2021 | |
Number of subsampled metadata and sequences lower than indexed | 1 | 332 | October 31, 2022 | |
Augur error while subsampling - updated | 0 | 496 | November 21, 2020 | |
How to select representative data from GISAID | 0 | 315 | June 5, 2022 |