Subsample different datasets

Hi,
I have a dataset (fasta and metadata) with only hospital patients. And I have a second dataset from gisaid that I want to use as background.

What I want to do is to include all the sequences from the hospitalized set, and to subsample the Gisaid set.

Is it possible to achieve this, for example using “name”? Or do I need to add a new column specifying “hospitalized” or not and subsample based on that?

Thanks, Jon

Hi Jon,

This can be achieved using the “name” of the input.
Here’s an example of a config file that includes all sequences from one input and does subsampling on the second input:

Best,
Jover

1 Like

Thanks a lot Jover! This is perfect! :smile: