I have a dataset (fasta and metadata) with only hospital patients. And I have a second dataset from gisaid that I want to use as background.
What I want to do is to include all the sequences from the hospitalized set, and to subsample the Gisaid set.
Is it possible to achieve this, for example using “name”? Or do I need to add a new column specifying “hospitalized” or not and subsample based on that?
This can be achieved using the “name” of the input.
Here’s an example of a config file that includes all sequences from one input and does subsampling on the second input:
Thanks a lot Jover! This is perfect!