Subsampling Local DENV dataset based on genetic similarity


I hope you are doing well! My goal is to subsample 300 genetically similar sequences to a distinct set of sequences based on the the country they were collected in (Colombia for example). I have went through a few discussions for subsampling DENV, which suggested that the DENV workflow isn’t suited for accepting custom subsampling schemes, just yet. What would be the best alternative way implement the following build/snakefile below:
denv_col.yaml.txt (1.9 KB)

I only need the fasta file of the subsampled data. Thank you for your help! I can provide the input files through email if needed!