I have encountered a new question regarding the usage of augur filter commands and would greatly appreciate your guidance.
My objective is to randomly subsample sequences from Bangladesh for the months of May, June, July, and August, specifically aiming to select 1, 4, 13, and 4 sequences for each month respectively. I attempted to use the following syntax; however, it seems there is an issue with the part --sequences-per-group 1 4 13 4. Upon reviewing the instructions, I did not find any guidance on this specific scenario. The documentation I found only mentions the capability to select any one sample per month from each country. Is it possible to use Augur filter to specify a fixed number of sequences for specific months within a country? If yes, could anyone help me to correct the below syntax? Thanks a lot!!
This is a great question thatâs relevant to my current work.
--sequences-per-group takes a single number which is the size used for all groups.
There are ongoing discussions to support varying group sizes, but currently this is beyond the capabilities of a single augur filter call. It can be done using multiple calls to augur filter. Something like this:
Great question and very useful answer @victorlin! Iâve been struggling to apply multiple filters myself. I can do this for ncov using the configfile, but is it possible to achieve something similar to the code below using multiple calls to augur filter directly in the Snakefile? Specifically I want to subsample strains that are closest to a specific country.
The proximity-based sampling in your config is implemented as part of the ncov Snakemake workflow. I havenât tried myself, but if you want to use it in another Snakemake workflow, you should be able to copy the relevant rules/functions/scripts referenced in main_workflow.smk and modify the rule inputs/outputs.
We have future plans to make this available in an Augur command, but no clear timeline.