Hello, I’ve constructed some (non-clock-like) consensus genomes using various different parameters and want to construct trees with only my genomes plus Wuhan reference. The parameter settings for variant calling and consensus construction might cause the genomes to be filtered out by auger, but I’d like to force it to construct a tree without filtering them out so I can compare trees with different parameter settings.
I was doing this brute force by putting all taxa in the include.txt file, but I’d also like to be able to subsample since I have about 12K genomes and the include file would overrule subsampling (can one subsample on non-geographic metadata such as sex? sorry to embed unrelated q here). I’ve been testing things out with a toy example of 100 genomes and trying to subsample grouping by sex with 25 seq_per_group.
So is there a way to turn off the filtering globally?
Thanks!
Stacia
so the diagnositics won’t run on data with input_name. You can see our config here. We added this b/c the reference strain (root of the tree) was getting removed in extreme conditions XD
It sounds like you want to just subsample and not filter. Both are controlled by augur filter. In the docs, the difference is in the parameters under metadata filters vs. subsampling, though you’re right that using --include will override any subsampling.
You should be able to subsample without filtering by simply not providing any of the parameters under metadata filters. As @dlu noted, you can use --group-by sex or any other combination (e.g. --group-by year month sex for temporal sampling) to get what you’re looking for.