Way to turn off filters?

Hello, I’ve constructed some (non-clock-like) consensus genomes using various different parameters and want to construct trees with only my genomes plus Wuhan reference. The parameter settings for variant calling and consensus construction might cause the genomes to be filtered out by auger, but I’d like to force it to construct a tree without filtering them out so I can compare trees with different parameter settings.

I was doing this brute force by putting all taxa in the include.txt file, but I’d also like to be able to subsample since I have about 12K genomes and the include file would overrule subsampling (can one subsample on non-geographic metadata such as sex? sorry to embed unrelated q here). I’ve been testing things out with a toy example of 100 genomes and trying to subsample grouping by sex with 25 seq_per_group.

So is there a way to turn off the filtering globally?
Thanks!
Stacia

Any help on this? Wondering if I can turn off filtering completely or certain kinds of filtering.

Would this be appropriate? Workflow config file reference — SARS-CoV-2 Workflow documentation

Just add this to builds.yaml:

filter:
  input_name:
    skip_diagnostics: True

so the diagnositics won’t run on data with input_name. You can see our config here. We added this b/c the reference strain (root of the tree) was getting removed in extreme conditions XD

For subsampling, I thought query ref and group_by ref allows custom column headers, since they just become options to augur filter ref

Hi @stacia, are you using a SARS-CoV-2 workflow config file (e.g. ncov-tutorial/genomic-surveillance.yaml)? If so, could you provide a link so we can understand the context better?


If not and you’re using augur filter directly:

It sounds like you want to just subsample and not filter. Both are controlled by augur filter. In the docs, the difference is in the parameters under metadata filters vs. subsampling, though you’re right that using --include will override any subsampling.

You should be able to subsample without filtering by simply not providing any of the parameters under metadata filters. As @dlu noted, you can use --group-by sex or any other combination (e.g. --group-by year month sex for temporal sampling) to get what you’re looking for.