While using augur refine, it seems clock-filter-iqd 4 has been enough to get rid of major outliers in most analyses I have been seen. I’m using this parameter in an analysis with 3300 genomes, but clock-filter-iqd 4 removes nearly half of the genomes shown in the clock view below.
If I remove clock-filter-iqd, no tips are pruned, as expected. But even if I set it to clock-filter-iqd 100, it still removes 1000+ genomes from the build.
By inspecting the tree above on TempEst, that is the root-to-tip plot and distribution of residuals I get.
I’m trying to only get rid of the samples on blue above (n ~ 40), but I have neither figured out how to set clock-filter-iqd to do so, nor why augur refine is purging 1000+ samples, no matter how clock-filter-iqd is set up.
I guess I found the cause of this issue.
My pipeline has a rule align that relies on an --existing-alignment to speed up the alignment step, but this file contains several extra genomes not found in the original --sequences. During the alignment step, those extra genomes were in large part the ones being purged in the rule refine step.
After reformatting the --existing-alignment to keep only (most of the) genomes found inside the --sequences file, after setting clock-filter-iqd 3, it was enough to prune around 30 of those outlier genomes on blue above, which lie within the three interquartile limits.
Not sure if it fully explains the issue, but fixing the --existing-alignment did the trick.
Still on this topic, and taking advantage of this thread about clock filter:
Is there a way to protect certain leaves against pruning when clock-filter-iqd is used? For example, among thousands of genomes, a few dozen are pruned, but I want to prevent two leaves from being pruned. How can I do that?
Can I automatically export the list of pruned leaves?
sorry about the delay. I can’t think of an automated way to do that right now. Also treetime’s reporting of filtered sequences needs improvements. so sorry, can’t really help.
you could just filter once, add-back the ones you want to keep, and rerun without filter.