The ‘–clock-filter-iqd’ command removes n_iqd interquartile ranges from the root to tip with time regression. It filters out the new sequences because they are different from the previously defined sequences. The sequences I have added are also very similar, which causes many of the sequences to be on the same branch, and the nextstrain system trims these sequences as much as possible.
If I remove the ‘–clock-filter-iqd’ step, then the Nextstrain system will force these sequences into the build, but the collection date will infer these sequences in the future to spread out the phylogenetic tree.
Is there a way to keep all or most sequences in the refine step without re-rooting these sequences to future collection dates?
Hi @dwwhall - by default augur refine will use the RTT to infer the best root, and it sounds like the sequences you are adding are changing the root due to this. You can use --root STRAIN [STRAIN2] to define a monophyletic group or single strain to root on instead (this is what we do for our nCoV pipeline).
but the collection date will infer these sequences in the future to spread out the phylogenetic tree.
Are you able to provide sampling dates for these strains (in the metadata TSV)? That would avoid us attempting to infer a date here.