Can someone please point me to the setting, or explanation of the sequences placed into
flagged-sequences.tsv. Specifically the meaning of ‘too high divergence’
I am seeing some samples excluded with
too high divergence 15.23>15; or similar values that are falling close to the
15 threshold, and I’d like to see about increasing it slightly above 15.
Yes, these parameters are hard coded in
scripts/diagnostic.py (not ideal, I know).
we recently relaxed these numbers as the old set of numbers was too strict for the increasing diversity accumulating globally.
Hi, can you share a copy of some version of flagged-sequences.tsv and related files ? For example I’d like to count the number of Belgium sequences and eventually see the flagging messages. Thank you.
The ‘excess divergence’ field seems to be the most interesting for excluding sequences with an isolated long branch in the tree, it is unclear if it is used and how.
We are filtering on excess divergence. if its absolute value is too large, the sequences get added to the exclude file and will then be dropped.