Inconsistencies in the result of augur traits

I have been carrying out different executions of a dataset with the augur traits option, currently I have a base dataset of 20000 sequences and I do different executions taking an average of 4000 sequences from the main dataset using the group_by option, however when I see the results in auspice I notice that the inference of transmission events varies between iterations

As an example I have regions A, B, C and D
first repetition infers first transmission event of variant X from point A to B
Second repetition infers first transmission event of variant X from point B to C
Second repetition infers first transmission event of variant X from point D to C

I know from the literature that the starting point is region B, and in all executions the first sequences of the variant are found in region B.

Taking this result into account, can I assume that using NEXTSTRAIN it is not possible to infer the initial transmission events of the variants? Or am I making a mistake?

In my configuration I use group_by (division year mount), in augur traits I use a sample bias of 2.5 and reviewing the sequences of each execution I see that they are selected evenly between all regions

Hi Juan! Inference of ancestral states is inherently a statistics problem. If you have different trees, you can get different results. TreeTime uses maximum likelihood methods, which gives only limited information about how likely a particular transmission path is compared to all the possible paths.

If different runs give different results it just means that all these paths are compatible with the data. Does that make sense?

See: Inference of transition between discrete characters and ‘mugration’ models — TreeTime 0.10.0 documentation
and maybe also: TreeTime: Maximum-likelihood phylodynamic analysis | Virus Evolution | Oxford Academic for a deep dive

1 Like