Using trait subcommand to infer location of unsampled nodes

amcasto · September 6, 2021, 12:42am

I have a collection of SARS-CoV-2 genomes that came from different facilities. I’d like to use the subcommand trait to determine/estimate the number of viral introduction events into each facility that are represented by the sequences I have.
I’ve created a sequence set combining my sequences with samples from GISAID collected around the same time and in the same geographic area. I’ve made a metadata file that gives each of my sequences a designation corresponding to the facility it came from and all the GISAID sequences an “other” designation.
I have successfully gotten “trait” to output location assignments for unsampled nodes (either a facility or the “other” designation). However, I’ve notice that if I run the same tree and metadata file through trait, these assignments will change, so I assume that there is a stochastic process that determines unsampled node assignments. Is there a way to calculate probabilities that node x will be assigned designation y without running trait multiple times and summarizing the results of those runs?
The number of samples I have from some facilities is really small (as little as 2 sequences). Is it still worthwhile to run this analysis or do I really need larger sample sizes for the output to mean anything?
Finally, is there a way to get the node numbers (indexes) created by refine to display on trees visualized with auspice? What about getting output from trait to display on trees?
Thanks!

james · September 7, 2021, 2:35am

Hi Amanda - augur traits uses discrete trait analysis (DTA), which models migration like mutations, and therefore inherits the same assumptions. There are papers which discuss these and propose alternative methods, e.g. De Maio et al. With deme sizes as low as 2 I would be skeptical of the results here, and for your data I would suggest comparing to other models. Augur traits has some parameters which affect the equilibrium frequencies and transition rate factors of the underlying model, and can output confidence values for different states per node, but again the results will be dependent on sampling. If you supply the output node-data JSON from augur traits to augur export then the resulting dataset will include these values (see here for docs relating to the nextstrain/ncov pipeline); auspice displays these by colouring the internal branches of the tree for that trait, and hovering over branches will reveal the confidence values.

Finally to get the node numbers (reated by refine) from an auspice visualisation, you can right-click on the displayed branch and click “inspect” (wording may be slightly different depending on browser). The SVG element will have an id along the lines of id="branchS_NODE_0000379", where NODE_0000379 is the node name in the intermediate files.

amcasto · September 8, 2021, 5:07pm

Thanks so much, James. I appreciate the information.

Topic		Replies	Views
Inconsistencies in the result of augur traits Help and Getting Started	1	241	May 8, 2023
300 or more distinct discrete states found in ancestral reconstruction	4	713	March 13, 2021
Problems with `augur traits` and `augur frequencies` using supplied sequences Help and Getting Started	3	952	June 23, 2021
Extract nt and AA mutation info per branch from JSON after augur ancestral/translate/export General	6	747	March 22, 2021
How to parse node data JSONs from the ncov workflow (or any Nextstrain workflow) General	0	403	November 15, 2021

Using trait subcommand to infer location of unsampled nodes

Related topics