I have a collection of SARS-CoV-2 genomes that came from different facilities. I’d like to use the subcommand trait to determine/estimate the number of viral introduction events into each facility that are represented by the sequences I have.

I’ve created a sequence set combining my sequences with samples from GISAID collected around the same time and in the same geographic area. I’ve made a metadata file that gives each of my sequences a designation corresponding to the facility it came from and all the GISAID sequences an “other” designation.

I have successfully gotten “trait” to output location assignments for unsampled nodes (either a facility or the “other” designation). However, I’ve notice that if I run the same tree and metadata file through trait, these assignments will change, so I assume that there is a stochastic process that determines unsampled node assignments. **Is there a way to calculate probabilities that node x will be assigned designation y without running trait multiple times and summarizing the results of those runs?**

The number of samples I have from some facilities is really small (as little as 2 sequences). **Is it still worthwhile to run this analysis or do I really need larger sample sizes for the output to mean anything?**

**Finally, is there a way to get the node numbers (indexes) created by refine to display on trees visualized with auspice? What about getting output from trait to display on trees?**

Thanks!

Hi Amanda - `augur traits`

uses discrete trait analysis (DTA), which models migration like mutations, and therefore inherits the same assumptions. There are papers which discuss these and propose alternative methods, e.g. De Maio et al. With deme sizes as low as 2 I would be skeptical of the results here, and for your data I would suggest comparing to other models. Augur traits has some parameters which affect the equilibrium frequencies and transition rate factors of the underlying model, and can output confidence values for different states per node, but again the results will be dependent on sampling. If you supply the output node-data JSON from `augur traits`

to `augur export`

then the resulting dataset will include these values (see here for docs relating to the `nextstrain/ncov`

pipeline); auspice displays these by colouring the internal branches of the tree for that trait, and hovering over branches will reveal the confidence values.

Finally to get the node numbers (reated by refine) from an auspice visualisation, you can right-click on the displayed branch and click “inspect” (wording may be slightly different depending on browser). The SVG element will have an id along the lines of `id="branchS_NODE_0000379"`

, where `NODE_0000379`

is the node name in the intermediate files.

Thanks so much, James. I appreciate the information.