How to deal with MCC BEAST Tree uncertain data?

Hello,

I am in charge of submitting a dataset to the nextstrain pipeline.
After following the different tutorials on nextstrain.org, I attempted to import a BEAST MCC tree on the nextstrain pipeline. To do this, I first used augur import beast as follows :arrow_down:

augur import beast --mcc ./data/WA261_mcct --output-tree OUTPUT_TREE --output-node-data OUTPUT_NODE_DATA --tip-date-regex “[0-9]{4}”

Knowing that the tip labels contain the dates (in years). Please find below the results of this command :arrow_down:

Success parsing BEAST nexus
Parsed BEAST traits:
name n(internal) n(terminal)
location2_median 260 0
location1_median 260 0
length 260 261
posterior 260 0
height_median 260 261
location1 260 261
location2 260 261
height 260 261
location1_80%HPD_1 260 0
location2_80%HPD_1 260 0
height_confidence 260 261
location.rate_median259 261
location.rate 259 261
rate 259 261
length_median 259 261
rate_median 259 261
length_confidence 259 261
rate_confidence 259 261
location.rate_confidence259 261
location1_80%HPD_2 80 0
location2_80%HPD_2 80 0
location1_80%HPD_3 33 0
location2_80%HPD_3 33 0
location1_80%HPD_4 13 0
location1_80%HPD_5 6 0
location2_80%HPD_4 13 0
location2_80%HPD_5 6 0
location1_80%HPD_6 4 0
location2_80%HPD_6 4 0
location1_80%HPD_7 3 0
location2_80%HPD_7 3 0
location1_80%HPD_9 1 0
location1_80%HPD_8 1 0
location1_80%HPD_11 1 0
location1_80%HPD_10 1 0
location2_80%HPD_10 1 0
location2_80%HPD_11 1 0
location2_80%HPD_8 1 0
location2_80%HPD_9 1 0

Inferred 2018.00 as the most recent tip date
Tree root is 122.49 years prior to most recent tip
Temporal phylogeny spans 1895.51 - 2018.00


Successfully parsed BEAST MCC tree ./data/WA261_mcct
Files produced:
OUTPUT_TREE
OUTPUT_NODE_DATA

I then ran the following command :arrow_down:

augur export v2 --tree OUTPUT_TREE2 --node-data OUTPUT_NODE_DATA2
–output auspice/2ndtrymcctree.json

However, this leads to an error :arrow_down:

Traceback (most recent call last):
File “/usr/local/bin/augur”, line 33, in
sys.exit(load_entry_point(‘nextstrain-augur’, ‘console_scripts’, ‘augur’)())
File “/nextstrain/augur/augur/main.py”, line 10, in main
return augur.run( argv[1:] )
File “/nextstrain/augur/augur/init.py”, line 75, in run
return args.command.run(args)
File “/nextstrain/augur/augur/export.py”, line 22, in run
return run_v2(args)
File “/nextstrain/augur/augur/export_v2.py”, line 1024, in run_v2
node_attrs=node_attrs
File “/nextstrain/augur/augur/export_v2.py”, line 423, in set_colorings
colorings = [c for c in colorings if _is_valid(c)]
File “/nextstrain/augur/augur/export_v2.py”, line 423, in
colorings = [c for c in colorings if _is_valid(c)]
File “/nextstrain/augur/augur/export_v2.py”, line 370, in _is_valid
trait_values = get_values_across_nodes(node_attrs, key) # e.g. list of countries, regions etc
File “/nextstrain/augur/augur/export_v2.py”, line 183, in get_values_across_nodes
vals.add(data.get(key))
TypeError: unhashable type: ‘list’

I assume that augur does not handle cases where uncertainty around
ancestral locations is represented as a list of polygons (refer to “location2_80%HPD_1” below):
:arrow_down:

One solution that has been proposed to me is to parse the node data JSON to discard uncertain data. This method works, because I was then able to use augur export v2 without causing the error previously stated.

However, I think it would be possible to integrate such a feature in the augur code, either in augur import beast or augur export (in order to ignore these problematic values)

Please let me know, if you have any idea to fix this issue !

Hey @kelianP – thanks for the detailed report, really helpful.

I assume that augur does not handle cases where uncertainty around
ancestral locations is represented as a list of polygons (refer to “location2_80%HPD_1” below)

Yeah, the issue is that node-data files (such as the output from augur import beast) define trait-value pairs on each node, such as location2 = -5.77..., which eventually become colorings in the auspice visualisation (e.g. you can colour the tree by “location2” and see the numeric values across the tree). There are a two other allowed formats of data here: <trait_key>_confidence and <trait_key>_entropy which store data as an array (of length 2) and a dictionary, respectively; which are used to convey extra information about <trait_key> rather than being their own colouring.

Here we have a trait-value pair location2_80%HPD_1 = <list> which augur export v2 doesn’t know what to do with. Is there a way these data should (could) be visualised in auspice? The simplest would be to stop such data being exported by augur import beast (src code if you want to have a go!).

1 Like

Hey @james thanks a lot for your answer. I am a beginner in programming but I will see what I can do to ignore these values in import beast.

PS : I’m new to the nextstrain community but it’s very nice to see such mutual help! :wink:

1 Like