Displaying trees from NCBI Pathogen Browser in auspice.us

NCBI generates phylogenetic trees from sequences submitted to NARMS, PulseNet, and GenomeTrakr, which can be found here: Isolates Browser - Pathogen Detection - NCBI

I’m interested in visualizing these trees using auspice. They can be downloaded as Newick files, but they don’t display properly when imported into auspice. I was wondering if you all had time to take a look and see if the newick files can be easily converted into a format that will display properly. I’ve attached a small newick file I pulled from the pathogen browser to this post.
pathogen_browser.nwk (1.2 KB)

There’re a few things in that tree might be confusing Newick parser, but the first place I would look is the long tip names with commas in them. Commas are used as delimiters in newick format, so any parser that isn’t respecting the quotes around those names is going do something weird or panic and die.

Ex: ‘clinical, 2021-06-01, USA, PNUSAS204121, PDT001057153.1’ as a tip name

@ewolfsohn When you say you’re importing these into Auspice, are you dragging them onto auspice.us or converting them to Auspice JSONs via Augur?

@Jason’s hunch about parsing is a good guess.

I am dragging them onto auspice.us.

Ah, it’s definitely a Newick parsing issue then. The minimal parser used by auspice.us doesn’t support the single-quoted names allowed by the Newick format and used in those trees from NCBI. We should definitely replace that parser with a better one.

I’ve proposed changes to auspice.us to swap in a better Newick parser. It won’t be on auspice.us until accepted and merged, but you can test it out for now on https://auspice-us-trs-improved-udntd9.herokuapp.com/.

@ewolfsohn Those changes are now deployed to auspice.us, so you should be able to view the NCBI Pathogens trees there.

It’s working great for me! Thank you for looking into it.

Unfortunately our new newick parsing introduced a few other bugs and I’ve had to switch back to the old way. This means that the handling of quoted strain names (such as the tree in this post) are problematic again. To avoid rendering incorrect trees I’ve made trees with quotes in them result in an error while we sort this out. Thanks!

@ewolfsohn You can change the tip labels when you download the newick file from NCBI. I usually choose the strain only as the label. This also makes it easier to match up the metadata file to the tip labels using whatever you choose as the tip label. Demo files attached pulled from NCBI that work on auspice
pdp_metadata.tsv (1.6 MB)
pdp_tree.newick (157.6 KB)


1 Like