NCBI generates phylogenetic trees from sequences submitted to NARMS, PulseNet, and GenomeTrakr, which can be found here: Isolates Browser - Pathogen Detection - NCBI
I’m interested in visualizing these trees using auspice. They can be downloaded as Newick files, but they don’t display properly when imported into auspice. I was wondering if you all had time to take a look and see if the newick files can be easily converted into a format that will display properly. I’ve attached a small newick file I pulled from the pathogen browser to this post.
pathogen_browser.nwk (1.2 KB)
There’re a few things in that tree might be confusing Newick parser, but the first place I would look is the long tip names with commas in them. Commas are used as delimiters in newick format, so any parser that isn’t respecting the quotes around those names is going do something weird or panic and die.
Ex: ‘clinical, 2021-06-01, USA, PNUSAS204121, PDT001057153.1’ as a tip name
@ewolfsohn When you say you’re importing these into Auspice, are you dragging them onto auspice.us or converting them to Auspice JSONs via Augur?
@Jason’s hunch about parsing is a good guess.
I am dragging them onto auspice.us.
Ah, it’s definitely a Newick parsing issue then. The minimal parser used by auspice.us doesn’t support the single-quoted names allowed by the Newick format and used in those trees from NCBI. We should definitely replace that parser with a better one.
I’ve proposed changes to auspice.us to swap in a better Newick parser. It won’t be on auspice.us until accepted and merged, but you can test it out for now on https://auspice-us-trs-improved-udntd9.herokuapp.com/.
@ewolfsohn Those changes are now deployed to auspice.us, so you should be able to view the NCBI Pathogens trees there.
It’s working great for me! Thank you for looking into it.