I’ve been downloading tree.json files for COV2 from e.g. nextclade_data/data/datasets/sars-cov-2/references/MN908947 at release · nextstrain/nextclade_data · GitHub and parsing the json structure into a tree, but it appears that those json files don’t have branch lengths embedded in them. So as an alternative I have tried downloading the newick files from the COV2 browser, e.g. at auspice but it appears as if those newick trees are missing a great deal of data compared to the JSON files (a few random examples: strain USA/MI-CDC-ASC210067726/2021, England/NORW-EAAA4/2020, Netherlands/Limburg_4/2020). I’m just trying to understand why the sample sets differ. Can anyone direct me to documentation on this (or let me know if there is branch length data available for the tree.json files)? Thanks a lot!
The tree files you’ve been downloading are used for Nextclade’s tree placement and aren’t necessarily associated with the trees shown on https://nextstrain.org/ncov/open. That said, a brief look shows me they do have
div properties in the
node_attrs object for each node, which should be a cumulative divergence value. (Though I’m not sure off the top of my head the specifics of that value as produced for Nextclade’s trees.)
Thanks, that’s super helpful. I would never have realised that “div” contained divergence values. For what it’s worth, the div values in those JSON files seem to be integers between 0 and (in the file of samples up to 2021-07-08) about 47.