Hi, I’m trying to add a SARs CoV2 newick tree (all data up to mid 2021, could either be the GISAID or Open dataset) to a bioinformatics pipeline. I can get the newick file by clicking “Download Data” from the viewing page, but ideally I would like to automate this, or at least encode a download URL into my script for reproducibility. Is there a way to get a newick tree programatically like this? Or is this restriction imposed to force the user to perform the click-though agreement for data reuse?
You can download our Nextstrain dataset JSON files containing the trees for each region in the Open datasets at URLs like:
https://data.nextstrain.org/files/ncov/open/global/global.json
https://data.nextstrain.org/files/ncov/open/oceania/oceania.json
but note that (for historical reasons) this corresponds to the 6 month subsampled dataset for each region:
https://nextstrain.org/ncov/open/global/6m
https://nextstrain.org/ncov/open/oceania/6m
and not the other timespans (i.e. 1m, 2m, all-time).
Please do note these are all subsampled data and not “all data up to mid 2021”.
Other SARS-CoV-2 data files available are described at Remote inputs — SARS-CoV-2 Workflow documentation.
Thanks for your fast reply, and for the links. As you may have gathered, for my use-case it is the all-time data I’m particularly interested in, sadly, so I will have to do some more digging. But it’s very useful to know that these are the available links - much appreciated.
Take a look at https://cov2tree.org/ and its source data files in http://hgdownload.soe.ucsc.edu/goldenPath/wuhCor1/UShER_SARS-CoV-2/. The information () icon at the top right of cov2tree.org has additional description of the tree, methods, and links to other data.
Thanks. UShER is a good bet for huge trees: at the moment I’m sticking to nextstrain comparisons, but I’m sure we’ll move onto UShER at some point soon.