Iterative use of nextstrain or parameters tuning

qwerty123 · July 23, 2021, 7:41pm

Dear all,

I faced with the following problem:
Building the tree takes too long due to the large amount of data I have. Now it takes a week to calculate it. And with each next week, the amount of data increases. (Now I have already 20k of subsampled samples, but soon it will be about 40-50k)

(I am still using the default parameters for Nextstrain. In other words, I haven’t changed anything in main_workflow.smk)

Is it possible to somehow optimise (or tune) some of the parameters in one of the .smk files to speed up the calculations?

Or maybe I can build a new tree not from scratch, but based on the previous one.
For example, I have my already built tree. I am getting a new data. Can I build a new tree with new data starting from the tree I already have?

rneher · July 26, 2021, 8:27pm

thanks for reaching out.

Unfortunately, neither the processing pipeline augur/treetime nor our visualization is well suited to handle more than 10-20k sequences. There are a few ways in which things could be sped up (tree building parameters, skipping confidence calculation for the timetree (or the time tree calculation altogether)). But to give specific advice, we would need to know more about your use case.

Topic		Replies	Views
Is there a way to incorporate Ultrafast bootstrap in the pipeline?	5	1034	August 15, 2021
Digging into the ½ million sequences with javascript + filtering + dynamic nextstrain tree	0	387	February 17, 2021
Error in "main_workflow.smk" file in --latency-wait Help and Getting Started	7	1104	May 16, 2022
Modifying builds.yaml file Help and Getting Started	1	516	August 15, 2020
Error in rule tree Help and Getting Started	22	1289	April 5, 2021

Iterative use of nextstrain or parameters tuning

Related topics