Digging into the ½ million sequences with javascript + filtering + dynamic nextstrain tree

Hi, I made a demo there showing that it is possible, once extracted the SNP of the 500000 sequences, to run in a web page a filtering script selecting a few thousand of sequences, then to add those to a small precomputed nextstrain (divergence) tree.

Obviously the tree I am generating is NOT quite a maximum likelihood phylogenetic tree, the algorithm to add the sequences to the tree is the simplest and the fastest one you can imagine: for each sequence, if a mutation in a branch is not in the sequence then skip the subtree (thus the reversions and missing data generate some duplicated parasite lineages).

The main difficulty I am encountering:

  • optimizing everything to make it faster,

  • finding an algorithm to generate a dynamic timetree,

  • the map works but then it gets buggy, auspice is a bit hard to customize, all I did was to add

    if (window.alreadyaDatasetInMemory) datasetJson = window.alreadyaDatasetInMemory; in loadData.js/fetchDataAndDispatch to force auspice using my dynamically created tree.

Nextclade (the ‘show tree’ button) is implementing something closely related, though the algorithm to create the dynamic tree seems to be trying every node for the closest one, and it is not using the dynamically created nodes.