Hi @eakin
Nextclade dev here.
Happy to see there’s interest for the sort command.
I believe the only thing the sort command needs from the server is so-called “minimizer index” file. You can download it and provide it using --input-minimizer-index-json (shortcut -m).
$ nextclade sort --help
...
-m, --input-minimizer-index-json <INPUT_MINIMIZER_INDEX_JSON>
Path to input minimizer index JSON file.
By default, the latest reference minimizer index is fetched from the dataset server (default or customized with `--server` argument). If this argument is provided, the algorithm skips fetching the default index and uses the index provided in the JSON file.
Supports the following compression formats: "gz", "bz2", "xz", "zst". Use "-" to read uncompressed data from standard input (stdin).
...
The “minimizer index json” is a large JSON file which contains a map of dataset names to the “minimizers” - hashed sequence fragments.
Release versions of Nextclade simply download it from https://data.clades.nextstrain.org/v3/minimizer_index.json
You can also find the latest (unstable) version of the minimizer_index.json in the data repo’s data_output/ directory here (which is the current snapshot of the dataset server) on master branch.
The stable version is in the same place, but on release branch.
So what you can do is to download minimizer_index.json file on internet-enabled machine:
curl -fsSLo minimizer_index.json https://data.clades.nextstrain.org/v3/minimizer_index.json
and then provide it to the sort command like this on your offline machine:
nextclade sort -m minimizer_index.json ...
I believe in this case Nextclade CLI won’t make any network requests. If I understood correctly, that’s your goal.
For advanced use-cases, e.g. if you have your own datasets and references and you want to be able to detect/sort based on them, you could generate your own minimizer index file and use it offline or on your own dataset server. In our data repo the minimizer index is prepared as a part of the rebuild script here. And the prototype of the sorting/detection algo is here - it’s the same thing the Nextclade CLI does in Rust, but rewritten in Python (for dev/testing/prototyping purposes).
If you already have datasets organized in a dir structure similar to the nextlade_data repo, then you could run:
./scripts/rebuild --input-dir 'data/' --output-dir 'data_output/' --no-pull
And this should produce data_output/minimizer_index.json for the sort command, as well as data_output/index.json for dataset list and dataset get command, and everything else required for custom Nextclade dataset server to be operational.
Feel free to give the -m paramter a try and let me know if it works for you and whether you have ideas on how to improve things.