Hi @Achattwood
The complexity of it depends on where exactly you want to add these sequences. Please give some more information, perhaps screenshots.
I’ll answer the questions literally for now:
Is there a way to recover previous reference dataset versions in Nextclade?
There are many ways to do this, but I think neither is very convenient currently. The easiest is probably to pre-configure Nextclade Web to load a dataset by it’s “path” and “tag” like this:
https://clades.nextstrain.org/?dataset-name=nextstrain/rsv/a/EPI_ISL_412866&dataset-tag=2025-09-09–12-13-13Z
However, currently dataset-tag= parameter seems to be buggy and causes a crash for all tags except latest. I’ll investigate and try to fix it.
Once it’s working, if you know a tag, you can roll back to it. All tags are listed on “Releases” page in nextclade_data GitHub repo: https://github.com/nextstrain/nextclade_data/releases?q=“rsv”&expanded=true or in the data_output/index.json file: https://github.com/nextstrain/nextclade_data/blob/1e827a5c13f27553e5cd1b7b2d7c3f3c96e2c2fc/data_output/index.json#L1977-L2035
Similarly, If you use Nextclade CLI, you can use --tag with the nextclade dataset get command when fetching dataset files.
If you want to inspect dataset files, then “Releases” contain zip archives of the datasets, for each tag. Or you can find all versions of all datasets in the data_output/ directory of the repo.
More complicated ways, but with more control:
If you want to use custom single dataset or an entire dataset server, then you’d need to host these datasets somewhere - on your local computer, on local network or on the internet. GitHub also works. See the docs.
Relevant docs:
Can I manually add the three lost sequences back to the reference dataset?
I am not sure where you want to add them exactly.
I haven’t checked the datasets, sorry, but if you want to add sequences to input fasta (the sequences being analyzed), then you could drag and drop multiple files in web, or append them to an existing fasta file in the existing dataset (see FASTA format), or use ?input-fasta in web (multiple occurrences allowed), or provide multiple positional arguments in CLI.
If you are talking about samples on the reference tree, then you’d have to re-build the tree, as explained in the “Dataset creation guide“.
If I rewrite the lesson plan, will a future update erase my efforts?
Yes, likely. Regarding example input sequences - these are randomly chosen by dataset authors, based on their own considerations. Usually, it’s no more than to showcase the various features of the dataset. The examples are non-significant in Nextclade’s main use-case, because most users come with their own data to be analyzed. If we are talking about input sequences, then you can drag and drop any fasta file - with any sequences you like.
Regarding reference trees - the subsampling step during tree constructions is non-deterministic, so it’s hard to predict which sequences end up on a tree. But there are some toggles if you are willing to build your own tree. It’s quite involved technically.
For reproducibility, you’d want to “freeze“ the dataset version used: either by using a concrete tag, or by using your custom dataset or at least custom input sequences (if that’s the only requirement).
You’d also want to freeze the software version. We keep updating the software, fixing bugs and adding new features. This could cause unintended consequences, which are hard to predict on our side, though we try to avoid any obvious breakage for users. Currently you cannot select version of Nextclade Web software, it’s always the latest. You can host your own copy of Nextclade Web though - the software is open-source. And you can download any version of Nextclade CLI if that’s what you use.
Let me know if it helps at all or how I can assist further.
P.S. We might need to consider to add some more convenient ways to select dataset versions in Web. So far the assumption was that most web users would want to always use the latest and greatest, with “advanced“ feature of CLI being an ability to “freeze“ the version for reproducibility.