–configfile ./my_profiles/fl_delta/builds.yaml
Building DAG of jobs…
CreateCondaEnvironmentException:
The ‘conda’ command is not available in the shell /bin/bash that will be used by Snakemake. You have to ensure that it is in your PATH, e.g., first activating the conda base environment with conda activate base.
File “/usr/local/lib/python3.7/site-packages/snakemake/deployment/conda.py”, line 232, in create
File “/usr/local/lib/python3.7/site-packages/snakemake/deployment/conda.py”, line 343, in new
File “/usr/local/lib/python3.7/site-packages/snakemake/deployment/conda.py”, line 356, in init
File “/usr/local/lib/python3.7/site-packages/snakemake/deployment/conda.py”, line 396, in _check
I’ve reinstalled, but still get the same issue. I’m sure the issue is on my end, but if anyone could give some guidance, I would appreciate it.
Hi @gvestal, the current Nextstrain container doesn’t support conda, so I think this can be fixed by dropping the --use-conda argument (which is being passed to Snakemake running in the container).
nextstrain build --docker . --cores 4 --configfile ./my_profiles/fl_delta/builds.yaml
# P.S. the --docker argument isn't needed if docker is the default environment,
# run `nextstrain check-setup` to see the default
If you want to use conda to manage dependencies etc, then we can also run nextstrain “natively” (i.e. not within a container) via:
# ensure conda & snakemake are available in the current environment
nextstrain build --native . --cores 4 --use-conda –configfile ./my_profiles/fl_delta/builds.yaml
Thank you for the clarification. I managed to get a build done, but I had a follow-up question about subsampling. Below is the subsampling scheme we used to for the FL Delta build. We uploaded our sequences to UShER, found the nearest neighbor sequences, combined both sequence sets and are attempting to build a tree to determine the transmission of Delta into FL. The subsampling scheme worked, but is there an optimal way to create a subsampling scheme for this? Or, as I suspect, is Nextstrain really not ideal for building that? Any feedback would be appreciated!
We hope that nextstrain is can be used for workflows such as these, and are excited to hear about your plan.
We uploaded our sequences to UShER, found the nearest neighbor sequences, combined both sequence sets and are attempting to build a tree to determine the transmission of Delta into FL.
If you want to use all of the nearest neighbors (and your data) provided by UShER, and assuming data/sequences.fasta represents this, then you can build a tree by using a dummy subsampling scheme which essentially doesn’t do any subsampling:
subsampling:
delta:
division:
group_by: "year"
seq_per_group: 10000000 # i.e. no subsampling
If you want to combine the above dataset with some other contextual sequences, or if the above dataset is too large and you need to reduce it via subsampling, then let me know and I’ll try to help.
Adding to @james’s response above, you can also skip subsampling by omitting the subsampling_scheme from your build definition or setting the value to all.
Either of the following examples will allow you to effectively skip subsampling and use all sequences defined in your inputs (that also pass your standard filters).
builds:
fl_delta:
region: North America
country: USA
division: Florida
Or with an explicit subsampling scheme:
builds:
fl_delta:
subsampling_scheme: all
region: North America
country: USA
division: Florida