Problems with Using sanitize.sequences.py

I am running nextstrain for msa_0908.tar.xz. When I use the sanitize.sequences.py step of the analysis. I get the following error or problems. I have chosen the MSA file because it has a unique ID or EPI ID or Ascension ID. How should I solve it?

Hi @vrmarathe - I haven’t been able to confirm, but my guess is that we cannot read a .tar.xz input. Could you extract the actual sequences file and provide that as input? (This file can be .xz compressed.)

@james Thank you for the help.
I have another problem. I have a VM where I run the nextstrain analysis.
Now I start the analysis and when I close the session of the VM on my Putty windows application, nextstrain stops executing when I relogged in to have a check whether it is still running. I solved the problem using nohup on nextclade and it worked. I tried to do it on nextstrain, it does not work. Is there a way to run the analysis after I close the session ? Snake make also gives me an error saying that

Building DAG of jobs…
Error: Directory cannot be locked. This usually means that another Snakemake instance is running on this directory. Another possibility is that a previous run exited unexpectedly.

I have another question, I have used nextclade to get some data in the form of .tsv files. What are the difference between nextclade and nextstrain outputs ?

I am running nextstrain for msa_0908.tar.xz. When I use the sanitize.sequences.py step of the analysis. I get the following error or problems. I have chosen the MSA file because it has a unique ID or EPI ID or Ascension ID. How should I solve it?

Hi @vrmarathe, sorry for the delayed response to your original issue. The issue with sanitize sequences has been fixed as of last week. You can update to the latest version of the ncov workflow to get the corrected version of the sanitize sequences script.

Is there a way to run the analysis after I close the session ?

I would recommend avoiding nohup for long-running workflows like the ncov workflow. Although it allows you to close a terminal window, you lose the ability to track the workflow’s progress and catch issues close to when they first occur. If you are comfortable minimizing your VM window, the workflow will still effectively “run in the background” in a way that allows you to check in on it periodically.

You can also run Nextstrain workflows on a cloud provider like AWS Batch with the Nextstrain CLI. Using AWS Batch requires a bit of initial setup on your part, but it would let you submit your workflows to a remote server so they don’t depend on the state of your local computer.

Building DAG of jobs…
Error: Directory cannot be locked. This usually means that another Snakemake instance is running on this directory. Another possibility is that a previous run exited unexpectedly.

This error suggests that closing your VM window terminates the Snakemake process that nextstrain started without shutting it down properly. Snakemake creates a lock directory when it starts running a workflow; this lock prevents multiple workflows from running at the same time on the same files. You should be able to run nextstrain build . --unlock to ask Snakemake to remove the lock.

I have used nextclade to get some data in the form of .tsv files. What are the difference between nextclade and nextstrain outputs ?

Nextclade’s web outputs are designed to be a human- and computer-readable summary of an analysis. Nextstrain’s output files are primarily .json files that are designed to be read by Auspice or Augur.