Problems with Using sanitize.sequences.py

vrmarathe · October 2, 2021, 12:03am

I am running nextstrain for msa_0908.tar.xz. When I use the sanitize.sequences.py step of the analysis. I get the following error or problems. I have chosen the MSA file because it has a unique ID or EPI ID or Ascension ID. How should I solve it?

james · October 4, 2021, 3:09am

Hi @vrmarathe - I haven’t been able to confirm, but my guess is that we cannot read a .tar.xz input. Could you extract the actual sequences file and provide that as input? (This file can be .xz compressed.)

vrmarathe · October 5, 2021, 9:08am

@james Thank you for the help.
I have another problem. I have a VM where I run the nextstrain analysis.
Now I start the analysis and when I close the session of the VM on my Putty windows application, nextstrain stops executing when I relogged in to have a check whether it is still running. I solved the problem using nohup on nextclade and it worked. I tried to do it on nextstrain, it does not work. Is there a way to run the analysis after I close the session ? Snake make also gives me an error saying that

Building DAG of jobs…
Error: Directory cannot be locked. This usually means that another Snakemake instance is running on this directory. Another possibility is that a previous run exited unexpectedly.

I have another question, I have used nextclade to get some data in the form of .tsv files. What are the difference between nextclade and nextstrain outputs ?

jlhudd · October 26, 2021, 5:08pm

I am running nextstrain for msa_0908.tar.xz. When I use the sanitize.sequences.py step of the analysis. I get the following error or problems. I have chosen the MSA file because it has a unique ID or EPI ID or Ascension ID. How should I solve it?

Hi @vrmarathe, sorry for the delayed response to your original issue. The issue with sanitize sequences has been fixed as of last week. You can update to the latest version of the ncov workflow to get the corrected version of the sanitize sequences script.

Is there a way to run the analysis after I close the session ?

I would recommend avoiding nohup for long-running workflows like the ncov workflow. Although it allows you to close a terminal window, you lose the ability to track the workflow’s progress and catch issues close to when they first occur. If you are comfortable minimizing your VM window, the workflow will still effectively “run in the background” in a way that allows you to check in on it periodically.

You can also run Nextstrain workflows on a cloud provider like AWS Batch with the Nextstrain CLI. Using AWS Batch requires a bit of initial setup on your part, but it would let you submit your workflows to a remote server so they don’t depend on the state of your local computer.

Building DAG of jobs…
Error: Directory cannot be locked. This usually means that another Snakemake instance is running on this directory. Another possibility is that a previous run exited unexpectedly.

This error suggests that closing your VM window terminates the Snakemake process that nextstrain started without shutting it down properly. Snakemake creates a lock directory when it starts running a workflow; this lock prevents multiple workflows from running at the same time on the same files. You should be able to run nextstrain build . --unlock to ask Snakemake to remove the lock.

I have used nextclade to get some data in the form of .tsv files. What are the difference between nextclade and nextstrain outputs ?

Nextclade’s web outputs are designed to be a human- and computer-readable summary of an analysis. Nextstrain’s output files are primarily .json files that are designed to be read by Auspice or Augur.

Topic		Replies	Views
Error in rule sanitize_metadata: ncov workflow Help and Getting Started	5	536	November 7, 2021
Error: Alignment must have at least 3 sequences Help and Getting Started	9	1037	November 7, 2024
Sanitize_metadata.py error: ERROR: ' ' expected after '"' Help and Getting Started	6	487	November 5, 2021
Error running snakemake Help and Getting Started	2	1952	January 5, 2021
Error in rule "align" Help and Getting Started	3	759	January 19, 2022

Problems with Using sanitize.sequences.py

Related topics