Error in "main_workflow.smk" file in --latency-wait

Hi,
Yes, I installed Nextstain Conda environment. I ran the tutorial from nextstrain in my system with the example data of SARs-CoV-2 and I was able to make the .JSON file. Then I created my own data from GISAID following the nextstarin data preparation guidelines. I believe once I prepare TSV and FASTA file, I just need to change the build.yaml and config.yaml file. Then I run the following command after activating the nextstain in my system.

nextstrain build . --cores 4 --use-conda
–configfile ./my_profiles/test/builds.yaml

Please find the error below:
If you notice, the error said “MissingOutputException in line 99 of /home/sksunny/ncov/workflow/snakemake_rules/main_workflow.smk:”

I do not know how to increase the latency time as it suggested to increase the latency time:
“This might be due to filesystem latency. If that is the case, consider to increase the wait time with --latency-wait.”

Hi @ssunny – the error is occurring because your metadata is missing a "date_submitted" column – see here for all the required metadata fields.

Thanks a lot @james. Unfortunately, even after including the “date_submitted” column, I got the following error. My colleague was able to run the same files (TSV and FATSA) and YAML files in his nextstarin environment, but it didn’t work in my environment (mine one is updated version). Could it be related to the updated version? Any suggestions, please. I highly appreciate your help.

treetime.UnknownMethodError: TreeTime.reroot – ERROR: unsupported rooting mechanisms or root not found

Another question, is it required to keep the “date_submitted” column always in metadata?

treetime.UnknownMethodError: TreeTime.reroot – ERROR: unsupported rooting mechanisms or root not found

We’ve most commonly run into this when trying to root the tree using a sample which has been filtered out in previous steps, or which may not have been included in the starting dataset. The default config uses “Wuhan/Hu-1/2019” but other pipelines use “Wuhan-Hu-1/2019”. Both are in the default include.txt so that they doesn’t get filtered. Without seeing your pipeline I don’t know if you are using a different root, a different include.txt, or are using a sample which doesn’t exist in your dataset to begin with.

is it required to keep the “date_submitted” column always in metadata?

It shouldn’t be, but currently a few of our scripts require this (e.g. scripts/diagnostic.py which caused the error in the original post). We’ll try to improve this.

Thanks. I think the problem was the name of the references sequence was updated from “Wuhan/Who-1/2019” to “Wuhan/Hu-1/2019”.

I think the problem was the name of the references sequence was updated from “Wuhan/Who-1/2019” to “Wuhan/Hu-1/2019”.

This sounds right. We recently updated the reference used to root the time tree, but we forgot to update our reference sequences and metadata in the workflow repository to match. As of about three weeks ago this issue was fixed, so if you update your local copy of the workflow, you’ll have the correct reference data locally.

You can ensure that you always have the correct reference sequences in your builds by adding a references input in your builds.yaml file like so:

inputs:
  - name: your-data
    metadata: data/your_data.tsv
    sequences: data/your_data.fasta
  - name: references
    metadata: data/references_metadata.tsv
    sequences: data/references_sequences.fasta
2 Likes

Thanks for this solution @jlhudd - that solved a number of builds that were failing. Can this be added to one of the my_profiles examples as seems like a good thing to include by default?

After many tries without solution, I found this post, and adding reference files as inputs solved my problem.

1 Like