Hello! Thanks for creating this tutorial. I’m trying to run through it and have some comments:
When I went to the starting page, I wasn’t sure if I should dive into the complete walkthough at the Preparing your data link or do the Quickstart. It seems like the Prepare your data link assumes you’ve installed auspice/auger since it refers to files in the example “data” directory. The Quickstart assumes you already understand running the analysis. A nice setup would be to have installation instructions, then a simple toy test case to make sure it works, then the rest of the tutorial (rather than Set up an dinstallation as step 2 when the files are needed for step 1). And I did the quick start, and did snakemake on the example directory, but it’s at step 58 of 106 steps and each step has a “Missing output files” error as it scrolls by on the screen…so a toy example would have been nice to check functionality.
Shouldn’t that say “snakemake --profile ./my_profiles/example” instead of my-analyses? Or should I have my-analyses and that’s why I got over 100 error messages?
Just running the example took quite a while (on a server with 32procs and 125GB RAM, maybe mention in the Quick Start the machine requirements). I thought I’d run through the tutorial since it was just an hour investment (Ph.D. in computer science, figured I could knock this right out), but it’s already been >45 minutes installing and waiting for that to run…and I’m not quite sure it installed correctly since I got so many errors.
(just saw nextstrain check-setup --set-default on the nextstrain.org installation instructions page, maybe add that to quickstart. Though that command hung for me on “Testing your setup…”.)
Hi @stacia - thank you so much for writing such detailed feedback on the tutorial!
over 100 error messages?
Eek - that doesn’t sound right! Indeed, it sounds like we need to double-check something. You’re right that it should be my_profiles instead of my-analyses – we changed the name a few times and clearly we missed a couple of spots in the tutorial - apologies!
This is incredibly valuable and clearly we need to look into this more, as this isn’t our intended behaviour, for sure! Let us get an idea of what’s going wrong and we’ll get back to you! I’m so sorry this hasn’t been a simple experience!
If they do look like this, then everything is actually working as expected. We have the example profile setup to use Snakemake’s --reason flag which reports why Snakemake is running each rule. The most common reason to run a rule is because the output files for that rule don’t exist yet. Other reasons include the presence of most recent input files that warrant an update to the output or a forced execution of the rule.
These reasons have been useful for us when debugging workflows, but it seems like they are more alarming than reassuring when running the workflow for the first time. I’m going to make a note to comment this out from the example.
Just to make sure we cover everything in your comments, I’ve outlined the primary issues we need to address here:
I just wanted to share that the first pages of the ncov tutorial have been reorganized and edited such that the first page is a quick setup and workflow test with a small data set. This page is then followed by the data preparation page.
If folks have other comments about how this tutorial can be improved, please let us know here.