Feedback on tutorial

Hello! Thanks for creating this tutorial. I’m trying to run through it and have some comments:

  1. When I went to the starting page, I wasn’t sure if I should dive into the complete walkthough at the Preparing your data link or do the Quickstart. It seems like the Prepare your data link assumes you’ve installed auspice/auger since it refers to files in the example “data” directory. The Quickstart assumes you already understand running the analysis. A nice setup would be to have installation instructions, then a simple toy test case to make sure it works, then the rest of the tutorial (rather than Set up an dinstallation as step 2 when the files are needed for step 1). And I did the quick start, and did snakemake on the example directory, but it’s at step 58 of 106 steps and each step has a “Missing output files” error as it scrolls by on the screen…so a toy example would have been nice to check functionality.

  2. Shouldn’t that say “snakemake --profile ./my_profiles/example” instead of my-analyses? Or should I have my-analyses and that’s why I got over 100 error messages?

Just running the example took quite a while (on a server with 32procs and 125GB RAM, maybe mention in the Quick Start the machine requirements). I thought I’d run through the tutorial since it was just an hour investment (Ph.D. in computer science, figured I could knock this right out), but it’s already been >45 minutes installing and waiting for that to run…and I’m not quite sure it installed correctly since I got so many errors.

(just saw nextstrain check-setup --set-default on the nextstrain.org installation instructions page, maybe add that to quickstart. Though that command hung for me on “Testing your setup…”.)

1 Like

Hi @stacia - thank you so much for writing such detailed feedback on the tutorial!

over 100 error messages?

Eek - that doesn’t sound right! Indeed, it sounds like we need to double-check something. You’re right that it should be my_profiles instead of my-analyses – we changed the name a few times and clearly we missed a couple of spots in the tutorial - apologies!

This is incredibly valuable and clearly we need to look into this more, as this isn’t our intended behaviour, for sure! Let us get an idea of what’s going wrong and we’ll get back to you! I’m so sorry this hasn’t been a simple experience!

Hi @stacia! Echoing @emmahodcroft’s post, thank you for this extremely helpful feedback!

Regarding the missing output messages, would you be able to confirm whether those messages look something like this example?

[Mon Jul 6 10:38:42 2020]
Job 5: Constructing colors file
Reason: Missing output files: results/global/colors.tsv

If they do look like this, then everything is actually working as expected. We have the example profile setup to use Snakemake’s --reason flag which reports why Snakemake is running each rule. The most common reason to run a rule is because the output files for that rule don’t exist yet. Other reasons include the presence of most recent input files that warrant an update to the output or a forced execution of the rule.

These reasons have been useful for us when debugging workflows, but it seems like they are more alarming than reassuring when running the workflow for the first time. I’m going to make a note to comment this out from the example.

Just to make sure we cover everything in your comments, I’ve outlined the primary issues we need to address here:

  1. Replace the “quickstart” guide with an initial installation and quick test guide with a minimal viable output to visualize
  2. Remove --reason flag from example profile to disable alarming “missing output” messages
  3. Fix broken references to profiles (e.g., my-analyses should be my_profiles)
  4. Revise description of the overall runtime for the tutorial to reflect “hands on” time vs. run time
  5. Include description of reasonable machine requirements in the installation/quick test document
  6. Follow up on issue with blocked nextstrain check-setup command

Does this seem like an accurate summary of the issues?

Hi! Yes, that’s exactly what they were. Now I see that it is actually spread out over a couple lines which makes it much clearer:
Job 103:
Filtering to
- excluding strains in defaults/exclude.txt

Reason: Missing output files: results/filtered.fasta

Job 23:
        Adjusting metadata for build 'north-america_usa_washington'

Reason: Missing output files: results/north-america_usa_washington/metadata_adjusted.tsv

That list looks good. BTW the check-setup hung until I killed it. If I find more issues, should I follow up on this thread or start a new one?
–Stacia

Perfect! I’ll create proper issues for these in GitHub and update the post above with links, in case you want to follow any of them.

This is a great place for all tutorial feedback. We can always fork this discussion out into separate threads, if this topic starts to diverge too much from the tutorial content.

@stacia Would you actually mind opening an issue for the nextstrain command line problem you described? This will allow us to capture details about your environment that could help with troubleshooting.

@trs will probably be the person most interested in following up with this.

1 Like

Sure! Just did this.

1 Like

I just wanted to share that the first pages of the ncov tutorial have been reorganized and edited such that the first page is a quick setup and workflow test with a small data set. This page is then followed by the data preparation page.

If folks have other comments about how this tutorial can be improved, please let us know here.