I am running in Docker on Windows and I have run through the tutorials, which worked fine. Now I am trying to run Nextstrain with a collection of flu sequences and I’m getting stuck on this command (as in the tutorial):
I have a file flu_outgroup.gb from Genbank, but I consistently get the following error: ERROR: Cannot read reference sequence. make sure the file config/flu_outgroup.gb contains one sequence in genbank or fasta format. I’ve read through this documentation and I think my reference looks okay based on that guidance. I’ve tried it with another reference sequence in FASTA format and get the same result. Can you provide any insight into why this reference sequence isn’t working?
Hi @maryj, thank you for sharing your reference GenBank file! It looks like the issue is that there are multiple loci in your GenBank file (one per gene segment) and augur align expects a single reference sequence. In this case, the augur error is not entirely accurate, since augur can read the GenBank file but it doesn’t know which reference sequence to use.
Our seasonal flu phylogenetic workflows usually produce a single phylogeny per gene segment. For example, we would make a HA-specific tree using a reference sequence like the one I’ve attached here which only includes “LOCUS 4” from your original file. Can you try running the augur align command above with this reference file and your HA input sequences and let us know if that works?
Thanks for this explanation - I wondered if that might be the case. Using the HA reference you provided does seem to work. To clarify, does this mean I should use multiple reference sequences to make multiple, locus-specific trees with influenza data?
That’s right, you’ll need one reference sequence file per locus so you can make one tree per locus. You can see this pattern in the configuration file for the quickstart GISAID analysis where the path to the reference FASTA is parameterized by the {segment} value. For each of the gene segments defined in the configuration file, the workflow will look for the corresponding reference file path (e.g., config/h3n2/ha/reference.fasta for H3N2 HA).
Most steps in our seasonal flu workflow are parameterized to run per segment within each “build” defined in the build configuration file like the one linked to above. For example, we produce one alignment FASTA per build and gene segment.
To run the workflow with all 8 gene segments, you’d modify the configuration file linked above such that the segments list looks like this:
segments:
- ha
- na
- pa
- pb1
- pb2
- np
- ns
- mp
And you’d update the reference and annotation entries in the build configuration file to point to corresponding FASTA and GFF files per gene segment. If you were to use the existing Nextstrain seasonal flu workflow, you’d need to convert your GenBank file to these separate FASTA and GFF files or you could use one of the existing references in the config/ directory of the seasonal-flu repository.
Thank you for this information! But now I’m hopelessly lost trying to make sense of the configuration files and Snakemake rules in the seasonal flu workflow. I can’t even get the example build to run. When I run this:
I’m sorry about that, @maryj! Can you share the contents of the log file (logs/select_titers_ci_build_cdc_cell_fra.txt)? This file should include details about the nature of the error in the workflow.
It may also be useful to work through this synchronously through a Zoom discussion, if the error message in that file doesn’t explain what’s happening and that’s something you are available for. We host weekly office hours on Thursdays from 10-11am Pacific Time, that might be a good venue for continuing this troubleshooting session.
Thanks! That was a helpful error message. I just made a couple of changes to the seasonal flu repository that should fix that issue and also help avoid it in the future. You can download the most recent version of the workflow by running git pull in the seasonal-flu directory.
The example build we were recommending in the README for that repository is not really the ideal starting point for most users, so I’ve updated the README to reference a simpler example build. You can run this build like so: