Run basic analysis on example data

yzhang2168 · June 25, 2020, 7:43pm

I’m trying the example data after installation following instructions here:

ncov$ snakemake --profile ./my_config/example

Error: profile given but no config.yaml found.
I don’t have my_config/ after installation. The closest is config/
I tried:
$ snakemake --profile ./config/config.yaml
// Error: profile given but no config.yaml found.

$ snakemake --profile config/
// snakemake: error: unrecognized arguments: --conda_environment=…/envs/nextstrain.yaml --s3_staging_url=s3://nextstrain-staging --slack_token=None --slack_channel=#ncov-gisaid-updates --sequences=data/sequences.fasta --metadata=data/metadata.tsv --reference_node_name=USA/WA1/2020

jlhudd · June 26, 2020, 2:38am

Thank you for trying out the new tutorial! We haven’t merged all of the changes that are described in the tutorial into the master branch of the ncov repository, so the my_config directory doesn’t exist yet.

We should have these new changes merged within the week, but in the mean time you can explore some of the existing profiles. The my_config/example profile is similar to the existing profile at profiles/king-county. You can run this like so:

snakemake --profile profiles/king-county

A more advanced existing profile is profiles/swiss. This profile is tuned to run on a specific cluster for a subset of the Nextstrain team, but the builds.yaml file in this directory is a great example of how to implement custom builds and subsampling.

You can check back at this discussion board for an announcement when the new ncov repository changes are finally available.

yzhang2168 · June 26, 2020, 3:30am

Hi! Thanks for the response!
I just downloaded nextstrain and I am just trying out to see if the environment is set up correctly, so any sample data is fine.

I tried

snakemake --profile profiles/king-county

it build a DAG and then failed here:
Job 16: Downloading metadata and fasta files from S3
Reason: Missing output files: data/sequences.fasta, data/metadata.tsv

    aws s3 cp s3://nextstrain-ncov-private/metadata.tsv.gz - | gunzip -cq >data/metadata.tsv
    aws s3 cp s3://nextstrain-ncov-private/sequences.fasta.gz - | gunzip -cq > data/sequences.fasta

/bin/bash: aws: command not found
[Thu Jun 25 20:25:25 2020]
Error in rule download:
jobid: 16
output: data/sequences.fasta, data/metadata.tsv
shell:

    aws s3 cp s3://nextstrain-ncov-private/metadata.tsv.gz - | gunzip -cq >data/metadata.tsv
    aws s3 cp s3://nextstrain-ncov-private/sequences.fasta.gz - | gunzip -cq > data/sequences.fasta
    
    (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Removing output files of failed job download since they might be corrupted:
data/metadata.tsv
Job failed, going on with independent jobs.
Exiting because a job execution failed. Look above for error message
Complete log: /Users/yuzhang/Documents/Nextstrain/ncov/.snakemake/log/2020-06-25T202514.473169.snakemake.log

so the sequence and metadata is hosted on aws not GISAID? Are the data identical as those in GISAID or you have curated them in some way?

yzhang2168 · June 26, 2020, 3:38am

Also, I’m very new to this. What tutorial should I follow to get started? What are the profiles? Is there a high level description of whaat each job does and the related files?

emmahodcroft · June 26, 2020, 7:16am

Thanks for your enthusiasm and for trying this! At the moment the easiest path may be to wait for the new tutorial which should be up in the next few days and make things much easier and clearer.

I believe error you are getting is because you need to manually include the data to run the example build. You can use the example data that comes in the repository, download the data yourself from GISIAD, or provide your own data - all the options are laid out here: https://github.com/nextstrain/ncov/blob/master/docs/running_old.md#using-the-example-data

Once you have data/sequences.fasta and data/metadata.tsv this rule hopefully shouldn’t fail anymore.

yzhang2168 · June 26, 2020, 8:15pm

so I copies the 2 files under example_data/ to data/
and then ran:

snakemake --profile profiles/king-county

and got this error:
Job 84:
Filtering to
- excluding strains in config/exclude.txt

Reason: Missing output files: results/filtered.fasta

    augur filter             --sequences data/sequences.fasta             --metadata data/metadata.tsv             --include config/include.txt             --max-date 2020.4849726775956             --exclude config/exclude.txt             --exclude-where division='USA' date='2020' date='2020-01-XX' date='2020-02-XX' date='2020-03-XX' date='2020-04-XX' date='2020-05-XX' date='2020-06-XX' date='2020-01' date='2020-02' date='2020-03' date='2020-04' date='2020-05' date='2020-06'            --min-length 27000             --output results/filtered.fasta 2>&1 | tee logs/filtered.txt

/bin/bash: augur: command not found
[Fri Jun 26 19:39:04 2020]
Error in rule filter:
jobid: 84
output: results/filtered.fasta
log: logs/filtered.txt (check log file(s) for error message)
shell:

    augur filter             --sequences data/sequences.fasta             --metadata data/metadata.tsv             --include config/include.txt             --max-date 2020.4849726775956             --exclude config/exclude.txt             --exclude-where division='USA' date='2020' date='2020-01-XX' date='2020-02-XX' date='2020-03-XX' date='2020-04-XX' date='2020-05-XX' date='2020-06-XX' date='2020-01' date='2020-02' date='2020-03' date='2020-04' date='2020-05' date='2020-06'            --min-length 27000             --output results/filtered.fasta 2>&1 | tee logs/filtered.txt
    
    (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Job failed, going on with independent jobs.
Exiting because a job execution failed. Look above for error message

can you help? I don’t know how to attache the log file.

As I mentioned earlier, I am just trying to see if I can run nextstrain successfully. What data does not matter at this stage.

emmahodcroft · June 27, 2020, 7:56am

This command suggests it can’t find augur on your computer - are you sure you have installed it? You can see many different installation methods in the menu on the left here: What is Nextstrain? — Nextstrain documentation

You should then be able to run augur --version and some version numbers should come up. Does that work for you?

yzhang2168 · June 29, 2020, 2:59pm

I later saw that error.
I followed the instructions and ran:

python3 -m pip install nextstrain-augur

Initially there was /usr/local/ error. I re-ran this and got a bunch of Requirement already satisfied.

emmahodcroft · June 29, 2020, 3:11pm

Can you now run augur --version ? You should if it has installed properly.

yzhang2168 · June 29, 2020, 3:30pm

It does not find augur. I don’t know if its not installed properly or the path is not found. Anyway I can start over?

trs · June 29, 2020, 6:29pm

This sounds like a permissions issue during installation when pip cannot write files to /usr/local/, for example, it wants to create /usr/local/bin/augur but can’t and thus the augur command cannot be found.

There are various ways around this, but I think overall you’ll have an easier time if you instead follow our instructions for a local installation with Conda or a containerized installation using the Nextstrain CLI.

yzhang2168 · June 29, 2020, 6:59pm

Hi! Thanks!
I had tried one thing to bypass the problem, and the /usr/local/ error didn’t pop up again, so I thought it was installed, but apparently not.
I saw the conda option, haven’t tried it yet. Thank you!

yzhang2168 · June 29, 2020, 7:32pm

I installed nextstrain on a different computer which did have augur in place. Following the instruction I ran the same sample data and it failed. Can you help interpret the problem?

ncov$ snakemake --profile profiles/king-county/
Building DAG of jobs…
Updating job 81 (aggregate_alignments).
Using shell: /bin/bash
Provided cores: 8
Rules claiming more threads will be scaled down.
Job counts:
count jobs
1 all
2 export
2 finalize
2 incorporate_travel_history
7

[Mon Jun 29 12:30:34 2020]
Job 17: Exporting data files for for auspice
Reason: Missing output files: results/usa/ncov_with_accessions.json

    augur export v2             --tree results/usa/tree.nwk             --metadata results/usa/metadata_adjusted.tsv             --node-data results/usa/branch_lengths.json results/usa/nt_muts.json results/usa/aa_muts.json results/usa/legacy_clades.json results/usa/clades.json results/usa/recency.json results/usa/traits.json             --auspice-config config/auspice_config.json             --colors results/usa/colors.tsv             --lat-longs config/lat_longs.tsv             --title 'Genomic epidemiology of novel coronavirus - Usa-focused subsampling'             --description config/description.md             --output results/usa/ncov_with_accessions.json 2>&1 | tee logs/export_usa.txt

[Mon Jun 29 12:30:34 2020]
Job 21: Exporting data files for for auspice
Reason: Missing output files: results/usa_washington/ncov_with_accessions.json

    augur export v2             --tree results/usa_washington/tree.nwk             --metadata results/usa_washington/metadata_adjusted.tsv             --node-data results/usa_washington/branch_lengths.json results/usa_washington/nt_muts.json results/usa_washington/aa_muts.json results/usa_washington/legacy_clades.json results/usa_washington/clades.json results/usa_washington/recency.json results/usa_washington/traits.json             --auspice-config config/auspice_config.json             --colors results/usa_washington/colors.tsv             --lat-longs config/lat_longs.tsv             --title 'Genomic epidemiology of novel coronavirus - Usa_Washington-focused subsampling'             --description config/description.md             --output results/usa_washington/ncov_with_accessions.json 2>&1 | tee logs/export_usa_washington.txt

WARNING: You asked for a color-by for trait ‘pangolin_lineage’, but it has no values on the tree. It has been ignored.

WARNING: You asked for a color-by for trait ‘GISAID_clade’, but it has no values on the tree. It has been ignored.

WARNING: These values for trait location were not specified in your provided color scale: maricopa county, kenner. Auspice will create colors for them.

WARNING: These values for trait division were not specified in your provided color scale: south america, oceania, europe, asia. Auspice will create colors for them.

WARNING: These values for trait country were not specified in your provided color scale: south america, asia, oceania, europe. Auspice will create colors for them.

WARNING: These values for trait region were not specified in your provided color scale: asia. Auspice will create colors for them.

Validating schema of ‘results/usa_washington/aa_muts.json’…
Validating config file config/auspice_config.json against the JSON schema
Validating schema of ‘config/auspice_config.json’…
Traceback (most recent call last):
File “/Library/Frameworks/Python.framework/Versions/3.6/bin/augur”, line 11, in
sys.exit(main())
File “/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/augur/main.py”, line 10, in main
return augur.run( argv[1:] )
File “/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/augur/init.py”, line 74, in run
return args.command.run(args)
File “/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/augur/export.py”, line 22, in run
return run_v2(args)
File “/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/augur/export_v2.py”, line 861, in run_v2
set_node_attrs_on_tree(data_json, node_attrs)
File “/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/augur/export_v2.py”, line 509, in set_node_attrs_on_tree
author_data = create_author_data(node_attrs)
File “/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/augur/export_v2.py”, line 491, in create_author_data
node_author_info[node_name][“value”] = author + " {}".format(“ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz”[index])
IndexError: string index out of range
WARNING: You asked for a color-by for trait ‘pangolin_lineage’, but it has no values on the tree. It has been ignored.

WARNING: You asked for a color-by for trait ‘GISAID_clade’, but it has no values on the tree. It has been ignored.

WARNING: These values for trait location were not specified in your provided color scale: maricopa county, kenner. Auspice will create colors for them.

WARNING: These values for trait division were not specified in your provided color scale: south america, oceania, europe, asia. Auspice will create colors for them.

WARNING: These values for trait country were not specified in your provided color scale: south america, europe, oceania, asia. Auspice will create colors for them.

WARNING: These values for trait region were not specified in your provided color scale: asia. Auspice will create colors for them.

Validating schema of ‘results/usa/aa_muts.json’…
Validating config file config/auspice_config.json against the JSON schema
Validating schema of ‘config/auspice_config.json’…
Traceback (most recent call last):
File “/Library/Frameworks/Python.framework/Versions/3.6/bin/augur”, line 11, in
sys.exit(main())
File “/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/augur/main.py”, line 10, in main
return augur.run( argv[1:] )
File “/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/augur/init.py”, line 74, in run
return args.command.run(args)
File “/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/augur/export.py”, line 22, in run
return run_v2(args)
File “/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/augur/export_v2.py”, line 861, in run_v2
set_node_attrs_on_tree(data_json, node_attrs)
File “/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/augur/export_v2.py”, line 509, in set_node_attrs_on_tree
author_data = create_author_data(node_attrs)
File “/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/augur/export_v2.py”, line 491, in create_author_data
node_author_info[node_name][“value”] = author + " {}".format(“ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz”[index])
IndexError: string index out of range
[Mon Jun 29 12:30:35 2020]
Error in rule export:
jobid: 17
output: results/usa/ncov_with_accessions.json
log: logs/export_usa.txt (check log file(s) for error message)
shell:

    augur export v2             --tree results/usa/tree.nwk             --metadata results/usa/metadata_adjusted.tsv             --node-data results/usa/branch_lengths.json results/usa/nt_muts.json results/usa/aa_muts.json results/usa/legacy_clades.json results/usa/clades.json results/usa/recency.json results/usa/traits.json             --auspice-config config/auspice_config.json             --colors results/usa/colors.tsv             --lat-longs config/lat_longs.tsv             --title 'Genomic epidemiology of novel coronavirus - Usa-focused subsampling'             --description config/description.md             --output results/usa/ncov_with_accessions.json 2>&1 | tee logs/export_usa.txt
    
    (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Job failed, going on with independent jobs.
[Mon Jun 29 12:30:35 2020]
Error in rule export:
jobid: 21
output: results/usa_washington/ncov_with_accessions.json
log: logs/export_usa_washington.txt (check log file(s) for error message)
shell:

    augur export v2             --tree results/usa_washington/tree.nwk             --metadata results/usa_washington/metadata_adjusted.tsv             --node-data results/usa_washington/branch_lengths.json results/usa_washington/nt_muts.json results/usa_washington/aa_muts.json results/usa_washington/legacy_clades.json results/usa_washington/clades.json results/usa_washington/recency.json results/usa_washington/traits.json             --auspice-config config/auspice_config.json             --colors results/usa_washington/colors.tsv             --lat-longs config/lat_longs.tsv             --title 'Genomic epidemiology of novel coronavirus - Usa_Washington-focused subsampling'             --description config/description.md             --output results/usa_washington/ncov_with_accessions.json 2>&1 | tee logs/export_usa_washington.txt
    
    (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Job failed, going on with independent jobs.
Exiting because a job execution failed. Look above for error message

jlhudd · June 29, 2020, 10:11pm

@yzhang2168, this bug should be fixed in the augur 9.0.0 release that we just made a couple hours ago. You can confirm that the bug is fixed by upgrading augur from inside your conda environment. For example, you might run:

# Activate your environment, if you haven't already.
conda activate nextstrain

# Upgrade augur to version 9.0.0
python3 -m pip install --upgrade nextstrain-augur

Can you try this out and let us know if you still see the same error? Thank you!

yzhang2168 · June 30, 2020, 12:10am

Great! Thank you! I haven’t set up conda yet. Will keep you posted.

yzhang2168 · June 30, 2020, 12:17am

Does anyone have answers to my other 2 questions on Entropy panel data and frequencies program? I’d appreciate any information you may have.

Topic		Replies	Views
Followed data prep instructions, nextstrain fails Help and Getting Started	20	822	December 16, 2021
Error with example analysis: configfile can't be set to a list Help and Getting Started	3	668	July 14, 2020
"snakemake --cores 1 --profile ../south-usa-sarscov2/profiles/south-central/" does not work	4	592	January 8, 2021
Conda environment error	6	502	December 19, 2022
Work is not running with "Nothing to be done" Help and Getting Started	10	1278	February 10, 2022

Run basic analysis on example data

Related topics