I am hoping someone can help me get past an error I’m having when trying to run a local dataset in the context of a global dataset.
My aim is to use a curated dataset of GISAID IDs stored in a text file to produce metadata.tsv and sequence.fasta files.
Then, I’d like to create a random sample of global data and output their metadata and fasta files.
Finally, I’d like to build a tree of auspice using the two datasets.
However, I’m getting the following error when running the build:
augur traits is using TreeTime version 0.8.4
WARNING: no states found for discrete state reconstruction.
Traceback (most recent call last):
File “ncov/ncov-8/.snakemake/conda/8ec83933e66c3abe39927a0c6a3f5220/bin/augur”, line 10, in
sys.exit(main())
File “ncov/ncov-8/.snakemake/conda/8ec83933e66c3abe39927a0c6a3f5220/lib/python3.8/site-packages/augur/main.py”, line 10, in main
return augur.run( argv[1:] )
File “ncov/ncov-8/.snakemake/conda/8ec83933e66c3abe39927a0c6a3f5220/lib/python3.8/site-packages/augur/init.py”, line 75, in run
return args.command.run(args)
File “ncov/ncov-8/.snakemake/conda/8ec83933e66c3abe39927a0c6a3f5220/lib/python3.8/site-packages/augur/traits.py”, line 179, in run
mugration_states[node.name][column+‘_confidence’] = node.getattribute(column+‘_confidence’)
AttributeError: ‘Clade’ object has no attribute ‘country_exposure_confidence’
My Steps
1. Installed the program
Ran the getting_started build and everything worked
2. Prepared the Data
Used Method 3 from Preparring your Data
-
Downloaded the FASTA and metadata tar.xz files
-
Sanitized the FASTA file by
python3 scripts/sanitize_sequences.py \
--sequences data/sequences_fasta.tar.xz \
--strip-prefixes "hCoV-19/" \
--output data/sequences_gisaid.fasta.gz
- Indexed the sequences
augur index \
--sequences data/sequences_gisaid.fasta.gz \
--output data/sequence_index_gisaid.tsv.gz
- Prepared the metadata
python3 scripts/sanitize_metadata.py \
--metadata data/metadata_tsv.tar.xz \
--parse-location-field Location \
--rename-fields 'Virus name=strain' 'Accession ID=gisaid_epi_isl' 'Collection date=date' \
--strip-prefixes "hCoV-19/" \
--output data/metadata_gisaid.tsv.gz
-
Created a txt file with GISAID ids to be included (“hCoV-19/” already removed)
-
Used the txt file to output subsampled metadata and sequences
augur filter \
--metadata data/metadata_gisaid.tsv.gz \
--sequence-index data/sequence_index_gisaid.tsv.gz \
--sequences data/sequences_gisaid.fasta.gz \
--exclude-all \
--include focal_gisaid_ids.txt \
--output-metadata data/focal_metadata_gisaid.tsv.gz \
--output-sequences data/focal_sequences_gisaid.fasta.gz
- Filtered GISAID database to create a sample global dataset
augur filter \
--metadata data/metadata_gisaid.tsv.gz \
--exclude-ambiguous-dates-by any \
--subsample-max-sequences 1000 \
--group-by region year month \
--output-strains strains_global.txt
augur filter \
--metadata data/metadata_gisaid.tsv.gz \
--sequence-index data/sequence_index_gisaid.tsv.gz \
--sequences data/sequences_gisaid.fasta.gz \
--exclude-all \
--include strains_global.txt \
--output-metadata data/context_metadata_gisaid.tsv.gz \
--output-sequences data/context_sequences_gisaid.fasta.gz
Created and Ran the build
- Copied the example_global_context profile and edited the build to
inputs:
- name: focal
metadata: "data/focal_metadata_gisaid.tsv.gz"
sequences: "data/focal_sequences_gisaid.fasta.gz"
- name: contextual
metadata: "data/contextual_metadata_gisaid.tsv.gz"
filtered: "data/contextual_sequences_gisaid.fasta.gz"
builds:
global:
subsampling_scheme: custom-scheme
use_nextalign: true
filter:
focal:
skip_diagnostics: True
subsampling:
custom-scheme:
focal:
query: --query "focal == 'yes'"
contextual:
query: --query "contextual == 'yes'"
skip_travel_history_adjustment: True
- Run the build
snakemake --cores 2 --profie ./my_profiles/custom_build -p