AttributeError: ‘Clade’ object has no attribute ‘country_exposure_confidence’

I am hoping someone can help me get past an error I’m having when trying to run a local dataset in the context of a global dataset.

My aim is to use a curated dataset of GISAID IDs stored in a text file to produce metadata.tsv and sequence.fasta files.

Then, I’d like to create a random sample of global data and output their metadata and fasta files.

Finally, I’d like to build a tree of auspice using the two datasets.

However, I’m getting the following error when running the build:

augur traits is using TreeTime version 0.8.4
WARNING: no states found for discrete state reconstruction.
Traceback (most recent call last):
File “ncov/ncov-8/.snakemake/conda/8ec83933e66c3abe39927a0c6a3f5220/bin/augur”, line 10, in
sys.exit(main())
File “ncov/ncov-8/.snakemake/conda/8ec83933e66c3abe39927a0c6a3f5220/lib/python3.8/site-packages/augur/main.py”, line 10, in main
return augur.run( argv[1:] )
File “ncov/ncov-8/.snakemake/conda/8ec83933e66c3abe39927a0c6a3f5220/lib/python3.8/site-packages/augur/init.py”, line 75, in run
return args.command.run(args)
File “ncov/ncov-8/.snakemake/conda/8ec83933e66c3abe39927a0c6a3f5220/lib/python3.8/site-packages/augur/traits.py”, line 179, in run
mugration_states[node.name][column+‘_confidence’] = node.getattribute(column+‘_confidence’)
AttributeError: ‘Clade’ object has no attribute ‘country_exposure_confidence’

My Steps

1. Installed the program

Ran the getting_started build and everything worked

2. Prepared the Data

Used Method 3 from Preparring your Data

  1. Downloaded the FASTA and metadata tar.xz files

  2. Sanitized the FASTA file by

python3 scripts/sanitize_sequences.py \
    --sequences data/sequences_fasta.tar.xz \
    --strip-prefixes "hCoV-19/" \
    --output data/sequences_gisaid.fasta.gz
  1. Indexed the sequences
augur index \
    --sequences data/sequences_gisaid.fasta.gz \
    --output data/sequence_index_gisaid.tsv.gz
  1. Prepared the metadata
 python3 scripts/sanitize_metadata.py \
    --metadata data/metadata_tsv.tar.xz \
    --parse-location-field Location \
    --rename-fields 'Virus name=strain' 'Accession ID=gisaid_epi_isl' 'Collection date=date' \
    --strip-prefixes "hCoV-19/" \
    --output data/metadata_gisaid.tsv.gz
  1. Created a txt file with GISAID ids to be included (“hCoV-19/” already removed)

  2. Used the txt file to output subsampled metadata and sequences

augur filter \
    --metadata data/metadata_gisaid.tsv.gz \
    --sequence-index data/sequence_index_gisaid.tsv.gz \
    --sequences data/sequences_gisaid.fasta.gz \
    --exclude-all \
    --include focal_gisaid_ids.txt \
    --output-metadata data/focal_metadata_gisaid.tsv.gz \
    --output-sequences data/focal_sequences_gisaid.fasta.gz
  1. Filtered GISAID database to create a sample global dataset
augur filter \
    --metadata data/metadata_gisaid.tsv.gz \
    --exclude-ambiguous-dates-by any \
    --subsample-max-sequences 1000 \
    --group-by region year month \
    --output-strains strains_global.txt
augur filter \
    --metadata data/metadata_gisaid.tsv.gz \
    --sequence-index data/sequence_index_gisaid.tsv.gz \
    --sequences data/sequences_gisaid.fasta.gz \
    --exclude-all \
    --include strains_global.txt \
    --output-metadata data/context_metadata_gisaid.tsv.gz \
    --output-sequences data/context_sequences_gisaid.fasta.gz

Created and Ran the build

  1. Copied the example_global_context profile and edited the build to
inputs:
  - name: focal
       metadata: "data/focal_metadata_gisaid.tsv.gz"
       sequences: "data/focal_sequences_gisaid.fasta.gz"
  - name: contextual
       metadata: "data/contextual_metadata_gisaid.tsv.gz"
       filtered: "data/contextual_sequences_gisaid.fasta.gz"

builds:
  global:
    subsampling_scheme: custom-scheme

use_nextalign: true

filter:
  focal:
    skip_diagnostics: True

subsampling:
  custom-scheme:
    focal:
      query: --query "focal == 'yes'"
    contextual:
      query: --query "contextual == 'yes'"

skip_travel_history_adjustment: True
  1. Run the build

snakemake --cores 2 --profie ./my_profiles/custom_build -p

Hmm, the error WARNING: no states found for discrete state reconstruction makes me wonder what traits we are trying to reconstruct. By default this is on “country_exposure” which may not appear in your metadata?

You can change this by adding the following to your builds.yaml:

traits:
  global: # use the name of your build
    sampling_bias_correction: 2.5
    columns: ["country"] # or "region", or an empty array, as appropriate.

P.S. In your custom-scheme it looks like you are not actually subsampling, instead taking all of the focal and all of the contextual sequences. If so, you can change to subsampling_scheme: all which will speed things up a bit.

P.P.S. a common gotcha is not updating the ./my_profiles/custom_build/config.yaml so that it indeed refers to ./my_profiles/custom_build/builds.yaml - could you double check this?