Problems with `augur traits` and `augur frequencies` using supplied sequences

Hi!
Thanks for writing such a detailed tutorial as well as all of your great software. I was able to follow it and successfully run the example getting_started example workflow using conda envs (rather than docker).

I am now trying to run my own data (800 seqs) without context using a very simple custom build, but am running into the following error. Might someone be able to help me troubleshoot?

My metadata.tsv looks like this:

name	date	nosocomial_acquisition	sex	virus	region
a39-ob_3_05/2020-11-16	2020-11-16	Staff	F	SARS-CoV-2	Europe

Snakemake output

[Mon Jun 21 18:40:27 2021]
Job 5:
        Refining tree
          - estimate timetree
          - use skyline coalescent timescale
          - estimate marginal node dates
 


        augur refine             --tree results/default-build/tree_raw.nwk             --alignment results/default-build/aligned.fasta             --metadata results/default-bu
ild/metadata_adjusted.tsv.xz             --output-tree results/default-build/tree.nwk             --output-node-data results/default-build/branch_lengths.json             --roo
t Wuhan/Hu-1/2019             --timetree                          --clock-rate 0.0008             --clock-std-dev 0.0004             --coalescent skyline             --date-inf
erence marginal             --divergence-unit mutations             --date-confidence             --no-covariance             --clock-filter-iqd 4 2>&1 | tee logs/refine_defaul
t-build.txt
 
Activating conda environment: /gpfs2/well/bag/uzv018/sars2/nextstrain/ncov/.snakemake/conda/f794c0c2346483e881efe86baebefe4d
augur refine is using TreeTime version 0.8.1

11.26   TreeTime.reroot: with method or node: Wuhan/Hu-1/2019
Traceback (most recent call last):   
  File "/gpfs2/well/bag/uzv018/sars2/nextstrain/ncov/.snakemake/conda/f794c0c2346483e881efe86baebefe4d/bin/augur", line 10, in <module>
    sys.exit(main())
  File "/gpfs2/well/bag/uzv018/sars2/nextstrain/ncov/.snakemake/conda/f794c0c2346483e881efe86baebefe4d/lib/python3.8/site-packages/augur/__main__.py", line 10, in main
    return augur.run( argv[1:] )
  File "/gpfs2/well/bag/uzv018/sars2/nextstrain/ncov/.snakemake/conda/f794c0c2346483e881efe86baebefe4d/lib/python3.8/site-packages/augur/__init__.py", line 75, in run
    return args.__command__.run(args)
  File "/gpfs2/well/bag/uzv018/sars2/nextstrain/ncov/.snakemake/conda/f794c0c2346483e881efe86baebefe4d/lib/python3.8/site-packages/augur/refine.py", line 206, in run
    tt = refine(tree=T, aln=aln, ref=ref, dates=dates, confidence=args.date_confidence,
  File "/gpfs2/well/bag/uzv018/sars2/nextstrain/ncov/.snakemake/conda/f794c0c2346483e881efe86baebefe4d/lib/python3.8/site-packages/augur/refine.py", line 42, in refine
    tt.clock_filter(reroot=reroot, n_iqd=clock_filter_iqd, plot=False) #use whatever was specified
  File "/gpfs2/well/bag/uzv018/sars2/nextstrain/ncov/.snakemake/conda/f794c0c2346483e881efe86baebefe4d/lib/python3.8/site-packages/treetime/treetime.py", line 327, in clock_fil
ter
    self.reroot(root='least-squares' if reroot=='best' else reroot, covariation=False, clock_rate=fixed_clock_rate)
  File "/gpfs2/well/bag/uzv018/sars2/nextstrain/ncov/.snakemake/conda/f794c0c2346483e881efe86baebefe4d/lib/python3.8/site-packages/treetime/treetime.py", line 453, in reroot
    raise UnknownMethodError('TreeTime.reroot -- ERROR: unsupported rooting mechanisms or root not found')
treetime.UnknownMethodError: TreeTime.reroot -- ERROR: unsupported rooting mechanisms or root not found

My builds.yaml file looks like this:

inputs:
  - name: sars2-oxfordshire
    metadata: data/data/metadata.tsv
    sequences: data/data/sequences.fasta

files:
  auspice_config: "my_profiles/sars2-oxfordshire/my_auspice_config.json"

frequencies:
  min_date: 2020-11-16
  max_date: 2020-01-04

Thank you,
Bede

Ok, it looks like Wuhan/Hu-1/2019 needs to be present in the input set. I didn’t see this requirement in the getting started guide. So far so good!

1 Like

I’m running into more issues. All of the input sequences have been generated with the ARTIC bioinformatics protocol, and have <10% N content.

  1. Missing Wuhan ref – addressed.

  2. Error running augur traits – addressed by commenting line in main_workflow.smk as I think suggested by @james in issue 426.

[Mon Jun 21 20:48:02 2021]
Job 24: 
        Inferring ancestral traits for ['country_exposure']
          - increase uncertainty of reconstruction by 2.5 to partially account for sampling bias
        


        augur traits             --tree results/default-build/tree.nwk             --metadata results/default-build/metadata_adjusted.tsv.xz             --output results/defaul
t-build/traits.json             --columns country_exposure             --confidence             --sampling-bias-correction 2.5 2>&1 | tee logs/traits_default-build.txt
        
Activating conda environment: /gpfs2/well/bag/uzv018/sars2/nextstrain/ncov/.snakemake/conda/f794c0c2346483e881efe86baebefe4d

[Mon Jun 21 20:48:05 2021]
Job 26: Estimating censored KDE frequencies for tips


        augur frequencies             --method kde             --metadata results/default-build/metadata_adjusted.tsv.xz             --tree results/default-build/tree.nwk      
       --min-date 2020-11-16             --max-date 2020-01-04             --pivot-interval 1             --pivot-interval-units weeks             --narrow-bandwidth 0.05      
       --proportion-wide 0.0             --output results/default-build/tip-frequencies.json 2>&1 | tee logs/tip_frequencies_default-build.txt
        
augur traits is using TreeTime version 0.8.1
WARNING: no states found for discrete state reconstruction.
Traceback (most recent call last):
  File "/gpfs2/well/bag/uzv018/sars2/nextstrain/ncov/.snakemake/conda/f794c0c2346483e881efe86baebefe4d/bin/augur", line 10, in <module>
    sys.exit(main())
  File "/gpfs2/well/bag/uzv018/sars2/nextstrain/ncov/.snakemake/conda/f794c0c2346483e881efe86baebefe4d/lib/python3.8/site-packages/augur/__main__.py", line 10, in main
    return augur.run( argv[1:] )
  File "/gpfs2/well/bag/uzv018/sars2/nextstrain/ncov/.snakemake/conda/f794c0c2346483e881efe86baebefe4d/lib/python3.8/site-packages/augur/__init__.py", line 75, in run
    return args.__command__.run(args)
  File "/gpfs2/well/bag/uzv018/sars2/nextstrain/ncov/.snakemake/conda/f794c0c2346483e881efe86baebefe4d/lib/python3.8/site-packages/augur/traits.py", line 179, in run
    mugration_states[node.name][column+'_confidence'] = node.__getattribute__(column+'_confidence')
AttributeError: 'Clade' object has no attribute 'country_exposure_confidence'
  1. Latest error running augur frequencies

I’m not sure where to go from here. Any pointers appreciated.

[Tue Jun 22 08:40:41 2021]
Job 25: Estimating censored KDE frequencies for tips


        augur frequencies             --method kde             --metadata results/default-build/metadata_adjusted.tsv.xz             --tree results/default-build/tree.nwk      
       --min-date 2020-11-16             --max-date 2020-01-04             --pivot-interval 1             --pivot-interval-units weeks             --narrow-bandwidth 0.05      
       --proportion-wide 0.0             --output results/default-build/tip-frequencies.json 2>&1 | tee logs/tip_frequencies_default-build.txt
        
Activating conda environment: /gpfs2/well/bag/uzv018/sars2/nextstrain/ncov/.snakemake/conda/f794c0c2346483e881efe86baebefe4d
Traceback (most recent call last):   
  File "/gpfs2/well/bag/uzv018/sars2/nextstrain/ncov/.snakemake/conda/f794c0c2346483e881efe86baebefe4d/bin/augur", line 10, in <module>
    sys.exit(main())
  File "/gpfs2/well/bag/uzv018/sars2/nextstrain/ncov/.snakemake/conda/f794c0c2346483e881efe86baebefe4d/lib/python3.8/site-packages/augur/__main__.py", line 10, in main
    return augur.run( argv[1:] )
  File "/gpfs2/well/bag/uzv018/sars2/nextstrain/ncov/.snakemake/conda/f794c0c2346483e881efe86baebefe4d/lib/python3.8/site-packages/augur/__init__.py", line 75, in run
    return args.__command__.run(args)
  File "/gpfs2/well/bag/uzv018/sars2/nextstrain/ncov/.snakemake/conda/f794c0c2346483e881efe86baebefe4d/lib/python3.8/site-packages/augur/frequencies.py", line 172, in run
    frequencies = kde_frequencies.estimate(tree)
  File "/gpfs2/well/bag/uzv018/sars2/nextstrain/ncov/.snakemake/conda/f794c0c2346483e881efe86baebefe4d/lib/python3.8/site-packages/augur/frequency_estimators.py", line 1181, in
 estimate
    frequencies = self.estimate_tip_frequencies_to_proportion(tips, proportion=1.0)
  File "/gpfs2/well/bag/uzv018/sars2/nextstrain/ncov/.snakemake/conda/f794c0c2346483e881efe86baebefe4d/lib/python3.8/site-packages/augur/frequency_estimators.py", line 1094, in
 estimate_tip_frequencies_to_proportion
    normalized_freq_matrix = self.estimate_frequencies(
  File "/gpfs2/well/bag/uzv018/sars2/nextstrain/ncov/.snakemake/conda/f794c0c2346483e881efe86baebefe4d/lib/python3.8/site-packages/augur/frequency_estimators.py", line 1039, in
 estimate_frequencies
    density_matrix = cls.get_densities_for_observations(tip_dates, pivots, max_date=max_date, **kwargs)
  File "/gpfs2/well/bag/uzv018/sars2/nextstrain/ncov/.snakemake/conda/f794c0c2346483e881efe86baebefe4d/lib/python3.8/site-packages/augur/frequency_estimators.py", line 984, in 
get_densities_for_observations
    if (obs < pivots[0] or obs > pivots[-1]) or (max_date is not None and obs > max_date):
IndexError: index 0 is out of bounds for axis 0 with size 0

Are these missing label ‘possible errors’ expected, or a problem?

0.25    ***WARNING: TreeAnc._attach_sequences_to_nodes: NO SEQUENCE FOR LEAF:
        a748-lc_13_52/2020-12-22

0.25    ***WARNING: TreeAnc: 1 nodes don't have a matching sequence in the
        alignment. POSSIBLE ERROR.   

0.28    -TreeAnc.infer_ancestral_sequences with method: probabilistic, joint
0.28    --TreeAnc._ml_anc_joint: type of reconstruction: Joint

Hi @bede, thank you for sharing your progress as you work through these issues. We should definitely make it clearer that the Wuhan reference dataset is required for TreeTime and probably even check for that as early as possible, so you don’t have to get to TreeTime to find out. I’ve created an issue in the ncov repository to at least document this requirement. I also created a separate issue to implement a check for this root sequence in the workflow as early as possible.

Regarding the frequencies error, it looks like the problem might be that the max date (Jan 4, 2020) occurs before the min date (Nov 16, 2020), so there are no valid timepoints for Augur to estimate frequencies at. It is a bug in Augur that you didn’t get an error message clearly explaining this issue with the inputs and I’ve created an Augur issue for us to fix this.

It looks like once we also fix the existing issue with augur traits, resolving the other issues should get you most of the way to a full build. Let us know what happens next though after you update your frequencies dates! :slight_smile: