Error in Job 3: Exporting data files for for auspice


I’m trying to use the tool on COVID-19 samples, however I’m facing the following error:

  Job 3: Exporting data files for for auspice

        augur export v2             --tree results/global/tree.nwk             --metadata data/metadata.tsv
    --node-data results/global/branch_lengths.json results/global/nt_muts.json results/global/aa_muts.json results/global/subclades.json results/global/clades.json results/global/recency.json results/global/traits.json             --auspice-config my_profiles/covid/my_auspice_config.json             --include-root-sequence             --colors results/global/colors.tsv             --lat-longs defaults/lat_longs.tsv             --title 'Genomic epidemiology of novel coronavirus - Global subsampling'             --description my_profiles/covid/             --output results/global/ncov_with_accessions.json 2>&1 | tee logs/export_global.txt

    Validating schema of 'results/global/aa_muts.json'...
    Traceback (most recent call last):
      File "/home/charbel/miniconda3/envs/nextstrain/bin/augur", line 10, in <module>
      File "/home/charbel/miniconda3/envs/nextstrain/lib/python3.8/site-packages/augur/", line 10, in main
    return argv[1:] )
      File "/home/charbel/miniconda3/envs/nextstrain/lib/python3.8/site-packages/augur/", line 75, in run
      File "/home/charbel/miniconda3/envs/nextstrain/lib/python3.8/site-packages/augur/", line 22, in run
    return run_v2(args)
      File "/home/charbel/miniconda3/envs/nextstrain/lib/python3.8/site-packages/augur/", line 903, in run_v2
    node_data, node_attrs, node_data_names, metadata_names = parse_node_data_and_metadata(T, args.node_data, args.metadata)
      File "/home/charbel/miniconda3/envs/nextstrain/lib/python3.8/site-packages/augur/", line 863, in parse_node_data_and_metadata
    if node["strain"] in node_attrs: # i.e. this node name is in the tree
    KeyError: 'strain'

What could be causing this error? I double checked my metadata and input sequences many times now, and everything seems fine.

Note that I’ve run the pipeline on the example data and it worked fine.

Hi @cgem. The error looks like it’s coming from an improperly formatted node-data file, where a node is missing a name ('strain'). If nothing appears amiss when looking at those files then feel free to email us the files & we can take a look. It might also be worth looking at the output messages from the previous commands in the pipeline to see if anything looks amiss which could have resulted in a strange node-data file being produced.