Error in Job 3: Exporting data files for for auspice


I’m trying to use the tool on COVID-19 samples, however I’m facing the following error:

  Job 3: Exporting data files for for auspice

        augur export v2             --tree results/global/tree.nwk             --metadata data/metadata.tsv
    --node-data results/global/branch_lengths.json results/global/nt_muts.json results/global/aa_muts.json results/global/subclades.json results/global/clades.json results/global/recency.json results/global/traits.json             --auspice-config my_profiles/covid/my_auspice_config.json             --include-root-sequence             --colors results/global/colors.tsv             --lat-longs defaults/lat_longs.tsv             --title 'Genomic epidemiology of novel coronavirus - Global subsampling'             --description my_profiles/covid/             --output results/global/ncov_with_accessions.json 2>&1 | tee logs/export_global.txt

    Validating schema of 'results/global/aa_muts.json'...
    Traceback (most recent call last):
      File "/home/charbel/miniconda3/envs/nextstrain/bin/augur", line 10, in <module>
      File "/home/charbel/miniconda3/envs/nextstrain/lib/python3.8/site-packages/augur/", line 10, in main
    return argv[1:] )
      File "/home/charbel/miniconda3/envs/nextstrain/lib/python3.8/site-packages/augur/", line 75, in run
      File "/home/charbel/miniconda3/envs/nextstrain/lib/python3.8/site-packages/augur/", line 22, in run
    return run_v2(args)
      File "/home/charbel/miniconda3/envs/nextstrain/lib/python3.8/site-packages/augur/", line 903, in run_v2
    node_data, node_attrs, node_data_names, metadata_names = parse_node_data_and_metadata(T, args.node_data, args.metadata)
      File "/home/charbel/miniconda3/envs/nextstrain/lib/python3.8/site-packages/augur/", line 863, in parse_node_data_and_metadata
    if node["strain"] in node_attrs: # i.e. this node name is in the tree
    KeyError: 'strain'

What could be causing this error? I double checked my metadata and input sequences many times now, and everything seems fine.

Note that I’ve run the pipeline on the example data and it worked fine.

Hi @cgem. The error looks like it’s coming from an improperly formatted node-data file, where a node is missing a name ('strain'). If nothing appears amiss when looking at those files then feel free to email us the files & we can take a look. It might also be worth looking at the output messages from the previous commands in the pipeline to see if anything looks amiss which could have resulted in a strange node-data file being produced.

Hello @james,Thanks for your reply.
It turned out that the error was caused due to a different ordering of the samples in the FASTA and metadata files. The pipeline ran to completion after fixing the issue and I managed to visualize the results in auspice.
I’ll leave the post up for anyone who might face the same issue in the future.

Does the order still matter? How did you find out and go about fixing this?

im having the same issue but since im dealing with 20k entries, not sure how to begin fixing it.

Hi @omarkr - the order of entries in FASTA vs metadata doesn’t matter. Could you post the error message you are getting?