Error in jobid: 22


I am trying to run some custom/local COVID sequencing data through the nextstrain custom data tutorial, replacing the suggested GISAID data with local data. I was able to run the custom data tutorial as written, and also using downloaded GISAID data. I just have a problem with the local data. I tried running a minimized version of the local data using the columns that the data preparation tutorial defines as required: strain, date,

I get an error in job 22:

Note: You did not provide a sequence index, so Augur will generate one. You can generate your own index ahead of time with augur index and pass it with augur filter --sequence-index.
ERROR: Query contains a column that does not exist in metadata.
[Mon Sep 5 20:04:05 2022]
Error in rule filter:
jobid: 22
output: results/custom-build/filtered.fasta, results/custom-build/filtered_log.tsv
log: logs/filtered_custom-build.txt (check log file(s) for error message)

    augur filter             --sequences results/custom-build/masked.fasta             --metadata results/custom-build/metadata_with_index.tsv             --include defaults/include.txt             --query '(`reference_data` == '"'"'yes'"'"' & _length >= 27000) | (`custom_data` == '"'"'yes'"'"' & _length >= 27000)'             --max-date 2022-09-06             --min-date 2019-12-01             --exclude-ambiguous-dates-by any             --exclude defaults/exclude.txt results/custom-build/excluded_by_diagnostics.txt             --exclude-where division='USA'            --output results/custom-build/filtered.fasta             --output-log results/custom-build/filtered_log.tsv 2>&1 | tee logs/filtered_custom-build.txt;

    (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

The obvious fix would have been to ensure that the metadata contained all the columns in the query - but I tried to include everything in my source metadata (including length), but it doesn’t translate when nextstrain writes metadata_with_index.tsv. That file contains functional reference data / custom data flags, but only has length values for reference data. The custom column I made in the original metadata gets re-labeled as “_length_x”. I think this is the source of my issue, but I cannot figure out how to overcome it. Is there a disconnect between the FASTA and my metadata? I can run this fasta through nextclade on the web, and the fasta gets through the masking stage without issue. Help!

Complete log: 2022-09-05T200355.712908.snakemake.log - Google Drive