Hi,
I’m trying to use the steps listed in the “ingest” pipeline for Mpox. I have my own fasta files that I managed to convert to an ndjson file using the “fasta-to-ndjson” script. However, after I have combined the genbank dataset and my own dataset I get an error from what I think is a “transform-field-names” script. But it’s difficult to understand exactly why this fails:
rror in rule curate:
jobid: 1
input: data/sequences.ndjson, data/all-geolocation-rules.tsv, defaults/annotations.tsv
output: data/metadata_raw.tsv, results/sequences.fasta
log: logs/curate.txt (check log file(s) for error details)
shell:
(cat data/sequences.ndjson | ./vendored/transform-field-names --field-map "collected"="date" "submitted"="date_submitted" "genbank_accession"="accession" "submitting_organization"="institution" | augur curate normalize-strings | ./vendored/transform-strain-names --strain-regex ^.+$ --backup-fields accession | augur curate format-dates --date-fields date date_submitted --expected-date-formats %Y %Y-%m %Y-%m-%d %Y-%m-%dT%H:%M:%SZ | ./vendored/transform-genbank-location | augur curate titlecase --titlecase-fields region country division location --articles and d de del des di do en l la las le los nad of op sur the y --abbreviations USA | ./vendored/transform-authors --authors-field authors --default-value ? --abbr-authors-field abbr_authors | ./vendored/apply-geolocation-rules --geolocation-rules data/all-geolocation-rules.tsv | ./vendored/merge-user-metadata --annotations defaults/annotations.tsv --id-field accession | ./bin/ndjson-to-tsv-and-fasta --metadata-columns accession genbank_accession_rev strain date region country division location host date_submitted sra_accession abbr_authors reverse authors institution --metadata data/metadata_raw.tsv --fasta results/sequences.fasta --id-field accession --sequence-field sequence ) 2>> logs/curate.txt
(one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
Removing output files of failed job curate since they might be corrupted:
data/metadata_raw.tsv, results/sequences.fasta
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: .snakemake/log/2024-03-19T120905.859379.snakemake.log
My sequences in the ndjson format looks like this:
{"strain":"202401196","reference":"NC_063383.1","location":"Norway","collected":"2024-01-01","sequence":"ATTTTACTATTTTATTTAG...."