Error in rule colours: jobid 5

Hi,

I’m currently getting started with the nanopore sequencing for covid strain testing at Sickkids DPLM. I’m use to Illumina WES/WGS sequencing so this is something totally different for me.

I ran the installation and setup on our HPC centos7 with no issues using my own installation of miniconda3.

Unfortunately, when trying to run the example step:
snakemake --cores 1 --profile ./my_profiles/getting_started

I’m getting the following error:
[Wed Oct 14 06:09:17 2020]
Job 5: Constructing colors file

    python3 scripts/assign-colors.py             --ordering defaults/color_ordering.tsv             --color-schemes defaults/color_schemes.tsv             --output results/global/colors.tsv             --metadata data/example_metadata.tsv 2>&1 | tee logs/colors_global.txt

Traceback (most recent call last):
File “scripts/assign-colors.py”, line 22, in
for line in f.readlines():
File “/hpf/largeprojects/pray/llau/programs/miniconda3/miniconda3_2020/envs/nextstrain/lib/python3.6/encodings/ascii.py”, line 26, in decode
return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: ‘ascii’ codec can’t decode byte 0xc3 in position 7071: ordinal not in range(128)
[Wed Oct 14 06:09:18 2020]
Error in rule colors:
jobid: 5
output: results/global/colors.tsv
log: logs/colors_global.txt (check log file(s) for error message)
shell:

    python3 scripts/assign-colors.py             --ordering defaults/color_ordering.tsv             --color-schemes defaults/color_schemes.tsv             --output results/global/colors.tsv             --metadata data/example_metadata.tsv 2>&1 | tee logs/colors_global.txt
    
    (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

The log file has the same error as above this:
(nextstrain) [llau@qlogin11 logs]$ cat colors_global.txt
Traceback (most recent call last):
File “scripts/assign-colors.py”, line 22, in
for line in f.readlines():
File “/hpf/largeprojects/pray/llau/programs/miniconda3/miniconda3_2020/envs/nextstrain/lib/python3.6/encodings/ascii.py”, line 26, in decode
return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: ‘ascii’ codec can’t decode byte 0xc3 in position 7071: ordinal not in range(128)

I’ve tried to add :
with open(args.ordering, ‘r+’, encoding=‘utf-8’) as f:
Unfortunately, that caused more errors downstream in the write out:
f.write(trait_name + “\t” + trait_value + “\t” + color + “\n”)

Any help would be greatly appreciated! Thanks so much!
Lynette

Welcome, @llau! I believe the issue here is that your HPC system is using an ASCII (not UTF-8) locale (like C) but the assign-colors.py script assumes a UTF-8 locale (like en_CA.UTF-8).

For troubleshooting purposes, can you run the locale command on your HPC system and paste the output here?

The long-term fix is to update the assign-colors.py script to be explicit about its file encoding instead of assuming the locale’s encoding will be UTF-8. As a workaround in the meantime, however, you can try running snakemake after overriding the default locale to use a UTF-8 one. For example:

LC_ALL=en_CA.UTF-8 snakemake --cores 1 --profile ./my_profiles/getting_started

We fixed a similar issue in Augur itself earlier this year, but your problem seems to be within the ncov-specific assign-colors.py.

Let me know what locale says and if the workaround above works for you?

Hi trs! Thanks so much for your help!!!

(nextstrain) [llau@qlogin11 ~]$ locale

locale: Cannot set LC_CTYPE to default locale: No such file or directory

locale: Cannot set LC_MESSAGES to default locale: No such file or directory

locale: Cannot set LC_ALL to default locale: No such file or directory

LANG=C.UTF-8

LC_CTYPE=“C.UTF-8”

LC_NUMERIC=“C.UTF-8”

LC_TIME=“C.UTF-8”

LC_COLLATE=“C.UTF-8”

LC_MONETARY=“C.UTF-8”

LC_MESSAGES=“C.UTF-8”

LC_PAPER=“C.UTF-8”

LC_NAME=“C.UTF-8”

LC_ADDRESS=“C.UTF-8”

LC_TELEPHONE=“C.UTF-8”

LC_MEASUREMENT=“C.UTF-8”

LC_IDENTIFICATION=“C.UTF-8”

LC_ALL=
(nextstrain) [llau@qlogin11 ~]$

I’m trying the workaround now!

Ah, I think these warning messages from locale are telling. The C.UTF-8 locale that’s set is a UTF-8 locale (so that’s good!), but it appears unsupported on your HPC system (so that’s bad!). It appears this means that something (Python or the system) ends up falling back to the basic C locale, which uses ASCII.

Thanks so much @trs!

Good to know - I have contacted our HPC admins to help as I’m beta testing there roll out of centos7.

I have included one of the samples sequenced here in the GSAID fasta and meta data file. Do you know how long approximate it will run for and how much RAM is needed? I used 4 cores but it got killed as it exceeded the 32Gb I requested after ~10hrs. I have tried again with 100Gb…

Thanks again for the help!

Lynette

@llau I’d be curious to hear if you got this working or not. :slight_smile:

Apologies for missing your question about the runtime and RAM. I don’t have a good handle on the current numbers for those, but we could probably dig up some metrics from our own recent runs if they’d still be helpful.

Hi @trs!

Yes! I did! Thanks so much! I managed to get it working and finished in ~10hrs using 64Gb and 4 threads! But this is using all the gsaid sequences with ours.

Thanks so much again!

Awesome! Glad to hear it’s working!