Nextclade CLI error - Alignment matrix size exceeds maximum value

I’m running the nextclade CLI (latest version) over the recent data from GISAID EpiCoV (Submitted from 2024-12-23), and it threw about 15 warning messages that I hadn’t seen before, e.g.

2025-01-04 00:27:26.782 [W] nextclade_ordered_writer.rs:169: In sequence #6650 ‘hCoV-19/USA/CA-PFZ01901/2024|EPI_ISL_19640838|2024-09-12’: When processing sequence #6650 ‘hCoV-19/USA/CA-PFZ01901/2024|EPI_ISL_19640838|2024-09-12’: Alignment matrix size 1048210062 exceeds maximum value 500000000. The threshold can be adjusted using CLI flag ‘–max-band-area’ or using ‘maxBandArea’ field in the dataset’s pathogen.json. Note that this sequence will not be included in the results.

My command line was:
nextclade run -d sars-cov-2 --input-tree nightly.json -q --output-tsv=C:/Dev/nextclade-tools/output\ALL-2024-12-23-1.tsv C:/Dev/nextclade-tools/input\ALL-2024-12-23-1.fasta

Not a huge issue (among almost 10K samples) and probably triggered by something new in the data, but I thought I would raise it here.

I’ve just processed the last week of data and there were no further occrences of that issue, among a further ~10K samples. So it’s probably not worth putting much time into.

Hi @mike_honey

With this warning Nextclade reports that it is unable to align certain sequences. And without alignment we cannot really do much with them.

This could happen when input sequences are too divergent or of very low quality. There could be mistakes in raw uncurated data taken from databases. For example, viruses from animal hosts or sequences of an entirely different virus. This is expected to some extent.

When this happens, TSV file will then contain reasons for failures in the error and warning columns, so that you can tell which samples were not processed (and you could then extract them by name and study separately if that’s something of interest).

You could also try and give Nextclade some more computational power by increasing –max-band-area - this will significantly increase memory and CPU usage, but can make alignment algo to try and work harder on these kind of problematic sequences.

Another funny situation we encountered is that a small percentage of sequences people have are reverse-complemented. You could use flag --retry-reverse-complement such that if alignment fails, Nextclade tries to reverse-complement the original sequence and try to align the reverse-complemented one instead.

This could allow to “squeeze” some more sequences, even if they are slightly broken.