Can quality control other poxvirus genomes?

In our study we wanted to perform comparative genomic analysis of monkeypox virus and other pox viruses. So we want to perform quality control on all genomes by using nextclade. but monkeypox virus is successful in getting qc.overallScore results, but other pox viruses (e.g. Nile crocodilepox virus NC_008030) are unsuccessful, reporting the error message: 2022-09-19 16:49:20.138 [W] In sequence #32 ‘NC008030’: Unable to align: no seed matches. Details: number of seed matches: 1. Note that this sequence will not be included in the results.

The alignment algorithm, detection of mutations and calculation of various metrics in Nextclade is mostly only suited for samples that are similar to the reference sequence.

In your case, the virus diverges from the monkeypox so much that Nextclade cannot even find seeds for alignment (seeds are small fragments that we compare between reference and query sequence; and in case of a match we can roughly align sequences, making the process very fast). So that will probably not work.

Nextclade decides what the reference sequence is and what other parameters of the virus are based on a so called “dataset”. For example, you are using one of the monkeypox datasets. And someone else might be using a SARS-CoV-2 dataset, which will not work for monkeypox for example.

You could add a new dataset, by providing a directory with dataset files. You could take monkeypox dataset as a starting point and modify it to suit your needs. For example, you could change sequence (reference.fasta) and gene annotation (genemap.gff) as well as create a reference tree (tree.json), that are more suitable for your virus.

The current datasets are stored in this GitHub repository:

And this is the latest hMPXV dataset:

The reference trees are prepared using Nextstrain Augur here:

More information about datasets is available in the documentation: