Using influenza datasets in clades.nextstrain.org

jmediavi · March 11, 2025, 1:41pm

Hello, I have experience using https://clades.nextstrain.org/ for SARS-CoV-2 genome sequence analysis, but am having difficulty with influenza A.

I am trying to use it primarily to identify mutations in partial HA and NA fragments.

Whereas for SARS-CoV-2, you can upload a partial sequence (e.g. partial spike gene fragment) and get a result, when I try to do it with different influenza A reference sets, it gives error messages saying “Unable to align: seed alignment was unable to find any matches that are long enough.”

I tried using a full-length NA segment against the Influenza A H1N1pdm NA reference dataset, and it still gave the same error.

Is it necessary to have the full-length HA or NA sequence for it to work?

Thank you in advance,
Jose

rneher · March 15, 2025, 10:33am

hi @jmediavi

partial sequences for HA and NA should definitely work. Could you share one such example?

richard

jmediavi · March 19, 2025, 9:22pm

Hi Richard,

Sorry for the delay in responding. Attached is an example of a Sanger fragment from the NA gene of an H1N1 strain.

The sequence is also pasted below for convenience.
It is 525 bp long, but when I try to upload it to NextClade using the Influenza A H1N1pdm NA reference set, it gives the following error message:
When calculating seed matches: Unable to align: seed alignment was unable to find any matches that are long enough. Only matches of at least 40 nucleotides long are considered (configurable using ‘min match length’ CLI flag or dataset property). This is likely due to low quality of the provided sequence, or due to using incorrect reference sequence.

Please let me know if I am doing something wrong…

Thanks,
Jose

83_R-NA_1157_R_B11.ab1
TTCTCCCTATCNNNACACCATTGCCGTATTTAAATGAAAATCCCTTTACCCCATTTGCTCCATTAGACGATACTGGGCCACAACTGCCTGTCTTATCATTAGGGCGTGGATTGTCTCCGAAAACCCCACTGCATATGTATCCCATCTGATATTCCAGATTCTGGTTGAAAGACACCCAAGGTCGATTTGAGCCATGCCAATTATCCCTGCACACACATGTGATTTCACTAGAATCAGGGTAACAGGAGCATTCTTCATAGTGATAATTAGGGGCCTTCATTTCGACTGATTTGGTTATCTTTCCCTTCTCTATTCTGAAGATTTTGTATGAGGCCTGTCCATCACTTGGTCCATCGGTCATTATGGTAAAGCAAGAACCATTTACACATGCACATTCAGACTCTTGTGTTCTCAATATCTTATTCCTCCAACTCTTGATAGTGTCTGTTATTATGCCATTGTATTTTAACACAGCCACTGCCCCATTGTCTGGGCCANAAATTCCNATTNNTAGCCAATTGGNGC

(attachments)

influenza test.txt (550 Bytes)

rneher · March 20, 2025, 7:19am

I think your sequence is reverse complemented. Try rerunning the reverse complement.

best,
richard

jmediavi · March 20, 2025, 12:46pm

Hi Richard,

Well, that’s embarrassing! I thought NextClade would have interpreted sequence in either direction, thanks for clarifying!

Regards
Jose

rneher · March 20, 2025, 1:06pm

no worries. For some viruses, we allow reverse complements. But for flu this is apparently not switched on.

jmediavi · March 20, 2025, 1:18pm

One more question, my segment was identified as NA clade C.5.3.1

I cannot find too much information on this clade – is there a list somewhere of known influenza A clades, to try to put it in context?

Thanks again,
Jose

rneher · March 20, 2025, 2:17pm

these are just labels; there is no phenotypic significance attached to them.

Topic		Replies	Views
Nextclade H5N1 Concatenated view B3.13 and D1.1 genotypes Site Feedback	3	317	January 10, 2025
Error with a flu reference sequence for alignment Help and Getting Started	8	383	March 27, 2024
Clade difference between influenza HA references Help and Getting Started	1	622	August 10, 2023
How to use Nextstrain seasonal flu graph as reference tree for Nextclade? Help and Getting Started	1	556	October 5, 2023
NextClade Variant Calling info General	11	873	June 20, 2024

Using influenza datasets in clades.nextstrain.org

Related topics