Query about M2-2 gene annotation discrepancy in Nextstrain RSV-A module

Dear Nextstrain Team,

I hope this message finds you well. I am writing to seek clarification regarding the annotation of the M2-2 gene in the RSV-A module on Nextstrain.

I noticed a discrepancy between the M2-2 gene coordinates in Nextstrain and those in the NCBI reference sequence (PP109421.1):

  • Nextstrain annotation: M2-2 = 8193–8465 (+)
  • NCBI (PP109421.1): M2-2 = 8199–8465 (+)

This 6-nucleotide difference at the start of the gene results in a translation discrepancy:

  • Nextstrain translation: TTMPKIMILPDKYPCSINSILITSNYRVTMYNQKNTLYINQNNQNSHIYPPDQPFNEIHWTSQDLIDATQNFLQHLGITDDIYTIYILVS*
  • NCBI translation: MPKIMILPDKYPCSINSILITSNYRVTMYNQKNTLYINQNNQNSHIYPPDQPFNEIHWTSQDLIDATQNFLQHLGITDDIYTIYILVS* (lacking the initial “TT”).

Could you kindly clarify:

  1. What is the basis for the 8193 start position in Nextstrain (e.g., experimental evidence, historical annotation, or alignment-based inference)?
  2. How should this discrepancy be resolved for downstream analyses? Is there a recommended canonical annotation for RSV-A M2-2?

Thank you for your time and expertise. I greatly appreciate your work in maintaining this invaluable resource.

Best regards,
Jingqi Yang

Dear Jingqi Yang,

thank you for bringing this to our attention. The underlying reason is that we used to use to annotation of a related strain LR699737 which (for reasons unclear to me) has added the six nucleotides coding for TT to the M2-2 protein. I will review this and correct as necessary.

thanks again,
richard

thanks again. I have fixed the annotations. The changes should be live soon.

Thank you very much for fixing the annotations - I really appreciate your corrections and the work you put into making those updates.