Question about frequency values in builds files

I have a question about the seasonal-flu builds of nextflu. I thought the “frequency” column in the output file builds/<build_name>//forecast_.tsv is actually the frequency estimate of a tip/strain at the current timepoint. And the sums of those values across different groupings(haplotype, subclade, lbi value ranges) is also what would be shown in the auspice visualization for that timepoint.
Are my assumptions correct or did I misinterpret things?

@adalisan Your assumptions are generally correct! The forecasts TSV depends on the tip frequencies JSON that you see visualized in Auspice, but there is a forecast-specific change to the frequencies in the data flow between the frequencies JSON and the forecasts TSV. The main steps are:

  1. The workflow converts frequencies JSON to TSV. During this process, tips with frequency <0.00001 get set to 0 and all frequencies get renormalized to sum to 1. This is the main reason you would see a difference between the frequency column of the forecast TSV and what appears in Auspice. This renormalization minimizes the number and effect of very low frequency sequences on the forecasts.
  2. The frequencies TSV gets joined with other node data and ends up in the tip attributes TSV
  3. The forecast tips rule uses frequency per strain from tip attributes table to make the forecast TSV table.
  4. All tips with nonzero frequencies at the forecast timepoint get records in the forecast TSV.

Which forecast models are you running? Let me know if you have other questions or issues with this part of the workflow. As far as I know, I’m the only person who regularly uses it, so any feedback to improve it would be great. :slight_smile:

Thank you for your quick reply. I am currently running ne_star+lbi forecast model for H3N2, as working with HI titer data is a bit tricky at the moment. This is limiting for h1n1 and B/Victoria subtypes, so eventually I will need to use HI titer models.

1 Like

Cool! We do have non-epitope sites defined for H1N1pdm based on the Caton et al. 1982 definition of epitopes. Coordinates in that non-epitope file are the inverse of the “epitope” sites file which we sourced from Table S1 of Igarashi et al. 2010.