Strain frequency

Hello, How are the strain-specific frequencies for each timepoint that are shown in this visualization calculated? I tried calculating weekly average from metadata.tsv but it didn’t match. I am interested in reproducing the frequencies shown in this visualization as data.

1 Like

Hi, take a look at https://nextstrain.org/charon/getDataset?prefix=ncov/gisaid/global/all-time&type=tip-frequencies

See the “pivots” array at the end of the json, it is an array of N dates where N is the number of weeks between the most recent and the oldest strain.

Then for each tip of https://nextstrain.org/ncov/gisaid/global/all-time the json contains an array of N values, a smoothed curve with a peak at the collection date of the tip.

The sum of all these smoothed curves is 1 (the precomputation of the “per-tip frequency curves” is the KDE part).

Then sum all the curves coming from tips labeled as Alpha, it is the frequency curve for Alpha, do the same for Delta and so on.

When doing the same on a subtree the tip frequency curves don’t sum to 1 anymore, that’s the point of the “normalize frequencies” button.

1 Like

@babarlelephant this worked beautifully! Thanks for explaining!

One more question: how do I translate the “pivot” values in the json into calendar dates for each week? Example: 2020.9304 what is this? Is this year 2020 and the rest is a fraction of the year 2020?

Is this year 2020 and the rest is a fraction of the year 2020?

Yup! If you’re comfortable in JavaScript, here’s how auspice performs that conversion from a numeric value to a YYYY-MM-DD string.

Thanks @james. My date conversion is off by a day (done in R) but I think i can use it. Is the frequency data aggregated monthly at all available? I assume the weekly data is weighted so I can’t just aggregate myself.