Titer measurement formats and conversion to csv for nextflu

I have csv files that were parsed from the published WIC-Crick reports. I believe these are in table format (strain vs strain). How do I put these in the format that the nextflu pipeline uses? I was able to get a rethinkdb instance working, but initializing the database is a blocker. I am not sure if the fauna repository has any scripts from creating the necessary tables in tdb with the necessary fields. Is there a basic script I can easily modify to convert the csvs into one csv file while utilizing the error correction and nextflu-compatible fieldnames?

HI @adalisan, the Crick reports are a great resource! Since titer data are a relatively niche component of the Nextstrain ecosystem, we don’t have formal documentation about how to prepare these data for use in a Nextstrain analysis. The best starting point would be the Nextstrain workflow we developed as part of Lee and Hadfield et al. 2023. This workflow shows how we format titer data and pass them to Augur commands like augur titers. It also shows how to build a measurements panel to visualize these data on a quantitative scale as discussed in the paper.

That workflow starts from a previously curated table of public titer measurements that were originally published in Bedford et al. 2014.You can find these data and a corresponding data curation guide for the sequences used in the workflow mentioned above in the data subdirectory of the same GitHub repository. You can also jump directly to the HI titer data to get a sense of their format.

When working with these kinds of CSV or TSV data, we typically use third-party tools like tsv-utils and csvtk to accomplish standard tasks like renaming or reordering columns, adding new columns, etc.

@jlhudd Regarding the integration of this data for input into augur titers, and with GISAID metadata, how do you make sure the strain names match? Are the standardized names based on GISAID strain names, or is there a standardized strain name format like A//<IsolateNumber/ID>/ ?

@adalisan Complete strain names for influenza viruses collected from humans should be formatted as <subtype>/<location>/<some id>/<year> and what you find in GISAID will usually follow that pattern.

The titer data will sometimes include abbreviations of the reference strain names (the names of the viruses used to raise antisera) that do not follow this standard (e.g., DARWIN/6). In these cases, we often have to make our best guess about the corresponding full strain name based on temporal and genetic context (e.g., H3N2 HA strains from 2023-2024 narrows down the options a bit to A/Darwin/6/2021). Then we build a map of the abbreviations to the complete names to use when preparing the titer data tables for us in Augur. Whether the titers include these abbreviations or other approaches often depends on the center performing the assays, though.