Titer measurement formats and conversion to csv for nextflu

adalisan · July 17, 2024, 9:58pm

I have csv files that were parsed from the published WIC-Crick reports. I believe these are in table format (strain vs strain). How do I put these in the format that the nextflu pipeline uses? I was able to get a rethinkdb instance working, but initializing the database is a blocker. I am not sure if the fauna repository has any scripts from creating the necessary tables in tdb with the necessary fields. Is there a basic script I can easily modify to convert the csvs into one csv file while utilizing the error correction and nextflu-compatible fieldnames?

jlhudd · July 18, 2024, 4:15pm

HI @adalisan, the Crick reports are a great resource! Since titer data are a relatively niche component of the Nextstrain ecosystem, we don’t have formal documentation about how to prepare these data for use in a Nextstrain analysis. The best starting point would be the Nextstrain workflow we developed as part of Lee and Hadfield et al. 2023. This workflow shows how we format titer data and pass them to Augur commands like augur titers. It also shows how to build a measurements panel to visualize these data on a quantitative scale as discussed in the paper.

That workflow starts from a previously curated table of public titer measurements that were originally published in Bedford et al. 2014.You can find these data and a corresponding data curation guide for the sequences used in the workflow mentioned above in the data subdirectory of the same GitHub repository. You can also jump directly to the HI titer data to get a sense of their format.

When working with these kinds of CSV or TSV data, we typically use third-party tools like tsv-utils and csvtk to accomplish standard tasks like renaming or reordering columns, adding new columns, etc.

adalisan · July 30, 2024, 4:28pm

@jlhudd Regarding the integration of this data for input into augur titers, and with GISAID metadata, how do you make sure the strain names match? Are the standardized names based on GISAID strain names, or is there a standardized strain name format like A//<IsolateNumber/ID>/ ?

jlhudd · July 30, 2024, 5:54pm

@adalisan Complete strain names for influenza viruses collected from humans should be formatted as <subtype>/<location>/<some id>/<year> and what you find in GISAID will usually follow that pattern.

The titer data will sometimes include abbreviations of the reference strain names (the names of the viruses used to raise antisera) that do not follow this standard (e.g., DARWIN/6). In these cases, we often have to make our best guess about the corresponding full strain name based on temporal and genetic context (e.g., H3N2 HA strains from 2023-2024 narrows down the options a bit to A/Darwin/6/2021). Then we build a map of the abbreviations to the complete names to use when preparing the titer data tables for us in Augur. Whether the titers include these abbreviations or other approaches often depends on the center performing the assays, though.

Topic		Replies	Views
Titer measurement of seasonal influenza General	2	354	July 17, 2024
How to download avian influenza fasta and metadata files from GISAID or GenBank in a compatible format? Help and Getting Started	1	532	December 15, 2023
Updated example command needed for updated GISAID file	4	563	August 30, 2021
Error with a flu reference sequence for alignment Help and Getting Started	8	308	March 27, 2024
Inferring pathogen.json from a nextclade.tsv results	0	137	April 16, 2024

Titer measurement formats and conversion to csv for nextflu

Related topics