Question about "augur refine"

Dear mentors,

I have a question regarding the --coalescent setting in the augur refine command, which offers three configurations: a coalescent time scale in units of inverse clock rate (float), optimization as a scalar (opt), or a skyline plot (skyline). How do we determine which setting is most appropriate for our data? I’ve noticed that the choice varies with different pathogens, for instance, Zika uses opt, some influenza related articles mentioned “skyline”. If we aim to construct a time-resolved tree for seasonal influenza, such as H1N1pdm09 or H3N2, how should we select the appropriate setting? Is this option essential to choose (or could I directly use this code to construct time resolved tree: augur refine -a HA_aligned.fasta --tree HA_raw.nwk --metadata HA_metadata.tsv --timetree --divergence-units mutations --output-tree HA_tree.time.nwk --output-node-data HA_refine.node.json; do I miss any necessary setting)? I’m somewhat puzzled by this and appreciate your guidance!

Hi @Emma316, the option to choose for the coalescent argument to augur refine depends primarily on the information you have about your system and the complexity of the coalescent model you need for your analysis. The best place to start on this topic is with the TreeTime paper (Sagulenko et al. 2018) which describes the algorithms behind the augur refine command. Specifically, section 2.5 of that paper describes the different coalescent models provided by TreeTime.

If you know the coalescent rate for your system, you can specify that value as a floating point value to the --coalescent argument. If you want TreeTime to infer the coalescent rate for you as a single scalar value, you can use the “opt” or “const” values to the argument. If you want to allow the coalescent rate to change through time, you specify the “skyline” argument (see the paper for more details and a reference for this). For seasonal flu, we use the “const” argument. This inference of a scalar coalescent rate is common across Nextstrain analyses including Zika, measles, and SARS-CoV-2. Ebola is an exception to the rule and used as an example of the skyline method in the TreeTime documentation. When in doubt, you probably want to stick with the “opt” argument to the coalescent.