Hi, all the metadata is filled (including precise locations for most sequences) http://babarlelephant.free-hoster.net/dist/index_sarbecovirus-1000-12000.html?c=clade_membership&ci&p=full&r=division&transmissions=hide
ORF1a is easier to align, not many recombinations, the substitution rate is mostly uniform and not too low, and PrC31 becomes very close to SARS-CoV-2.
The clock rate was estimated by comparing the ORF1a SNP of the May 2021 SARS-CoV-2 sequences with Wuhan-Hu-1, obtaining 0.0005438subs/site/year.
The confidence intervals given by Treetime are very tight, I wonder if this is true or if I chose a wrong param.
My other question, is it possible to upload the json, fasta, metadata and Snakemake to nextstrain/community?
Thanks a lot.
rule refine:
message:
"""
Refining tree
- estimate timetree
- use {params.coalescent} coalescent timescale
- estimate {params.date_inference} node dates
"""
input:
tree = rules.tree.output.tree,
alignment = rules.align.output,
metadata = input_metadata
output:
tree = "results/tree.nwk",
node_data = "results/branch_lengths.json"
params:
coalescent = "opt",
date_inference = "marginal",
root = "internal-node-root",
clock_rate = 0.0005438
shell:
"""
augur refine \
--tree {input.tree} \
--alignment {input.alignment} \
--metadata {input.metadata} \
--output-tree {output.tree} \
--output-node-data {output.node_data} \
--coalescent {params.coalescent} \
--timetree \
--date-inference {params.date_inference} \
--date-confidence \
--keep-polytomies \
--precision 1 \
--clock-rate {params.clock_rate}
"""