Date confidence, ORF1a of Sarbecovirus, 104 genomes

Hi, all the metadata is filled (including precise locations for most sequences)

ORF1a is easier to align, not many recombinations, the substitution rate is mostly uniform and not too low, and PrC31 becomes very close to SARS-CoV-2.

The clock rate was estimated by comparing the ORF1a SNP of the May 2021 SARS-CoV-2 sequences with Wuhan-Hu-1, obtaining 0.0005438subs/site/year.

The confidence intervals given by Treetime are very tight, I wonder if this is true or if I chose a wrong param.

My other question, is it possible to upload the json, fasta, metadata and Snakemake to nextstrain/community?

Thanks a lot.

rule refine:
        Refining tree
          - estimate timetree
          - use {params.coalescent} coalescent timescale
          - estimate {params.date_inference} node dates
        tree = rules.tree.output.tree,
        alignment = rules.align.output,
        metadata = input_metadata
        tree = "results/tree.nwk",
        node_data = "results/branch_lengths.json"
        coalescent = "opt",
        date_inference = "marginal",
        root = "internal-node-root",
        clock_rate = 0.0005438
        augur refine \
            --tree {input.tree} \
            --alignment {input.alignment} \
            --metadata {input.metadata} \
            --output-tree {output.tree} \
            --output-node-data {output.node_data} \
            --coalescent {params.coalescent} \
			--timetree \
            --date-inference {params.date_inference} \
            --date-confidence \
            --keep-polytomies \
            --precision 1 \
            --clock-rate {params.clock_rate}

Beast (same alignement and fixed clock rate) agrees with treetime

My take here is that model mispecification and time dependent rates of evolution will render this extrapolation pretty meaningless. Doing the same with HIV gives results that are orders of magnitude off. I would not read much into it.