Hi all,
I’m using the SARS-CoV-2 all-time global dataset ( auspice ) to generate a small tree by adding some new samples.
Is there any documentation available about the filters that are applied to generate the sets for this tree?
Best regards,
Diego
1 Like
Hi @diegogotex,
There is no specific documentation page, but the workflow configuration file has all the information. Here are the relevant entries for the ncov/global/all-time
dataset.
builds:
global_all-time:
subsampling_scheme: nextstrain_global_all_time
subsampling:
# Custom subsampling logic for global region over all-time
# 4320 total (expect ~3200)
nextstrain_global_all_time:
all:
group_by: "country year month"
group_by_weights: "defaults/population_weights.tsv"
max_sequences: 4320
It aims to sample evenly over monthly intervals, with country-level population weighting used within each month’s sample, for a maximum of 4320 sequences in total.
– Victor
1 Like