Filter parameters for all-time global SARS-CoV-2 dataset

Hi all,

I’m using the SARS-CoV-2 all-time global dataset ( auspice ) to generate a small tree by adding some new samples.

Is there any documentation available about the filters that are applied to generate the sets for this tree?

Best regards,
Diego

1 Like

Hi @diegogotex,

There is no specific documentation page, but the workflow configuration file has all the information. Here are the relevant entries for the ncov/global/all-time dataset.

builds:
  global_all-time:
    subsampling_scheme: nextstrain_global_all_time

subsampling:
  # Custom subsampling logic for global region over all-time
  # 4320 total (expect ~3200)
  nextstrain_global_all_time:
    all:
      group_by: "country year month"
      group_by_weights: "defaults/population_weights.tsv"
      max_sequences: 4320

It aims to sample evenly over monthly intervals, with country-level population weighting used within each month’s sample, for a maximum of 4320 sequences in total.

– Victor

1 Like

Thank you @victorlin!

1 Like