Subsampling sequences genetically related to a focal sample

enelson · January 14, 2022, 9:59pm

Hi, I’m confused about the format for custom subsampling, and the instructions in nextstrain docs are sort of sparse on this. I’m wondering if you might look at my builds.yaml file to see if I’m going to get what I want out of the build.

My input files contain data from California (first sequences) and from all over the USA (remaining sequences). I want to build a tree with N_1 CA sequences, and subsample N_2 sequences from USA that they are genetically similar to the CA sequences. I’m not sure if what I’m doing is right.

Here is the text of my build file with N_1=N_2=500. . it’s just a slightly modified version of the “custom-county” subsampling section in the ncov/example/builds.yaml file. Will this give me what I want? :

inputs:

name: shared-id-gisaid-data
metadata: data/shared-id-gisaid_metadata.tsv
sequences: data/shared-id-gisaid.fasta.gz

builds:
shared-id-gisaid-build:
subsampling_scheme: custom-division
region: North America
country: USA
division: California

subsampling:
custom-division:
focal:
group_by: “division”
max_sequences: 500
query: --query “(country == ‘{country}’) & (division == ‘{division}’)”
related:
group_by: “division”
max_sequences: 500
exclude: “–exclude-where ‘division={division}’”
priorities:
type: “proximity”
focus: “focal”

files:
auspice_config: “my_profiles/example/my_auspice_config.json”
description: “my_profiles/example/my_description.md”
include: “defaults/include.txt”

Topic		Replies	Views
Using existing alignment Help and Getting Started	5	536	January 29, 2022
Regarding Build for USA- Missing Data Help and Getting Started	9	540	October 27, 2021
Subsampling and Data Download Help and Getting Started	2	564	March 19, 2021
Subsampling Local DENV dataset based on genetic similarity Help and Getting Started	1	277	December 19, 2023
Using Genomic Epidemiology from GISAID Help and Getting Started	1	339	November 5, 2021

Subsampling sequences genetically related to a focal sample

Related topics