Tutorial/example for combining new sequences with GISAID data

I posted this on the GitHub issues page but I think that wasn’t the right spot for this kind of request. Thank you for all of your incredible resources! I was wondering if it would be possible to add a tutorial or set of example files for explicitly walking beginners through a use case many of them have: when you’ve sequenced your own samples and want to build a tree using them + some appropriate contextual samples from GISAID. Right now it seems like the tutorials mostly focus on subsampling from GISAID or on preparing your own samples and metadata from scratch, and it’s been a little challenging figuring out how to configure YAML files and so forth for combining both (for people like us who are not really fluent in this whole ecosystem). If something like this already exists, my apologies and feel free to point me there!

Hi @aamtemp - we have a number of tutorials in the docs but none of them cover the exact scenario you describe, as querying GISAID for contextual sequences isn’t currently possible. We do have contextual GenBank datasets publicly available - you could add in the entire alignment and subsample accordingly, or add in a pre-subsampled alignment to use as context.

I do something similar for Influenza on a regular basis. I can produce a small tutorial that shows the tools I use. It’s actually figtree + seqkit + augur repeated until a representative tree is achieved.

However, this is not automated and requires manual selection of appropriate contextual viruses. In the cause of influenza this is never static so requires constant adjustment (for our needs anyway). But it is possible to automate if you are focusing on a geographical region.

1 Like