The most basic of build help

omarkr · November 1, 2021, 11:54pm

Hello,

fairly new to manipulating scripts, and I would appreciate some guidance on the builds and parameters yamls.
followed a tutorial to create a custom profile in my_profiles, and edit the builds yaml.

im not exactly sure which entries are required, and which are optional. and where do I adjust the parameters for all the tools (for example, I do not want any entries filtered, so I dont really need and ‘exclude’ parameters). I imagine that if i do not specify some parameters, it defaults to …default values.

as an example,
I uploaded entries onto GISAID, so that will be my starting point for this nextstrain analysis. So i’d appreciate a guide from how to download from there. I know there are several ways (tar, or extracted fasta and metadata), but not sure if there is a difference between them. I’ll have roughly 80 entries, and do need to filter them in any way. All from the same country, just different states. end result should be an auspice map showing strains in each area.

The error message I get has something to do with all entries being filtered out due to ambiguous date in any. I imagine because of default values, and that my dates when submitted to GISAID were dd-mm-yyyy, whereas nextstrain needs it to be yyyy-mm-dd?

tldr; what the minimum i need in builds yaml, and where do i change values in parameter, like removing the need for filtering.

many thanks

james · November 2, 2021, 1:12am

Hi @omarkr - I suggest starting with the tutorials for running SARS-CoV-2 builds in nextstrain. Specifically, the data preparation section explains how to download data from GISAID for analysis in Nextstrain.

im not exactly sure which entries are required, and which are optional.

The simplest starting point would be a YAML as follows:

inputs:
  - name: example-data # change as needed
    metadata: data/example_metadata.tsv # change as needed
    sequences: data/example_sequences.fasta # change as needed

Which will create a default build with no subsampling. The tutorials should provide examples of how to customise the build as desired.

Please get back in touch with any specific questions!

omarkr · November 7, 2021, 2:37am

So i sort of managed to go through the general tutorial, and apply my data to it. all the way to auspice, which is good.

but im wondering if there’s anything i gain from attempting the sars cov 2 tutorial, on my sars cov 2 data, or is the output the same whichever path i choose here.

james · November 8, 2021, 10:52pm

The inputs you provide (sequences + metadata) will determine which genomes are in the final auspice visualisation. If you have specific sequences you wish to analyse, then you’ll need to provide these as inputs; otherwise you may be better off viewing some of the datasets listed on nextstrain.org/sars-cov-2 which subsample the entire dataset depending on which geographical area is of interest.

omarkr · November 11, 2021, 12:00am

I notice that using the web nextclade gives the option to download auspice.json files. Is there a way to use feed this json from web nextclade to auspice.us, but with just the sequences we provided to nextclade? basically the phylogeny that auspice generates from that json is far too populated, if i just wanted to know the relationship between by sequences and a few references.

james · November 11, 2021, 2:17am

I don’t believe so. While you could write a script to prune out the sequences in the JSON which you didn’t provide, I’d worry about the accuracy of inferences drawn from that data. (This depends on your data and what you want to know, it may be good enough for some questions.) The best approach would be to run this data through the nCoV workflow.

P.S. I would suggest adding in some contextual (background) sequences to preserve the overall structure of the tree, but without knowing what data you are analysing it’s hard to say more.

Topic		Replies	Views
Regarding Build for USA- Missing Data Help and Getting Started	9	540	October 27, 2021
Guide to filtering GISAID data for division-specific SARS-CoV-2 builds Help and Getting Started	3	1511	April 17, 2024
Modifying builds.yaml file Help and Getting Started	1	516	August 15, 2020
Only global build found in ./auspice General	4	568	October 23, 2020
Wyoming Build Request Help and Getting Started	1	467	September 19, 2020

The most basic of build help

Related topics