Using existing alignment

Hi,

Is it possible to feed nextstrain my own alignment file? Meaning, how is it possible to give it my own alignment and only do the next steps on it, rather than using nextstrain’s alignment rule?
Both my context and focal datasets are aligned and have the same length.

Thanks!

Hi @Sanna – there are two alignment steps in the nCoV workflow, which is what I think you’re using?

The first alignment happens per-input, and aligns everything before we move onto the subsampling step(s). This can be skipped by defining an input like so:

inputs:
  - name: your-input-name
    metadata: "path-to-metadata-tsv"
    aligned: "path-to-alignment"

The second alignment happens after subsampling, before we build the phylogeny. There is no way to skip this currently, although it should be quick as it only happens on the subsampled sequences.

Hi @james
Thank you for your response!
Is there a way to skip the subsampling step? I have my own focal and context sequences and I would like to create a phylogeny of those only. How would I do that?
Thanks again!!

Absolutely – if you don’t explicitly supply a subsampling scheme, or specify “all”, then no subsampling will be used. E.g. the first 2 of the following builds won’t subsample, but the third will use the subsampling scheme “my-scheme” (which itself must be defined):

builds:
  buildA:
    region: global # this key won't be used, but I think you need at least one key per build?
  buildB:
    subsampling_scheme: all
  buildC:
    subsampling_scheme: my-scheme

Hi James. thank you for your response.
I tried the subsampling_scheme: all and got an error, saying that my builds file it not valid.
I looked at the defaults and it looks like the default is no subsampling:

I looked at the defaults and it looks like the default is no subsampling:

subsampling:
  # Default subsampling logic to select all strains from all inputs (i.e., no subsampling).
  all:
    all:
      no_subsampling: true

So I tried this and my builds.yaml file looks like this:

#Define inputs
inputs:
  - name: test_run
    metadata: data/metadata_delim.tsv
    aligned: data/msa.fasta

#Define builds

refine:
  root: "hCoV-19/Wuhan/WIV04/2019"

With this too, I get the following error:


Could you please help me with this?