Filter for create Dengue phylogenetic tree

Hello, Nextstrain developer, I have a question to ask you about the filter for creating dengue phylogenetics. When I attempt to filter all sequences from my local sample to display on the dengue phylogenetic tree, I find that our sequence is deleted from the main phylogenetic tree after running nextstrain build. Do you have a suggestion for this question?

Hi @Soon_tare, I’m not a Nextstrain developer, but I think it would be helpful to them if you can copy-paste to show the configuration code where you’re attempting to include all local samples. How many local samples do you have?


@AngieHinrichs Thank you for your kindness. This is the config file I use. Actually, we modified a few configs like path of location, but most of the file is like the original, and we have around 200 samples.

Hi @Soon_tare,

Just to confirm, did you modify the workflow to use the new config parameters serotype, sequences_local and metadata_local? These parameters are not supported in Nextstrain’s dengue workflow, so if you did not modify the workflow then you’re data are not included in the analysis.

-Jover

Hi @joverlee ,

Thanks a lot for your recommendation. We tried modifying some data and updated the config for running Nextstrain. But the main problem is that when we generate the phylogenetic tree, our local data doesn’t show up. Do you have any suggestions for us?

Best,
Jane

Hi Jane,

I see you have in the config file:

sequences_local: "../try/example_data/sequences_denv3.fasta"
metadata_local: "../try/example_data/metadata_denv3.tsv"

In case you haven’t already, you may need to modify the filter rule (or connect the config) in phylogenetic/rules/prepare_sequences.smk#L58-L59 to something like:

input:
        sequences = "../try/example_data/sequences_{serotype}.fasta",
        metadata = "../try/example_data/metadata_{serotype}.tsv",

Then I would manually add sequence IDs for your samples to phylogenetic/defaults/denv3/include.txt, which bypasses filters such as --exclude-where country=? region=? date=? is_lab_host='true' and --min-length.

And test with only the denv3 build:

nextstrain build phylogenetic auspice/dengue_denv3_genome.json

I’m also going to look into adding a --output-log reasons_dropped.tsv to the filter rule described here, as some diagnostics.

Please let me know if the above doesn’t work, we also have Nextstrain office hours this Thursday at 10am Pacific time to help debug, let me know and I can add you

After some internal discussion, we ended up revamping how to spike in sequences into the dengue workflow. Based on your provided config, the key change is in the phylogenetic/defaults/config_dengue.yaml file when defining inputs here which you can modify to something like:

inputs:
  - name: ncbi
    metadata: "https://data.nextstrain.org/files/workflows/dengue/metadata_{serotype}.tsv.zst"
    sequences: "https://data.nextstrain.org/files/workflows/dengue/sequences_{serotype}.fasta.zst"

# move local data within phylogenetic folder if possible
additional_inputs:
  - name: local
    metadata: "try/example_data/sequences_{serotype}.fasta"
    sequences: "try/example_data/metadata_{serotype}.tsv"

And test by running:

nextstrain build phylogenetic auspice/dengue_denv3_genome.json

Please see the “Adding your own data” section in the repository for full details on how to configure this.

1 Like

Thank you for your suggestion, @quietjen. I will try and study in detail by recommended section.