Customizing Colors in Nextstrain SARS-CoV-2 Temporal Phylogenetic Tree for Enhanced Analysis

Good day Nextstrain Community,

I hope this message finds you well. I’m currently working on a SARS-CoV-2 genomic surveillance project for the Philippines, utilizing Nextstrain to build temporal phylogenetic trees from curated GISAID data. I’ve successfully set up a build following the Nextstrain team’s tutorial (Setup and installation — SARS-CoV-2 Workflow documentation) and generated a working tree.

Project Details:

  • Data Source: Curated SARS-CoV-2 genome sequences from GISAID, focused on Philippine samples.
  • Goal: To perform temporal phylogenetic analysis to understand the spread and evolution of SARS-CoV-2 variants within the Philippines.
  • Current Status: A Nextstrain build is functioning, and a phylogenetic tree is generated.

Specific Question: Customizing Color Schemes for Enhanced Visualization and Analysis

My primary concern revolves around the visualization of the generated phylogenetic tree. Specifically, I’m struggling to effectively differentiate samples and clades using color coding.

  1. Current Issue:
  • The default color scheme results in many samples/clades being assigned similar or identical colors, making it difficult to distinguish patterns and track specific lineages visually.
  • I am having difficulty in identifying what parameter in the build configuration is responsible for the color assignments.
  1. Desired Outcome:
  • I need guidance on how to configure the colors parameter within the Nextstrain build to achieve a more informative and visually distinct color scheme.
  • Specifically, I would like to:
    • Color samples based on specific metadata (e.g., collection date, region, variant designation).
    • Ensure distinct colors for major clades and lineages of interest.
    • Understand how to generate a color scale that is appropriate for temporal data.
  1. Specific Requests:
  • Could you provide examples of colors configurations within the auspice_config.json or builds.yaml that demonstrate how to:
    • Assign colors based on metadata fields.
    • Manually specify colors for particular clades or samples.
    • Create temporal color gradients.
  • Are there recommended color palettes or strategies for visualizing temporal phylogenetic data in Nextstrain?
  • What parameters are responsible for the automatic color generation, and how to overide them?
  • Are there any debugging steps I can take to better understand how Nextstrain assigns colors?
  • Are there any nextstrain commands that can be used to view the current color assignments of the tree?

I am working on nextstrain.cli 8.5.4 and am working on Ubuntu 22.04 jammy(on the Windows Subsystem for Linux).

I appreciate any insights, code snippets, or best practices you can share to help me improve the clarity and analytical value of my Nextstrain visualizations.

Thank you for your time and assistance.

Sincerely,
John_Kim

Hi @John_Kim,

Great to see your were able to successfully create a dataset. You can specify colors for specific values using a custom colors TSV file. That documentation page has information on how to format and use the file. You would want a file like below:

division	China	<hex color code>
division	Bangsamoro Autonom…	<hex color code>
division	Soccsksargen	<hex color code>
…

Note that the columns must be separated by a tab character and not spaces, which is a common mistake.

Let me know if that doesn’t work, I’d be happy to help you debug further.

– Victor

1 Like

Good day @victorlin ,

I hope you are doing well. I have now generated a [custom colors TSV file]. I apologize if I have missed a step, but could you please indicate which dependency or file within the SARS-CoV-2 workflow requires the path to this TSV file?

Regards,
John_Kim

Hi @John_Kim,

Nice, that colors TSV file looks like the right format. The documentation page I referenced earlier shows how to use the file. Here is the URL directly: Customizing visualization — SARS-CoV-2 Workflow documentation

So you should add something like this to your YAML config:

files:
  colors: "path/to/colors_updated.tsv"

– Victor

Good Day @victorlin ,

Thank you very much for your response. I would like to confirm, sir, if this is the config file you’re referring to?

Regards,
John_Kim

Hi @John_Kim,

Not quite. If you followed the tutorial on our documentation, it would be a file like ncov-tutorial/custom-data.yaml. In our documentation, we call this the workflow config file. The defaults/parameters.yaml file in your screenshot above provides the default values and should be left as-is.

– Victor