Customize Nextstrain

Dear colleague,

My name is Anita Howe. I am a senior scientist at the British Columbia Centre for Disease Control in Canada. I am also the HCV SHARED coordinator, an international collaboration to study HCV drug resistance and transmission.

Thank you for setting up the Nextstrain program. I have downloaded the Zika Tutorial and planned to use it to generate an “HCV Nextstrain”. So far, the display was terrific on my demo set. I have a few questions, and I hope you can help me to understand.

  1. How were the travel lines constructed? Were they primarily based on the sample collection dates, or using both the collection date and the inferred phylogeny?

  2. In the Genotype and the Diversity Pane, does the “Event” mean the number of cases with mutations at each amino acid in the dataset? I suppose the “Entropy” refers to the genetic diversity at each amino acid position.

  3. Because of privacy agreements, we would prefer not to show the individual data; the box appears after clicking the dots in the Phylogeny Pane. Is there possible to turn off the display of some information inside the box, e.g., retain the Gender in the “Color by” on the left panel, but do not show “male” or “female” inside the box?

  4. Lastly, is the phylogenetic tree primarily based on the sequences, or does it also take the sample collection dates into account? Some of our samples do not have detailed dd-mm-yyyy, just the yyyy, and I wonder if I should include these samples in our dataset and how they might affect the phylogeny tree?

Sorry for the long list of questions. I am looking forward to hearing from you, and hopefully, we will have the HCV in your website someday.

Thanks

Anita

Hi @Anita – thanks for reaching out! Looking forward to seeing more HCV data in Nextstrain - you might also be interested in comparing results / analysis with https://hcv.amsterdamumc.nl. In terms of your questions:

  1. They use inferred date & location – see https://nextstrain.org/docs/visualisation/map-interpretation#transmissions-lines for more
  2. Event means the number of switches (changes) at that position across the tree. More info here: https://nextstrain.org/docs/visualisation/download-data#diversity-entropy-data
  3. Not currently. There are a few alternative options which may be useful, such as a private nextstrain group which requires logins to access the data or separating out the private data into a separate CSV file which you can distribute to certain users who can drag it onto the nextstrain window to add the data into the analysis on-the-fly.
  4. It’s hard to answer without seeing the analysis steps you are running, but if you are using augur refine --timetree then dates will be inferred and this will affect the tree, but mostly the branch lengths not the topology (this page has more info).

Thanks,
james

Dear James,

Thank you for your prompt response. I will definitely go to the links for more information.

Best regards
Anita

Dear James,

Thanks again for the links and information. I read through the link on travel lines several times but I still could not quite understand our data. I must first apologize that I am not bioinformatically trained so I must have missed many nuances. In the tutorial, it said, “Each individual line represents a parent-child branch in the phylogenetic tree where the value of the current geographic resolution differs.” I interpret it as inferred ancestral parent sequence is phylogenetically related to the actual child sequence, but the locations are different, thus there is a travel line connecting the two. Is this correct?

I have taken some snapshots of our data and I hope you can help me to understand.

Picture 1. The travel line 1 runs from Hamburg to Stuttgart. Can you please tell me where the parent is in this case? How do DE2bd85b (orange dot) and DE8173e7 (turquoise dot) relate to each other from the travel line perspective?

Picture 2a and 2b. Pictures 2a and 2b are the same views except that 2a is colored by “sampling date” and 2b is colored by “GT” (genotype in HCV). In the tutorial, it said “If the selected color-by is not present at the origin of an individual transmission, then the line will be colored grey.” I am confused if the sequences between Germany and Australia are or are not related? Can you please elaborate? Second,

I will continue to read through the

(Attachment Questions.pptx is missing)

Dear James, my PowerPoint was rejected. I saved the file into jpg. Please see attached.

Thanks
Anita