I have been looking at, and using NextStrain trees, primarily to follow the evolution of the variants of interest in SAR-CoV-2 since 2020, for a few years. I am new to creating my own NexStrain trees, and setting up a “groups” page for my group (HIV Databases at LANL).
I know that for many data sets, the sequences are obtained from the GISAID, and there are restrictions on redistributing those sequences in public, such as by sharing the multiple sequence alignment that was used to generate the tree. However, for many other data sets and trees I am looking at on the NextStrain site, the data all came from GenBank, and it should be possible to share the alignment. The issue I am interested in, is that when I view the tree and the entropy/events plot below it, and find a site or small region of interest, it is very difficult to locate exactly where that site or region is in the genomes. In the example screenshot I am uploading here, I can see that nucleotide 1530 has high entropy and number of events, it is just into the E2 region of the polyprotein gene. But Locating that column in my multiple sequence alignment of HCV genomes is not easy, as my alignment is not exactly the same as that used for this tree.