I have a few questions about the SARA-CoV-2 phylogenetic tree created at Nextstrain (https://nextstrain.org/ncov/global). 1. I found that only about 5000 SARS-CoV-2 genomes from GISAID were used to create the tree. I wonder how these mutations were selected. 2. I wonder how the tree is rooted. 3. It seems that there are multiple SARS-CoV-2 samples on a single branch, then what is the relationship between these samples? are they identical in genome or protein sequence? 3. I noticed that the X-axis of the phylogenetic tree of SARS-CoV-2 data is date (from Dec, 2019 to Sep, 2020). I wonder how the sampling date was assigned in the tree. Only the SARS-CoV-2 data that match the sampling date (from past to now) were shown?
I am quite new on building and interpreting the phylogenetic trees, and I am really confused about the Nextstrain tree. Hope someone can help me on these questions. Also, I wonder whether there are some papers/references about the algorithm that Nextstrain used to create the phylogenetic tree with date (in the X-axis). Thanks!