I noticed some discrepancies in the country labels of Zika samples. For example, in Figure 1, two samples uploaded by China were incorrectly labeled as Venezuela(Z16019 and GZ02/2016). In Figure 2, multiple samples from China were marked as American Samoa, and some entries appear to be duplicates. I’m not sure why these errors occurred.
In Figure 2, multiple samples from China were marked as American Samoa, and some entries seem to be duplicates.
Hi @pangmingfan,
Thanks for reporting these issues.
Regarding the location discrepancies, Nextstrain annotates location by where the infection happened instead of who uploaded the samples, which allows the visualization to show the transmission patterns. For example, GZ01/2016 is reported in Morphologic and Molecular Characterization of a Strain of Zika Virus Imported into Guangdong, China to be a case returning from Venezuela. I have not tracked down the papers for other sequences, but they should all be returning travel cases as well.
As for the duplicates, we can look into if they are true duplicate samples and exclude one of the them from the build.
Best,
Jover
Hi Jover,
Thanks for your detailed explanation!
According to the rules you mentioned, cases imported into China from abroad (e.g., Venezuela) should have their location marked as the country of origin (i.e., Venezuela). In reality, based on statistics from China and the WHO ( A Review of the Recent Epidemiology of Zika Virus Infection - PubMed ), there has been no local transmission of Zika in China to date, and all reported cases have been imported.
However, in the current tree, there are over a dozen cases labeled as originating from China. If we follow the principle you described, these cases should not be designated as “China”. I noticed this issue because I observed a path on the map indicating transmission from China to Venezuela, which, in fact, does not exist. The correct direction should be importation from Venezuela to China, without causing local transmission within China.
I’m not sure whether the above issue needs to be corrected? Do similar issues also occur with other countries?
Thanks,
Mingfan
I’m not sure whether the above issue needs to be corrected? Do similar issues also occur with other countries?
Thank you for noting the other samples labeled with country as “China”, we will be tracking these down in ingest: Annotate geolocation for samples with country "China" · Issue #99 · nextstrain/zika · GitHub .
This can definitely occur with other countries since we use the geolocation from the NCBI records by default. We need to manually annotate locations based on linked papers or news reports.
Thank you for the efforts to double check related samples.

