How to look at data from each country?

yoojinepark · October 27, 2020, 8:53pm

I am trying to see which data are from Canada specifically. Is there a way to visualize how many data points submitted from Canada along with detailed info? (eg. # of data, which city/region, who submitted, which strains etc.) I am new to Nextstrain - any help or suggestions would be much appreciated!

Thanks!

eharkins · November 24, 2020, 7:01pm

Hi @yoojinepark! Good question; some recent interface improvements by @james hopefully make this slightly more intuitive - here is a tweet we did explaining how to use the new interface to do things like filter by country and other metadata: https://twitter.com/nextstrain/status/1329535390273957894. This kind of filtering is tracked in the url like so: https://nextstrain.org/ncov/north-america?f_country=Canada. It will tell you how many are included within each filtering category, and hovering over sequences in the tree will show who submitted them. Hope this helps - Eli

ECG · February 19, 2021, 8:42pm

HI @eharkins & Nextstrain team!

First: Thanks for making this website! Congratulations for your work!

I’m interested in comparing the latest lineage frequencies between different countries worldwide (I dont have specific interested in the tree). Although I have read the tweet, I have several questions on how to achieve that and how to interpret the visualized data.

Depending on how I access the data for a specific country I get different lineage frequencies. For example, if I want to know the latest frequencies of B1.1.7 in Germany, If I filter for Europe> Germany I get 21% frequency while if I filter for Germany directly (dropping down the countries inside Europe and selecting Germany directly) I get 12% frequency. I guess I’m missing something here. What’s the difference between the two ways of filtering? Which is the best way to filter if I want to compare frequencies among countries all over the world?
Which is the denominator of the % on frequencies? In covariants.org this info is provided together with the frequencies, but I can’t find it in Nextstrains.
The latest frequency provided is calculated per week as in covariants.org or how is that calculated?

Thank you very much for your time and help!
All the best-
Elvira

emmahodcroft · February 23, 2021, 9:52am

Hi @ECG, Thanks for writing! I’m afraid I’m not 100% clear on what you mean by filtering for German sequences in two different ways. Certainly if you are filtering German sequences on different builds, then these frequencies may change due to subsampling. For example, if you access the Germany build maintained by Neher Lab and filter to Germany, it shows about 27% currently.

If you use the Nextstrain Europe build and filter to Germany, it’s also about 27%.

On the global build this significantly lower at about 12% - but the global builds are now extremely subsampled (~4,000 sequences out of >500,000 available), so I’d urge caution when using it to look at country levels - it’s better used just for larger-scale, longer-time-period interpretation.

The denominator of the % for frequencies is the total number of sequences that are visible in the tree given the currently applied filters. If no filters are on, that means it’ll use all the sequences in the tree from the appropriate time-slice. Note that this is different from Covariants.org as CoVariants uses raw sequence data (not a tree) that’s not significantly subsampled, whereas Nextstrain frequencies here reflect the sequences in the tree & thus any subsampling therein.

Frequencies are shown per-week on Nextstrain.org. Note that for CoVariants.org they’re shown per week for the Per Variant plots, but per 2 weeks for the Per Country plots (to smooth jitter).

I hope all this helps!

ECG · February 23, 2021, 11:43am

Hi @emmahodcroft

Indeed, this helps a lot! Sorry if that was not very clear, but you got it right! Thank you very much for your time!
So, if I want to assess the country-level variant distribution, I should use the continental or country build rather than the global build since there is less effect of subsampling.

Topic		Replies	Views
Frequency map problem	2	62	December 20, 2024
How to download variant data? General	1	418	April 27, 2021
Inconclusive data by country Site Feedback	9	971	August 29, 2021
Guide to filtering GISAID data for division-specific SARS-CoV-2 builds Help and Getting Started	1	1635	April 17, 2024
Difference between sequence samples based on Dataset Help and Getting Started	1	49	September 16, 2024

How to look at data from each country?

Related topics