Hi, I have uploaded SARS-CoV-2 FASTA sequences derived from wastewater samples into GISAID. On Nextclade, I have filtered for Host-Environment, and I can see only 5 samples right now even if there are many more uploaded into GISAID and they are not the one we uploaded. Moreover I can’t see our submitting lab from the filter either. Is it possible to see the wastewater samples on nexclade? or at least the sample from our lab? Thank you very much for the amazing work you are doing.
Carlotta Olivero
Hi @Carlotta - due to the huge number of sequences on GISAID (many millions) we heavily subsample the data for our nextstrain datasets with the aim of providing a representative view into the pandemic. This subsampling considers time and location (e.g.) and so I would expect few wastewater samples to be sampled. (Furthermore, there are multiple QC filters which wastewater samples may be more likely to fail?)
You could run a specific build with a sampling scheme focused on wastewater samples – the best starting point would be this tutorial and don’t hesitate to post here if you get stuck.
P.S. If data from your lab is in a nextstrain dataset but the Submitting Lab / Originating Lab isn’t available as a filter then that’s a bug! Could you let us know the dataset + lab name & I’ll chase this up.
Thank you very much for this exaustive explanation. I will try to follow the tutorial!
As per the Submitting Lab / Originating Lab, the datasets I have uploaded on GISAID are for example EPI_ISL_13699833 (hCoV-19/env/Italy/PIE-ARPA-22BL10879/2022) and EPI_ISL_12751712 (hCoV-19/env/Italy/PIE-ARPA-22BL10880/2022). Can you see the submitting lab from there?
Many thanks for the help!