Since there are so many sequences in GISAID, I’d like to know if there is an effective and reliable way to filter those informative ones. It seems that Nextstrain is built using a subsampling scheme. Is random sampling enough? And how can we be sure that there is not any information we leave out?
Thank you very much