[Non-Nextstrain survey] What are the pain points in getting data ready for phylogenomic analysis?

Hi all!

I’m new to this forum, so please accept my apologies if this post is off topic.

I’m a developer of the open source software tool cogent3 for genomic data wrangling and molecular evolutionary analyses. (cogent3 is a direct descendant of PyCogent, of which I was co-lead developer with Rob Knight.)

I’m running a survey to evaluate what the major computational challenges being faced by our community are in terms of getting data ready for phylogenomic analyses.

If this is of interest to you, you can fill out the survey at https://forms.gle/VSt8TKdWtzUfe5A99

It will take <2 minutes.

Please forward to any colleagues who you think might be interested!

thank you!


1 Like

Filling it out just now. Would it be possible to share your results? If not in raw then maybe in aggregate form? We all love data :wink:

Some feedback:

  1. Do you really mean “species” here or in fact “samples/sequences”? If I analyze 2000 SARS-CoV-2 virus genomes, those are of one species. What should I tick?

  2. What if I analyze protein-coding RNA, as in SARS-CoV-2?


1 Like

Great comments, updated the form for both questions!

Happy to share the results of course! Probably be a few weeks, but feel free to pester if I haven’t delivered them by then.

1 Like