Feedback on some default exclusions for nCoV workflow

Hi all,

I’ve recently begun testing out the nCoV workflow for my work purposes. Thanks for the great tools and workflow - very useful.

My focus is on researching SARS-CoV-2 from an Australian perspective. I followed the tutorial to define a new Australia-focused build. The analysis ran smoothly, and it was great to visualise the results using Auspice. However, there were less Australian genomes in the final tree than I expected.

Evidently I need to modify my build (subsampling schemes etc.) to get the output I want. In any case, I also took a look at defaults/exclude.txt to see if any Australian genomes were being excluded by default, and why. I last “pulled” the github repo on Friday 18th September, and from that date there are 46 Australian genomes excluded. However, if I’m interpreting the reasons why they’re excluded, I don’t think they are worth excluding by default.

For example, 42 genomes are excluded for having “future collection dates”. I checked through these samples, and their genomes aren’t in the future. An example is “Australia/VIC1948/2020”, which has a collection date of 2020-06-09 (9th June 2020). Four genomes are also excluded for “collection dates from Jan 6 with divergence resembling a much more recent virus”. However, the collection dates for the samples are actually for 2020-06-01 (1st June 2020). Based on this, I think the exclusions were because of YYYY-DD-MM interpretation of collection dates, rather than the correct YYYY-MM-DD interpretation. If I’m wrong here, please do correct me.

I’ll remove these from the default exclusions on my local copy of defaults/exclude.txt. I just thought this feedback might be useful for potentially updating the exclude.txt file on the github repo :slight_smile:

Cheers,
Charles

Thanks Charles – my guess is that the GISAID metadata has been updated since we flagged them as “future collection dates”, rather than a YYYY-DD-MM interpretation.

We should now remove them from the exclude list – i’ll make a GitHub issue to track this.

Thanks!
james

Update: see https://github.com/nextstrain/ncov/issues/492

1 Like