What happens when 2 genomes have the same name but different sequences?

My understanding is that Nextstrain will remove duplicates from the input datasets by looking at both sequence names and the actual sequences. What happens if 2 sequences have the same name but different sequences? Will the run fail or just throw a warning?

A related question is that Nextstrain will strip the “hCoV-19/” from sequence names. Does that happen for all input data, and before de-duplication?

We use the latest master branch.

Thank you for such a useful tool and great support to the users!

Answering myself: if 2 sequences have both identical name and sequence, the duplicate will be removed quietly; if 2 sequences have identical name but different sequences, only the first one will be keep, and if user set error_on_duplicates=True, the names will be written into a record.
Reference ncov code

2 Likes