Samples identified as belonging to a parent clade

deweyh · March 4, 2021, 9:59pm

Hello all,

I have a sample that according to NextClade should be in 20H/501Y.V2 that when I produce the tree for all my samples is placed into 20C. I have manually verified the clade assignment based on the BAM file for the sample, and when I color the data by nucleotide and put in the two locations that separate 20H/501Y.V2 from 20C, both are present.

I have verified that the clades.tsv file that I’m using is up to date, and I’m struggling to think what else I could be missing that’s causing this error. Samples in 20I/501Y.V1 are all correctly identified, so I know that it’s at least able to correctly identify when there is an S:N501Y mutation.

Any suggestion as to where I’m going wrong with this is greatly appreciated!

Thanks in advance,
Hannah

Edit: I have solved the problem in what feels to be a somewhat hacky way, by adding another sample that is in 20H/501Y.V2. This seems to encourage the algorithm to recognize that our sample is indeed a variant.

However, I feel that there likely is another solution that would allow me to have only samples from my area, but still get the correct clades. If anyone has an idea as to where I’m going wrong I would appreciate knowing.

Thanks,
Hannah

rneher · March 6, 2021, 10:22am

Hello Hannah,

the clade assignment in the ncov pipeline works by identifying signature mutations and then labels the largest clade with these signature mutations. This can sometimes go wrong when the tree doesn’t have sufficient background diversity. This might have been the case in your example.

best,
richard

trvrb · March 6, 2021, 11:50pm

I agree with Richard here. My best guess at what happened here is that the single 20H/501Y.V2 sample was N for one of the signature mutations here: ncov/clades.tsv at master · nextstrain/ncov · GitHub, but adding the 2nd sample made it clear this clade had all these signature mutations.

I think that clades are allowed to have just a single representative (as intended), but I’d have to confirm this to be sure.

Topic		Replies	Views
Trouble identifying mutations in clade definitions (20J/501Y.V3) General	2	508	February 28, 2021
Understanding RSV clade assignment Help and Getting Started	4	364	January 12, 2024
RSV-A lineage reference sequences have discordant clade assignments Help and Getting Started	3	51	October 27, 2024
Differences in clade classification (results folder vs JSON) General	3	538	May 25, 2023
Resource for representative nucleotide changes for Nextstrain clades General	1	41	August 2, 2024

Samples identified as belonging to a parent clade

Related topics