I have been reviewing the results of my executions and I notice something curious, in the results folder the classification of clades prior to the variants (20A, 20B, 20C …) does not correspond to the classification of clades seen at the time of uploading the JSON file
code #result #JSON
20A 20 80
20B 40 10
20C 40 10
I imagine that it has something to do with the tree but I don’t know the exact explanation, could someone explain to me why this happens?
Hi Juan, I’m not quite sure I understand what you’re comparing. Can you give a few more details about which files (ideally with exact path) of which workflow (github repo) you’re looking at and with which data you’re running the workflow?
Hello, thanks for answering all my posts.
These are the files with which I compare the classification of clades
In addition to that, I do iterations with the same dataset and notice that the clade classification is different for each iteration, but the file metadata_with_nextclade_qc.tsv remains the same for both runs.
Note that in the first there are no 20B clades but for the second it is found and in abundant quantity
As I mentioned this does not happen with clades that are classified as variants, I imagine because the point mutations are better established, but in the case of the clades prior to the variants, I don’t know what happens
Thank you very much for your time Cornelius
Ok I think I get what you’re comparing now. My hypothesis is:
- Metadata clades come from a run of Nextclade, using Nextclade datasets. So everytime you run your data through it, you’ll get the same clade results.
- The “Clade” coloring in the Auspice output/tree is constructed in your workflow based on clades.tsv and
augur clades. Augur clades output depends purely on the tree that you constructed using your data alone. That’s why things may be changing. If you have scarce data for a clade, augur clades might miss some clades due to branches not being stable.
I hope that makes sense.