QDK is an index of agreement between a set of predictions and a set of multiclass labels. Beyond simply looking at accuracy of the match between predictions and labels, it tries to account for the similarity between the classes, beyond exclusively the class. It’s very useful in a clinical AI context.
It is a generalization of Cohen's kappa, which is
In addition to taking into the label distance, the QDK also takes into account the possibility of inter-rater agreement occurring by chance, or random agreement.
The agreement by chance is calculated by taking the outer product of the histograms of the actual class labels and predicted class labels.
Here's a simple example. When ophthalmologists grade diabetic retinopathy, there is clearly a difference in agreement between an image being graded a 3 and 4 instead of a 0 and 5. The former case would have a higher QDK, while the latter would have a lower score.
The QDK is calculated by first setting up a confusion matrix.
Links
- https://www.youtube.com/watch?v=fOR_8gkU3UE
- http://kagglesolutions.com/r/evaluation-metrics--quadratic-weighted-kappa