What is Quadratic Weighted Kappa?

QDK is an index of agreement between a set of predictions and a set of multiclass labels. Beyond simply looking at accuracy of the match between predictions and labels, it tries to account for the similarity between the classes, beyond exclusively the class. It’s very useful in a clinical AI context.

It is a generalization of Cohen's kappa, which is $\kappa \equiv \frac{p_{o}-p_{e}}{1-p_{e}}$

In addition to taking into the label distance, the QDK also takes into account the possibility of inter-rater agreement occurring by chance, or random agreement.

The agreement by chance is calculated by taking the outer product of the histograms of the actual class labels and predicted class labels.

Here's a simple example. When ophthalmologists grade diabetic retinopathy, there is clearly a difference in agreement between an image being graded a 3 and 4 instead of a 0 and 5. The former case would have a higher QDK, while the latter would have a lower score.

The QDK is calculated by first setting up a confusion matrix.

def qdk(actual, preds, n_classes):
    '''
    Calculates the QDK metric for assessing agreements between a set of labels and predictions.
    :param preds: a list of the predicted class scores
    :param actual: a list of the actual class scores
    :return: qdk_score: a float with the value of the actual metrics
    '''

    # make confusion matrix
    from sklearn.metrics import confusion_matrix

    conf = confusion_matrix(actual, preds, labels= np.linspace(0,n_classes-1, n_classes)) # considered the occurences
    # TODO: add labels

    # make weights - calculated based on difference between actual and predicted score
    weights = np.zeros((n_classes, n_classes))
    for i in range(n_classes):
        for j in range(n_classes):
            weights[i][j] = ((i-j)**2)/((n_classes-1)**2)

    # make expected matrix
    actual_hist = np.histogram(actual, bins=n_classes, range=(0, n_classes-1))
    pred_hist = np.histogram(preds, bins=n_classes, range=(0, n_classes - 1))

    expected = np.outer(actual_hist[0], pred_hist[0])

    # calculate QDK score
    expected = expected / expected.sum() # normalize matrices
    conf = conf / conf.sum()
    qdk_score = 1 - (np.sum(np.multiply(weights, conf)) / np.sum(np.multiply(weights, expected)))

    return qdk_score

What is Quadratic Weighted Kappa?

Links