Tuesday, April 2, 2013

The Area Under the ROC Curve

In a previous post I discussed predictive mode diagnostics in the context of the pseudo r-square statistic ( as well as the percentage of correct predictions) and also demonstrated the construction of an ROC curve.  However I did not expand on an interpretation of the ROC curve. Before discussing the ROC curve further,  let’s construct what is referred to as a confusion table that visualizes potential outcomes in a binary prediction scenario:



Some Definitions:

TP = true positive FP = false positive FN = false negative TN = true negative

% of Correct Predictions: For Y ~(0,1) the percentage of total correct predictions. (again see here for more details) or  (TP + TN) / (TP + FP + TN + FN)

Precision: Percentage of correctly predicted 1’s  or TP/(TP + FP))

Recall: Percentage of total observed or true 1’s correctly classified or TP/(TP+FN) also true positive rate
False Positive Rate:  FP /(FP + TN)

F1-Score: The harmonic mean of precision and recall or (2*Precision*Recall)/(Precision + Recall)

True positive rate: TP/(TP +FN) = Recall

Sensitivity: = recall

Specificity: = Percentage of total observed or true 0’s correctly classified or TN/(FP + TN)  or 1- false positive rate

All of the metrics mentioned above are based on classifying predictions based on a cutoff. If the predicted probability exceeds some threshold ‘c’ then we assign that observation a class value = 1. Otherwise the observation gets assigned a value = 0. These metrics are based on a single chosen cutoff. (one could examine multiple cutoffs and find the optimal value for c).

As explained in a previous post, the ROC curve is constructed by examining all possible cutoffs.  The ROC curve visualizes the tradeoffs between the true positive rate and false positive rate or sensitivity vs.  1-specificity. Particularly we are usually interested in the area under the ROC curve (AROC or c-statistic).  The ROC curve is a measure of a model’s discriminatory power.  The area under the ROC curve can be interpreted as the probability that a classifier will correctly rank a randomly chosen  training example with a positive outcome higher than a randomly chosen example with a negative outcome(Cook,2007).

This method is used increasingly in the machine learning community and is preferred over other measures of fit like precision or the F1-Score because it evaluates model performance across all considered cutoff values vs. an arbitrarily chosen cutoff (Bradley, 1997).It also gives a measure of classifier performance that gives low scores to random or one class only classifiers (Bradley,1997).

References: 
Bradley, Andrew P. Pattern Recognition, Volume 30, issue 7 (July, 1997), p. 1145-1159. Elsevier Science

Provost, F. J., Fawcett, T.,&  Kohavi, R. (1998). The Case against Accuracy Estimation for  Comparing Induction Algorithms. Proceedings of the Fifteenth  International Conference on Machine Learning (pp.445-453)(ICML '98), Jude W. Shavlik (Ed.). Morgan Kaufmann Publishers Inc.,San Francisco, CA, USA.

Nancy R. Cook, Use and Misuse of the Receiver Operating Characteristic Curve in Risk Prediction. Circulation. 2007; 115: 928-935

Tom Fawcett.. An introduction to ROC analysis.
Pattern Recognition Letters 27 (2006) 861–874

No comments:

Post a Comment