BiGYaN's Random Thoughts: July 2012

Sunday, July 22, 2012

ROC Curve

My recent work related to Text Classification got me introduced to ROC Curve which is a very effective way to compare classifiers against each other and decide on the cutoff value for classes.

Its best defined with a simple example. Say you have a binary classification problem. You have generated 3 train-test datasets from the original data. Say you are using Support Vector Machine (SVM) as the classifier. After training SVM on all the three sets, you want to select the "best" of these three. How do you go about it?

Here's where ROC comes into rescue. Sort the output of the SVM and compute True Positive %tage and False Positive %tage so far for each data point and plot this in x-y graph. The area under the curve will give you a way to measure the effectiveness of each classifier. Moreover you can use the nature of the graph to establish your positive and negative class boundary.

Here are a few links to know more about ROC curve:

Introduction to ROC, "A Framework for Evaluating Predictive Capability of Classifiers Using ROC Approach" by Artur Dubrawski
Tutorial on "The Many Faces of ROC Analysis in Machine Learning" by Peter A. Flach delivered in ICML 2004. Slides: part 1, part 2, part 3
Wikipedia