Monday 29 February 2016

Calculation of ROC using R script

Use the following script to calculate ROC and making plot of true positive rate vs false positive rate using your input training data for random forest

data <- read.csv (file = "INPUT training file", sep = ", or \t or  ")
pred = data[,1:20] # Number of columns or features used to describe each sequence
fac = data$Factor # Factor tag for each sequence this can be either positive or negative

library (randomForest)
library (ROCR)
library (pROC)

rf <- randomForest(pred, fac, mtry = 4, ntree = 500, do.trace = 100, na.action = na.fail, importance = TRUE, cv.fold = 10) # use mtry and ntree optimized parameters

OOB.votes <- rf$votes

print ("Area under the curve)
auc(data$Factor, predictions$Pos) #Here Pos is the tag for Positive data

OOB.pred <- OOB.votes[,2]

pred.obj <- prediction (OOB.pred,fac)

RP.perf <- performance(pred.obj, "rec","prec")
plot (RP.perf)

ROC.perf <- performance(pred.obj, "tpr","fpr")
plot (ROC.perf)


For more details please visit
https://cran.r-project.org/web/packages/randomForest/randomForest.pdf
https://cran.r-project.org/web/packages/ROCR/index.html
https://cran.r-project.org/web/packages/pROC/index.html