ConfusionMatrix
confusionMatrix(actual, predicted, cutoff = 0.5)
Accuracy
Accuracy is the percentage of correctly classied instances
out of all instances.
Kappas
Kappa or Cohen’s Kappa is like classication accuracy, except that it is normalized at the
baseline of random chance on your dataset. It is a more useful measure to use on problems
that have an imbalance in the classes (e.g. a 70% to 30% split for classes 0 and 1 and you can
achieve 70% accuracy by predicting all instances are for class 0).
Sensitivity
Sensitivity is the true positive rate also called the recall. It is the number of instances
from the positive (rst) class that actually predicted correctly.
Specitivity
Specicity is also called the true negative rate. Is the number of instances from the
negative class (second class) that were actually predicted correctly.
RMSE
RMSE or Root Mean Squared Error is the average deviation of the predictions from the
observations.
R square
R2 spoken as R Squared or also called the coecient of determination provides a goodness-
of-t measure for the predictions to the observations.
ROC
ROC metrics are only suitable for binary classication problems (e.g. two classes).
To calculate ROC information, you must change the summaryFunction in your trainControl to be
twoClassSummary. This will calculate the Area Under ROC Curve (AUROC) also called just
Area Under curve (AUC), sensitivity and specicity.
# load packageslibrary(caret)library(mlbench)# load the datasetdata(PimaIndiansDiabetes)# prepare resampling methodtrainControl <- trainControl(method="cv", number=5, classProbs=TRUE,summaryFunction=twoClassSummary)set.seed(7)fit <- train(diabetes~., data=PimaIndiansDiabetes, method="glm", metric="ROC",trControl=trainControl)# display resultsprint(fit)
Logloss
Logarithmic Loss (or LogLoss) is used to evaluate binary classication but it is more common
for multi-class classication algorithms. Specically, it evaluates the probabilities estimated by
the algorithms.
# prepare resampling methodtrainControl <- trainControl(method="cv", number=5, classProbs=TRUE,summaryFunction=mnLogLoss)set.seed(7)fit <- train(Species~., data=iris, method="rpart", metric="logLoss", trControl=trainControl)# display resultsprint(fit)
