trainControl:
use for model testing, data resampling/
boot, cv, repeatedcv, etc.
By default, simple bootstrap resampling is used for line 3 in the algorithm above. Others are available, such as repeated K-fold cross-validation, leave-one-out etc. The function trainControl can be used to specifiy the type of resampling:
bootstrap: randomly select data from datasets with replacement
trainControl <- trainControl(method="boot", number=100)
cross-validation: The k-fold cross validation method involves splitting the dataset into k-subsets. Each subset is
held out while the model is trained on all other subsets. This process is completed until accuracy
is determine for each instance in the dataset, and an overall accuracy estimate is provided.
trainControl <- trainControl(method="cv", number=10)
RepeatedCV : cv for multiple repeats
trainControl <- trainControl(method="repeatedcv", number=10, repeats=3)
LOOCV:
trainControl <- trainControl(method="LOOCV")
library(caret)# load the iris datasetdata(iris)# define training controltrainControl <- trainControl(method="boot", number=100)# evalaute the modelfit <- train(Species~., data=iris, trControl=trainControl, method="nb")# display the resultsprint(fit)
train:
set.seed(825)gbmFit1 <- train(Class ~ ., data = training,method = "gbm",trControl = fitControl,## This last option is actually one## for gbm() that passes throughverbose = FALSE)gbmFit1
Tuning:
trainControl(search = “random”)
train(tuneLength = number)
# Random SearchtrainControl <- trainControl(method="repeatedcv", number=10, repeats=3, search="random")set.seed(seed)mtry <- sqrt(ncol(x))rfRandom <- train(Class~., data=dataset, method="rf", metric=metric, tuneLength=15,trControl=trainControl)print(rfRandom)plot(rfRandom)
search by grid
gbmGrid <- expand.grid(interaction.depth = c(1, 5, 9),n.trees = (1:30)*50,shrinkage = 0.1,n.minobsinnode = 20)nrow(gbmGrid)set.seed(825)gbmFit2 <- train(Class ~ ., data = training,method = "gbm",trControl = fitControl,verbose = FALSE,## Now specify the exact models## to evaluate:tuneGrid = gbmGrid)gbmFit2
