balanced data-partition
library(caret)set.seed(3456)trainIndex <- createDataPartition(iris$Species, p = .8,list = FALSE,times = 1)head(trainIndex)
p :the percentage of data that goes to training
times: the number of partitions to create
Similarly, createResample can be used to make simple bootstrap samples and createFolds can be used to generate balanced cross–validation groupings from a set of data.
createFolds(y, k = 10, list = TRUE, returnTrain = FALSE)
createMultiFolds(y, k = 10, times = 5)
createTimeSlices(y, initialWindow, horizon = 1, fixedWindow = TRUE,
skip = 0)
groupKFold(group, k = length(unique(group)))
createResample(y, times = 10, list = TRUE)
Data Splitting for Time Series
Simple random sampling of time series is probably not the best way to resample times series data. Hyndman and Athanasopoulos (2013) discuss rolling forecasting origin techniques that move the training and test sets in time. caret contains a function called createTimeSlices that can create the indices for this type of splitting.
The three parameters for this type of splitting are:
initialWindow: the initial number of consecutive values in each training set samplehorizon: The number of consecutive values in test set samplefixedWindow: A logical: ifFALSE, the training set always start at the first sample and the training set size will vary over data splits.
