I have a data frame called
b. I split this into a training set and test set.
smp_size <- floor(0.75 * nrow(b)) set.seed(123) train_ind <- sample(seq_len(nrow(b)), size = smp_size) b_train <- b[train_ind, ] b_test <- b[-train_ind, ]
b contains a variable/column, let’s say
x, that I use as
factor() with many different categories.
b_train to get a linear model with the function
lm(). After that I use the function
predict() with the
lm() object and
b_train$x does not include all different types of characters in
b$x. Therefore, the function
predict() can not be used, since
b_test$x contains categories that are not in
How to make sure that all types of categories are included in
This can be easily done using caret package’s createDataPartition() function.
library(caret) samp = createDataPartition(as.factor(b$x), p = 0.75, list = F) train = b[samp,] test = b[-samp,]
Answered By – Not_Dave
Answer Checked By – Gilberto Lyons (AngularFixing Admin)