preprocessing (center and scale) only specific variables (numeric variables)

Issue

I have a dataframe that consist of numerical and non-numerical variables. I am trying to fit a logisic regression model predicting my variable "risk" based on all other variables, optimizing AUC using a 6-fold cross validation.
However, I want to center and scale all numerical explanatory variables. My code raises no errors or warning but somehow I fail to figure out how to tell train() through preProcess (or in some other way) to just center and scale my numerical variables.

Here is the code:

test <- train(risk ~ .,
              method = "glm",
              data = df,
              family = binomial(link = "logit"),
              preProcess = c("center", "scale"),
              trControl = trainControl(method = "cv",
                                       number = 6,
                                       classProbs = TRUE,
                                       summaryFunction = prSummary),
              metric = "AUC")

Solution

You could try to preprocess all numerical variables in original df first and then applying train function over scaled df

library(dplyr)
library(caret)

df <- df %>%
        dplyr::mutate_if(is.numeric, scale)

test <- train(risk ~ .,
              method = "glm",
              data = df,
              family = binomial(link = "logit"),
              trControl = trainControl(method = "cv",
                                       number = 6,
                                       classProbs = TRUE,
                                       summaryFunction = prSummary),
              metric = "AUC")

Answered By – AlSub

Answer Checked By – Robin (AngularFixing Admin)

Leave a Reply

Your email address will not be published.