What are the default train and test set sizes for the forecast() function in R?

Issue

I have used the TBATS model on my data and when I apply the forecast() function, it automatically forecasts two years in the future. I haven’t specified any training set or testing set, so how do I know how much data it used to predict the next two years?

The data I’m dealing with is Uber travel times data from Jan 2016 to Jan 2020. I have daily data (sampling frequency = 1) for 18 cities and each city has a different sample size (they range from 1422 days to 1459 days).

This is what the Amsterdam travel times data looks like

I have set the vector of travel times as an msts object, for it has multiple seasonality, which is used by the TBATS model.

When I calculate RMSE, MAE, MAPE and MSE, I get very low values in general, so how can I know which data TBATS is training on?

Here is my code:

data <- read.csv('C:/users/Datasets/Final Datasets/final_a.csv', TRUE, ",")
y <- msts(data$MeanTravelTimeSeconds, start=c(2016,1), seasonal.periods=c(7.009615384615385, 30.5, 91.3, 365.25))

fit <- tbats(y)
plot(fit)
fc <- forecast(fit)
autoplot(fc, ylab = "Travel Time in Seconds")

# Check residuals (ACF and histogram)
checkresiduals(fc)

# RMSE
rmse <- sqrt(fit$variance)

# MAE
res <- residuals(fit)
mae <- mean(abs(res))

# MAPE
pt <- (res)/y
mape <- mean(abs(pt))

# MSE (Mean Squared Error)
mse <- mean(res^2)

This is what the forecast looks like

The performance results for the TBATS model for Amsterdam are:

RMSE: 0.06056063
MAE: 0.04592825
MAPE: 6.474616e-05
MSE: 0.00366759

If I had to manually select the test and train sets, how should I modify my code in order to do so?

Solution

If you use forecast(fit), as you did, what you get is the fitted vales from the training data.

If you want to use a test set as well see below for an example. You use the fitted model to forecast to a horizon h and compare with known data set.

library(forecast)

# Training Data
n_train <- round(length(USAccDeaths) * 0.8)
train <- head(USAccDeaths, n_train)

# Test Data
n_test <- length(USAccDeaths) - n_train
test <- tail(USAccDeaths, n_test)

# Model Fit
fit <- tbats(train)

# Forecast for the same horizion as the test data
fc <- forecast(fit, n_test)

# Point Forecasts 
fc$mean
#            Jan       Feb       Mar       Apr       May       Jun       Jul
# 1977                      7767.513  7943.791  8777.425  9358.863 10034.996
# 1978  7711.478  7004.621  7767.513  7943.791  8777.425  9358.863 10034.996
#            Aug       Sep       Oct       Nov       Dec
# 1977  9517.860  8370.509  8706.441  8190.262  8320.606
# 1978  9517.860  8370.509  8706.441  8190.262  8320.606

test # for comparison with the point forecasts
#        Jan   Feb   Mar   Apr   May   Jun   Jul   Aug   Sep   Oct   Nov   Dec
# 1977              7726  8106  8890  9299 10625  9302  8314  8850  8265  8796
# 1978  7836  6892  7791  8192  9115  9434 10484  9827  9110  9070  8633  9240

It would be interesting to see how plots like the following would behave as well.

autoplot(USAccDeaths) + autolayer(fc) + autolayer(fitted(fit))
#autoplot(USAccDeaths) +  autolayer(fitted(fit))

enter image description here

Answered By – Suren

Answer Checked By – Marilyn (AngularFixing Volunteer)

Leave a Reply

Your email address will not be published.