I have used the TBATS model on my data and when I apply the forecast() function, it automatically forecasts two years in the future. I haven’t specified any training set or testing set, so how do I know how much data it used to predict the next two years?
The data I’m dealing with is Uber travel times data from Jan 2016 to Jan 2020. I have daily data (sampling frequency = 1) for 18 cities and each city has a different sample size (they range from 1422 days to 1459 days).
I have set the vector of travel times as an
msts object, for it has multiple seasonality, which is used by the TBATS model.
When I calculate RMSE, MAE, MAPE and MSE, I get very low values in general, so how can I know which data TBATS is training on?
Here is my code:
data <- read.csv('C:/users/Datasets/Final Datasets/final_a.csv', TRUE, ",") y <- msts(data$MeanTravelTimeSeconds, start=c(2016,1), seasonal.periods=c(7.009615384615385, 30.5, 91.3, 365.25)) fit <- tbats(y) plot(fit) fc <- forecast(fit) autoplot(fc, ylab = "Travel Time in Seconds") # Check residuals (ACF and histogram) checkresiduals(fc) # RMSE rmse <- sqrt(fit$variance) # MAE res <- residuals(fit) mae <- mean(abs(res)) # MAPE pt <- (res)/y mape <- mean(abs(pt)) # MSE (Mean Squared Error) mse <- mean(res^2)
The performance results for the TBATS model for Amsterdam are:
RMSE: 0.06056063 MAE: 0.04592825 MAPE: 6.474616e-05 MSE: 0.00366759
If I had to manually select the test and train sets, how should I modify my code in order to do so?
If you use
forecast(fit), as you did, what you get is the fitted vales from the training data.
If you want to use a test set as well see below for an example. You use the fitted model to forecast to a horizon h and compare with known data set.
library(forecast) # Training Data n_train <- round(length(USAccDeaths) * 0.8) train <- head(USAccDeaths, n_train) # Test Data n_test <- length(USAccDeaths) - n_train test <- tail(USAccDeaths, n_test) # Model Fit fit <- tbats(train) # Forecast for the same horizion as the test data fc <- forecast(fit, n_test) # Point Forecasts fc$mean # Jan Feb Mar Apr May Jun Jul # 1977 7767.513 7943.791 8777.425 9358.863 10034.996 # 1978 7711.478 7004.621 7767.513 7943.791 8777.425 9358.863 10034.996 # Aug Sep Oct Nov Dec # 1977 9517.860 8370.509 8706.441 8190.262 8320.606 # 1978 9517.860 8370.509 8706.441 8190.262 8320.606 test # for comparison with the point forecasts # Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec # 1977 7726 8106 8890 9299 10625 9302 8314 8850 8265 8796 # 1978 7836 6892 7791 8192 9115 9434 10484 9827 9110 9070 8633 9240
It would be interesting to see how plots like the following would behave as well.
autoplot(USAccDeaths) + autolayer(fc) + autolayer(fitted(fit)) #autoplot(USAccDeaths) + autolayer(fitted(fit))
Answered By – Suren
Answer Checked By – Marilyn (AngularFixing Volunteer)