## Issue

I have used the TBATS model on my data and when I apply the forecast() function, it automatically forecasts two years in the future. I haven’t specified any training set or testing set, so how do I know how much data it used to predict the next two years?

The data I’m dealing with is Uber travel times data from Jan 2016 to Jan 2020. I have daily data (sampling frequency = 1) for 18 cities and each city has a different sample size (they range from 1422 days to 1459 days).

I have set the vector of travel times as an `msts`

object, for it has multiple seasonality, which is used by the TBATS model.

When I calculate RMSE, MAE, MAPE and MSE, I get very low values in general, so how can I know which data TBATS is training on?

Here is my code:

```
data <- read.csv('C:/users/Datasets/Final Datasets/final_a.csv', TRUE, ",")
y <- msts(data$MeanTravelTimeSeconds, start=c(2016,1), seasonal.periods=c(7.009615384615385, 30.5, 91.3, 365.25))
fit <- tbats(y)
plot(fit)
fc <- forecast(fit)
autoplot(fc, ylab = "Travel Time in Seconds")
# Check residuals (ACF and histogram)
checkresiduals(fc)
# RMSE
rmse <- sqrt(fit$variance)
# MAE
res <- residuals(fit)
mae <- mean(abs(res))
# MAPE
pt <- (res)/y
mape <- mean(abs(pt))
# MSE (Mean Squared Error)
mse <- mean(res^2)
```

The performance results for the TBATS model for Amsterdam are:

```
RMSE: 0.06056063
MAE: 0.04592825
MAPE: 6.474616e-05
MSE: 0.00366759
```

If I had to manually select the test and train sets, how should I modify my code in order to do so?

## Solution

If you use `forecast(fit)`

, as you did, what you get is the fitted vales from the training data.

If you want to use a test set as well see below for an example. You use the fitted model to forecast to a horizon h and compare with known data set.

```
library(forecast)
# Training Data
n_train <- round(length(USAccDeaths) * 0.8)
train <- head(USAccDeaths, n_train)
# Test Data
n_test <- length(USAccDeaths) - n_train
test <- tail(USAccDeaths, n_test)
# Model Fit
fit <- tbats(train)
# Forecast for the same horizion as the test data
fc <- forecast(fit, n_test)
# Point Forecasts
fc$mean
# Jan Feb Mar Apr May Jun Jul
# 1977 7767.513 7943.791 8777.425 9358.863 10034.996
# 1978 7711.478 7004.621 7767.513 7943.791 8777.425 9358.863 10034.996
# Aug Sep Oct Nov Dec
# 1977 9517.860 8370.509 8706.441 8190.262 8320.606
# 1978 9517.860 8370.509 8706.441 8190.262 8320.606
test # for comparison with the point forecasts
# Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
# 1977 7726 8106 8890 9299 10625 9302 8314 8850 8265 8796
# 1978 7836 6892 7791 8192 9115 9434 10484 9827 9110 9070 8633 9240
```

It would be interesting to see how plots like the following would behave as well.

```
autoplot(USAccDeaths) + autolayer(fc) + autolayer(fitted(fit))
#autoplot(USAccDeaths) + autolayer(fitted(fit))
```

Answered By – Suren

Answer Checked By – Marilyn (AngularFixing Volunteer)