Direct Forecasting with Multiple Time Series - Sequences

Nickalus Redell

2020-04-19

Purpose

This vignette is a shorter version of the Direct Forecasting with Multiple Time Series vignette. The goal here is to illustrate the workflow for forecasting factor outcomes. To keep this brief, we’ll skip model exploration with nested cross-validation by training 1 forecast model across the entire dataset for each direct forecast horizon.

For this problem, the outcome will be artificially binned into 4 factor levels using cut().

Setup

To forecast with multiple/grouped/hierarchical time series in forecastML, your data need the following characteristics:

Example - Direct Forecasting with Factors

To illustrate forecasting with multiple time series, we’ll use the data_buoy dataset that comes with the package. This dataset consists of daily sensor measurements of several environmental conditions collected by 14 buoys in Lake Michigan from 2012 through 2018. The data were obtained from NOAA’s National Buoy Data Center available at https://www.ndbc.noaa.gov/ using the rnoaa package.

Load Packages and Data

data_buoy_gaps consists of:

library(forecastML)
library(dplyr)
library(DT)
library(ggplot2)
library(xgboost)

data("data_buoy_gaps", package = "forecastML")

data_buoy_gaps$wind_spd <- cut(data_buoy_gaps$wind_spd, breaks = c(-1, 3, 5, 8, 10),
                               ordered_result = TRUE)  # Create the factor outcome.

DT::datatable(head(data_buoy_gaps), options = list(scrollX = TRUE))

forecastML::fill_gaps

data <- forecastML::fill_gaps(data_buoy_gaps, date_col = 1, frequency = '1 day', 
                              groups = 'buoy_id', static_features = c('lat', 'lon'))

print(list(paste0("The original dataset with gaps in data collection is ", nrow(data_buoy_gaps), " rows."), 
      paste0("The modified dataset with no gaps in data collection from fill_gaps() is ", nrow(data), " rows.")))
## [[1]]
## [1] "The original dataset with gaps in data collection is 23646 rows."
## 
## [[2]]
## [1] "The modified dataset with no gaps in data collection from fill_gaps() is 31225 rows."

Dynamic Features

data$day <- lubridate::mday(data$date)
data$year <- lubridate::year(data$date)

Plot Wind Speed Outcome

p <- ggplot(data[!is.na(data$wind_spd), ], aes(x = date, y = 1, fill = wind_spd, color = wind_spd))
p <- p + geom_tile()
p <- p + facet_wrap(~ ordered(buoy_id), scales = "fixed")
p <- p + theme_bw() + theme(axis.text.y = element_blank(), axis.ticks.y = element_blank()) + 
  xlab(NULL) + ylab(NULL)
p

Model Training

data$buoy_id <- as.numeric(factor(data$buoy_id))

Training dataset - forecastML::create_lagged_df

  • We have 3 datasets for training models that forecast 1, 1 to 7, and 1 to 30 days into the future. We’ll view the 1-day-ahead training data below.
outcome_col <- 1  # The column position of our 'wind_spd' outcome (after removing the 'date' column).

horizons <- c(1, 7, 30)  # Forecast 1, 1:7, and 1:30 days into the future.

lookback <- c(1:30, 360:370)  # Features from 1 to 30 days in the past and annually.

dates <- data$date  # Grouped time series forecasting requires dates.
data$date <- NULL  # Dates, however, don't need to be in the input data.

frequency <- "1 day"  # A string that works in base::seq(..., by = "frequency").

dynamic_features <- c("day", "year")  # Features that change through time but which will not be lagged.

groups <- "buoy_id"  # 1 forecast for each group or buoy.

static_features <- c("lat", "lon")  # Features that do not change through time.
type <- "train"  # Create a model-training dataset.

data_train <- forecastML::create_lagged_df(data, type = type, outcome_col = outcome_col,
                                           horizons = horizons, lookback = lookback,
                                           dates = dates, frequency = frequency,
                                           dynamic_features = dynamic_features,
                                           groups = groups, static_features = static_features, 
                                           use_future = FALSE)

DT::datatable(head(data_train$horizon_1), options = list(scrollX = TRUE))


  • The plot below shows the feature map for any lagged features across forecast horizons. Here, we set all non-dynamic and non-static features to have the same lags (refer to the custom lags vignette to see how this could be modified). Notice that features that don’t support direct forecasting to the given horizon–e.g., lags of 1 to 29 days for the 30-day-horizon model–are silently dropped.
p <- plot(data_train)  # plot.lagged_df() returns a ggplot object.
p <- p + geom_tile(NULL)  # Remove the gray border for a cleaner plot.
p

CV setup - forecastML::create_windows

  • We’ll model with 0 external validation datasets and use a simple cross-validation setup in train_model().
windows <- forecastML::create_windows(data_train, window_length = 0)

plot(windows, data_train)


  • Now we’ll use the group_filter = "buoy_id == 1" argument to get a closer look at 1 of our 14 time series. The user-supplied filter is passed to dplyr::filter() internally.
plot(windows, data_train, group_filter = "buoy_id == 1") 

User-defined modeling function

  • A user-defined wrapper function for model training that takes the following arguments:
    • 1: A horizon-specific data.frame made with create_lagged_df(..., type = "train") (e.g., my_lagged_df$horizon_h),
    • 2: optionally, any number of additional named arguments which can be passed as ‘…’ in train_model()
    • and returns a model object or list containing a model that will be passed into the user-defined predict() function.

Any data transformations, hyperparameter tuning, or inner loop cross-validation procedures should take place within this function, with the limitation that it ultimately needs to return() a model suitable for the user-defined predict() function; a list can be returned to capture meta-data such as pre-processing pipelines or hyperparameter results.

  • Notice that the xgboost-specific input datasets are created within this wrapper function.
# The value of outcome_col can also be set in train_model() with train_model(outcome_col = 1).
model_function <- function(data, outcome_col = 1) {
  
  # xgboost cannot model factors directly so they'll be converted to numbers.
  data[] <- lapply(data, as.numeric)
  
  # xgboost cannot handle missing outcomes data so we'll remove this.
  data <- data[!is.na(data[, outcome_col]), ]
  
  data[, outcome_col] <- data[, outcome_col] - 1  # xgboost needs factors to start at 0.

  indices <- 1:nrow(data)
  
  set.seed(224)
  train_indices <- sample(1:nrow(data), ceiling(nrow(data) * .8), replace = FALSE)
  test_indices <- indices[!(indices %in% train_indices)]

  data_train <- xgboost::xgb.DMatrix(data = as.matrix(data[train_indices, 
                                                           -(outcome_col), drop = FALSE]),
                                     label = as.matrix(data[train_indices, 
                                                            outcome_col, drop = FALSE]))

  data_test <- xgboost::xgb.DMatrix(data = as.matrix(data[test_indices, 
                                                          -(outcome_col), drop = FALSE]),
                                    label = as.matrix(data[test_indices, 
                                                           outcome_col, drop = FALSE]))

  params <- list("objective" = "multi:softprob",
                 "eval_metric" = "mlogloss",
                 "num_class" = 4)  # Hard-coding the number of factor levels.

  watchlist <- list(train = data_train, test = data_test)
  
  set.seed(224)
  model <- xgboost::xgb.train(data = data_train, params = params, 
                              max.depth = 8, nthread = 2, nrounds = 30,
                              metrics = "rmse", verbose = 0, 
                              early_stopping_rounds = 5, 
                              watchlist = watchlist)

  return(model)
}

Model training - forecastML::train_model

  • This should take ~1 minute to train our ‘3 forecast horizons’ * ‘1 validation datasets’ = 3 models.

  • The user-defined modeling wrapper function could be much more elaborate, in which case many more models could potentially be trained here.

  • These models could be trained in parallel on any OS with the very flexible future package by un-commenting the code below and setting use_future = TRUE. To avoid nested parallelization, models are either trained in parallel across forecast horizons or validation windows, whichever is longer (when equal, the default is parallel across forecast horizons).

#future::plan(future::multiprocess)  # Multi-core or multi-session parallel training.

model_results <- forecastML::train_model(lagged_df = data_train,
                                         windows = windows,
                                         model_name = "xgboost",
                                         model_function = model_function, 
                                         use_future = FALSE)
  • We can access the xgboost model for any horizon or validation window. Here, we show a summary() of the 1-step-ahead model for the first validation window which is 2012.
summary(model_results$horizon_1$window_1$model)
##                 Length  Class              Mode       
## handle                1 xgb.Booster.handle externalptr
## raw             1090563 -none-             raw        
## best_iteration        1 -none-             numeric    
## best_ntreelimit       1 -none-             numeric    
## best_score            1 -none-             numeric    
## niter                 1 -none-             numeric    
## evaluation_log        3 data.table         list       
## call                 10 -none-             call       
## params                7 -none-             list       
## callbacks             2 -none-             list       
## feature_names       128 -none-             character  
## nfeatures             1 -none-             numeric

Forecast

User-defined prediction function

The following user-defined prediction function is needed for each model:

  • A wrapper function that takes the following 2 positional arguments:
    • 1: The model returned from the user-defined modeling function (could be a list containing the model).
    • 2: A data.frame of the model features from forecastML::create_lagged_df(..., type = "train").
  • and returns a data.frame of predictions with 1 or 3 columns. A 1-column data.frame will produce point forecasts, and a 3-column data.frame can be used to return point, lower, and upper forecasts (column names and order do not matter).

Function to predict class probabilities

  • Returns a 4-column data.frame of predicted probabilities–one for each factor.
# If 'model' is passed as a named list, the prediction model would be accessed with model$model or model["model"].
prediction_function_prob <- function(model, data_features) {
  
  # xgboost cannot model factors directly so they'll be converted to numbers.
  data_features[] <- lapply(data_features, as.numeric)
  
  x <- xgboost::xgb.DMatrix(data = as.matrix(data_features))
  data_pred <- data.frame("y_pred" = predict(model, x, reshape = TRUE))  # 'reshape' returns a wide data.frame.
  return(data_pred)
}

Function to predict class levels

  • Returns a 1-column data.frame with the predicted factor level.
# We'll define a global variable with the factor levels.
factor_levels <- levels(data_buoy_gaps$wind_spd)

# If 'model' is passed as a named list, the prediction model would be accessed with model$model or model["model"].
prediction_function_level <- function(model, data_features) {
  
  # xgboost cannot model factors directly so they'll be converted to numbers.
  data_features[] <- lapply(data_features, as.numeric)
  
  x <- xgboost::xgb.DMatrix(data = as.matrix(data_features))
  data_pred <- data.frame("y_pred" = predict(model, x, reshape = TRUE))  # 'reshape' returns a wide data.frame.
  
  data_pred$y_pred <- apply(data_pred, 1, which.max)  # Find the column with the highest probability.
  data_pred$y_pred <- dplyr::recode(data_pred$y_pred, `1` = factor_levels[1], `2` = factor_levels[2], 
                                    `3` = factor_levels[3], `4` = factor_levels[4])

  data_pred$y_pred <- factor(data_pred$y_pred, levels = factor_levels, ordered = TRUE)

  data_pred <- data_pred[, "y_pred", drop = FALSE]
  return(data_pred)
}

Historical model fit

  • Here, we’re predicting on our validation dataset with predicted (a) probabilities and (b) factor levels.
data_pred_prob <- predict(model_results, prediction_function = list(prediction_function_prob), data = data_train)

data_pred_level <- predict(model_results, prediction_function = list(prediction_function_level), data = data_train)
  • With 14 buoys and 3 direct forecast horizons, there are a total of 42 forecasts to plot. We’ll filter these in our plot to better see the actuals and predictions for 2 buoys at a direct forecast horizon of 7 days.
plot(data_pred_prob, horizons = 7, group_filter = "buoy_id %in% c(1, 2)")


  • Below are the factor level plots for the same buoys and horizon but zoomed in to 2018.
inspect_dates <- seq(as.Date("2018-01-01"), as.Date("2018-12-31"), by = "1 day")

plot(data_pred_level[data_pred_level$date_indices %in% inspect_dates, ], horizons = 7, group_filter = "buoy_id %in% c(1, 2)")

Historical prediction error - forecastML::return_error

  • Let’s take a quick look at our historical forecast error..

  • At present, mean absolute error–based on the binary misclassification rate where 0 is a correct prediction and 1 is incorrect–is the only error metric available for forecasting factor levels, and error metrics for predicted factor probabilities are not currently supported. Additional error metrics will continue to be added to forecastML.

data_error <- forecastML::return_error(data_pred_level, metric = "mae")

plot(data_error, type = "horizon", metric = "mae")

plot(data_error, type = "global", metric = "mae")

Forecast

  • We have 3 datasets that support forecasting 1, 1 to 7, and 1 to 30 days into the future. We’ll view the 1-day-ahead forecasting data below.

  • Note that the index and horizon columns are removed internally when passed into the user-defined predict() function.

type <- "forecast"  # Create a forecasting dataset for our predict() function.

data_forecast <- forecastML::create_lagged_df(data, type = type, outcome_col = outcome_col,
                                              horizons = horizons, lookback = lookback,
                                              dates = dates, frequency = frequency,
                                              dynamic_features = dynamic_features,
                                              groups = groups, static_features = static_features, 
                                              use_future = FALSE)

DT::datatable(head(data_forecast$horizon_1), options = list(scrollX = TRUE))

Dynamic features and forecasting

  • Our dynamic features day and year were not lagged in our modeling dataset. This was the right choice from a modeling perspective; however, in order to forecast ‘h’ steps ahead, we need to know their future values for each forecast horizon. At present, there’s no function in forecastML to autofill the future values of dynamic, non-lagged features so we’ll simply do it manually below.
for (i in seq_along(data_forecast)) {
  data_forecast[[i]]$day <- lubridate::mday(data_forecast[[i]]$index)  # When dates are given, the 'index` is date-based.
  data_forecast[[i]]$year <- lubridate::year(data_forecast[[i]]$index)
}

Forecast

  • Now we’ll forecast 1, 1:7, and 1:30 days into the future with predict(..., data = data_forecast).

  • The first time step into the future is max(dates) + 1 * frequency. Here, this is 12-31-2018 + 1 * ‘1 day’ or 1-1-2019.

data_forecasts_prob <- predict(model_results, prediction_function = list(prediction_function_prob), data = data_forecast)

data_forecasts_level <- predict(model_results, prediction_function = list(prediction_function_level), data = data_forecast)
  • We’ll focus on 2 buoys in our plots.
plot(data_forecasts_prob, group_filter = "buoy_id %in% c(1, 2)")

plot(data_forecasts_level, group_filter = "buoy_id %in% c(1, 2)")

Forecast Combination - forecastML::combine_forecasts

data_combined_prob <- forecastML::combine_forecasts(data_forecasts_prob)
data_combined_level <- forecastML::combine_forecasts(data_forecasts_level)

# Plot the final forecasts.
plot(data_combined_prob, group_filter = "buoy_id %in% c(1, 2)")

plot(data_combined_level, group_filter = "buoy_id %in% c(1, 2)")