Overview
The Prophet-XGBoost hybrid model combines the strengths of Facebook's Prophet forecasting procedure with the powerful gradient boosting capabilities of XGBoost. This hybrid approach leverages Prophet for its robust decomposition of time series into trend, seasonality, and holiday effects, and then uses an XGBoost model to capture any remaining complex, non-linear patterns in the residuals (the unexplained part) from the Prophet forecast. This synergy aims to provide more accurate and robust forecasts, particularly for time series data that exhibits both clear, interpretable patterns and intricate, non-linear dynamics that tree-based models excel at capturing.
Architecture & Components
The Prophet-XGBoost hybrid model typically follows a two-stage sequential process:
- Stage 1: Prophet Modeling (Trend, Seasonality, Holidays)
A Prophet model is first applied to the raw time series data. Prophet excels at modeling piecewise linear or logistic trends, multiple seasonalities (yearly, weekly, daily) using Fourier series, and the impact of holidays. After fitting, Prophet generates in-sample predictions, and the **residuals** (the differences between the actual values and Prophet's fitted values) are calculated. These residuals are assumed to primarily contain the non-linear patterns that Prophet's additive model could not fully capture.
$ R_t = Y_t - \hat{Y}_t^{\text{Prophet}} $
Where $R_t$ are the residuals, $Y_t$ is the actual value, and $\hat{Y}_t^{\text{Prophet}}$ is Prophet's fitted value. - Stage 2: XGBoost Modeling (Non-linear Residuals)
An XGBoost model is then trained on these residuals. XGBoost, a powerful gradient boosting algorithm, is highly effective at capturing complex non-linear relationships and interactions between features. For time series residuals, it would typically be trained on lagged residuals and potentially other relevant time-based features (e.g., day of week, month) to predict the future deviation of the Prophet predictions.
$ \hat{R}_t^{\text{XGBoost}} = \text{XGBoost}(R_{t-1}, R_{t-2}, \dots, \text{TimeFeatures}_t) $
Where $\hat{R}_t^{\text{XGBoost}}$ is the XGBoost's forecast of the residual. - Final Forecast Combination:
The final forecast is obtained by summing the forecasts from both components: the forecast from Prophet and the non-linear residual forecast from XGBoost.
$ \hat{Y}_t^{\text{Hybrid}} = \hat{Y}_t^{\text{Prophet}} + \hat{R}_t^{\text{XGBoost}} $
Conceptual diagram of the Prophet-XGBoost hybrid model, showing sequential processing.
When to Use Prophet-XGBoost Hybrid
The Prophet-XGBoost hybrid model is particularly effective for:
- Time series with strong seasonality and holidays, combined with complex non-linear residuals: This is common in business data where predictable calendar effects interact with subtle, hard-to-model non-linearities.
- Achieving high forecasting accuracy: By combining complementary strengths, it often outperforms standalone Prophet or XGBoost models.
- When interpretability of the trend and seasonal components is desired: Prophet provides a clear, interpretable baseline.
- Handling missing data and outliers: Prophet's robustness to these issues is maintained, and XGBoost can also handle missing values.
- As a robust solution for challenging time series data.
Pros and Cons
Pros
- Enhanced Accuracy: Leverages Prophet's robust handling of seasonality and holidays with XGBoost's predictive power for complex non-linearities.
- Improved Robustness: Benefits from Prophet's resilience to missing data and outliers, and XGBoost's general robustness.
- Interpretability: Prophet's decomposition provides clear insights into the linear and seasonal components.
- Addresses Limitations: Overcomes Prophet's limitation with complex non-linearities and XGBoost's struggle with extrapolation of trends.
- Fast & Efficient Residual Modeling: XGBoost is known for its speed and efficiency in training.
Cons
- Increased Complexity: More challenging to implement and manage due to the need to train and integrate two separate models.
- Higher Computational Cost: Involves training two models sequentially.
- Error Propagation: Errors from the Prophet model can propagate to the XGBoost model.
- Feature Engineering for XGBoost: Requires careful manual creation of features from residuals and time components.
- Hyperparameter Tuning: Requires tuning parameters for both Prophet and XGBoost components.
Example Implementation
Implementing a Prophet-XGBoost hybrid model involves several steps: fitting Prophet, extracting residuals, preparing residuals for XGBoost (with lagged features), training XGBoost, and combining forecasts. Here's a conceptual Python example demonstrating this process.
Python Example (Conceptual)
import pandas as pd
import numpy as np
from prophet import Prophet
import xgboost as xgb
from sklearn.metrics import mean_absolute_error, mean_squared_error
import matplotlib.pyplot as plt
# 1. Generate sample data with trend, seasonality, and some non-linearity
np.random.seed(42)
n_samples = 365 * 2 # 2 years of daily data
time_idx = pd.date_range(start='2020-01-01', periods=n_samples, freq='D')
# Linear trend + strong yearly seasonality
linear_seasonal_component = 100 + 0.5 * np.arange(n_samples) + 30 * np.sin(np.arange(n_samples) * 2 * np.pi / 365)
# Add some non-linear, autoregressive-like noise (residuals for XGBoost)
non_linear_residuals_base = np.zeros(n_samples)
for i in range(1, n_samples):
non_linear_residuals_base[i] = 0.4 * non_linear_residuals_base[i-1] + np.random.normal(0, 2) * (1 + np.cos(i/100))
original_series = linear_seasonal_component + non_linear_residuals_base
df_prophet = pd.DataFrame({'ds': time_idx, 'y': original_series})
# 2. Split data into train and test sets (chronological)
train_size = int(n_samples * 0.8)
train_df_prophet, test_df_prophet = df_prophet.iloc[0:train_size], df_prophet.iloc[train_size:n_samples]
train_series_actual = df_prophet['y'].iloc[0:train_size]
test_series_actual = df_prophet['y'].iloc[train_size:n_samples]
# --- Stage 1: Prophet Modeling ---
# 3. Fit Prophet model to capture linear/seasonal patterns
m = Prophet(yearly_seasonality=True, weekly_seasonality=True, daily_seasonality=False)
m.fit(train_df_prophet)
# Create future dataframe for in-sample predictions to get residuals
future_in_sample = m.make_future_dataframe(periods=0, include_history=True)
prophet_train_pred_df = m.predict(future_in_sample)
prophet_train_pred = prophet_train_pred_df['yhat'].iloc[0:train_size]
# 4. Get Prophet in-sample residuals
prophet_residuals = train_series_actual - prophet_train_pred
print("Prophet Model Fitted.")
print(f"\nProphet Residuals (first 5): {prophet_residuals.head().values}")
# --- Stage 2: XGBoost Modeling on Residuals ---
# 5. Prepare residuals for XGBoost (create lagged features and time features)
df_residuals = pd.DataFrame({'ds': train_df_prophet['ds'], 'residuals': prophet_residuals})
def create_features_for_xgboost(df_res, lag_steps):
df_res['day_of_week'] = df_res['ds'].dt.dayofweek
df_res['month'] = df_res['ds'].dt.month
df_res['day_of_year'] = df_res['ds'].dt.dayofyear
for i in range(1, lag_steps + 1):
df_res[f'lag_res_{i}'] = df_res['residuals'].shift(i)
return df_res.dropna()
lag_steps = 7 # Number of past residuals to use as input for XGBoost
df_residuals_features = create_features_for_xgboost(df_residuals.copy(), lag_steps)
features_xgb = [col for col in df_residuals_features.columns if col not in ['ds', 'residuals']]
X_residuals_xgb, y_residuals_xgb = df_residuals_features[features_xgb], df_residuals_features['residuals']
# 6. Build and train XGBoost model on residuals
xgb_model = xgb.XGBRegressor(n_estimators=100, learning_rate=0.1, max_depth=5, random_state=42)
print("\nStarting XGBoost training on Prophet residuals...")
xgb_model.fit(X_residuals_xgb, y_residuals_xgb)
print("XGBoost training complete.")
# --- Forecasting and Combination ---
# 7. Make multi-step forecasts
forecast_steps = len(test_series_actual)
# Prophet forecast for the future
future_prophet = m.make_future_dataframe(periods=forecast_steps, include_history=False)
prophet_forecast_future_df = m.predict(future_prophet)
prophet_forecast_future = prophet_forecast_future_df['yhat'].values
# XGBoost forecast for future residuals (recursive prediction)
# Need to create future features for XGBoost
future_residuals_xgb =
# Start with the last 'lag_steps' residuals from training
current_residuals_sequence = prophet_residuals.values[-lag_steps:]
# Create a dummy dataframe for future dates to extract time features
future_dates_df = pd.DataFrame({'ds': test_df_prophet['ds']})
future_dates_df['day_of_week'] = future_dates_df['ds'].dt.dayofweek
future_dates_df['month'] = future_dates_df['ds'].dt.month
future_dates_df['day_of_year'] = future_dates_df['ds'].dt.dayofyear
for i in range(forecast_steps):
# Prepare input for next residual prediction
time_features_for_next_step = future_dates_df.iloc[i][['day_of_week', 'month', 'day_of_year']].values
# Combine time features with current lagged residuals
input_features_for_xgb = np.concatenate((time_features_for_next_step, current_residuals_sequence)).reshape(1, -1)
next_residual_pred = xgb_model.predict(input_features_for_xgb)
future_residuals_xgb.append(next_residual_pred)
# Update current_residuals_sequence for the next iteration
current_residuals_sequence = np.append(current_residuals_sequence[1:], next_residual_pred)
# 8. Combine forecasts
hybrid_forecast = prophet_forecast_future + np.array(future_residuals_xgb)
# 9. Evaluate Hybrid Model
mae = mean_absolute_error(test_series_actual, hybrid_forecast)
rmse = np.sqrt(mean_squared_error(test_series_actual, hybrid_forecast))
print(f"\nHybrid Model MAE: {mae:.3f}")
print(f"Hybrid Model RMSE: {rmse:.3f}")
# 10. Plotting Results
plt.figure(figsize=(14, 7))
plt.plot(train_df_prophet['ds'], train_series_actual, label='Training Data', color='blue')
plt.plot(test_df_prophet['ds'], test_series_actual, label='Actual Test Data', color='orange')
plt.plot(test_df_prophet['ds'], hybrid_forecast, label='Prophet-XGBoost Hybrid Forecast', color='green', linestyle='--')
plt.title('Prophet-XGBoost Hybrid Time Series Forecasting')
plt.xlabel('Date')
plt.ylabel('Value')
plt.legend()
plt.grid(True)
plt.show()
Dependencies & Resources
Dependencies: pandas
, numpy
, prophet
, xgboost
, scikit-learn
, matplotlib
(for plotting).
- A Hybrid Forecasting Model Based on Prophet and LSTM for Time Series Prediction (MDPI Paper, discusses Prophet hybrids) ↗
- Improved Sales Forecasting using Trend and Seasonality Decomposition with LightGBM (discusses Prophet-LightGBM hybrid, similar concept) ↗
- Hybrid-Time-Series-Modeling GitHub Repository (includes conceptual hybrid frameworks) ↗