ARIMA Model - TSM Hub

Overview

The AutoRegressive Integrated Moving Average (ARIMA) model is one of the most fundamental and widely used statistical methods for time series forecasting. It's a class of model that captures temporal structures in time series data. The name reflects its three key components: Autoregression (AR), Integration (I), and Moving Average (MA).

Architecture & Components

ARIMA models are defined by three parameters: (p, d, q).

AR (p) - Autoregression: This component suggests that the value of the series at a given time point is a linear combination of its own past values. The parameter 'p' is the order of the autoregression, meaning it specifies the number of lagged observations to include in the model. For example, if p=2, the model uses the values from the previous two time points to predict the current value.
I (d) - Integration: This component is used to make the time series stationary by differencing. Stationarity is a key assumption for many time series models. The parameter 'd' is the degree of differencing, representing the number of times the raw observations are differenced. For example, if d=1, the model uses the difference between consecutive observations.
MA (q) - Moving Average: This component suggests that the value of the series is a linear combination of past forecast errors. The parameter 'q' is the order of the moving average, specifying the size of the moving average window. This allows the model to account for random shocks or unexpected events from the past.

Mathematical Formulation

A non-seasonal ARIMA(p,d,q) model can be written as:

$ (1 - \sum_{i=1}^{p} \phi_i L^i) (1-L)^d Y_t = c + (1 + \sum_{i=1}^{q} \theta_i L^i) \epsilon_t $

Where:

$Y_t$ is the time series value at time t.
$L$ is the lag operator ($L Y_t = Y_{t-1}$).
$\phi_i$ are the parameters of the autoregressive part.
$\theta_i$ are the parameters of the moving average part.
$\epsilon_t$ is the white noise error term.
$d$ is the degree of differencing.
$c$ is a constant.

When to Use ARIMA

ARIMA is particularly well-suited for:

Forecasting data with a clear trend and without significant non-linearities.
Data that is stationary or can be made stationary through differencing.
Situations where model interpretability is important. The model's parameters have clear statistical interpretations.
As a baseline model to compare against more complex methods.

Example Implementation

Here's how to implement an ARIMA model in Python using the `statsmodels` library. We first generate some sample data, fit the model, and then plot the forecast.


# Import necessary libraries
import pandas as pd
import numpy as np
from statsmodels.tsa.arima.model import ARIMA
import matplotlib.pyplot as plt

# Generate sample data with a trend
np.random.seed(42)
n_samples = 150
time = np.arange(n_samples)
data = 20 + time * 0.5 + np.random.randn(n_samples) * 5
series = pd.Series(data, index=pd.date_range(start='2020-01-01', periods=n_samples, freq='D'))

# Split data into train and test
train_size = 120
train, test = series[0:train_size], series[train_size:n_samples]

# Fit the ARIMA model
# We choose (p,d,q) = (5,1,0). 'd=1' for the trend, 'p=5' to capture some autoregression.
model = ARIMA(train, order=(5,1,0))
model_fit = model.fit()

# Make a forecast
forecast_steps = len(test)
forecast = model_fit.forecast(steps=forecast_steps)

# Plotting the results (conceptual, actual plotting requires matplotlib setup)
print("Forecasted Values:")
print(forecast)
# plt.figure(figsize=(12, 6))
# plt.plot(train.index, train, label='Training Data')
# plt.plot(test.index, test, label='Actual Data', color='orange')
# plt.plot(test.index, forecast, label='ARIMA Forecast', color='green', linestyle='--')
# plt.title('ARIMA Model Forecast')
# plt.legend()
# plt.show()