← Back to Model Library

Autoformer Model

Decomposition Transformers with Auto-Correlation for Long-Term Series Forecasting

Overview

Autoformer is a Transformer-based model designed for long-term time series forecasting. It addresses the limitations of prior Transformer models, which struggled with intricate temporal patterns and efficiency for long sequences. Autoformer introduces a novel decomposition architecture and an Auto-Correlation mechanism, allowing it to progressively decompose trend and seasonal components during the forecasting process. This approach significantly enhances its ability to capture complex temporal patterns and achieves state-of-the-art accuracy, particularly in long-horizon forecasting.

Architecture & Components

Autoformer's architecture builds upon the Transformer framework with two primary innovations:

  • Deep Decomposition Architecture: Autoformer renovates the Transformer into a deep decomposition architecture. Unlike models that perform decomposition as a separate preprocessing step, Autoformer adaptively separates the raw time series signal into its trend and seasonal components *within* the model during both training and inference. The trend component ($x_{trend}$) is typically extracted by applying a moving average (MA) filter with a specified kernel size ($k$). This inductive bias improves forecasting by allowing the model to learn each component more effectively.
  • Series-wise Auto-Correlation Mechanism: Inspired by stochastic process theory, Autoformer replaces the conventional self-attention mechanism with an Auto-Correlation mechanism. This mechanism discovers period-based dependencies by comparing similar sub-sequences (e.g., aligning Mondays with other Mondays) rather than every time step with each other. This not only reduces computational complexity from quadratic to log-linear ($O(L \log L)$) but also aligns better with the periodic nature of many real-world time series. This series-wise connection inherently keeps sequential information, meaning Autoformer does not need explicit position embeddings like other Transformers.
  • Encoder-Decoder Structure: Autoformer retains the standard Transformer encoder-decoder structure. The encoder processes the input sequence, and the decoder generates the forecast, leveraging the decomposed components and the Auto-Correlation mechanism.

Autoformer is a deterministic model, providing a single point forecast rather than a distribution of possible future values.

Autoformer Architecture Diagram

Conceptual diagram of Autoformer's architecture with decomposition and Auto-Correlation.

When to Use Autoformer

Autoformer is an excellent choice for:

  • Long-horizon time series forecasting problems where trends and seasonality unfold over extended periods.
  • Structured and periodic data, where its Auto-Correlation mechanism can effectively identify recurring patterns.
  • Scenarios requiring robust performance in both clean and noisy environments, as its decomposition mechanism helps filter high-frequency noise.
  • When interpretability of trend and seasonal components is desired, as it explicitly models them.
  • As a state-of-the-art model for various practical applications including energy, traffic, economics, weather, and disease forecasting.

Pros and Cons

Pros

  • State-of-the-Art for Long-Term Forecasting: Achieves significant relative improvements on various benchmarks.
  • Superior Noise Resilience: Its built-in decomposition mechanism effectively filters high-frequency noise, maintaining predictive stability even in perturbed environments.
  • Interpretable Decomposition: Explicitly separates and models trend and seasonal components, offering insights into the forecast.
  • Efficient: The Auto-Correlation mechanism reduces computational complexity to log-linear ($O(L \log L)$), making it efficient for long sequences.
  • No Positional Embedding Needed: Inherently preserves sequential information, simplifying the architecture.

Cons

  • Deterministic Output: Provides only a single point forecast, limiting its ability to quantify forecast uncertainty (no probability distribution).
  • Requires PyTorch: Official and most common implementations are in PyTorch, limiting direct TensorFlow usage without adaptations.
  • Data Requirements: Like most deep learning models, it generally requires sufficient historical data for optimal performance.

Example Implementation

Autoformer is primarily implemented in PyTorch, with the official code provided by THUML. HuggingFace Transformers and NeuralForecast also offer implementations. Here's a conceptual example using the THUML repository's approach, which typically involves running bash scripts for specific datasets.

PyTorch Example (using THUML/Autoformer)

                        
# 1. Clone the official Autoformer repository
# git clone https://github.com/thuml/Autoformer.git
# cd Autoformer

# 2. Install Python 3.6 and PyTorch 1.9.0 (or compatible versions)
# pip install -r requirements.txt # (assuming a requirements.txt exists or install manually)

# 3. Download datasets
# The datasets are typically provided via a Google Drive link in the repository's README.
# Download them and place them in a './dataset' folder in the root of the cloned repository.
# Example: make get_dataset (if using their Makefile)

# 4. Run a training script for a specific dataset (e.g., ETTm1)
# These scripts are located in the './scripts' directory.
echo "Running Autoformer training script for ETTm1 dataset..."
bash./scripts/ETT_script/Autoformer_ETTm1.sh

# This script will typically:
# - Set up model parameters (e.g., sequence length, prediction length, number of encoder/decoder layers)
# - Load the ETTm1 dataset
# - Train the Autoformer model
# - Evaluate its performance (RMSE, MAE) and save results to './result.txt' or similar.

# Example of what the script might contain (simplified):
# python main_long_term_forecast.py \
#   --model Autoformer \
#   --data ETTm1 \
#   --features M \
#   --seq_len 96 \
#   --label_len 48 \
#   --pred_len 24 \
#   --e_layers 2 \
#   --d_layers 1 \
#   --factor 3 \
#   --enc_in 7 \
#   --dec_in 7 \
#   --c_out 7 \
#   --des Exp \
#   --itr 1 \
#   --train_epochs 10 \
#   --batch_size 32 \
#   --learning_rate 0.0001 \
#   --root_path./dataset/ETT-small/ \
#   --data_path ETTm1.csv \
#   --checkpoints./checkpoints/

echo "Autoformer training script executed. Check './result.txt' or specified output directory for results."

# For inference, you would typically load a trained model checkpoint and use its predict method.
# The repository's 'predict.ipynb' (in Chinese) provides a workflow example.
                        

TensorFlow Example (Conceptual - via HuggingFace Transformers)

While the original Autoformer implementation is in PyTorch, HuggingFace provides a TensorFlow version within its Transformers library. This allows for a more direct TensorFlow usage.

                        
import tensorflow as tf
import numpy as np
import pandas as pd
from transformers import TFAutoformerForPrediction, AutoformerConfig
import matplotlib.pyplot as plt

# 1. Generate sample data (similar to PyTorch example)
n_samples = 200
context_length = 96 # Length of input sequence for encoder
prediction_length = 24 # Length of sequence to predict
num_input_channels = 1 # Univariate time series

# Simulate past_values (e.g., a sine wave with noise and trend)
past_values_np = (np.sin(np.arange(context_length + prediction_length) / 10) * 10 +
                  np.arange(context_length + prediction_length) * 0.1 +
                  np.random.randn(context_length + prediction_length) * 2)
past_values_np = past_values_np.reshape(1, -1, num_input_channels) # (batch_size, seq_len, num_channels)
past_values = tf.constant(past_values_np[:, :context_length, :], dtype=tf.float32)

# Simulate future_values (for training/evaluation, not needed for inference)
future_values = tf.constant(past_values_np[:, context_length:, :], dtype=tf.float32)

# Simulate past_time_features (e.g., simple linear time index)
past_time_features = tf.constant(np.arange(context_length).reshape(1, -1, 1), dtype=tf.float32)
future_time_features = tf.constant(np.arange(context_length, context_length + prediction_length).reshape(1, -1, 1), dtype=tf.float32)

# 2. Load a pre-trained Autoformer model (or initialize from scratch)
# For a real application, you might fine-tune on your data or train from scratch
# model = TFAutoformerForPrediction.from_pretrained("huggingface/autoformer-tourism-monthly") # Example pre-trained model
# Or initialize from config:
config = AutoformerConfig(
    context_length=context_length,
    prediction_length=prediction_length,
    num_input_channels=num_input_channels,
    lags_sequence=[1, 2, 3, 4, 5, 6, 7], # Default lags
    num_encoder_layers=2,
    num_decoder_layers=1,
    d_model=128,
    num_attention_heads=4,
    activation_function="gelu",
    dropout=0.1,
    attention_dropout=0.0,
    ffn_dim=512,
    scaling="mean", # or "std", "none"
    loss="mse", # or "nll" for probabilistic
    num_time_features=1, # For the time feature we created
)
model = TFAutoformerForPrediction(config)

# 3. Perform a forward pass (inference example)
outputs = model(
    past_values=past_values,
    past_time_features=past_time_features,
    future_time_features=future_time_features,
    # past_observed_mask=tf.ones_like(past_values, dtype=tf.bool)
)

forecast = outputs.predictions.numpy().squeeze()

# 4. Display the forecast (conceptual)
print("Autoformer TensorFlow (HuggingFace) model inference complete.")
print(f"Forecast shape: {forecast.shape}")
print(f"First 5 forecasted values: {forecast[:5]}")

# Plotting (conceptual)
# plt.figure(figsize=(14, 7))
# plt.plot(np.arange(context_length), past_values.numpy().squeeze(), label='Past Values')
# plt.plot(np.arange(context_length, context_length + prediction_length), future_values.numpy().squeeze(), label='Actual Future Values', color='orange')
# plt.plot(np.arange(context_length, context_length + prediction_length), forecast, label='Autoformer Forecast', linestyle='--', color='green')
# plt.title('Autoformer Time Series Forecast (TensorFlow)')
# plt.xlabel('Time Step')
# plt.ylabel('Value')
# plt.legend()
# plt.grid(True)
# plt.show()
                        

Dependencies & Resources

Dependencies: pandas, numpy, torch (for PyTorch examples), tensorflow (for TensorFlow examples), transformers (HuggingFace library), matplotlib (for plotting).