Time Series Concepts

Featured Dataset: Pollution-Data-17-indian-cites

Domain: Environment / Air Quality
PM2.5 pollution data for 17 Indian cities, including preprocessed and raw datasets. Useful for urban air quality prediction and time series analysis.

Types of Time Series Data

Story for Kids

Imagine you're keeping a diary:

  • Univariate: Every day, you write down the temperature. That's one thing, over time.
  • Multivariate: Every day, you write down the temperature, how much you played outside, and what you ate for lunch. That's many things, over time.
  • Categorical: Some days you write "holiday" or "school day" in your diary. These are special labels, not numbers.
  • Numerical: Numbers you write, like your test scores or how many candies you ate.
  • Exogenous Variables: Maybe you write about the weather, and you notice you play outside more when it's sunny. The weather is an "extra" thing that affects your playtime.
  • Missing Data: Oops! You forgot to write in your diary one day. That's a missing entry.
  • Outliers: One day, you ate 100 candies at a party! That's way more than usual—a special, unusual day.
Simple Tip: Time series data is just keeping track of things over time, like a diary!

Example Time Series Datasets

Dataset Type Description Link
Air Passengers Univariate Monthly totals of international airline passengers (1949-1960). Download
Sunspots Univariate Monthly mean sunspot numbers (1749-present). Download
Electricity Load Diagrams Multivariate 15-min electricity consumption of 370 customers (2011-2014). UCI
Beijing PM2.5 Multivariate Hourly PM2.5 data with weather covariates (2010-2014). UCI
Rossmann Store Sales Multivariate, Categorical Daily sales data for 1,115 stores with promotions, holidays, etc. Kaggle
M4 Competition Univariate, Various 100,000+ time series from finance, economics, demographics, etc. Official
Exchange Rate Univariate Daily exchange rates for major currencies (1990-2016). Download
Household Power Consumption Multivariate Minute-averaged measurements of electric power usage (2006-2010). UCI
COVID-19 Global Cases Multivariate Daily confirmed, deaths, and recovered cases by country. GitHub
Retail Sales Univariate Monthly US retail sales (1992-present). FRED

See also: Awesome Public Datasets: Time Series

Trend

Story for Kids

Imagine you're climbing a hill. Some days you go up, some days you go down a little, but overall, you're getting higher and higher. That's a trend—like your height as you grow up, it usually goes up over time!

Simple Tip: If you see your numbers going up or down for a long time, that's a trend.

A trend is a long-term increase or decrease in the data. Trends can be linear (straight line), exponential (growing faster over time), or more complex. Recognizing a trend helps you understand the underlying direction of your data.

Upward Trend

Real-World Analogy

Think of a trend as the overall direction of a river: even if the water ripples up and down, the river flows downhill (or uphill, in data!).

Mathematical Intuition

A linear trend can be modeled as: y = a + bt, where b is the slope.

Practical Tip

Always check for trends before modeling. If present, consider detrending your data for models that require stationarity.

Example: Rolling Mean

Illustrative Example (SVG)
data['rolling_mean'] = data['value'].rolling(window=12).mean()
data[['value', 'rolling_mean']].plot()
Further Reading: Trend

Seasonality

Story for Kids

Think about ice cream sales. In summer, everyone wants ice cream, but in winter, not so much. This happens every year, like a birthday or a holiday. That's seasonality—a pattern that repeats over and over.

Simple Tip: If something happens again and again at the same time (like every summer), that's seasonality.

Seasonality refers to regular, periodic fluctuations in a time series, such as higher ice cream sales in summer or increased electricity usage in winter.

Seasonal Pattern

Real-World Analogy

Like the changing seasons, some data rises and falls in a predictable pattern every year, month, or week.

Mathematical Intuition

Seasonality can be modeled as: y = a + bt + S_t, where S_t is the seasonal component.

Practical Tip

Use seasonal decomposition to separate and analyze seasonal effects before forecasting.

Example: Seasonal Decomposition

from statsmodels.tsa.seasonal import seasonal_decompose
result = seasonal_decompose(data['value'], model='additive', period=12)
result.plot()
Further Reading: Seasonality

Stationarity

Story for Kids

Imagine a bouncing ball that always bounces to the same height, no matter when you throw it. The ball's bounces don't get bigger or smaller over time. That's stationarity—things stay the same, on average.

Simple Tip: If your numbers don't change their "average" or "spread" over time, they're stationary.

A stationary time series has statistical properties (mean, variance, autocorrelation) that do not change over time. Many forecasting models assume stationarity.

Stationary Series

Real-World Analogy

Imagine a heart rate monitor: the line bounces up and down, but the average stays the same over time.

Mathematical Intuition

A stationary process has constant mean and variance: E[y_t] = μ, Var[y_t] = σ².

Practical Tip

If your data is not stationary, try differencing or detrending before modeling with ARIMA-type models.

Example: Augmented Dickey-Fuller Test

from statsmodels.tsa.stattools import adfuller
result = adfuller(data['value'])
print('ADF Statistic:', result[0])
print('p-value:', result[1])
Further Reading: Stationarity

Noise

Story for Kids

You're listening to your favorite song, but there's static on the radio. The static is noise—it's not part of the song, and it makes it harder to hear the music.

Simple Tip: Noise is the "extra stuff" that makes it hard to see the real pattern.

Noise is the random variation in a time series that cannot be explained by trend or seasonality. Reducing noise can help improve model accuracy.

Random Noise

Real-World Analogy

Noise is like static on a radio: unpredictable and not part of the underlying signal.

Mathematical Intuition

Noise is often modeled as a random variable with mean zero: ε_t ~ N(0, σ²).

Practical Tip

Smoothing techniques (like moving averages) can help reduce noise before modeling.

Example: Smoothing

data['smoothed'] = data['value'].rolling(window=3).mean()
Further Reading: Noise

Autocorrelation & Partial Autocorrelation

Story for Kids

If you did well on your spelling test last week, you might do well this week too, because you studied hard. That's autocorrelation—when what happened before helps predict what happens next.

Simple Tip: If today's number is a lot like yesterday's, that's autocorrelation.

Autocorrelation measures the correlation of a time series with its own past values. Partial autocorrelation measures the correlation at a specific lag, controlling for shorter lags.

ACF Plot

Real-World Analogy

Autocorrelation is like an echo: how much does the past "echo" into the present?

Mathematical Intuition

The autocorrelation at lag k: Corr(y_t, y_{t-k}).

Practical Tip

Use ACF and PACF plots to identify AR and MA terms for ARIMA models.

Example: Plotting ACF and PACF

from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
plot_acf(data['value'])
plot_pacf(data['value'])
Further Reading: Autocorrelation

Lag, Windowing, and Feature Engineering

Story for Kids

If you want to guess how many candies you'll eat today, you might look at how many you ate yesterday, or the day before. That's using "lag." If you look at the average over the last 3 days, that's a "window."

Simple Tip: Looking at the past helps you guess the future!

Lag features use previous time steps as input for forecasting. Windowing aggregates values over a window. Feature engineering creates new variables to improve model performance.

t-3 t-2 t-1 t

Real-World Analogy

Lag is like remembering what happened yesterday to predict today.

Mathematical Intuition

Lag features: y_{t-1}, y_{t-2}, ...

Practical Tip

Create lag and rolling window features to help models learn temporal dependencies.

Example: Creating Lag Features

data['lag_1'] = data['value'].shift(1)
data['rolling_mean_3'] = data['value'].rolling(window=3).mean()
Further Reading: Feature Engineering

Evaluation Metrics

Story for Kids

Imagine you guess how many candies you'll eat each day, and then check how close your guess was. If you're usually close, you're good at guessing! That's what metrics do—they tell you how good your guesses (forecasts) are.

Simple Tip: Metrics are your "score" for how well you predicted.

Common metrics for time series forecasting include MAE (Mean Absolute Error), RMSE (Root Mean Squared Error), and MAPE (Mean Absolute Percentage Error).

Forecast vs Actual

Real-World Analogy

Metrics are like a report card for your model: they tell you how close your predictions are to reality.

Mathematical Intuition

MAE: mean(|y - ŷ|), RMSE: sqrt(mean((y - ŷ)²))

Practical Tip

Compare multiple metrics to get a full picture of model performance.

Example: Calculating MAE and RMSE

from sklearn.metrics import mean_absolute_error, mean_squared_error
mae = mean_absolute_error(y_true, y_pred)
rmse = mean_squared_error(y_true, y_pred, squared=False)
Further Reading: Forecast Accuracy

Train/Test Splitting for Time Series

Story for Kids

You practice math problems all week (training), then take a test on Friday (testing). You use what you learned to do well on the test.

Simple Tip: Practice on old data, test on new data!

Unlike random splits in standard ML, time series splits must respect temporal order. Use the earliest data for training and the most recent for testing.

Train Test

Real-World Analogy

Like learning from the past to predict the future: you train on history, then test on what comes next.

Mathematical Intuition

Split data chronologically: train = data[:-n], test = data[-n:]

Practical Tip

Never use future data to predict the past! Always split by time, not randomly.

Example: Time-based Split

train = data.iloc[:-12]
test = data.iloc[-12:]
Further Reading: Data Partitioning

Data Preprocessing

Story for Kids

Before you bake a cake, you wash your hands, measure the flour, and get everything ready. That's preprocessing—getting your data ready before you use it.

Simple Tip: Clean and prepare your data before you use it!

Preprocessing steps include handling missing values, scaling, and transforming data to improve model performance.

Raw Fill Scale Ready

Real-World Analogy

Preprocessing is like cleaning and prepping ingredients before cooking a meal.

Mathematical Intuition

Fill missing: data.fillna(method='ffill'), Scale: (x - μ) / σ

Practical Tip

Always preprocess your data before modeling for best results.

Example: Filling Missing Values

data['value'] = data['value'].fillna(method='ffill')
Further Reading: Data Preprocessing

White Noise

Story for Kids

Imagine you're listening to a radio that's not tuned to any station. All you hear is "shhhhhh"—that's white noise! It's just random sound, with no pattern at all.

Simple Tip: White noise is like a bag of totally random numbers—no way to guess what comes next!

White noise in time series is a sequence of random values with a constant mean and variance, and no autocorrelation. It is used as a baseline for randomness in data.

Random Walk

Story for Kids

Pretend you're flipping a coin to decide if you take a step forward or backward. Heads, you go forward; tails, you go back. You never know where you'll end up! That's a random walk.

Simple Tip: A random walk is when each step depends only on the last one, and it's totally random.

A random walk is a time series where each value is the previous value plus a random step. It is non-stationary and unpredictable in the long run.

Differencing

Story for Kids

If you want to know how much you grew each year, you subtract your height last year from your height this year. That's differencing—looking at the change, not the value.

Simple Tip: Differencing helps you see changes, not just levels, and can turn a trend into a flat line.

Differencing is a technique to make a time series stationary by subtracting the previous value from the current value. It removes trends and seasonality.

Forecast Horizon

Story for Kids

If you want to know what you'll get for lunch tomorrow, that's a short forecast. If you want to know what you'll get for lunch next year, that's a long forecast!

Simple Tip: Forecast horizon is how far into the future you're trying to predict.

The forecast horizon is the length of time into the future for which predictions are made. Short, medium, and long horizons require different modeling strategies.

Backtesting

Story for Kids

It's like practicing for a spelling bee by testing yourself on last year's words to see how well you would have done.

Simple Tip: Backtesting means checking your predictions on old data to see if your method works.

Backtesting is the process of testing a forecasting model on historical data to evaluate its performance before using it for future predictions.

Overfitting

Story for Kids

If you memorize every question on last year's test, you might do great on that test, but not so well on a new one. That's overfitting—learning too much from the past and not being ready for the future.

Simple Tip: Overfitting is when your model is too good at remembering the past, but not good at guessing the future.

Overfitting happens when a model learns the noise in the training data instead of the true pattern, resulting in poor predictions on new data.

Underfitting

Story for Kids

If you don't study at all, you won't do well on any test. That's underfitting—your model doesn't learn enough from the data.

Simple Tip: Underfitting is when your model is too simple and misses important patterns.

Underfitting occurs when a model is too simple to capture the underlying pattern in the data, resulting in poor performance on both training and new data.

Seasonal Decomposition

Story for Kids

It's like taking apart a toy to see all the pieces—trend, seasonality, and noise. You can see what each part does!

Simple Tip: Decomposition helps you see the different parts of your data: trend, seasonality, and noise.

Seasonal decomposition separates a time series into trend, seasonal, and residual (noise) components to better understand and model the data.

Smoothing

Story for Kids

If you draw a wiggly line and then try to draw a smoother line that follows the same path, you're smoothing out the bumps.

Simple Tip: Smoothing helps you see the big picture by ignoring little ups and downs.

Smoothing techniques, like moving averages, help reduce noise and reveal the underlying trend in a time series.

Exponential Smoothing

Story for Kids

Imagine you remember what happened yesterday, but you remember last week a little less. Exponential smoothing gives more importance to recent days.

Simple Tip: Exponential smoothing is a way to make predictions by focusing more on recent data.

Exponential smoothing assigns exponentially decreasing weights to older observations, making recent data more influential in forecasting.

Rolling Window

Story for Kids

It's like looking at your last 5 test scores to see how you're doing, instead of just the most recent one.

Simple Tip: A rolling window looks at a set number of past points to calculate averages or other stats.

A rolling window calculates statistics (like mean or sum) over a fixed number of past observations, moving forward one step at a time.

Drift

Story for Kids

If you're walking and you slowly start to go off the path, that's drift. In data, it means the average is slowly changing over time.

Simple Tip: Drift is a slow, steady change in your data's average.

Drift refers to a gradual change in the mean level of a time series, often due to external factors or long-term trends.

Structural Break

Story for Kids

If you're riding your bike and suddenly hit a bump, your path changes. In data, a structural break is a sudden change in the pattern.

Simple Tip: A structural break is when something big changes in your data, like a new rule or event.

A structural break is a sudden, significant change in the pattern or behavior of a time series, often caused by external events.

Cyclical Patterns

Story for Kids

Some things go up and down, but not on a regular schedule—like how you might get more homework some months than others.

Simple Tip: Cyclical patterns are like seasonality, but not on a fixed schedule.

Cyclical patterns are long-term fluctuations in a time series that are not of fixed period, often related to economic or business cycles.

Lead and Lag

Story for Kids

If your friend always tells a joke and you laugh a minute later, your laugh "lags" behind the joke. If you laugh before the joke, that's a "lead"!

Simple Tip: Lag is when something happens after, lead is when something happens before.

Lead and lag refer to the relationship between two time series, where one series may move before (lead) or after (lag) the other.

Seasonal Adjustment

Story for Kids

Imagine you want to know if you're really getting better at basketball, but you always play more in summer. To see your real progress, you have to "adjust" for the summer effect.

Simple Tip: Seasonal adjustment removes repeating patterns so you can see the real trend.

Seasonal adjustment is the process of removing seasonal effects from a time series to better analyze trends and cycles.

Cointegration

Story for Kids

If you and your best friend always walk together, even if you wander a little, you never get too far apart. That's cointegration—two things that move together over time.

Simple Tip: Cointegration means two series are linked, even if they wander.

Cointegration occurs when two or more non-stationary series move together in such a way that their combination is stationary.

Granger Causality

Story for Kids

If your dog barks before the mail comes, you might think the barking "causes" the mail to arrive! Granger causality checks if one thing helps predict another.

Simple Tip: If knowing the past of one thing helps predict another, that's Granger causality.

Granger causality tests whether past values of one time series help predict another series.

Impulse Response

Story for Kids

If you drop a pebble in a pond, you see ripples. An impulse response shows how a sudden change affects things over time.

Simple Tip: Impulse response is how a sudden event changes the future.

Impulse response measures the effect of a one-time shock to one variable on the future values of a time series system.

Forecast Interval / Prediction Interval

Story for Kids

If you guess you'll get 8 out of 10 on your test, but you're not sure, you might say, "I'll probably get between 7 and 9." That's a prediction interval.

Simple Tip: A forecast interval gives a range where the future value is likely to fall.

A forecast interval is a range of values within which a future observation is expected to fall, with a certain probability.

Autoregressive Model (AR)

Story for Kids

If you always get a score close to what you got last time, your next score depends on your last one. That's an autoregressive model.

Simple Tip: AR models use the past to predict the future.

An AR model predicts future values using a linear combination of past values.

Moving Average Model (MA)

Story for Kids

If you want to guess your next test score, you might average your last few scores. That's a moving average.

Simple Tip: MA models use past errors (mistakes) to predict the future.

An MA model predicts future values using past forecast errors.

ARMA, ARIMA, SARIMA

Story for Kids

Sometimes you need both your past scores and your past mistakes to guess your next score. That's ARMA. If you also need to "fix" a trend, that's ARIMA. If you need to handle seasons, that's SARIMA.

Simple Tip: ARMA, ARIMA, and SARIMA are models that use past values, errors, and adjustments for trends and seasons.

ARMA combines autoregressive and moving average models. ARIMA adds differencing for trends. SARIMA adds seasonal components.

State Space Models

Story for Kids

Imagine you're trying to guess where a hidden mouse is in a maze, using clues from the sounds it makes. State space models help you guess hidden things from what you can see.

Simple Tip: State space models track hidden states using what you observe.

State space models represent time series as a set of observed and hidden (latent) variables, often used in Kalman filters.

Kalman Filter

Story for Kids

If you're following a friend in a crowd and sometimes lose sight of them, you use clues to guess where they are. The Kalman filter helps you keep track, even with noisy clues.

Simple Tip: Kalman filters help you estimate things you can't see directly, using noisy data.

A Kalman filter is an algorithm that estimates the state of a dynamic system from a series of incomplete and noisy measurements.

Fourier Analysis

Story for Kids

If you hear a song, you can break it into notes. Fourier analysis breaks a time series into cycles, like finding the "notes" in your data.

Simple Tip: Fourier analysis finds cycles and patterns in your data.

Fourier analysis decomposes a time series into sinusoidal components of different frequencies.

Spectral Density

Story for Kids

It's like seeing which notes are loudest in a song. Spectral density shows which cycles are strongest in your data.

Simple Tip: Spectral density tells you which cycles are most important.

Spectral density measures the strength of different frequency components in a time series.

Persistence

Story for Kids

If your scores jump up and down a lot, that's high volatility. If they're always about the same, that's low volatility.

Simple Tip: Volatility is how much your data jumps around.

Volatility is the degree of variation in a time series, often measured by the standard deviation.

Heteroskedasticity

Story for Kids

If your test scores are sometimes close together and sometimes spread out, that's heteroskedasticity.

Simple Tip: Heteroskedasticity means the "spread" of your data changes over time.

Heteroskedasticity is when the variance of a time series changes over time.

Unit Root

Story for Kids

If you keep walking and never come back to where you started, you have a unit root! It means your path can wander forever.

Simple Tip: A unit root means your data can drift away and not return to a fixed level.

A unit root in a time series means the series is non-stationary and can wander without bound. Testing for unit roots helps decide if differencing is needed.

Spurious Regression

Story for Kids

If you notice that ice cream sales and sunburns both go up in summer, you might think eating ice cream causes sunburn! But really, it's just the season. That's a spurious relationship.

Simple Tip: Spurious regression is when two things look related, but really aren't.

Spurious regression happens when two unrelated non-stationary series appear to be related due to trends or other factors.

GARCH

Story for Kids

If some days are very noisy and some days are quiet, GARCH helps you predict when the noise will get bigger or smaller.

Simple Tip: GARCH models changing ups and downs (volatility) over time.

GARCH (Generalized Autoregressive Conditional Heteroskedasticity) models time-varying volatility in time series, often used in finance.

ARCH

Story for Kids

If you notice that big jumps in your scores are followed by more big jumps, that's ARCH.

Simple Tip: ARCH models how big changes can cluster together.

ARCH (Autoregressive Conditional Heteroskedasticity) models periods of high and low volatility in time series data.

Multicollinearity

Story for Kids

If you ask your two best friends for advice and they always say the same thing, it's hard to know who's really helping. That's multicollinearity.

Simple Tip: Multicollinearity is when predictors are too similar to each other.

Multicollinearity occurs when two or more predictors in a model are highly correlated, making it hard to separate their effects.

Cross-Correlation

Story for Kids

If you and your friend both clap at the same time, or one after the other, cross-correlation measures how your claps match up.

Simple Tip: Cross-correlation shows how two series move together, even with a delay.

Cross-correlation measures the similarity between two time series as one is shifted in time relative to the other.

Transfer Function Models

Story for Kids

If you turn a faucet, the water doesn't come out instantly—it takes a moment. Transfer function models show how one thing affects another over time.

Simple Tip: Transfer function models show how changes in one series affect another, possibly with a delay.

Transfer function models describe the relationship between an input and output time series, including delays and dynamic effects.

Intervention Analysis

Story for Kids

If your school changes the lunch menu, you might see a sudden change in how many kids buy lunch. That's an intervention!

Simple Tip: Intervention analysis studies the effect of big events or changes on your data.

Intervention analysis examines how a specific event or change affects a time series.

Regime Switching

Story for Kids

If you're sometimes happy and sometimes sad, and you switch between these moods, that's like regime switching in data.

Simple Tip: Regime switching means your data can change between different patterns or rules.

Regime switching models allow a time series to switch between different states or behaviors, such as high and low volatility.

Nonlinear Time Series

Story for Kids

If your path home from school has lots of twists and turns, it's not a straight line. That's nonlinear!

Simple Tip: Nonlinear time series have patterns that aren't straight or simple.

Nonlinear time series models capture relationships that can't be described by straight lines or simple equations.

Long Memory Processes

Story for Kids

If you remember things that happened a long time ago, not just yesterday, your memory is long. Some data "remembers" the distant past too!

Simple Tip: Long memory means the past keeps affecting the present for a long time.

Long memory processes have correlations that decay slowly, so distant past values still influence the present.

Panel Data

Story for Kids

If you keep a diary for yourself and all your friends, you have panel data—lots of time series for different people!

Simple Tip: Panel data is many time series for different things or people.

Panel data combines time series for multiple subjects, like people, companies, or countries.

Nowcasting

Story for Kids

If you want to know what's happening right now, not just in the future, you're nowcasting!

Simple Tip: Nowcasting is making predictions about the present, not just the future.

Nowcasting uses the latest data to estimate what's happening right now, often before official numbers are available.

Forecast Combination

Story for Kids

If you ask all your friends to guess how many candies are in a jar, and then average their guesses, you're combining forecasts!

Simple Tip: Forecast combination means using several predictions together for a better answer.

Forecast combination takes predictions from multiple models and combines them, often improving accuracy.

Benchmarking

Story for Kids

If you want to know if you're running fast, you compare your time to your best friend's. That's benchmarking!

Simple Tip: Benchmarking means comparing your results to a standard or another method.

Benchmarking compares the performance of models or forecasts to a standard or baseline method.