Time Series Concepts
Featured Dataset: Pollution-Data-17-indian-cites
Types of Time Series Data
Story for Kids
Imagine you're keeping a diary:
- Univariate: Every day, you write down the temperature. That's one thing, over time.
- Multivariate: Every day, you write down the temperature, how much you played outside, and what you ate for lunch. That's many things, over time.
- Categorical: Some days you write "holiday" or "school day" in your diary. These are special labels, not numbers.
- Numerical: Numbers you write, like your test scores or how many candies you ate.
- Exogenous Variables: Maybe you write about the weather, and you notice you play outside more when it's sunny. The weather is an "extra" thing that affects your playtime.
- Missing Data: Oops! You forgot to write in your diary one day. That's a missing entry.
- Outliers: One day, you ate 100 candies at a party! That's way more than usual—a special, unusual day.
Example Time Series Datasets
Dataset | Type | Description | Link |
---|---|---|---|
Air Passengers | Univariate | Monthly totals of international airline passengers (1949-1960). | Download |
Sunspots | Univariate | Monthly mean sunspot numbers (1749-present). | Download |
Electricity Load Diagrams | Multivariate | 15-min electricity consumption of 370 customers (2011-2014). | UCI |
Beijing PM2.5 | Multivariate | Hourly PM2.5 data with weather covariates (2010-2014). | UCI |
Rossmann Store Sales | Multivariate, Categorical | Daily sales data for 1,115 stores with promotions, holidays, etc. | Kaggle |
M4 Competition | Univariate, Various | 100,000+ time series from finance, economics, demographics, etc. | Official |
Exchange Rate | Univariate | Daily exchange rates for major currencies (1990-2016). | Download |
Household Power Consumption | Multivariate | Minute-averaged measurements of electric power usage (2006-2010). | UCI |
COVID-19 Global Cases | Multivariate | Daily confirmed, deaths, and recovered cases by country. | GitHub |
Retail Sales | Univariate | Monthly US retail sales (1992-present). | FRED |
See also: Awesome Public Datasets: Time Series
Trend
Story for Kids
Imagine you're climbing a hill. Some days you go up, some days you go down a little, but overall, you're getting higher and higher. That's a trend—like your height as you grow up, it usually goes up over time!
A trend is a long-term increase or decrease in the data. Trends can be linear (straight line), exponential (growing faster over time), or more complex. Recognizing a trend helps you understand the underlying direction of your data.
Real-World Analogy
Think of a trend as the overall direction of a river: even if the water ripples up and down, the river flows downhill (or uphill, in data!).
Mathematical Intuition
A linear trend can be modeled as: y = a + bt, where b is the slope.
Practical Tip
Always check for trends before modeling. If present, consider detrending your data for models that require stationarity.
Example: Rolling Mean
data['rolling_mean'] = data['value'].rolling(window=12).mean() data[['value', 'rolling_mean']].plot()Further Reading: Trend
Seasonality
Story for Kids
Think about ice cream sales. In summer, everyone wants ice cream, but in winter, not so much. This happens every year, like a birthday or a holiday. That's seasonality—a pattern that repeats over and over.
Seasonality refers to regular, periodic fluctuations in a time series, such as higher ice cream sales in summer or increased electricity usage in winter.
Real-World Analogy
Like the changing seasons, some data rises and falls in a predictable pattern every year, month, or week.
Mathematical Intuition
Seasonality can be modeled as: y = a + bt + S_t, where S_t is the seasonal component.
Practical Tip
Use seasonal decomposition to separate and analyze seasonal effects before forecasting.
Example: Seasonal Decomposition
from statsmodels.tsa.seasonal import seasonal_decompose result = seasonal_decompose(data['value'], model='additive', period=12) result.plot()Further Reading: Seasonality
Stationarity
Story for Kids
Imagine a bouncing ball that always bounces to the same height, no matter when you throw it. The ball's bounces don't get bigger or smaller over time. That's stationarity—things stay the same, on average.
A stationary time series has statistical properties (mean, variance, autocorrelation) that do not change over time. Many forecasting models assume stationarity.
Real-World Analogy
Imagine a heart rate monitor: the line bounces up and down, but the average stays the same over time.
Mathematical Intuition
A stationary process has constant mean and variance: E[y_t] = μ, Var[y_t] = σ².
Practical Tip
If your data is not stationary, try differencing or detrending before modeling with ARIMA-type models.
Example: Augmented Dickey-Fuller Test
from statsmodels.tsa.stattools import adfuller result = adfuller(data['value']) print('ADF Statistic:', result[0]) print('p-value:', result[1])Further Reading: Stationarity
Noise
Story for Kids
You're listening to your favorite song, but there's static on the radio. The static is noise—it's not part of the song, and it makes it harder to hear the music.
Noise is the random variation in a time series that cannot be explained by trend or seasonality. Reducing noise can help improve model accuracy.
Real-World Analogy
Noise is like static on a radio: unpredictable and not part of the underlying signal.
Mathematical Intuition
Noise is often modeled as a random variable with mean zero: ε_t ~ N(0, σ²).
Practical Tip
Smoothing techniques (like moving averages) can help reduce noise before modeling.
Example: Smoothing
data['smoothed'] = data['value'].rolling(window=3).mean()Further Reading: Noise
Autocorrelation & Partial Autocorrelation
Story for Kids
If you did well on your spelling test last week, you might do well this week too, because you studied hard. That's autocorrelation—when what happened before helps predict what happens next.
Autocorrelation measures the correlation of a time series with its own past values. Partial autocorrelation measures the correlation at a specific lag, controlling for shorter lags.
Real-World Analogy
Autocorrelation is like an echo: how much does the past "echo" into the present?
Mathematical Intuition
The autocorrelation at lag k: Corr(y_t, y_{t-k}).
Practical Tip
Use ACF and PACF plots to identify AR and MA terms for ARIMA models.
Example: Plotting ACF and PACF
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf plot_acf(data['value']) plot_pacf(data['value'])Further Reading: Autocorrelation
Lag, Windowing, and Feature Engineering
Story for Kids
If you want to guess how many candies you'll eat today, you might look at how many you ate yesterday, or the day before. That's using "lag." If you look at the average over the last 3 days, that's a "window."
Lag features use previous time steps as input for forecasting. Windowing aggregates values over a window. Feature engineering creates new variables to improve model performance.
Real-World Analogy
Lag is like remembering what happened yesterday to predict today.
Mathematical Intuition
Lag features: y_{t-1}, y_{t-2}, ...
Practical Tip
Create lag and rolling window features to help models learn temporal dependencies.
Example: Creating Lag Features
data['lag_1'] = data['value'].shift(1) data['rolling_mean_3'] = data['value'].rolling(window=3).mean()Further Reading: Feature Engineering
Evaluation Metrics
Story for Kids
Imagine you guess how many candies you'll eat each day, and then check how close your guess was. If you're usually close, you're good at guessing! That's what metrics do—they tell you how good your guesses (forecasts) are.
Common metrics for time series forecasting include MAE (Mean Absolute Error), RMSE (Root Mean Squared Error), and MAPE (Mean Absolute Percentage Error).
Real-World Analogy
Metrics are like a report card for your model: they tell you how close your predictions are to reality.
Mathematical Intuition
MAE: mean(|y - ŷ|), RMSE: sqrt(mean((y - ŷ)²))
Practical Tip
Compare multiple metrics to get a full picture of model performance.
Example: Calculating MAE and RMSE
from sklearn.metrics import mean_absolute_error, mean_squared_error mae = mean_absolute_error(y_true, y_pred) rmse = mean_squared_error(y_true, y_pred, squared=False)Further Reading: Forecast Accuracy
Train/Test Splitting for Time Series
Story for Kids
You practice math problems all week (training), then take a test on Friday (testing). You use what you learned to do well on the test.
Unlike random splits in standard ML, time series splits must respect temporal order. Use the earliest data for training and the most recent for testing.
Real-World Analogy
Like learning from the past to predict the future: you train on history, then test on what comes next.
Mathematical Intuition
Split data chronologically: train = data[:-n], test = data[-n:]
Practical Tip
Never use future data to predict the past! Always split by time, not randomly.
Example: Time-based Split
train = data.iloc[:-12] test = data.iloc[-12:]Further Reading: Data Partitioning
Data Preprocessing
Story for Kids
Before you bake a cake, you wash your hands, measure the flour, and get everything ready. That's preprocessing—getting your data ready before you use it.
Preprocessing steps include handling missing values, scaling, and transforming data to improve model performance.
Real-World Analogy
Preprocessing is like cleaning and prepping ingredients before cooking a meal.
Mathematical Intuition
Fill missing: data.fillna(method='ffill'), Scale: (x - μ) / σ
Practical Tip
Always preprocess your data before modeling for best results.
Example: Filling Missing Values
data['value'] = data['value'].fillna(method='ffill')Further Reading: Data Preprocessing
White Noise
Story for Kids
Imagine you're listening to a radio that's not tuned to any station. All you hear is "shhhhhh"—that's white noise! It's just random sound, with no pattern at all.
White noise in time series is a sequence of random values with a constant mean and variance, and no autocorrelation. It is used as a baseline for randomness in data.
Random Walk
Story for Kids
Pretend you're flipping a coin to decide if you take a step forward or backward. Heads, you go forward; tails, you go back. You never know where you'll end up! That's a random walk.
A random walk is a time series where each value is the previous value plus a random step. It is non-stationary and unpredictable in the long run.
Differencing
Story for Kids
If you want to know how much you grew each year, you subtract your height last year from your height this year. That's differencing—looking at the change, not the value.
Differencing is a technique to make a time series stationary by subtracting the previous value from the current value. It removes trends and seasonality.
Forecast Horizon
Story for Kids
If you want to know what you'll get for lunch tomorrow, that's a short forecast. If you want to know what you'll get for lunch next year, that's a long forecast!
The forecast horizon is the length of time into the future for which predictions are made. Short, medium, and long horizons require different modeling strategies.
Backtesting
Story for Kids
It's like practicing for a spelling bee by testing yourself on last year's words to see how well you would have done.
Backtesting is the process of testing a forecasting model on historical data to evaluate its performance before using it for future predictions.
Overfitting
Story for Kids
If you memorize every question on last year's test, you might do great on that test, but not so well on a new one. That's overfitting—learning too much from the past and not being ready for the future.
Overfitting happens when a model learns the noise in the training data instead of the true pattern, resulting in poor predictions on new data.
Underfitting
Story for Kids
If you don't study at all, you won't do well on any test. That's underfitting—your model doesn't learn enough from the data.
Underfitting occurs when a model is too simple to capture the underlying pattern in the data, resulting in poor performance on both training and new data.
Seasonal Decomposition
Story for Kids
It's like taking apart a toy to see all the pieces—trend, seasonality, and noise. You can see what each part does!
Seasonal decomposition separates a time series into trend, seasonal, and residual (noise) components to better understand and model the data.
Smoothing
Story for Kids
If you draw a wiggly line and then try to draw a smoother line that follows the same path, you're smoothing out the bumps.
Smoothing techniques, like moving averages, help reduce noise and reveal the underlying trend in a time series.
Exponential Smoothing
Story for Kids
Imagine you remember what happened yesterday, but you remember last week a little less. Exponential smoothing gives more importance to recent days.
Exponential smoothing assigns exponentially decreasing weights to older observations, making recent data more influential in forecasting.
Rolling Window
Story for Kids
It's like looking at your last 5 test scores to see how you're doing, instead of just the most recent one.
A rolling window calculates statistics (like mean or sum) over a fixed number of past observations, moving forward one step at a time.
Drift
Story for Kids
If you're walking and you slowly start to go off the path, that's drift. In data, it means the average is slowly changing over time.
Drift refers to a gradual change in the mean level of a time series, often due to external factors or long-term trends.
Structural Break
Story for Kids
If you're riding your bike and suddenly hit a bump, your path changes. In data, a structural break is a sudden change in the pattern.
A structural break is a sudden, significant change in the pattern or behavior of a time series, often caused by external events.
Cyclical Patterns
Story for Kids
Some things go up and down, but not on a regular schedule—like how you might get more homework some months than others.
Cyclical patterns are long-term fluctuations in a time series that are not of fixed period, often related to economic or business cycles.
Lead and Lag
Story for Kids
If your friend always tells a joke and you laugh a minute later, your laugh "lags" behind the joke. If you laugh before the joke, that's a "lead"!
Lead and lag refer to the relationship between two time series, where one series may move before (lead) or after (lag) the other.
Seasonal Adjustment
Story for Kids
Imagine you want to know if you're really getting better at basketball, but you always play more in summer. To see your real progress, you have to "adjust" for the summer effect.
Seasonal adjustment is the process of removing seasonal effects from a time series to better analyze trends and cycles.
Cointegration
Story for Kids
If you and your best friend always walk together, even if you wander a little, you never get too far apart. That's cointegration—two things that move together over time.
Cointegration occurs when two or more non-stationary series move together in such a way that their combination is stationary.
Granger Causality
Story for Kids
If your dog barks before the mail comes, you might think the barking "causes" the mail to arrive! Granger causality checks if one thing helps predict another.
Granger causality tests whether past values of one time series help predict another series.
Impulse Response
Story for Kids
If you drop a pebble in a pond, you see ripples. An impulse response shows how a sudden change affects things over time.
Impulse response measures the effect of a one-time shock to one variable on the future values of a time series system.
Forecast Interval / Prediction Interval
Story for Kids
If you guess you'll get 8 out of 10 on your test, but you're not sure, you might say, "I'll probably get between 7 and 9." That's a prediction interval.
A forecast interval is a range of values within which a future observation is expected to fall, with a certain probability.
Autoregressive Model (AR)
Story for Kids
If you always get a score close to what you got last time, your next score depends on your last one. That's an autoregressive model.
An AR model predicts future values using a linear combination of past values.
Moving Average Model (MA)
Story for Kids
If you want to guess your next test score, you might average your last few scores. That's a moving average.
An MA model predicts future values using past forecast errors.
ARMA, ARIMA, SARIMA
Story for Kids
Sometimes you need both your past scores and your past mistakes to guess your next score. That's ARMA. If you also need to "fix" a trend, that's ARIMA. If you need to handle seasons, that's SARIMA.
ARMA combines autoregressive and moving average models. ARIMA adds differencing for trends. SARIMA adds seasonal components.
State Space Models
Story for Kids
Imagine you're trying to guess where a hidden mouse is in a maze, using clues from the sounds it makes. State space models help you guess hidden things from what you can see.
State space models represent time series as a set of observed and hidden (latent) variables, often used in Kalman filters.
Kalman Filter
Story for Kids
If you're following a friend in a crowd and sometimes lose sight of them, you use clues to guess where they are. The Kalman filter helps you keep track, even with noisy clues.
A Kalman filter is an algorithm that estimates the state of a dynamic system from a series of incomplete and noisy measurements.
Fourier Analysis
Story for Kids
If you hear a song, you can break it into notes. Fourier analysis breaks a time series into cycles, like finding the "notes" in your data.
Fourier analysis decomposes a time series into sinusoidal components of different frequencies.
Spectral Density
Story for Kids
It's like seeing which notes are loudest in a song. Spectral density shows which cycles are strongest in your data.
Spectral density measures the strength of different frequency components in a time series.
Persistence
Story for Kids
If your scores jump up and down a lot, that's high volatility. If they're always about the same, that's low volatility.
Volatility is the degree of variation in a time series, often measured by the standard deviation.
Heteroskedasticity
Story for Kids
If your test scores are sometimes close together and sometimes spread out, that's heteroskedasticity.
Heteroskedasticity is when the variance of a time series changes over time.
Unit Root
Story for Kids
If you keep walking and never come back to where you started, you have a unit root! It means your path can wander forever.
A unit root in a time series means the series is non-stationary and can wander without bound. Testing for unit roots helps decide if differencing is needed.
Spurious Regression
Story for Kids
If you notice that ice cream sales and sunburns both go up in summer, you might think eating ice cream causes sunburn! But really, it's just the season. That's a spurious relationship.
Spurious regression happens when two unrelated non-stationary series appear to be related due to trends or other factors.
GARCH
Story for Kids
If some days are very noisy and some days are quiet, GARCH helps you predict when the noise will get bigger or smaller.
GARCH (Generalized Autoregressive Conditional Heteroskedasticity) models time-varying volatility in time series, often used in finance.
ARCH
Story for Kids
If you notice that big jumps in your scores are followed by more big jumps, that's ARCH.
ARCH (Autoregressive Conditional Heteroskedasticity) models periods of high and low volatility in time series data.
Multicollinearity
Story for Kids
If you ask your two best friends for advice and they always say the same thing, it's hard to know who's really helping. That's multicollinearity.
Multicollinearity occurs when two or more predictors in a model are highly correlated, making it hard to separate their effects.
Cross-Correlation
Story for Kids
If you and your friend both clap at the same time, or one after the other, cross-correlation measures how your claps match up.
Cross-correlation measures the similarity between two time series as one is shifted in time relative to the other.
Transfer Function Models
Story for Kids
If you turn a faucet, the water doesn't come out instantly—it takes a moment. Transfer function models show how one thing affects another over time.
Transfer function models describe the relationship between an input and output time series, including delays and dynamic effects.
Intervention Analysis
Story for Kids
If your school changes the lunch menu, you might see a sudden change in how many kids buy lunch. That's an intervention!
Intervention analysis examines how a specific event or change affects a time series.
Regime Switching
Story for Kids
If you're sometimes happy and sometimes sad, and you switch between these moods, that's like regime switching in data.
Regime switching models allow a time series to switch between different states or behaviors, such as high and low volatility.
Nonlinear Time Series
Story for Kids
If your path home from school has lots of twists and turns, it's not a straight line. That's nonlinear!
Nonlinear time series models capture relationships that can't be described by straight lines or simple equations.
Long Memory Processes
Story for Kids
If you remember things that happened a long time ago, not just yesterday, your memory is long. Some data "remembers" the distant past too!
Long memory processes have correlations that decay slowly, so distant past values still influence the present.
Panel Data
Story for Kids
If you keep a diary for yourself and all your friends, you have panel data—lots of time series for different people!
Panel data combines time series for multiple subjects, like people, companies, or countries.
Nowcasting
Story for Kids
If you want to know what's happening right now, not just in the future, you're nowcasting!
Nowcasting uses the latest data to estimate what's happening right now, often before official numbers are available.
Forecast Combination
Story for Kids
If you ask all your friends to guess how many candies are in a jar, and then average their guesses, you're combining forecasts!
Forecast combination takes predictions from multiple models and combines them, often improving accuracy.
Benchmarking
Story for Kids
If you want to know if you're running fast, you compare your time to your best friend's. That's benchmarking!
Benchmarking compares the performance of models or forecasts to a standard or baseline method.