📰 📈 TIP 1: Introduction to Time Series Analysis with statsmodels
📰 Edição #46 — TIP 1: Introduction to Time Series Analysis with statsmodels
✨ Introduction
Time series analysis is essential to uncover temporal patterns and make data-driven forecasts. The statsmodels library in Python allows you to quickly model trends, seasonality, and generate future scenarios with ease and power.
🎯 Objective
Show how to prepare a time series in Python, fit a basic ARIMA model, and interpret its output.
📚 Theoretical Concept
A time series is an ordered sequence of observations over time. ARIMA models combine three parts:
💡 Python Example with statsmodels
#!/usr/bin/env python3
"""
TIP 1 – Introduction to Time Series Analysis with statsmodels
"""
import argparse
import pandas as pd
import numpy as np
import statsmodels.api as sm
import matplotlib.pyplot as plt
def load_data(path: str = None) -> pd.Series:
"""
Load time series from a CSV file or generate synthetic data if no path is provided.
"""
if path:
df = pd.read_csv(path, parse_dates=True, index_col=0)
# expects a single column of values
ts = df.iloc[:, 0].asfreq('ME').fillna(method='ffill')
print(f"Loaded {len(ts)} records from {path}.")
else:
# generate 3 years of monthly synthetic data (using 'ME' no lugar de 'M')
dates = pd.date_range(start='2020-01-01', periods=36, freq='ME')
np.random.seed(42)
data = np.random.normal(loc=100, scale=10, size=len(dates)).cumsum()
ts = pd.Series(data, index=dates, name='value')
print("Generated synthetic time series for demonstration (36 monthly points).")
return ts
def fit_arima(ts: pd.Series, order=(1, 1, 1)):
"""
Fit an ARIMA model to the time series and return the results.
"""
model = sm.tsa.ARIMA(ts, order=order)
results = model.fit()
return results
def plot_diagnostics(results, output_file: str = "diagnostics.png"):
"""
Generate and save diagnostic plots for the fitted model.
"""
fig = results.plot_diagnostics(figsize=(10, 8)) # This is where statsmodels builds the four diagnostic plots.
fig.tight_layout()
fig.suptitle("ARIMA Fit Diagnostic Plots", y=1.02)
plt.savefig(output_file)
print(f"Diagnostic plots saved to {output_file}.")
def main():
parser = argparse.ArgumentParser(
description="TIP 1: Basic ARIMA with statsmodels"
)
parser.add_argument(
"-i", "--input",
dest="csv_path",
help="Path to CSV file containing time series (single column)."
)
parser.add_argument(
"-p", "--order",
nargs=3,
type=int,
default=[1, 1, 1],
metavar=('P', 'D', 'Q'),
help="ARIMA order: p d q (default: 1 1 1)."
)
parser.add_argument(
"-o", "--output",
dest="output_file",
default="diagnostics.png",
help="Filename for saving diagnostic plots."
)
args = parser.parse_args()
ts = load_data(args.csv_path)
print("\n--- Series Summary ---")
print(ts.describe())
print(f"\nFitting ARIMA{tuple(args.order)} model...")
results = fit_arima(ts, order=tuple(args.order))
print("\n--- Model Summary ---")
print(results.summary())
plot_diagnostics(results, output_file=args.output_file)
print("\nDone! Use python tip1_timeseries.py -h for help.")
if name == "__main__":
main()
🖥️ How to Use in VSCode
1. Install dependencies
pip install pandas numpy statsmodels matplotlib
2. Run the script
2.1 To use the synthetic series:
python dica1_timeseries.py
2.2 To load your own CSV and set ARIMA order:
python dica1_timeseries.py --input my_data.csv --order 2 1 2 --output diag.png
3. View output and plots
🖼️ Generated Images
🔍 Results Interpretation
🛠️ Practical Applications
Monthly sales forecasting, anomaly detection in financial data, demand planning.
💬 Extra Tip
Run results.plot_diagnostics() to visually assess model fit and residual autocorrelation.
📅 Next Steps
Explore SARIMA for seasonal behavior or include exogenous variables with SARIMAX.
📣 Conclusion & CTA
You’ve just taken your first step into time series with statsmodels! Comment below what series you’d like to model next and share with your network.
💼 LinkedIn & Newsletters:
💼 Company Page:
💻 GitHub:
🏷️ Hashtags
#TimeSeries #Python #Statsmodels #DataScience #Forecasting
ANNEXES:
Model Summary Interpretation
2. MA coefficient (ma.L1 ≈ -0.8760)
3. σ² (sigma2 ≈ 93.6)
4. Information Criteria (AIC = 270.285, BIC = 274.951)
5. Ljung‐Box, Jarque‐Bera, Heteroskedasticity tests
All three of these diagnostic tests having high p-values is a good sign: it means your residuals, at least by these tests, appear to be behaving like well‐behaved white noise (zero autocorrelation, approximately normal, homoscedastic).
2. Diagnostic Plot Overview
Your four diagnostic panels (saved in diagnostics.png) should look roughly like this:
Because your Ljung‐Box test was not significant and the Q-Q/Jarque-Bera also did not reject normality, you should see:
3. Next Steps & Recommendations
A) Check Stationarity in a Different Way
from statsmodels.tsa.stattools import adfuller, kpss
raw_adf = adfuller(ts, autolag='AIC')
diff_adf = adfuller(ts.diff().dropna(), autolag='AIC')
raw_kpss = kpss(ts, regression='c', nlags='auto')
diff_kpss = kpss(ts.diff().dropna(), regression='c', nlags='auto')
print("Raw ADF p-value: ", raw_adf[1])
print("Diff ADF p-value: ", diff_adf[1])
print("Raw KPSS p-value: ", raw_kpss[1])
print("Diff KPSS p-value:", diff_kpss[1])
B) Compare Alternative ARIMA Orders
for order in [(1,1,0), (0,1,1), (2,1,1)]:
m = sm.tsa.ARIMA(ts, order=order).fit()
print(f"Order={order} → AIC={m.aic:.3f}, BIC={m.bic:.3f}")
C) Try a SARIMA Model if Seasonality Exists
seasonal_order=(1, 1, 1, 12) # (P, D, Q, s) for monthly seasonality
sarima_mod = sm.tsa.statespace.SARIMAX(ts, order=(1,1,1), seasonal_order=seasonal_order).fit()
print(sarima_mod.summary())
sarima_mod.plot_diagnostics(figsize=(10,8))
D) Forecasting & Confidence Intervals
forecast_steps = 12 # next 12 months, for example
fc = results.get_forecast(steps=forecast_steps)
mean_forecast = fc.predicted_mean
conf_int = fc.conf_int(alpha=0.05)
# Plot:
ax = ts.plot(label='Observed', figsize=(10,6))
mean_forecast.plot(ax=ax, label='Forecast', color='red')
ax.fill_between(conf_int.index,
conf_int.iloc[:, 0],
conf_int.iloc[:, 1],
color='pink', alpha=0.3)
ax.legend()
plt.show()
E) Validate on Real Data
If you switch from synthetic data to a real CSV of actual time‐series observations (e.g., monthly sales), be sure to check: