Stats vs ML

Andrey Chirikhin

Quantitative Analyst

Published Feb 27, 2020

This is, sort of, reply to this discussion.

Here is the way I look at ML vs stats.

I guess the rule of thumb answer to the title question is whether the method you are using also provides confidence intervals for the model parameters and the predicted values. If yes - the method is statistical. Otherwise, it is something else, broadly ML, perhaps.

We start with a sequence of “bars” as Marcos Lopez de Prado likes to call them; essentially a vector of the exact same structure. In natural science we always start with a sequence as measurement time introduces natural ordering.

First assume the bars are not stochastic, as per the problem domain. Then there are two options: does order matter or not? If not, then we have a “function estimation” or “regression” problem, in the broad sense. Most ML, including its linear extreme falls into here, as well as Hilbert space methods and alike. If order matters then we have a dynamic system. Another half of ML fits here, where dynamics is essential, e.g. RNN or even RL, but also (nonlinear) dynamic system theory.

Now assume bars are stochastic. Again order may or may not matter. If not then we have a “sample”. By definition. This, again, is a modelling assumption; If we observed some serial dependency then it would be irrelevant. This is domain of parametric statistical inference, or direct non-parametric methods like “historical MC” with or without bootstrap.

Perhaps, this is what colleagues have called “classical statistics”. You have everything here, from ANOVA to MLE, factor models, PCA, copulas etc. The key feature is that we work with the sample and all observations are *assumed* i.i.d. along the time coordinate, while dependency among the bar components are allowed.

The i.i.d *assumption* for a typical measurement data is unrealistic (“the single path” kind of thing; we have initially no basis to suggest that any serial dependence observed is not random). So we fall into the domain of time series analysis.

The point of time series analysis may be defined as separating some dynamic but non-stochastic structure of the model from i.i.d. innovations and extracting both somehow. So now we get all interesting stuff: non-stationarity, linear vs non-linear kernels, endogenous vs exogenous variables. But if the model is “noise-separable”, i.e.

x(t) = F(path(x(t-)), path(noise(t-)) = R(path(x(t-)) + S(path(x(t-)), path(noise(t-))

where noise is a (vector) process with independent increments, and infinitely divisible in the continuous time, then we hit the spot.

The “R” is, so to speak “ML” component. That allows you to formulate conditional predictions. “S” is “statistical” component. Among other things, it allows computing confidence intervals of x(t).

This formulation, of course, contains AR, ARMA, VARMA, VARX linear models. It contains all flavours of “GARCH” having x(t-) in “S”. If instead of (vector version)

X(t) = A + B*X(t-1)) + C*X(t-2) …

you have

X(t) = A + F(X(t-1), X(t-2)) + …

i.e. nonliar, on NVARxxx. All this stuff has bene around for years, only the liberty to make F a “neural network”, an ultimate non-parametric method, became possible only recently.

I guess the conclusion is that ML methods should probably be considered tools useable to solve statistical problems, in the same sense as totally non-statistical “least squared” method shows up in max-likelihood estimation of the linear statistical model with Gaussian noise.

On a practical side, the separable structure has an economic interpretation. “S” is generalization of “efficient market”, where the best thing you can do is to diversity. Such interpretation is obvious if S is linear:

x(t) = A + S*noise(t),

“is” efficient market, for return x(t).

“R” part is potential source of “alpha”, not possible in the efficient market. This is what trading strategies try to extract, an ability to predict next point from the history.

The source of confusion of stats and ML mostly in “equity” crowd is because *classically* equity models operated in the efficient market setting. Markowitz, CAL, CAMP, MPT, BL etc are pretty much about relationship between A and S, or about the structure of the noise components. But as long as you don’t have the “R” part, it will be at most about building portfolio, optimal in some sense. Only if you have “R” part ML truly become relevant, even in its linear form, and you may be talking not about portfolio diversification, but individual trading.

Sheldon Kock

Equity Sales Trader at Bank of America Merrill Lynch

James Anthonyrajah

1 Reaction

Alec Schmidt

Adjunct Professor of Financial Engineering, NYU Tandon School

The problem with ML is that it always offers a solution, even when it's applied to non-stationary time series. Well, one cannot forecast random walk. I ask sometimes ML enthusiasts if they check their samples for unit roots - and their answer is 'we made a lot of money with our models'... Sure, in a bull market, one can make money by eyeballing the charts... :)

6 Reactions

Christopher Gifford, PhD

Quantitative Analyst in transition

If you could provide a quant model of noise in the markets then that would be impressive.

LinkedIn respects your privacy

Stats vs ML

Andrey Chirikhin

Quantitative Analyst

More articles by this author

Others also viewed

How to Build a Physically Sensible Model in a World of Biased Data: Double Machine Learning

Demystifying Core Concepts in Machine Learning & Statistics

Simulating the Physical World: Stochastic 'Model-driven' Digital Twins.

Kernel method in stock prices anomaly detection

K-Sleight of Hand: When Distributions Deceive

Data Optimizations Techniques in the Machine Learning

📨 Byte Sized ML#5: Overfitting, Underfitting, and the Bias-Variance Tradeoff (Explained Like You're 5)

What is hiding behind the term “mean Average Precision”?

Why it's so difficult to learn new things en|pt

What Are We Measuring When We Evaluate Large Vision-Language Models?

Explore content categories

NN-VAR-AEN

Oct 21, 2023

Correlation, causation and vector autoregressions

May 9, 2023

Balance vs entanglement (in life)

Jan 26, 2022

How not to tell truth with statistics

Jan 29, 2021

On pricing of death derivatives

Jan 27, 2021

BB Cs

Dec 15, 2020

Tensor Reloaded

Dec 8, 2020

The Vaccination Game

Nov 28, 2020

From CVA to EPE

Jul 20, 2020

The last temptation of Sarumans

May 25, 2020