Stats vs ML
This is, sort of, reply to this discussion.
Here is the way I look at ML vs stats.
I guess the rule of thumb answer to the title question is whether the method you are using also provides confidence intervals for the model parameters and the predicted values. If yes - the method is statistical. Otherwise, it is something else, broadly ML, perhaps.
We start with a sequence of “bars” as Marcos Lopez de Prado likes to call them; essentially a vector of the exact same structure. In natural science we always start with a sequence as measurement time introduces natural ordering.
First assume the bars are not stochastic, as per the problem domain. Then there are two options: does order matter or not? If not, then we have a “function estimation” or “regression” problem, in the broad sense. Most ML, including its linear extreme falls into here, as well as Hilbert space methods and alike. If order matters then we have a dynamic system. Another half of ML fits here, where dynamics is essential, e.g. RNN or even RL, but also (nonlinear) dynamic system theory.
Now assume bars are stochastic. Again order may or may not matter. If not then we have a “sample”. By definition. This, again, is a modelling assumption; If we observed some serial dependency then it would be irrelevant. This is domain of parametric statistical inference, or direct non-parametric methods like “historical MC” with or without bootstrap.
Perhaps, this is what colleagues have called “classical statistics”. You have everything here, from ANOVA to MLE, factor models, PCA, copulas etc. The key feature is that we work with the sample and all observations are *assumed* i.i.d. along the time coordinate, while dependency among the bar components are allowed.
The i.i.d *assumption* for a typical measurement data is unrealistic (“the single path” kind of thing; we have initially no basis to suggest that any serial dependence observed is not random). So we fall into the domain of time series analysis.
The point of time series analysis may be defined as separating some dynamic but non-stochastic structure of the model from i.i.d. innovations and extracting both somehow. So now we get all interesting stuff: non-stationarity, linear vs non-linear kernels, endogenous vs exogenous variables. But if the model is “noise-separable”, i.e.
x(t) = F(path(x(t-)), path(noise(t-)) = R(path(x(t-)) + S(path(x(t-)), path(noise(t-))
where noise is a (vector) process with independent increments, and infinitely divisible in the continuous time, then we hit the spot.
The “R” is, so to speak “ML” component. That allows you to formulate conditional predictions. “S” is “statistical” component. Among other things, it allows computing confidence intervals of x(t).
This formulation, of course, contains AR, ARMA, VARMA, VARX linear models. It contains all flavours of “GARCH” having x(t-) in “S”. If instead of (vector version)
X(t) = A + B*X(t-1)) + C*X(t-2) …
you have
X(t) = A + F(X(t-1), X(t-2)) + …
i.e. nonliar, on NVARxxx. All this stuff has bene around for years, only the liberty to make F a “neural network”, an ultimate non-parametric method, became possible only recently.
I guess the conclusion is that ML methods should probably be considered tools useable to solve statistical problems, in the same sense as totally non-statistical “least squared” method shows up in max-likelihood estimation of the linear statistical model with Gaussian noise.
On a practical side, the separable structure has an economic interpretation. “S” is generalization of “efficient market”, where the best thing you can do is to diversity. Such interpretation is obvious if S is linear:
x(t) = A + S*noise(t),
“is” efficient market, for return x(t).
“R” part is potential source of “alpha”, not possible in the efficient market. This is what trading strategies try to extract, an ability to predict next point from the history.
The source of confusion of stats and ML mostly in “equity” crowd is because *classically* equity models operated in the efficient market setting. Markowitz, CAL, CAMP, MPT, BL etc are pretty much about relationship between A and S, or about the structure of the noise components. But as long as you don’t have the “R” part, it will be at most about building portfolio, optimal in some sense. Only if you have “R” part ML truly become relevant, even in its linear form, and you may be talking not about portfolio diversification, but individual trading.
Equity Sales Trader at Bank of America Merrill Lynch
5yJames Anthonyrajah
Adjunct Professor of Financial Engineering, NYU Tandon School
5yThe problem with ML is that it always offers a solution, even when it's applied to non-stationary time series. Well, one cannot forecast random walk. I ask sometimes ML enthusiasts if they check their samples for unit roots - and their answer is 'we made a lot of money with our models'... Sure, in a bull market, one can make money by eyeballing the charts... :)
Quantitative Analyst in transition
5yIf you could provide a quant model of noise in the markets then that would be impressive.