Machine learning or econometrics?

In recent years we see fruitful developments and undeniable success in machine learning. The development in econometrics, a branch of economics that uses statistical methods in describing economic systems, seems falling behind the curve. Many predictive modelers including myself were rooted in econometrics, and now use machine learning more often. In this short post I like to mention the differences, and discuss how we can take advantages of both.

To "explan" or "predict"?

It helps to go back to the fundamental definitions. Galit Shmueli in his article "to explain or to predict" suggests the distinction between explanatory and predictive modeling. "Explanatory modeling" uses statistical methods to test causal relationships. In contrast, "predictive modeling" is the process of applying a statistical model or data mining algorithm to data for the purpose of predicting new or future observations. As Jerome H. Friedman said in his article "one of the most common uses for data is prediction." If the goal is to test the causal hypothesis, "explanatory" purpose is more important than predictive purpose, and vice versa. Econometrics, since its genesis, has been emphasized more on hypothesis testing and causality identification. In contrast, machine learning aims at prediction and has impressive success.

Understand the causality in model design. Pursue precision in model estimation

To deliver an effective business communication, both the "business sense" and "model accuracy" should be addressed. "Business sense" usually means the causality between the dependent and explanatory variables. How do we balance both? In the stage of model design, we shall communicate factors that identifying causality. In the stage of estimation, we shall adopt the machine learning methods that can achieve the highest level of prediction accuracy. The balance between the two goals can help ensuring the overall project success.

What econometrics can learn from machine learning?

Hal Varian, the Chief Economist at Google Inc., asks what an econometrican can learn from machine learning, and what machine learning can learn from econometrics. He lists the machine learning techniques (he called "new tricks") for econometricians to adopt, and the econometric perspectives that the machine learning techniques can consider.

  • train-test-validate to avoid overfitting
  • cross validation
  • nonlinear estimation (trees, forests, SVGs, neural nets, etc)
  • bootstrap, bagging, boosting
  • variable selection (lasso and friends)
  • model averaging
  • computational Bayesian methods (MCMC)
  • tools for manipulating big data (SQL, NoSQL databases)
  • textual analysis

What machine learning can learn from econometrics?

Hal Varian lists the following topics:

  • causal inference -- response to a treatment [manipulation, intervention]
  • confounding variables
  • natural experiments
  • explicit experiments
  • regression discontinuity
  • difference in differences
  • instrumental variables

Many business management problems need both the identification of the causal effects and accurate prediction, the current research has emerged from melting the two schools of thoughts. This melting trend can be seen plausibly in the increasing number of job articles in the American Economic Association conferences in the recent years.

Bladimir carabali Hinestroza

PhD in Demography from Unicamp / Interests in ethnic-racial inequalities, spatial inequalities and demographic analysis.

6mo

Thanks, excellent article.

Like
Reply
Vinay Surana

Regional Managing Director Asia Pacific, Middle East and Africa at Allianz Partners

8y

Chris....very well written.. I have not seen many authors give 'Business Sense' the same importance as 'Accuracy'. I agree that both the components are equally important. ML can certainly help enhance the confidence level around the predictability.....

Like
Reply
Yuanyuan Liu

Tech Executive, Statistician, AI Disruptor

8y

Interesting article. The potential benefits of ML will be its capability to handle immense unstructured data in a model-free, parameter-free flavor. This is potentially against the core values of econometrics though.

Like
Reply
Takeshi Yamaguchi

Analytics Executive on Finance, Economics, Insurance and Risk Management, CPCU/CSPA

8y

However, majority of the data in use are man-made products......A chicken and egg problem? But different time has different popularity?

Chris -- This article also helps non-econometricians get a glimpse of what's under the hood.

Like
Reply

To view or add a comment, sign in

Others also viewed

Explore topics