SlideShare a Scribd company logo
Qu Speaker Series
Modular Machine Learning for Model Validation
An Afternoon with Dr. Joseph Simonian
Autonomous Investing
2020 Copyright QuantUniversity LLC.
Hosted By:
Sri Krishnamurthy, CFA, CAP
sri@quantuniversity.com
www.qu.academy
07/22/2020
Online
https://quspeakerseries2.spla
shthat.com/
2
QuantUniversity
• Boston-based Data Science, Quant
Finance and Machine Learning
training and consulting advisory
• Trained more than 1000 students in
Quantitative methods, Data Science
and Big Data Technologies using
MATLAB, Python and R
• Building a platform for AI
and Machine Learning Exploration
and Experimentation
For registration information, go to
https://QuSummerSchool.splashthat.com
3
Modular Machine Learning for Model Validation
5
6
• Joseph Simonian is the Founder and CEO of Autonomous Investing Solutions.
Prior to that, Joseph was an Investment Strategist at Acadian Asset
Management. Before joining Acadian, Joseph was the director of
Quantitative Research for the Portfolio Research and Consulting Group at
Natixis Investment Managers. Prior to that, he was a principal research
analyst in the Global Institutional Solutions Group at Fidelity Investments. He
was also previously a vice president at JPMorgan Asset Management and
PIMCO.
• Joseph is currently the co-editor of the Journal of Financial Data Science and
Advisory Board member for the Financial Data Professional Institute.
• Joseph holds a Ph.D. from the University of California, Santa Barbara; an
M.A. from Columbia University; as well as a B.A. from the University of
California, Los Angeles.
Modular Machine Learning for Model Validation
Modular Machine Learning for Model
Validation: An Application to The
Fundamental Law of Active Management
Joseph Simonian, PhD
Intro
Implementing model validation through a set of interdependent modules that utilizes both traditional
econometrics and data science techniques can produce robust assessments of the predictive effectiveness of
investment signals in an economically intuitive manner.
Intro
The proposed methodology, modular machine learning, also answers a number of practical questions that arise
when applying block time series cross-validation such as what number of folds to use and what block size to use
between folds.
Intro
It is possible to re-interpret the Fundamental Law of Active Management into a model validation framework by
expressing its fundamental concepts, information coefficient and breadth, using the formal language of data
science.
Intro
We introduce an approach towards model validation which we call modular machine learning (MML) and use it to
build a methodology that can be applied to the evaluation of investment signals within the conceptual scheme
provided by the FL. Our framework is modular in two respects:
(1) It is comprised of independent computational components, each using the output of another as its input, and
(2) It is characterized by the distinct role played by traditional econometric and date science methodologies.
Intro
where σ_IC^2 represents the variance of the IC. Although there are various ways to express the variability of the
IC, in this article we will build our framework using the formula in (1), as it extends the original FL in a formally
elegant manner with few underlying assumptions.
MODULAR MACHINE LEARNING
Model validation using MML is implemented through three distinct modules:
• Sub-sample classification module: An econometric model is used to classify the sub-sets of a time
series into distinct regimes.
• Signal quality module: The regime-specific sub-sets of data from (1) are used as inputs to block time-
series cross-validated regressions, to derive regime-specific measures of signal quality. In the context of the FL,
these would be values for the IC and IC variance.
• Signal diversification module: Using signal quality measurements from (2) as inputs, clustering is used
to determine the level of diversity present in a set of investment signals. In the context of the FL, the number
of clusters is used as the value for BR.
MODULAR MACHINE LEARNING
A defining characteristic of MML is its blending of what Breiman [2001] calls the “data modeling culture”
represented by econometrics, with the “algorithmic modeling culture” represented by data science.
The data modeling culture has hitherto been the dominant culture in traditional statistical practice and is
primarily concerned with providing information about the relationships that exists between input and response
variables and assumes that the data are generated by a specific stochastic process.
In contrast, the algorithmic modeling culture assumes that the relationships between input and response
variables are essentially too complex to uncover, and consequently is primarily concerned with devising
methods to successfully predict responses from inputs.
MODULAR MACHINE LEARNING
Data science and traditional econometrics each have unique strengths that can prove valuable to the model
validation process.
Data science is directed towards prediction, testing, and pattern recognition.
In contrast, traditional econometrics has a fairly weak track record with regards to producing effective
forecasting tools, but has been more successful in producing formal frameworks that provide ex-post insight
into the structure of economic data.
Advantages
There are distinct advantages in using an econometrically rigorous foundation upon which to base model
validation, both in terms of methodological clarity and technical utility.
Advantages
Using regime-switching models to classify time series provides the formal means for the model validation
process to speak to the particular “mental models” of the economic and financial world that portfolio
managers use when building investment strategies.
Advantages
The second advantage of using regime-switching models for sub-sample classification is that when their output
is used as the input to a cross-validation procedure, the length and number of folds is objectively determined
by the regime-switching model being employed. While the various folds belonging to a particular regime will
almost invariably be of different length, they will all be of like kind, representing specific economic and market
states. Accordingly, in the context of the FL, this regime-based cross-validation, allows for the derivation of
predictive IRs corresponding to different regimes, which can then be considered alone or combined through
various averaging procedures.
Advantages
The third advantage of using regime-switching models in model validation is in relation to the block size in
block time-series cross-validation. Time series cross-validation is intended to address the “memory” of the past
that is embedded in data exhibiting chronological dependencies. It differs from standard types of cross-
validation in its requirement that training data temporally precede test data. Block time series cross-validation
has been introduced. The latter procedure is defined by the omission of some observations (blocks) lying
between training and test samples during cross-validation, in order to reinforce the “memoryless” nature of
the operation. When blocking is employed, the question of block size naturally arises, but time-series cross
validation by itself does not provide a clear-cut way to choose a value. Traditional regime-switching models on
the other hand can provide some guidance in this respect as they provide a specification for the order of the
autoregressive process that drives regimes. This order value seems to be a natural choice for the choice of the
block value for the blocks that are inserted between training and test samples.
Thanks!
Questions, contact:
Joe Simonian, Phd
21
Demos, slides and video available on QuAcademy
Go to www.qu.academy
21
22
Stay tuned for more speakers in the
Qu Speaker Series this summer!
https://qusummerschool.splashthat.com/
Next week!
Model Validation for
Machine learning models
Thank you!
Sri Krishnamurthy, CFA, CAP
Founder and CEO
QuantUniversity LLC.
srikrishnamurthy
www.QuantUniversity.com
Contact
Information, data and drawings embodied in this presentation are strictly a property of QuantUniversity LLC. and shall not be
distributed or used in any other publication without the prior written consent of QuantUniversity LLC.
23

More Related Content

PDF
achine Learning and Model Risk
PDF
Synthetic Data Generation with DoppelGanger
PDF
Machine learning for factor investing
PDF
Bayesian Portfolio Allocation
PDF
Rapid prototyping quant research ml models using the qu sandbox
PDF
Time series analysis : Refresher and Innovations
PDF
10 Key Considerations for AI/ML Model Governance
PDF
Machine Learning and AI: Core Methods and Applications
achine Learning and Model Risk
Synthetic Data Generation with DoppelGanger
Machine learning for factor investing
Bayesian Portfolio Allocation
Rapid prototyping quant research ml models using the qu sandbox
Time series analysis : Refresher and Innovations
10 Key Considerations for AI/ML Model Governance
Machine Learning and AI: Core Methods and Applications

What's hot (20)

PDF
Machine Learning: Considerations for Fairly and Transparently Expanding Acces...
PDF
Ai in finance
PDF
Synthetic VIX Data Generation Using ML Techniques
PDF
Machine Learning and AI: An Intuitive Introduction - CFA Institute Masterclass
PDF
Quant university MRM and machine learning
PDF
Ml conference slides boston june 2019
PDF
Machine Learning for Finance Master Class
PDF
Algorithmic auditing 1.0
PDF
QuantUniversity Machine Learning in Finance Course
PDF
QCon conference 2019
PDF
Data science in 10 steps
PDF
Nlp workshop-share
PDF
Machine Learning Interpretability
PDF
QuSandbox+NVIDIA Rapids
PDF
Ds for finance day1
PDF
Machine Learning Applications in Credit Risk
PDF
ML master class
PDF
CFA-NY Workshop - Final slides
PDF
Python for Data science
PDF
Synthetic data generation for machine learning
Machine Learning: Considerations for Fairly and Transparently Expanding Acces...
Ai in finance
Synthetic VIX Data Generation Using ML Techniques
Machine Learning and AI: An Intuitive Introduction - CFA Institute Masterclass
Quant university MRM and machine learning
Ml conference slides boston june 2019
Machine Learning for Finance Master Class
Algorithmic auditing 1.0
QuantUniversity Machine Learning in Finance Course
QCon conference 2019
Data science in 10 steps
Nlp workshop-share
Machine Learning Interpretability
QuSandbox+NVIDIA Rapids
Ds for finance day1
Machine Learning Applications in Credit Risk
ML master class
CFA-NY Workshop - Final slides
Python for Data science
Synthetic data generation for machine learning
Ad

Similar to Modular Machine Learning for Model Validation (20)

PDF
A Study On Hybrid System
PDF
Rethinking Analytics, Analytical Processes, and Risk Architecture Across the ...
PDF
MUTUAL FUND RECOMMENDATION SYSTEM WITH PERSONALIZED EXPLANATIONS
PDF
Smart E-Logistics for SCM Spend Analysis
DOCX
Modeling & simulation in projects
PDF
Stock Market Analysis and Prediction (1) (2).pdf
PDF
Post Graduate Admission Prediction System
DOCX
Sbi simulation
PPTX
Amino acids substitution matrices for dna/protein.pptx
DOCX
Om0012 supply chain management
PDF
Stochastic Simulation Optimization An Optimal Computing Budget Allocation Chu...
DOCX
Om0012 supply chain management
DOCX
Chapters 4,5 and 6Into policymaking and modeling in a comple.docx
DOCX
Chapters 4,5 and 6Into policymaking and modeling in a comple.docx
PDF
Advances In Collaborative Filtering
PDF
A Review on Traffic Signal Identification
PPTX
Supporting B2Bsales forecasting by machine learning - Mirjana Klajic Borstnar
PDF
Applying Convolutional-GRU for Term Deposit Likelihood Prediction
PDF
Java datamining ieee Projects 2012 @ Seabirds ( Chennai, Mumbai, Pune, Nagpur...
A Study On Hybrid System
Rethinking Analytics, Analytical Processes, and Risk Architecture Across the ...
MUTUAL FUND RECOMMENDATION SYSTEM WITH PERSONALIZED EXPLANATIONS
Smart E-Logistics for SCM Spend Analysis
Modeling & simulation in projects
Stock Market Analysis and Prediction (1) (2).pdf
Post Graduate Admission Prediction System
Sbi simulation
Amino acids substitution matrices for dna/protein.pptx
Om0012 supply chain management
Stochastic Simulation Optimization An Optimal Computing Budget Allocation Chu...
Om0012 supply chain management
Chapters 4,5 and 6Into policymaking and modeling in a comple.docx
Chapters 4,5 and 6Into policymaking and modeling in a comple.docx
Advances In Collaborative Filtering
A Review on Traffic Signal Identification
Supporting B2Bsales forecasting by machine learning - Mirjana Klajic Borstnar
Applying Convolutional-GRU for Term Deposit Likelihood Prediction
Java datamining ieee Projects 2012 @ Seabirds ( Chennai, Mumbai, Pune, Nagpur...
Ad

More from QuantUniversity (20)

PDF
AI in Finance and Retirement Systems: Insights from the EBRI-Milken Institute...
PDF
Adversarial Machine Learning: A Taxonomy and Terminology of Attacks and Mitig...
PDF
EU Artificial Intelligence Act 2024 passed !
PDF
Managing-the-Risks-of-LLMs-in-FS-Industry-Roundtable-TruEra-QuantU.pdf
PDF
PYTHON AND DATA SCIENCE FOR INVESTMENT PROFESSIONALS
PDF
Qu for India - QuantUniversity FundRaiser
PDF
Ml master class for CFA Dallas
PDF
Algorithmic auditing 1.0
PDF
Towards Fairer Datasets: Filtering and Balancing the Distribution of the Peop...
PDF
Seeing what a gan cannot generate: paper review
PDF
AI Explainability and Model Risk Management
PDF
Machine Learning in Finance: 10 Things You Need to Know in 2021
PDF
The API Jungle
PDF
Explainable AI Workshop
PDF
Constructing Private Asset Benchmarks
PDF
Responsible AI in Action
PDF
Qu speaker series 14: Synthetic Data Generation in Finance
PDF
Qwafafew meeting 5
PDF
Qu speaker series:Ethical Use of AI in Financial Markets
PDF
Fintech in the Post-Covid Age
AI in Finance and Retirement Systems: Insights from the EBRI-Milken Institute...
Adversarial Machine Learning: A Taxonomy and Terminology of Attacks and Mitig...
EU Artificial Intelligence Act 2024 passed !
Managing-the-Risks-of-LLMs-in-FS-Industry-Roundtable-TruEra-QuantU.pdf
PYTHON AND DATA SCIENCE FOR INVESTMENT PROFESSIONALS
Qu for India - QuantUniversity FundRaiser
Ml master class for CFA Dallas
Algorithmic auditing 1.0
Towards Fairer Datasets: Filtering and Balancing the Distribution of the Peop...
Seeing what a gan cannot generate: paper review
AI Explainability and Model Risk Management
Machine Learning in Finance: 10 Things You Need to Know in 2021
The API Jungle
Explainable AI Workshop
Constructing Private Asset Benchmarks
Responsible AI in Action
Qu speaker series 14: Synthetic Data Generation in Finance
Qwafafew meeting 5
Qu speaker series:Ethical Use of AI in Financial Markets
Fintech in the Post-Covid Age

Recently uploaded (20)

PPTX
IB Computer Science - Internal Assessment.pptx
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PPTX
Introduction to machine learning and Linear Models
PPTX
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
PPTX
1_Introduction to advance data techniques.pptx
PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PDF
Foundation of Data Science unit number two notes
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PPT
Quality review (1)_presentation of this 21
PPTX
Supervised vs unsupervised machine learning algorithms
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PPTX
Qualitative Qantitative and Mixed Methods.pptx
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
IB Computer Science - Internal Assessment.pptx
Galatica Smart Energy Infrastructure Startup Pitch Deck
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
Introduction-to-Cloud-ComputingFinal.pptx
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
Introduction to machine learning and Linear Models
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
1_Introduction to advance data techniques.pptx
oil_refinery_comprehensive_20250804084928 (1).pptx
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
Data_Analytics_and_PowerBI_Presentation.pptx
Foundation of Data Science unit number two notes
Acceptance and paychological effects of mandatory extra coach I classes.pptx
Quality review (1)_presentation of this 21
Supervised vs unsupervised machine learning algorithms
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
Qualitative Qantitative and Mixed Methods.pptx
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
IBA_Chapter_11_Slides_Final_Accessible.pptx

Modular Machine Learning for Model Validation

  • 1. Qu Speaker Series Modular Machine Learning for Model Validation An Afternoon with Dr. Joseph Simonian Autonomous Investing 2020 Copyright QuantUniversity LLC. Hosted By: Sri Krishnamurthy, CFA, CAP sri@quantuniversity.com www.qu.academy 07/22/2020 Online https://quspeakerseries2.spla shthat.com/
  • 2. 2 QuantUniversity • Boston-based Data Science, Quant Finance and Machine Learning training and consulting advisory • Trained more than 1000 students in Quantitative methods, Data Science and Big Data Technologies using MATLAB, Python and R • Building a platform for AI and Machine Learning Exploration and Experimentation
  • 3. For registration information, go to https://QuSummerSchool.splashthat.com 3
  • 5. 5
  • 6. 6 • Joseph Simonian is the Founder and CEO of Autonomous Investing Solutions. Prior to that, Joseph was an Investment Strategist at Acadian Asset Management. Before joining Acadian, Joseph was the director of Quantitative Research for the Portfolio Research and Consulting Group at Natixis Investment Managers. Prior to that, he was a principal research analyst in the Global Institutional Solutions Group at Fidelity Investments. He was also previously a vice president at JPMorgan Asset Management and PIMCO. • Joseph is currently the co-editor of the Journal of Financial Data Science and Advisory Board member for the Financial Data Professional Institute. • Joseph holds a Ph.D. from the University of California, Santa Barbara; an M.A. from Columbia University; as well as a B.A. from the University of California, Los Angeles. Modular Machine Learning for Model Validation
  • 7. Modular Machine Learning for Model Validation: An Application to The Fundamental Law of Active Management Joseph Simonian, PhD
  • 8. Intro Implementing model validation through a set of interdependent modules that utilizes both traditional econometrics and data science techniques can produce robust assessments of the predictive effectiveness of investment signals in an economically intuitive manner.
  • 9. Intro The proposed methodology, modular machine learning, also answers a number of practical questions that arise when applying block time series cross-validation such as what number of folds to use and what block size to use between folds.
  • 10. Intro It is possible to re-interpret the Fundamental Law of Active Management into a model validation framework by expressing its fundamental concepts, information coefficient and breadth, using the formal language of data science.
  • 11. Intro We introduce an approach towards model validation which we call modular machine learning (MML) and use it to build a methodology that can be applied to the evaluation of investment signals within the conceptual scheme provided by the FL. Our framework is modular in two respects: (1) It is comprised of independent computational components, each using the output of another as its input, and (2) It is characterized by the distinct role played by traditional econometric and date science methodologies.
  • 12. Intro where σ_IC^2 represents the variance of the IC. Although there are various ways to express the variability of the IC, in this article we will build our framework using the formula in (1), as it extends the original FL in a formally elegant manner with few underlying assumptions.
  • 13. MODULAR MACHINE LEARNING Model validation using MML is implemented through three distinct modules: • Sub-sample classification module: An econometric model is used to classify the sub-sets of a time series into distinct regimes. • Signal quality module: The regime-specific sub-sets of data from (1) are used as inputs to block time- series cross-validated regressions, to derive regime-specific measures of signal quality. In the context of the FL, these would be values for the IC and IC variance. • Signal diversification module: Using signal quality measurements from (2) as inputs, clustering is used to determine the level of diversity present in a set of investment signals. In the context of the FL, the number of clusters is used as the value for BR.
  • 14. MODULAR MACHINE LEARNING A defining characteristic of MML is its blending of what Breiman [2001] calls the “data modeling culture” represented by econometrics, with the “algorithmic modeling culture” represented by data science. The data modeling culture has hitherto been the dominant culture in traditional statistical practice and is primarily concerned with providing information about the relationships that exists between input and response variables and assumes that the data are generated by a specific stochastic process. In contrast, the algorithmic modeling culture assumes that the relationships between input and response variables are essentially too complex to uncover, and consequently is primarily concerned with devising methods to successfully predict responses from inputs.
  • 15. MODULAR MACHINE LEARNING Data science and traditional econometrics each have unique strengths that can prove valuable to the model validation process. Data science is directed towards prediction, testing, and pattern recognition. In contrast, traditional econometrics has a fairly weak track record with regards to producing effective forecasting tools, but has been more successful in producing formal frameworks that provide ex-post insight into the structure of economic data.
  • 16. Advantages There are distinct advantages in using an econometrically rigorous foundation upon which to base model validation, both in terms of methodological clarity and technical utility.
  • 17. Advantages Using regime-switching models to classify time series provides the formal means for the model validation process to speak to the particular “mental models” of the economic and financial world that portfolio managers use when building investment strategies.
  • 18. Advantages The second advantage of using regime-switching models for sub-sample classification is that when their output is used as the input to a cross-validation procedure, the length and number of folds is objectively determined by the regime-switching model being employed. While the various folds belonging to a particular regime will almost invariably be of different length, they will all be of like kind, representing specific economic and market states. Accordingly, in the context of the FL, this regime-based cross-validation, allows for the derivation of predictive IRs corresponding to different regimes, which can then be considered alone or combined through various averaging procedures.
  • 19. Advantages The third advantage of using regime-switching models in model validation is in relation to the block size in block time-series cross-validation. Time series cross-validation is intended to address the “memory” of the past that is embedded in data exhibiting chronological dependencies. It differs from standard types of cross- validation in its requirement that training data temporally precede test data. Block time series cross-validation has been introduced. The latter procedure is defined by the omission of some observations (blocks) lying between training and test samples during cross-validation, in order to reinforce the “memoryless” nature of the operation. When blocking is employed, the question of block size naturally arises, but time-series cross validation by itself does not provide a clear-cut way to choose a value. Traditional regime-switching models on the other hand can provide some guidance in this respect as they provide a specification for the order of the autoregressive process that drives regimes. This order value seems to be a natural choice for the choice of the block value for the blocks that are inserted between training and test samples.
  • 21. 21 Demos, slides and video available on QuAcademy Go to www.qu.academy 21
  • 22. 22 Stay tuned for more speakers in the Qu Speaker Series this summer! https://qusummerschool.splashthat.com/ Next week! Model Validation for Machine learning models
  • 23. Thank you! Sri Krishnamurthy, CFA, CAP Founder and CEO QuantUniversity LLC. srikrishnamurthy www.QuantUniversity.com Contact Information, data and drawings embodied in this presentation are strictly a property of QuantUniversity LLC. and shall not be distributed or used in any other publication without the prior written consent of QuantUniversity LLC. 23