SlideShare a Scribd company logo
MANAGING UNCERTAINTY
IN AI PERFORMANCE
TARGET SETTING
Key methods
• Monte Carlo Simulation
• AI model ‘calibration’
HOW MUCH ACCURACY
DOES YOUR PRODUCT
REQUIRE?
WORKING BACKWARD FROM
PRODUCT REQUIREMENTS
•In classical AI applications, such as CFAR and MNIST, Kaggle contests
and other online challenges, we are accustomed to hearing about
algorithms in the 95-99 percentile + of accuracy
•Frequently predictive solutions have built upon one another over a
period of years, sometimes decades, with new state of the art models
improving performance by a fraction of a percentage point
Managing uncertainty in ai performance target setting
Managing uncertainty in ai performance target setting
FOR MANY APPLICATIONS WITH NEW
DATASETS THE RETURN ON
INVESTMENT AND WAITING PERIOD TO
REACH THE 99TH PERCENTILE IN
ACCURACY IS PROHIBITIVE
• New datasets require long periods of data exploration to determine and eliminate errors in the dat
The data collection process
• Frequently, basic models can reach acceptable baseline levels of accuracy with in a time period tha
acceptable for early prototype development
• Not all applications require the highest level of accuracy
• Models with higher and higher levels of accuracy can have diminishing returns as they take longer
and are more expensive to train, especially when using cloud services such as AWS
• Simple models such as MLP and KNN can be implemented quickly using tools like scikit-learn and
decent results
RESULTS OF SIMPLE MODELS
APPLIED TO MNIST
HOW CAN YOU DETERMINE
WHEN YOU NEED TO INVEST
IN DEVELOPING MORE
COMPLEX METHODS?
HIGHEST RISK VS. LOWER RISK
APPLICATIONS
•For many applications the risks are clear and the lowest levels of
error possible are desirable
• Self driving cars
• Medical applications
•For many applications, the risk associated with an error is not fatal,
and the costs associated with 99+ percentile accuracy are large. In
some cases, decision boundaries are not clear to human observers
and/or labels (such as appraisal values) are not agreed upon. These
types of applications are frequent when business, financial or
economic subject matter is the target of a prediction problem, but
can also appear in other low risk applications such as chat-bots,
where occasional errors may not dissuade prospects from converting
to sales.
IDEA – USE MONTE CARLO
SIMULATION
• Use Monte Carlo to simulate algorithm performance on real data before developing algorithm
• For example, you can assess the impact of different levels of accuracy on your product performan
investing time and money into developing an AI algorithm
PROJECTED PERFORMANCE
SIMULATION
•Select percentage of known labels, in this hypothetical case “buy”
recommendations for hypothetical stocks with returns above a threshold,
and create an “AI” selected data set by randomly sampling 1 – p negative
examples to be mislabeled by hypothetical AI
•Create a model of your product or business performance
•Simulate the performance of the product or business using Monte Carlo
trials. In this case a portfolio of 50 hypothetical stocks were chosen by the AI
and compared to those chosen by a hypothetical human, with some
information, from the same universe of stocks
•Probability distribution of false identification in feature space can be
specified and tested for distributions with same mean precision
•In this hypothetical example, an algorithm with 99% accuracy would be a
good target, but should consider whether or not 80-90% would be sufficient
•Also: should consider what level of accuracy is possible (for example, by
considering variability between human experts, in light of Big Data) and
INCOMPLETE DATA
•Another question companies frequently face is whether or not the
cost and time required to gather additional data will significantly
improve model performance
•Concept: utilize simulation on existing data to estimate performance
improvements
•Can sub-sample from data to simulate missing data, either in feature
or label space
•Eliminate field entries or entire examples and track degradation of
algorithm performance
•If performance does not decrease significantly, than more data is
unlikely to be helpful
NEW DATASome population data may be available for target populations at a high level,
but predicting labels for individuals from the population requires data to be gathered and significant
Predictive features. Companies need to decide if the investment is worth it.
For example: should we gather data to predict income in the U.S. or Canada first? Can simulate perfo
determine which country would be more profitable to predict on a per-capita basis, given product or
For example, targeted advertising based on predicted income. Different distribution assumptions can
ALL OF THIS CAN BE
ACCOMPLISHED BEFORE AN
ALGORITHM IS DEVELOPED OR
DURING EARLY STAGES OF
ALGORITHM DEVELOPMENT
As we can see from the error rates of these simple algorithms on MNIST data,
which can be rapidly prototyped using existing packages, a product prototype can be
built while considering the added benefits of further development on the dataset we
need to work with by simulating the performance within our business or product
model
MODEL CALIBRATION
Guo, C., Pleiss, G., Sun,Y., Wienberger, K., (2017) On Calibration of Modern Neural Networks
Proceedings of the 34th International Conference on Machine Learning, 70, pp 1321-1303
IMPORTANCE OF CALIBRATION
• Useful when decisions need to be made or risks need to be assessed at the level of single predictio
• For example, in human-ai collaboration paradigms in which human assistance is requested for cas
Machine confidence falls below a threshold
• Investors buying single art works require risk assessments on a per-item basis
• Current calibration methods as reviewed in the referenced article asses calibration across all featur
• However, there is no reason to assume that an algorithm equally well calibrated across all subsets
• For example, there have been many cases in which facial recognition, sentiment analysis fail for pro
subgroups

More Related Content

PDF
Business Analytics and Optimization Introduction (part 2)
PDF
1225 lunchlearn shekhar_using his mac
PPTX
Predictive analytics roadshow
PDF
thesis_jinxing_lin
PDF
Webinar: How machine learning can impact manufacturing industry?
PDF
1555 track 1 huang_using his mac
PDF
Forecasting and Predictive Digital Marketing. Daniele Donzella, Gianluigi Spa...
PDF
Jinxing_LIN_S224266_Poster
Business Analytics and Optimization Introduction (part 2)
1225 lunchlearn shekhar_using his mac
Predictive analytics roadshow
thesis_jinxing_lin
Webinar: How machine learning can impact manufacturing industry?
1555 track 1 huang_using his mac
Forecasting and Predictive Digital Marketing. Daniele Donzella, Gianluigi Spa...
Jinxing_LIN_S224266_Poster

What's hot (18)

PDF
Machine Learning Application to Manufacturing using Tableau, Tableau and Goog...
PPTX
Price optimization for high-mix, low-volume environments | Using R and Tablea...
PPTX
Integrating A.I. and Machine Learning with your Demand Forecast
PDF
Machine Learning Application to Manufacturing using Tableau and Google by Pluto7
PDF
Customer Churn Analytics using Microsoft R Open
PDF
Customer Churn, A Data Science Use Case in Telecom
DOCX
Return on-investment (roi)
PDF
Feelink 2014 posts
PDF
OpLossModels_A2015
PDF
Energy Trading and Prescriptive Analytics
PPTX
Sales Performance Deep Dive and Forecast: A ML Driven Analytics Solution
PDF
ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...
PDF
DTH Case Study
PPTX
Churn Modeling For Mobile Telecommunications
PDF
CatchSense
PPTX
Prediction of customer propensity to churn - Telecom Industry
PPTX
Use of Analytics in Procurement
PDF
Enablers for Maturing your S&OP Processes, SherTrack
Machine Learning Application to Manufacturing using Tableau, Tableau and Goog...
Price optimization for high-mix, low-volume environments | Using R and Tablea...
Integrating A.I. and Machine Learning with your Demand Forecast
Machine Learning Application to Manufacturing using Tableau and Google by Pluto7
Customer Churn Analytics using Microsoft R Open
Customer Churn, A Data Science Use Case in Telecom
Return on-investment (roi)
Feelink 2014 posts
OpLossModels_A2015
Energy Trading and Prescriptive Analytics
Sales Performance Deep Dive and Forecast: A ML Driven Analytics Solution
ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...
DTH Case Study
Churn Modeling For Mobile Telecommunications
CatchSense
Prediction of customer propensity to churn - Telecom Industry
Use of Analytics in Procurement
Enablers for Maturing your S&OP Processes, SherTrack
Ad

Similar to Managing uncertainty in ai performance target setting (20)

PDF
Managing machine learning
PPT
PPT for project (1).ppt
PPTX
AI Class Topic 3: Building Machine Learning Predictive Systems (Predictive Ma...
PDF
Barga Galvanize Sept 2015
PDF
Machine Learning in Customer Analytics
PDF
Data Driven Engineering 2014
PDF
Business Intelligence, Data Analytics, and AI
PPTX
Is deep learning is a game changer for marketing analytics
PDF
Technology Solutions for Manufacturing
PPTX
Modernizing legacy systems
PDF
Five costly mistakes applying spc [whitepaper]
PDF
Leveraging Data Analysis for Sales
PPTX
Six Sigma Yellow Belt Training
PPTX
CollectionOptimization
PPTX
Four stage business analytics model
PPTX
Challenges in adapting predictive analytics
PPTX
Smart solutions for productivity gain IQA conference 2017
PDF
credit card fraud detection
PPTX
Machine intelligence data science methodology 060420
PDF
ACT Operations Research The Company
Managing machine learning
PPT for project (1).ppt
AI Class Topic 3: Building Machine Learning Predictive Systems (Predictive Ma...
Barga Galvanize Sept 2015
Machine Learning in Customer Analytics
Data Driven Engineering 2014
Business Intelligence, Data Analytics, and AI
Is deep learning is a game changer for marketing analytics
Technology Solutions for Manufacturing
Modernizing legacy systems
Five costly mistakes applying spc [whitepaper]
Leveraging Data Analysis for Sales
Six Sigma Yellow Belt Training
CollectionOptimization
Four stage business analytics model
Challenges in adapting predictive analytics
Smart solutions for productivity gain IQA conference 2017
credit card fraud detection
Machine intelligence data science methodology 060420
ACT Operations Research The Company
Ad

Recently uploaded (20)

PDF
Nidhal Samdaie CV - International Business Consultant
PDF
Katrina Stoneking: Shaking Up the Alcohol Beverage Industry
PPTX
Amazon (Business Studies) management studies
PDF
COST SHEET- Tender and Quotation unit 2.pdf
PPT
Data mining for business intelligence ch04 sharda
PPTX
Lecture (1)-Introduction.pptx business communication
PDF
A Brief Introduction About Julia Allison
PPTX
AI-assistance in Knowledge Collection and Curation supporting Safe and Sustai...
PDF
BsN 7th Sem Course GridNNNNNNNN CCN.pdf
PDF
Roadmap Map-digital Banking feature MB,IB,AB
PDF
Power and position in leadershipDOC-20250808-WA0011..pdf
PPTX
Principles of Marketing, Industrial, Consumers,
DOCX
Business Management - unit 1 and 2
PPT
Chapter four Project-Preparation material
PDF
pdfcoffee.com-opt-b1plus-sb-answers.pdfvi
PDF
Elevate Cleaning Efficiency Using Tallfly Hair Remover Roller Factory Expertise
PDF
Unit 1 Cost Accounting - Cost sheet
PDF
SIMNET Inc – 2023’s Most Trusted IT Services & Solution Provider
PDF
MSPs in 10 Words - Created by US MSP Network
PDF
Stem Cell Market Report | Trends, Growth & Forecast 2025-2034
Nidhal Samdaie CV - International Business Consultant
Katrina Stoneking: Shaking Up the Alcohol Beverage Industry
Amazon (Business Studies) management studies
COST SHEET- Tender and Quotation unit 2.pdf
Data mining for business intelligence ch04 sharda
Lecture (1)-Introduction.pptx business communication
A Brief Introduction About Julia Allison
AI-assistance in Knowledge Collection and Curation supporting Safe and Sustai...
BsN 7th Sem Course GridNNNNNNNN CCN.pdf
Roadmap Map-digital Banking feature MB,IB,AB
Power and position in leadershipDOC-20250808-WA0011..pdf
Principles of Marketing, Industrial, Consumers,
Business Management - unit 1 and 2
Chapter four Project-Preparation material
pdfcoffee.com-opt-b1plus-sb-answers.pdfvi
Elevate Cleaning Efficiency Using Tallfly Hair Remover Roller Factory Expertise
Unit 1 Cost Accounting - Cost sheet
SIMNET Inc – 2023’s Most Trusted IT Services & Solution Provider
MSPs in 10 Words - Created by US MSP Network
Stem Cell Market Report | Trends, Growth & Forecast 2025-2034

Managing uncertainty in ai performance target setting

  • 1. MANAGING UNCERTAINTY IN AI PERFORMANCE TARGET SETTING Key methods • Monte Carlo Simulation • AI model ‘calibration’
  • 2. HOW MUCH ACCURACY DOES YOUR PRODUCT REQUIRE?
  • 3. WORKING BACKWARD FROM PRODUCT REQUIREMENTS •In classical AI applications, such as CFAR and MNIST, Kaggle contests and other online challenges, we are accustomed to hearing about algorithms in the 95-99 percentile + of accuracy •Frequently predictive solutions have built upon one another over a period of years, sometimes decades, with new state of the art models improving performance by a fraction of a percentage point
  • 6. FOR MANY APPLICATIONS WITH NEW DATASETS THE RETURN ON INVESTMENT AND WAITING PERIOD TO REACH THE 99TH PERCENTILE IN ACCURACY IS PROHIBITIVE • New datasets require long periods of data exploration to determine and eliminate errors in the dat The data collection process • Frequently, basic models can reach acceptable baseline levels of accuracy with in a time period tha acceptable for early prototype development • Not all applications require the highest level of accuracy • Models with higher and higher levels of accuracy can have diminishing returns as they take longer and are more expensive to train, especially when using cloud services such as AWS • Simple models such as MLP and KNN can be implemented quickly using tools like scikit-learn and decent results
  • 7. RESULTS OF SIMPLE MODELS APPLIED TO MNIST
  • 8. HOW CAN YOU DETERMINE WHEN YOU NEED TO INVEST IN DEVELOPING MORE COMPLEX METHODS?
  • 9. HIGHEST RISK VS. LOWER RISK APPLICATIONS •For many applications the risks are clear and the lowest levels of error possible are desirable • Self driving cars • Medical applications •For many applications, the risk associated with an error is not fatal, and the costs associated with 99+ percentile accuracy are large. In some cases, decision boundaries are not clear to human observers and/or labels (such as appraisal values) are not agreed upon. These types of applications are frequent when business, financial or economic subject matter is the target of a prediction problem, but can also appear in other low risk applications such as chat-bots, where occasional errors may not dissuade prospects from converting to sales.
  • 10. IDEA – USE MONTE CARLO SIMULATION • Use Monte Carlo to simulate algorithm performance on real data before developing algorithm • For example, you can assess the impact of different levels of accuracy on your product performan investing time and money into developing an AI algorithm
  • 12. SIMULATION •Select percentage of known labels, in this hypothetical case “buy” recommendations for hypothetical stocks with returns above a threshold, and create an “AI” selected data set by randomly sampling 1 – p negative examples to be mislabeled by hypothetical AI •Create a model of your product or business performance •Simulate the performance of the product or business using Monte Carlo trials. In this case a portfolio of 50 hypothetical stocks were chosen by the AI and compared to those chosen by a hypothetical human, with some information, from the same universe of stocks •Probability distribution of false identification in feature space can be specified and tested for distributions with same mean precision •In this hypothetical example, an algorithm with 99% accuracy would be a good target, but should consider whether or not 80-90% would be sufficient •Also: should consider what level of accuracy is possible (for example, by considering variability between human experts, in light of Big Data) and
  • 13. INCOMPLETE DATA •Another question companies frequently face is whether or not the cost and time required to gather additional data will significantly improve model performance •Concept: utilize simulation on existing data to estimate performance improvements •Can sub-sample from data to simulate missing data, either in feature or label space •Eliminate field entries or entire examples and track degradation of algorithm performance •If performance does not decrease significantly, than more data is unlikely to be helpful
  • 14. NEW DATASome population data may be available for target populations at a high level, but predicting labels for individuals from the population requires data to be gathered and significant Predictive features. Companies need to decide if the investment is worth it. For example: should we gather data to predict income in the U.S. or Canada first? Can simulate perfo determine which country would be more profitable to predict on a per-capita basis, given product or For example, targeted advertising based on predicted income. Different distribution assumptions can
  • 15. ALL OF THIS CAN BE ACCOMPLISHED BEFORE AN ALGORITHM IS DEVELOPED OR DURING EARLY STAGES OF ALGORITHM DEVELOPMENT As we can see from the error rates of these simple algorithms on MNIST data, which can be rapidly prototyped using existing packages, a product prototype can be built while considering the added benefits of further development on the dataset we need to work with by simulating the performance within our business or product model
  • 16. MODEL CALIBRATION Guo, C., Pleiss, G., Sun,Y., Wienberger, K., (2017) On Calibration of Modern Neural Networks Proceedings of the 34th International Conference on Machine Learning, 70, pp 1321-1303
  • 17. IMPORTANCE OF CALIBRATION • Useful when decisions need to be made or risks need to be assessed at the level of single predictio • For example, in human-ai collaboration paradigms in which human assistance is requested for cas Machine confidence falls below a threshold • Investors buying single art works require risk assessments on a per-item basis • Current calibration methods as reviewed in the referenced article asses calibration across all featur • However, there is no reason to assume that an algorithm equally well calibrated across all subsets • For example, there have been many cases in which facial recognition, sentiment analysis fail for pro subgroups