SlideShare a Scribd company logo
Confidence in Software Cost Estimation Results based on MMRE and PRED Presentation for PROMISE 2008 Marcel Korte [email_address] Dan Port University of Hawai'i at Manoa Phone: +1-(808)-956-7494 [email_address]
Table of Contents 13 May 2008 Introduction Approach The Standard Error Bootstrapping The Confidence intervals Datasets and models used Ex.: Bootstrapped MMREs Accounting for Standard Error How much confidence needed? The Desharnais Problem Conclusion Invitation for collaboration
Introduction Large number of cost estimation research efforts over last 20+ years Still lack of confidence in such research results Average overrun of software projects is 30% - 40%  (Moløkken, Jørgensen) Various studies show inconclusive and / or contradictory results 13 May 2008
Approach Software cost estimation research is based on one or more  datasets Yet datasets are  samples , perhaps significantly biased, often outdated, and of questionable relevancy Empirical results, based on small datasets, are generalized to an entire population without considering the possible  error  inherent Question: How accurate is my accuracy? 13 May 2008
The Standard Error Widely used in many fields of research and well understood Measure of the error in calculations based on sample population datasets Has not been used in the field of software cost estimation yet Many confusing, inconclusive, or contradictory results can be illuminated by indicating that we cannot “have confidence” in them. 13 May 2008
Bootstrapping General problem: Distribution not known „ Computer intensive“ technique similar to Monte-Carlo method Resampling with replacement to „reconstruct“ the general population distribution Well-accepted, straightforward approach to approximating the standard error of an estimator We used 15,000 iterations in this study 13 May 2008
The Confidence Intervals MREs are not normally distributed Underlying distribution is not known BC-percentile, or „bias corrected“ method has been shown effective in approximating confidence intervals for the available distributions 13 May 2008 Histogram of bootstrapped MMRE and log-transformed MMRE for model (A), NASA93 dataset
Datasets and models used PROMISE Datasets: COCOMO81*, COCOMONASA, NASA93, and Desharnais* Models: A: ln_LSR_CAT** B: aSb C: given_EM D: ln_LSR_aSb E: ln_LSR_EM F: LSR_a+Sb * Some errors found and corrected in these datasets ** Purely statistical model 13 May 2008
Bootstrapped MMRE intervals 1/2 13 May 2008 COCOMO81 dataset COCOMONASA dataset
Bootstrapped MMRE intervals 2/2 13 May 2008 NASA93 dataset Desharnais dataset (*note only D & F used with FP raw and FP adj)
Accounting for Standard Error 13 May 2008 Model ranking based on MMRE,  not  accounting for Standard Error. Model ranking based on MMRE, accounting for Standard Error at 95% confidence level. COCOMO81 COCOMONASA NASA93 1. A A A 2. E E E 3. C C C 4. B D B 5. D B D COCOMO81 COCOMONASA NASA93 1. A A A, B, C, D, E 2. C, E E -  3. B, D B, C, D - 4. - - - 5. - - -
How much confidence needed? 13 May 2008 Bootstrapped PRED(.30) intervals with significant differences (32%-confidence level, COCOMONASA dataset)* *This a very crude example. There are more refined approaches that account for simultaneous (ANOVA like) comparisons Bootstrapped PRED(.30) intervals (COCOMONASA dataset)
The Desharnais Problem 13 May 2008 Model ranking not accounting for Standard Error (Desharnais, FP adj) imply contradictory results Model ranking not accounting for Standard Error (Desharnais, FP adj). No confident interpretation is possible based on the Desharnais dataset and models D, F MMRE PRED(.25) 1. F D 2. D F MMRE PRED(.25) 1. F, D F, D 2. - -
Conclusions 1/2 We applied standard, easily analyzed and replicated statistical methods: Standard Error, Bootstrapping Approach has potential for increasing confidence in research results and cost estimation practice Use of Standard Error can help address: How can we meaningfully interpret intuitively appealing accuracy measure research results? How to make valid statistical inferences (i.e. significant) for results based on comparing PRED or MMRE values. Estimating how many data points are needed for confident results. 13 May 2008
Conclusions 2/2 The different behaviors of MMRE and PRED (Expansion of this in ESEM 2008 paper) Determination of an adequate sample size for model calibration. Understanding how sample size effects model accuracy. Can “bad” calibration data be identified? If doing model validation studies using random methods (such as Jackknife, holdouts, or bootstrap), how many iterations are needed for stable results? Why are some cost estimation study results contradictory and how can these be resolved?   13 May 2008
Invitation for collaboration ESEM08 paper: “Comparative Studies of the Model Evaluation Criterions MMRE and PRED in Software Cost Estimation Research” (Port, Korte) There is much interesting work still to be done in this area such as: Standard error studies of non-COCOMO models Refinement of “how much data is enough?” methods Standard error studies of the “deviation” problem (i.e. variance in model parameters) (Menzies, et al) Validation of model selection when reducing parameters (Menzies, et al) Applying standard statistical methods for model accuracy (e.g. MSE, least-likelihood estimators) As suggested by Tim Menzies, we are keen to “crowd source” this research so if this presentation has inspired you in some way, contact  Dan Port (dport@hawaii.edu)  and lets discuss possible collaborations! 13 May 2008
Thank you! 13 May 2008 Marcel Korte [email_address] Dan Port University of Hawai'i at Manoa Phone: +1-(808)-956-7494 [email_address]

More Related Content

PDF
Comparison between the genetic algorithms optimization and particle swarm opt...
PDF
Metasem: An R Package For Meta-Analysis Using Structural Equation Modelling: ...
PDF
Introduction to Model-Based Machine Learning for Transportation
PDF
A graph based consensus maximization approach for combining multiple supervis...
PDF
Statistical and Predictive Modelling
PPTX
Model Selection Techniques
PDF
Dotnet maximum likelihood estimation from uncertain data in the belief funct...
PPTX
ForecastIT 2. Linear Regression & Model Statistics
Comparison between the genetic algorithms optimization and particle swarm opt...
Metasem: An R Package For Meta-Analysis Using Structural Equation Modelling: ...
Introduction to Model-Based Machine Learning for Transportation
A graph based consensus maximization approach for combining multiple supervis...
Statistical and Predictive Modelling
Model Selection Techniques
Dotnet maximum likelihood estimation from uncertain data in the belief funct...
ForecastIT 2. Linear Regression & Model Statistics

Viewers also liked (10)

PPTX
Software Cost Estimation
PPTX
Issues in software cost estimation
PPTX
Software Size Estimation
PPT
Software Estimation Technique
PPT
Wideband Delphi Estimation
PPT
Software Cost Estimation in Software Engineering SE23
PPTX
Software cost estimation
PPT
Software cost estimation
PPT
Software cost estimation
 
PPT
Software cost estimation project
Software Cost Estimation
Issues in software cost estimation
Software Size Estimation
Software Estimation Technique
Wideband Delphi Estimation
Software Cost Estimation in Software Engineering SE23
Software cost estimation
Software cost estimation
Software cost estimation
 
Software cost estimation project
Ad

Similar to Confidence in Software Cost Estimation Results based on MMRE and PRED (20)

PDF
Accounting for variance in machine learning benchmarks
PDF
AI TESTING: ENSURING A GOOD DATA SPLIT BETWEEN DATA SETS (TRAINING AND TEST) ...
 
PDF
AI Testing: Ensuring a Good Data Split Between Data Sets (Training and Test) ...
 
PPTX
Imputation Techniques For Market Research Datasets With Missing Values
PDF
Software Cost Estimation Using Clustering and Ranking Scheme
PDF
PDF
Empirical Analysis of the Bias-Variance Tradeoff Across Machine Learning Models
 
PDF
Empirical Analysis of the Bias-Variance Tradeoff Across Machine Learning Models
 
PPTX
Pattern Recognization.pptx
PDF
Analysis of Common Supervised Learning Algorithms Through Application
PDF
ANALYSIS OF COMMON SUPERVISED LEARNING ALGORITHMS THROUGH APPLICATION
PDF
TECHNICAL REVIEW: PERFORMANCE OF EXISTING IMPUTATION METHODS FOR MISSING DATA...
 
PDF
TECHNICAL REVIEW: PERFORMANCE OF EXISTING IMPUTATION METHODS FOR MISSING DATA...
 
PDF
Analysis of Common Supervised Learning Algorithms Through Application
PDF
Anomaly detection via eliminating data redundancy and rectifying data error i...
PDF
IRJET- Predicting Customers Churn in Telecom Industry using Centroid Oversamp...
DOCX
Maximum likelihood estimation from uncertain
PDF
Simulation pitfalls p302023
PDF
A1802050102
PPT
2cee Master Cocomo20071
Accounting for variance in machine learning benchmarks
AI TESTING: ENSURING A GOOD DATA SPLIT BETWEEN DATA SETS (TRAINING AND TEST) ...
 
AI Testing: Ensuring a Good Data Split Between Data Sets (Training and Test) ...
 
Imputation Techniques For Market Research Datasets With Missing Values
Software Cost Estimation Using Clustering and Ranking Scheme
Empirical Analysis of the Bias-Variance Tradeoff Across Machine Learning Models
 
Empirical Analysis of the Bias-Variance Tradeoff Across Machine Learning Models
 
Pattern Recognization.pptx
Analysis of Common Supervised Learning Algorithms Through Application
ANALYSIS OF COMMON SUPERVISED LEARNING ALGORITHMS THROUGH APPLICATION
TECHNICAL REVIEW: PERFORMANCE OF EXISTING IMPUTATION METHODS FOR MISSING DATA...
 
TECHNICAL REVIEW: PERFORMANCE OF EXISTING IMPUTATION METHODS FOR MISSING DATA...
 
Analysis of Common Supervised Learning Algorithms Through Application
Anomaly detection via eliminating data redundancy and rectifying data error i...
IRJET- Predicting Customers Churn in Telecom Industry using Centroid Oversamp...
Maximum likelihood estimation from uncertain
Simulation pitfalls p302023
A1802050102
2cee Master Cocomo20071
Ad

More from gregoryg (20)

PDF
Community-Assisted Software Engineering Decision Making
PDF
The Robust Optimization of Non-Linear Requirements Models
PPTX
Finding Robust Solutions to Requirements Models
PDF
Distributed Decision Tree Induction
PDF
Irrf Presentation
PDF
Optimizing Requirements Decisions with KEYS
PPT
Promise08 Wrapup
PPT
Improving Analogy Software Effort Estimation using Fuzzy Feature Subset Selec...
PPT
Software Defect Repair Times: A Multiplicative Model
PPT
Complementing Approaches in ERP Effort Estimation Practice: an Industrial Study
PPT
Multi-criteria Decision Analysis for Customization of Estimation by Analogy M...
PPT
Implications of Ceiling Effects in Defect Predictors
PDF
Practical use of defect detection and prediction
PPT
Risk And Relevance 20080414ppt
PDF
Organizations Use Data
PPT
Cukic Promise08 V3
PPT
Boetticher Presentation Promise 2008v2
PPT
Elane - Promise08
PPT
Risk And Relevance 20080414ppt
PPT
Introduction Promise 2008 V3
Community-Assisted Software Engineering Decision Making
The Robust Optimization of Non-Linear Requirements Models
Finding Robust Solutions to Requirements Models
Distributed Decision Tree Induction
Irrf Presentation
Optimizing Requirements Decisions with KEYS
Promise08 Wrapup
Improving Analogy Software Effort Estimation using Fuzzy Feature Subset Selec...
Software Defect Repair Times: A Multiplicative Model
Complementing Approaches in ERP Effort Estimation Practice: an Industrial Study
Multi-criteria Decision Analysis for Customization of Estimation by Analogy M...
Implications of Ceiling Effects in Defect Predictors
Practical use of defect detection and prediction
Risk And Relevance 20080414ppt
Organizations Use Data
Cukic Promise08 V3
Boetticher Presentation Promise 2008v2
Elane - Promise08
Risk And Relevance 20080414ppt
Introduction Promise 2008 V3

Recently uploaded (20)

PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
KodekX | Application Modernization Development
 
PPTX
sap open course for s4hana steps from ECC to s4
PPTX
Spectroscopy.pptx food analysis technology
PPTX
MYSQL Presentation for SQL database connectivity
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PPTX
Big Data Technologies - Introduction.pptx
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Empathic Computing: Creating Shared Understanding
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
 
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Encapsulation theory and applications.pdf
PDF
cuic standard and advanced reporting.pdf
PPT
Teaching material agriculture food technology
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
KodekX | Application Modernization Development
 
sap open course for s4hana steps from ECC to s4
Spectroscopy.pptx food analysis technology
MYSQL Presentation for SQL database connectivity
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Big Data Technologies - Introduction.pptx
MIND Revenue Release Quarter 2 2025 Press Release
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Reach Out and Touch Someone: Haptics and Empathic Computing
Empathic Computing: Creating Shared Understanding
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
 
NewMind AI Weekly Chronicles - August'25 Week I
20250228 LYD VKU AI Blended-Learning.pptx
Building Integrated photovoltaic BIPV_UPV.pdf
Encapsulation theory and applications.pdf
cuic standard and advanced reporting.pdf
Teaching material agriculture food technology

Confidence in Software Cost Estimation Results based on MMRE and PRED

  • 1. Confidence in Software Cost Estimation Results based on MMRE and PRED Presentation for PROMISE 2008 Marcel Korte [email_address] Dan Port University of Hawai'i at Manoa Phone: +1-(808)-956-7494 [email_address]
  • 2. Table of Contents 13 May 2008 Introduction Approach The Standard Error Bootstrapping The Confidence intervals Datasets and models used Ex.: Bootstrapped MMREs Accounting for Standard Error How much confidence needed? The Desharnais Problem Conclusion Invitation for collaboration
  • 3. Introduction Large number of cost estimation research efforts over last 20+ years Still lack of confidence in such research results Average overrun of software projects is 30% - 40% (Moløkken, Jørgensen) Various studies show inconclusive and / or contradictory results 13 May 2008
  • 4. Approach Software cost estimation research is based on one or more datasets Yet datasets are samples , perhaps significantly biased, often outdated, and of questionable relevancy Empirical results, based on small datasets, are generalized to an entire population without considering the possible error inherent Question: How accurate is my accuracy? 13 May 2008
  • 5. The Standard Error Widely used in many fields of research and well understood Measure of the error in calculations based on sample population datasets Has not been used in the field of software cost estimation yet Many confusing, inconclusive, or contradictory results can be illuminated by indicating that we cannot “have confidence” in them. 13 May 2008
  • 6. Bootstrapping General problem: Distribution not known „ Computer intensive“ technique similar to Monte-Carlo method Resampling with replacement to „reconstruct“ the general population distribution Well-accepted, straightforward approach to approximating the standard error of an estimator We used 15,000 iterations in this study 13 May 2008
  • 7. The Confidence Intervals MREs are not normally distributed Underlying distribution is not known BC-percentile, or „bias corrected“ method has been shown effective in approximating confidence intervals for the available distributions 13 May 2008 Histogram of bootstrapped MMRE and log-transformed MMRE for model (A), NASA93 dataset
  • 8. Datasets and models used PROMISE Datasets: COCOMO81*, COCOMONASA, NASA93, and Desharnais* Models: A: ln_LSR_CAT** B: aSb C: given_EM D: ln_LSR_aSb E: ln_LSR_EM F: LSR_a+Sb * Some errors found and corrected in these datasets ** Purely statistical model 13 May 2008
  • 9. Bootstrapped MMRE intervals 1/2 13 May 2008 COCOMO81 dataset COCOMONASA dataset
  • 10. Bootstrapped MMRE intervals 2/2 13 May 2008 NASA93 dataset Desharnais dataset (*note only D & F used with FP raw and FP adj)
  • 11. Accounting for Standard Error 13 May 2008 Model ranking based on MMRE, not accounting for Standard Error. Model ranking based on MMRE, accounting for Standard Error at 95% confidence level. COCOMO81 COCOMONASA NASA93 1. A A A 2. E E E 3. C C C 4. B D B 5. D B D COCOMO81 COCOMONASA NASA93 1. A A A, B, C, D, E 2. C, E E - 3. B, D B, C, D - 4. - - - 5. - - -
  • 12. How much confidence needed? 13 May 2008 Bootstrapped PRED(.30) intervals with significant differences (32%-confidence level, COCOMONASA dataset)* *This a very crude example. There are more refined approaches that account for simultaneous (ANOVA like) comparisons Bootstrapped PRED(.30) intervals (COCOMONASA dataset)
  • 13. The Desharnais Problem 13 May 2008 Model ranking not accounting for Standard Error (Desharnais, FP adj) imply contradictory results Model ranking not accounting for Standard Error (Desharnais, FP adj). No confident interpretation is possible based on the Desharnais dataset and models D, F MMRE PRED(.25) 1. F D 2. D F MMRE PRED(.25) 1. F, D F, D 2. - -
  • 14. Conclusions 1/2 We applied standard, easily analyzed and replicated statistical methods: Standard Error, Bootstrapping Approach has potential for increasing confidence in research results and cost estimation practice Use of Standard Error can help address: How can we meaningfully interpret intuitively appealing accuracy measure research results? How to make valid statistical inferences (i.e. significant) for results based on comparing PRED or MMRE values. Estimating how many data points are needed for confident results. 13 May 2008
  • 15. Conclusions 2/2 The different behaviors of MMRE and PRED (Expansion of this in ESEM 2008 paper) Determination of an adequate sample size for model calibration. Understanding how sample size effects model accuracy. Can “bad” calibration data be identified? If doing model validation studies using random methods (such as Jackknife, holdouts, or bootstrap), how many iterations are needed for stable results? Why are some cost estimation study results contradictory and how can these be resolved? 13 May 2008
  • 16. Invitation for collaboration ESEM08 paper: “Comparative Studies of the Model Evaluation Criterions MMRE and PRED in Software Cost Estimation Research” (Port, Korte) There is much interesting work still to be done in this area such as: Standard error studies of non-COCOMO models Refinement of “how much data is enough?” methods Standard error studies of the “deviation” problem (i.e. variance in model parameters) (Menzies, et al) Validation of model selection when reducing parameters (Menzies, et al) Applying standard statistical methods for model accuracy (e.g. MSE, least-likelihood estimators) As suggested by Tim Menzies, we are keen to “crowd source” this research so if this presentation has inspired you in some way, contact Dan Port (dport@hawaii.edu) and lets discuss possible collaborations! 13 May 2008
  • 17. Thank you! 13 May 2008 Marcel Korte [email_address] Dan Port University of Hawai'i at Manoa Phone: +1-(808)-956-7494 [email_address]