Confidence in Software Cost Estimation Results based on MMRE and PRED

Confidence in Software Cost Estimation Results based on MMRE and PRED Presentation for PROMISE 2008 Marcel Korte [email_address] Dan Port University of Hawai'i at Manoa Phone: +1-(808)-956-7494 [email_address]

Table of Contents 13 May 2008 Introduction Approach The Standard Error Bootstrapping The Confidence intervals Datasets and models used Ex.: Bootstrapped MMREs Accounting for Standard Error How much confidence needed? The Desharnais Problem Conclusion Invitation for collaboration

Introduction Large number of cost estimation research efforts over last 20+ years Still lack of confidence in such research results Average overrun of software projects is 30% - 40% (Moløkken, Jørgensen) Various studies show inconclusive and / or contradictory results 13 May 2008

Approach Software cost estimation research is based on one or more datasets Yet datasets are samples , perhaps significantly biased, often outdated, and of questionable relevancy Empirical results, based on small datasets, are generalized to an entire population without considering the possible error inherent Question: How accurate is my accuracy? 13 May 2008

The Standard Error Widely used in many fields of research and well understood Measure of the error in calculations based on sample population datasets Has not been used in the field of software cost estimation yet Many confusing, inconclusive, or contradictory results can be illuminated by indicating that we cannot “have confidence” in them. 13 May 2008

Bootstrapping General problem: Distribution not known „ Computer intensive“ technique similar to Monte-Carlo method Resampling with replacement to „reconstruct“ the general population distribution Well-accepted, straightforward approach to approximating the standard error of an estimator We used 15,000 iterations in this study 13 May 2008

The Confidence Intervals MREs are not normally distributed Underlying distribution is not known BC-percentile, or „bias corrected“ method has been shown effective in approximating confidence intervals for the available distributions 13 May 2008 Histogram of bootstrapped MMRE and log-transformed MMRE for model (A), NASA93 dataset

Datasets and models used PROMISE Datasets: COCOMO81*, COCOMONASA, NASA93, and Desharnais* Models: A: ln_LSR_CAT** B: aSb C: given_EM D: ln_LSR_aSb E: ln_LSR_EM F: LSR_a+Sb * Some errors found and corrected in these datasets ** Purely statistical model 13 May 2008

Bootstrapped MMRE intervals 1/2 13 May 2008 COCOMO81 dataset COCOMONASA dataset

Bootstrapped MMRE intervals 2/2 13 May 2008 NASA93 dataset Desharnais dataset (*note only D & F used with FP raw and FP adj)

Accounting for Standard Error 13 May 2008 Model ranking based on MMRE, not accounting for Standard Error. Model ranking based on MMRE, accounting for Standard Error at 95% confidence level. COCOMO81 COCOMONASA NASA93 1. A A A 2. E E E 3. C C C 4. B D B 5. D B D COCOMO81 COCOMONASA NASA93 1. A A A, B, C, D, E 2. C, E E - 3. B, D B, C, D - 4. - - - 5. - - -

How much confidence needed? 13 May 2008 Bootstrapped PRED(.30) intervals with significant differences (32%-confidence level, COCOMONASA dataset)* *This a very crude example. There are more refined approaches that account for simultaneous (ANOVA like) comparisons Bootstrapped PRED(.30) intervals (COCOMONASA dataset)

The Desharnais Problem 13 May 2008 Model ranking not accounting for Standard Error (Desharnais, FP adj) imply contradictory results Model ranking not accounting for Standard Error (Desharnais, FP adj). No confident interpretation is possible based on the Desharnais dataset and models D, F MMRE PRED(.25) 1. F D 2. D F MMRE PRED(.25) 1. F, D F, D 2. - -

Conclusions 1/2 We applied standard, easily analyzed and replicated statistical methods: Standard Error, Bootstrapping Approach has potential for increasing confidence in research results and cost estimation practice Use of Standard Error can help address: How can we meaningfully interpret intuitively appealing accuracy measure research results? How to make valid statistical inferences (i.e. significant) for results based on comparing PRED or MMRE values. Estimating how many data points are needed for confident results. 13 May 2008

Conclusions 2/2 The different behaviors of MMRE and PRED (Expansion of this in ESEM 2008 paper) Determination of an adequate sample size for model calibration. Understanding how sample size effects model accuracy. Can “bad” calibration data be identified? If doing model validation studies using random methods (such as Jackknife, holdouts, or bootstrap), how many iterations are needed for stable results? Why are some cost estimation study results contradictory and how can these be resolved? 13 May 2008

Invitation for collaboration ESEM08 paper: “Comparative Studies of the Model Evaluation Criterions MMRE and PRED in Software Cost Estimation Research” (Port, Korte) There is much interesting work still to be done in this area such as: Standard error studies of non-COCOMO models Refinement of “how much data is enough?” methods Standard error studies of the “deviation” problem (i.e. variance in model parameters) (Menzies, et al) Validation of model selection when reducing parameters (Menzies, et al) Applying standard statistical methods for model accuracy (e.g. MSE, least-likelihood estimators) As suggested by Tim Menzies, we are keen to “crowd source” this research so if this presentation has inspired you in some way, contact Dan Port (dport@hawaii.edu) and lets discuss possible collaborations! 13 May 2008

Thank you! 13 May 2008 Marcel Korte [email_address] Dan Port University of Hawai'i at Manoa Phone: +1-(808)-956-7494 [email_address]

Confidence in Software Cost Estimation Results based on MMRE and PRED

More Related Content

Viewers also liked (10)

Similar to Confidence in Software Cost Estimation Results based on MMRE and PRED (20)

More from gregoryg (20)

Recently uploaded (20)

Confidence in Software Cost Estimation Results based on MMRE and PRED