SlideShare a Scribd company logo
PREDICTING POST-RELEASE
DEFECTS USING PRE-RELEASE
FIELD TESTING RESULTS

          Foutse Khomh, Brian
          Chan, Ying Zou



          Anand Sinha, Dave Dietz
FIELD TESTING CYCLE




Field testing is important to improve the quality of   2
 an application before release.
MEAN TIME BETWEEN
FAILURE




Mean Time Between Failures (MTBF) is frequently
 used to gauge the reliability of the application.

Applications with a low MTBF are undesirable
                                                     3
  since they would have a higher number of
                   defects
AVERAGE USAGE TIME
 AVT is the average time that a user actively uses the
 application.




 The AVT can be longer than the period of field testing.

 A longer AVT indicates that an application is
                                                         4
reliable and a user tends to use the application
                     longer.
PROBLEM STATEMENT
 MTBF and AVT cannot capture the whole
pattern of failure occurrences in the field testing
of an application.




                                                      5

The reliability of A and B is very different.
METRICS
We propose three metrics that capture additional
patterns of failure occurrences:

   TTFF: the average length of usage time before
  the occurrence of the first failure,

   FAR: the failure accumulation rating to gauge
  the spread of failures to the majority of users,
  and

  OFR: the overall failure ratio that captures
  daily rates of failures.                           6
AVERAGE TIME TO FIRST
                                FAILURE (TTFF)
                                                           VersionA
% of users reporting failures




                                0.45
                                 0.4
                                0.35
                                 0.3
                                0.25
                                 0.2
                                0.15
                                 0.1
                                0.05
                                  0                                                         7
                                       1   2   3   4   5   6   7   8   9   10 11 12 13 14
                                                       Days
AVERAGE TIME TO FIRST
                                FAILURE (TTFF)
                                                       VersionA       VersionB
% of users reporting failures




                                0.45
                                 0.4
                                0.35
                                 0.3
                                0.25
                                 0.2
                                0.15
                                 0.1
                                0.05
                                  0                                                            8
                                       1   2   3   4    5   6     7   8   9   10 11 12 13 14
                                                        Days
AVERAGE TIME TO FIRST
FAILURE (TTFF)




                 reporting failures
                                              VersionA    VersionB




                    % of users
                                      0.5
                                      0.4
                                      0.3
                                      0.2
                                      0.1
                                        0
                                            1 2 3 4 5 6 7 8 9 1011121314

                                                   Days



TTFF produces high scores for applications
where the majority of users experience the                            9
            first failure late.
AVERAGE TIME TO FIRST
                                FAILURE (TTFF)
                                                       VersionA       VersionB
                                0.45
% of users reporting failures




                                 0.4
                                0.35
                                 0.3
                                                                  TTFFB = 3.56
                                0.25
                                 0.2
                                0.15
                                                                  TTFFA = 6.11
                                 0.1
                                0.05
                                  0                                                            10
                                       1   2   3   4    5   6     7   8   9   10 11 12 13 14
                                                        Days
FAILURE ACCUMULATION
                       RATING (FAR)
                        1
                       0.9
                       0.8
                       0.7
% of users reporting




                       0.6
                       0.5
                                                                                VersionA
                       0.4
                       0.3
                       0.2
                       0.1
                        0                                                                  11
                             1   2   3   4   5   6   7   8   9 10 11 12 13 14
                                     Number of unique failures
FAILURE ACCUMULATION
                       RATING (FAR)
                        1
                       0.9
                       0.8
                       0.7
% of users reporting




                       0.6
                       0.5                                                      VersionA
                       0.4                                                      VersionB
                       0.3
                       0.2
                       0.1
                        0                                                                  12
                             1   2   3   4   5   6   7   8   9 10 11 12 13 14
                                     Number of unique failures
FAILURE ACCUMULATION
RATING (FAR) 1




                      % of users reporting
                                             0.8

                                             0.6

                                             0.4

                                             0.2

                                              0
                                                   1     3   5   7   9   11   13
                                                       Number of unique failures




   The FAR metric produces high scores for
                                                                               13
applications where the majority of users report
        a very low numbers of failures.
FAILURE ACCUMULATION
                       RATING (FAR)
                        1
                       0.9
                       0.8
                       0.7                                   FARB = 4.97
% of users reporting




                       0.6
                       0.5                                                      VersionA
                       0.4                                   FARA = 6.97        VersionB
                       0.3
                       0.2
                       0.1
                        0                                                                  14
                             1   2   3   4   5   6   7   8   9 10 11 12 13 14
                                     Number of unique failures
OVERALL FAILURE RATING
                                (OFR)
                                                           VersionA
                                0.35
% of users reporting failures




                                 0.3

                                0.25

                                 0.2

                                0.15

                                 0.1

                                0.05

                                  0                                                         15
                                       1   2   3   4   5   6   7   8   9   10 11 12 13 14
                                                       Days
OVERALL FAILURE RATING
                                (OFR)
                                                       VersionA       VersionB
                                0.35
% of users reporting failures




                                 0.3

                                0.25

                                 0.2

                                0.15

                                 0.1

                                0.05

                                  0                                                            16
                                       1   2   3   4    5   6     7   8   9   10 11 12 13 14
                                                        Days
OVERALL FAILURE RATING
(OFR)




                      % of users reporting
                                                   VersionA        VersionB
                                             0.4
                                             0.3
                                             0.2




                      failures
                                             0.1
                                              0
                                                   1   3   5   7   9   11 13
                                                           Days




 The OFR metric produces high scores for
                                                                              17
  applications with fewer users reporting
              failures overall.
OVERALL FAILURE RATING
                                (OFR)
                                                       VersionA       VersionB           OFRB = 0.78
                                0.35
% of users reporting failures




                                 0.3
                                                                                         OFRA = 0.93
                                0.25

                                 0.2

                                0.15

                                 0.1

                                0.05

                                  0                                                             18
                                       1   2   3   4    5   6     7   8   9   10 11 12 13 14
                                                        Days
CASE STUDY
We analyze 18 versions of an enterprise software
 application

 Overall 2,546 users were involved in the field
 testing

 The testing period lasted 30 days




                                                   19
SPEARMAN CORRELATION
OF THE METRICS

         TTFF    FAR     OFR     AVT     MTBF

  TTFF    1      0.09    -0.08   -0.31   -0.08

  FAR    0.09     1      0.07    0.33    -0.24

  OFR    -0.08   0.07     1      0.39    -0.54

  AVT    -0.31   0.33    0.39     1      -0.3

  MTBF   -0.08   -0.24   -0.54   -0.3     1      20
INDEPENDENCY AMONG
PROPOSED METRICS
  1
0.8
0.6
0.4
0.2                            TTFF
                               FAR
  0
       PC1   PC2   PC3   PC4   OFR
-0.2
                               MTBF
-0.4
-0.6
-0.8
                                      21
 -1
PREDICTIVE POWER FOR
                    POST-RELEASE DEFECTS
                    0.14

                    0.12

                     0.1
Marginal R-square
           square




                    0.08                                       6 months
                                                               1 year
                    0.06
                                                               2 years
                    0.04

                    0.02

                      0                                                   22
                           TTFF   FAR     OFR     AVT   MTBF
                                        Metrics
PRECISION OF PREDICTIONS
                WITH ALL FIVE METRICS
                100
                 90
                 80
                 70
                 60
                                                             6 months
                 50
Precision (%)




                                                             1 year
                 40
                                                             2 years
                 30
                 20
                 10
                  0                                                     23
                      5   10      15     20     25      30
                               Number of testing days
CONCLUSION
TTFF, FAR, and OFR complement the traditional
MTBF and AVT in predicting the number of post-
release defects

 Provide faster predictions of the number of post-
release defects with good precision within just 5
days of a pre-release testing period

It takes MTBF up to 25 days to predict the
number of post-release defects
                                                     24
25

More Related Content

PDF
Dynamic Analysis - SCOTCH: Improving Test-to-Code Traceability using Slicing ...
PDF
Metrics - Using Source Code Metrics to Predict Change-Prone Java Interfaces
PDF
Traceability - Structural Conformance Checking with Design Tests: An Evaluati...
PDF
Natural Language Analysis - Mining Java Class Naming Conventions
PDF
ICSM'01 Most Influential Paper - Rainer Koschke
PDF
Postdoc Symposium - Abram Hindle
PDF
Industry - Estimating software maintenance effort from use cases an indu...
PDF
Faults and Regression Testing - Fault interaction and its repercussions
Dynamic Analysis - SCOTCH: Improving Test-to-Code Traceability using Slicing ...
Metrics - Using Source Code Metrics to Predict Change-Prone Java Interfaces
Traceability - Structural Conformance Checking with Design Tests: An Evaluati...
Natural Language Analysis - Mining Java Class Naming Conventions
ICSM'01 Most Influential Paper - Rainer Koschke
Postdoc Symposium - Abram Hindle
Industry - Estimating software maintenance effort from use cases an indu...
Faults and Regression Testing - Fault interaction and its repercussions

Viewers also liked (20)

PDF
ERA - Measuring Disruption from Software Evolution Activities Using Graph-Bas...
PDF
ERA - Tracking Technical Debt
PDF
Natural Language Analysis - Expanding Identifiers to Normalize Source Code Vo...
PDF
Tutorial 2 - Practical Combinatorial (t-way) Methods for Detecting Complex Fa...
PDF
Industry - Precise Detection of Un-Initialized Variables in Large, Real-life ...
PDF
Components - Graph Based Detection of Library API Limitations
PDF
Richard Kemmerer Keynote icsm11
PDF
Lionel Briand ICSM 2011 Keynote
PDF
ERA Poster - Measuring Disruption from Software Evolution Activities Using Gr...
PDF
Industry - Evolution and migration - Incremental and Iterative Reengineering ...
PDF
Industry - Testing & Quality Assurance in Data Migration Projects
PDF
ERA - A Comparison of Stemmers on Source Code Identifiers for Software Search
PDF
Metrics - You can't control the unfamiliar
PDF
Industry - Relating Developers' Concepts and Artefact Vocabulary in a Financ...
PDF
Faults and Regression testing - Localizing Failure-Inducing Program Edits Bas...
PDF
Migration and Refactoring - Identifying Overly Strong Conditions in Refactori...
PDF
Industry - The Evolution of Information Systems. A Case Study on Document Man...
PDF
Components - Crossing the Boundaries while Analyzing Heterogeneous Component-...
PDF
Impact Analysis - ImpactScale: Quantifying Change Impact to Predict Faults in...
PDF
ERA - Clustering and Recommending Collections of Code Relevant to Task
ERA - Measuring Disruption from Software Evolution Activities Using Graph-Bas...
ERA - Tracking Technical Debt
Natural Language Analysis - Expanding Identifiers to Normalize Source Code Vo...
Tutorial 2 - Practical Combinatorial (t-way) Methods for Detecting Complex Fa...
Industry - Precise Detection of Un-Initialized Variables in Large, Real-life ...
Components - Graph Based Detection of Library API Limitations
Richard Kemmerer Keynote icsm11
Lionel Briand ICSM 2011 Keynote
ERA Poster - Measuring Disruption from Software Evolution Activities Using Gr...
Industry - Evolution and migration - Incremental and Iterative Reengineering ...
Industry - Testing & Quality Assurance in Data Migration Projects
ERA - A Comparison of Stemmers on Source Code Identifiers for Software Search
Metrics - You can't control the unfamiliar
Industry - Relating Developers' Concepts and Artefact Vocabulary in a Financ...
Faults and Regression testing - Localizing Failure-Inducing Program Edits Bas...
Migration and Refactoring - Identifying Overly Strong Conditions in Refactori...
Industry - The Evolution of Information Systems. A Case Study on Document Man...
Components - Crossing the Boundaries while Analyzing Heterogeneous Component-...
Impact Analysis - ImpactScale: Quantifying Change Impact to Predict Faults in...
ERA - Clustering and Recommending Collections of Code Relevant to Task
Ad

Similar to Reliability and Quality - Predicting post-release defects using pre-release field testing results (18)

PPTX
Slidecast Financial situation Unilever
PDF
Dp and causal analysis guideline
PDF
SPICE MODEL of 1SS393 (Standard Model) in SPICE PARK
PDF
Feasible study of a light weight prediction system in China
PDF
Database Health Check
PDF
SPICE MODEL of TLP521-4 SAMPLE B in SPICE PARK
PDF
C4 d15120a p
PDF
SPICE MODEL of C4D15120A LTspice Model (Professional Model) in SPICE PARK
PDF
C4 d15120a p
PDF
C4 d15120a p
PDF
SPICE MODEL of MA4S111 (Standard Model) in SPICE PARK
PDF
SPICE MODEL of MA4S111 (Standard Model) in SPICE PARK
PDF
SPICE MODEL of C4D40120D LTspice Model (Professional Model) in SPICE PARK
XLS
Venture Capitalist Competition
PDF
Failure Reporting Webex Slides - March 9, 2010
PDF
SPICE MODEL of 2FWJ42M (Standard Model) in SPICE PARK
PDF
Automotive UI 2011
PDF
SPICE MODEL of D3FS4A (Standard Model) in SPICE PARK
Slidecast Financial situation Unilever
Dp and causal analysis guideline
SPICE MODEL of 1SS393 (Standard Model) in SPICE PARK
Feasible study of a light weight prediction system in China
Database Health Check
SPICE MODEL of TLP521-4 SAMPLE B in SPICE PARK
C4 d15120a p
SPICE MODEL of C4D15120A LTspice Model (Professional Model) in SPICE PARK
C4 d15120a p
C4 d15120a p
SPICE MODEL of MA4S111 (Standard Model) in SPICE PARK
SPICE MODEL of MA4S111 (Standard Model) in SPICE PARK
SPICE MODEL of C4D40120D LTspice Model (Professional Model) in SPICE PARK
Venture Capitalist Competition
Failure Reporting Webex Slides - March 9, 2010
SPICE MODEL of 2FWJ42M (Standard Model) in SPICE PARK
Automotive UI 2011
SPICE MODEL of D3FS4A (Standard Model) in SPICE PARK
Ad

Recently uploaded (20)

PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PPTX
Big Data Technologies - Introduction.pptx
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Approach and Philosophy of On baking technology
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Empathic Computing: Creating Shared Understanding
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Mobile App Security Testing_ A Comprehensive Guide.pdf
Advanced methodologies resolving dimensionality complications for autism neur...
Understanding_Digital_Forensics_Presentation.pptx
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
MYSQL Presentation for SQL database connectivity
Network Security Unit 5.pdf for BCA BBA.
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Big Data Technologies - Introduction.pptx
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Programs and apps: productivity, graphics, security and other tools
The Rise and Fall of 3GPP – Time for a Sabbatical?
Approach and Philosophy of On baking technology
Per capita expenditure prediction using model stacking based on satellite ima...
Empathic Computing: Creating Shared Understanding
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
MIND Revenue Release Quarter 2 2025 Press Release
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Dropbox Q2 2025 Financial Results & Investor Presentation

Reliability and Quality - Predicting post-release defects using pre-release field testing results

  • 1. PREDICTING POST-RELEASE DEFECTS USING PRE-RELEASE FIELD TESTING RESULTS Foutse Khomh, Brian Chan, Ying Zou Anand Sinha, Dave Dietz
  • 2. FIELD TESTING CYCLE Field testing is important to improve the quality of 2 an application before release.
  • 3. MEAN TIME BETWEEN FAILURE Mean Time Between Failures (MTBF) is frequently used to gauge the reliability of the application. Applications with a low MTBF are undesirable 3 since they would have a higher number of defects
  • 4. AVERAGE USAGE TIME AVT is the average time that a user actively uses the application. The AVT can be longer than the period of field testing. A longer AVT indicates that an application is 4 reliable and a user tends to use the application longer.
  • 5. PROBLEM STATEMENT MTBF and AVT cannot capture the whole pattern of failure occurrences in the field testing of an application. 5 The reliability of A and B is very different.
  • 6. METRICS We propose three metrics that capture additional patterns of failure occurrences: TTFF: the average length of usage time before the occurrence of the first failure, FAR: the failure accumulation rating to gauge the spread of failures to the majority of users, and OFR: the overall failure ratio that captures daily rates of failures. 6
  • 7. AVERAGE TIME TO FIRST FAILURE (TTFF) VersionA % of users reporting failures 0.45 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 7 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Days
  • 8. AVERAGE TIME TO FIRST FAILURE (TTFF) VersionA VersionB % of users reporting failures 0.45 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 8 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Days
  • 9. AVERAGE TIME TO FIRST FAILURE (TTFF) reporting failures VersionA VersionB % of users 0.5 0.4 0.3 0.2 0.1 0 1 2 3 4 5 6 7 8 9 1011121314 Days TTFF produces high scores for applications where the majority of users experience the 9 first failure late.
  • 10. AVERAGE TIME TO FIRST FAILURE (TTFF) VersionA VersionB 0.45 % of users reporting failures 0.4 0.35 0.3 TTFFB = 3.56 0.25 0.2 0.15 TTFFA = 6.11 0.1 0.05 0 10 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Days
  • 11. FAILURE ACCUMULATION RATING (FAR) 1 0.9 0.8 0.7 % of users reporting 0.6 0.5 VersionA 0.4 0.3 0.2 0.1 0 11 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Number of unique failures
  • 12. FAILURE ACCUMULATION RATING (FAR) 1 0.9 0.8 0.7 % of users reporting 0.6 0.5 VersionA 0.4 VersionB 0.3 0.2 0.1 0 12 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Number of unique failures
  • 13. FAILURE ACCUMULATION RATING (FAR) 1 % of users reporting 0.8 0.6 0.4 0.2 0 1 3 5 7 9 11 13 Number of unique failures The FAR metric produces high scores for 13 applications where the majority of users report a very low numbers of failures.
  • 14. FAILURE ACCUMULATION RATING (FAR) 1 0.9 0.8 0.7 FARB = 4.97 % of users reporting 0.6 0.5 VersionA 0.4 FARA = 6.97 VersionB 0.3 0.2 0.1 0 14 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Number of unique failures
  • 15. OVERALL FAILURE RATING (OFR) VersionA 0.35 % of users reporting failures 0.3 0.25 0.2 0.15 0.1 0.05 0 15 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Days
  • 16. OVERALL FAILURE RATING (OFR) VersionA VersionB 0.35 % of users reporting failures 0.3 0.25 0.2 0.15 0.1 0.05 0 16 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Days
  • 17. OVERALL FAILURE RATING (OFR) % of users reporting VersionA VersionB 0.4 0.3 0.2 failures 0.1 0 1 3 5 7 9 11 13 Days The OFR metric produces high scores for 17 applications with fewer users reporting failures overall.
  • 18. OVERALL FAILURE RATING (OFR) VersionA VersionB OFRB = 0.78 0.35 % of users reporting failures 0.3 OFRA = 0.93 0.25 0.2 0.15 0.1 0.05 0 18 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Days
  • 19. CASE STUDY We analyze 18 versions of an enterprise software application Overall 2,546 users were involved in the field testing The testing period lasted 30 days 19
  • 20. SPEARMAN CORRELATION OF THE METRICS TTFF FAR OFR AVT MTBF TTFF 1 0.09 -0.08 -0.31 -0.08 FAR 0.09 1 0.07 0.33 -0.24 OFR -0.08 0.07 1 0.39 -0.54 AVT -0.31 0.33 0.39 1 -0.3 MTBF -0.08 -0.24 -0.54 -0.3 1 20
  • 21. INDEPENDENCY AMONG PROPOSED METRICS 1 0.8 0.6 0.4 0.2 TTFF FAR 0 PC1 PC2 PC3 PC4 OFR -0.2 MTBF -0.4 -0.6 -0.8 21 -1
  • 22. PREDICTIVE POWER FOR POST-RELEASE DEFECTS 0.14 0.12 0.1 Marginal R-square square 0.08 6 months 1 year 0.06 2 years 0.04 0.02 0 22 TTFF FAR OFR AVT MTBF Metrics
  • 23. PRECISION OF PREDICTIONS WITH ALL FIVE METRICS 100 90 80 70 60 6 months 50 Precision (%) 1 year 40 2 years 30 20 10 0 23 5 10 15 20 25 30 Number of testing days
  • 24. CONCLUSION TTFF, FAR, and OFR complement the traditional MTBF and AVT in predicting the number of post- release defects Provide faster predictions of the number of post- release defects with good precision within just 5 days of a pre-release testing period It takes MTBF up to 25 days to predict the number of post-release defects 24
  • 25. 25