Reliability and Quality - Predicting post-release defects using pre-release field testing results

PREDICTING POST-RELEASE
DEFECTS USING PRE-RELEASE
FIELD TESTING RESULTS

Foutse Khomh, Brian
Chan, Ying Zou

Anand Sinha, Dave Dietz

FIELD TESTING CYCLE

Field testing is important to improve the quality of 2
an application before release.

MEAN TIME BETWEEN
FAILURE

Mean Time Between Failures (MTBF) is frequently
used to gauge the reliability of the application.

Applications with a low MTBF are undesirable
3
since they would have a higher number of
defects

AVERAGE USAGE TIME
AVT is the average time that a user actively uses the
application.

The AVT can be longer than the period of field testing.

A longer AVT indicates that an application is
4
reliable and a user tends to use the application
longer.

PROBLEM STATEMENT
MTBF and AVT cannot capture the whole
pattern of failure occurrences in the field testing
of an application.

5

The reliability of A and B is very different.

METRICS
We propose three metrics that capture additional
patterns of failure occurrences:

TTFF: the average length of usage time before
the occurrence of the first failure,

FAR: the failure accumulation rating to gauge
the spread of failures to the majority of users,
and

OFR: the overall failure ratio that captures
daily rates of failures. 6

AVERAGE TIME TO FIRST
FAILURE (TTFF)
VersionA
% of users reporting failures

0.45
0.4
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0 7
1 2 3 4 5 6 7 8 9 10 11 12 13 14
Days

FAILURE (TTFF)
VersionA VersionB

0.45
0.4
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0 8
1 2 3 4 5 6 7 8 9 10 11 12 13 14
Days

FAILURE (TTFF)

reporting failures
VersionA VersionB

% of users
0.5
0.4
0.3
0.2
0.1
0
1 2 3 4 5 6 7 8 9 1011121314

Days

TTFF produces high scores for applications
where the majority of users experience the 9
first failure late.

FAILURE (TTFF)
VersionA VersionB
0.45

0.4
0.35
0.3
TTFFB = 3.56
0.25
0.2
0.15
TTFFA = 6.11
0.1
0.05
0 10
1 2 3 4 5 6 7 8 9 10 11 12 13 14
Days

FAILURE ACCUMULATION
RATING (FAR)
1
0.9
0.8
0.7
% of users reporting

0.6
0.5
VersionA
0.4
0.3
0.2
0.1
0 11
1 2 3 4 5 6 7 8 9 10 11 12 13 14
Number of unique failures

RATING (FAR)
1
0.9
0.8
0.7

0.6
0.5 VersionA
0.4 VersionB
0.3
0.2
0.1
0 12
1 2 3 4 5 6 7 8 9 10 11 12 13 14

RATING (FAR) 1

0.8

0.6

0.4

0.2

0
1 3 5 7 9 11 13

The FAR metric produces high scores for
13
applications where the majority of users report
a very low numbers of failures.

RATING (FAR)
1
0.9
0.8
0.7 FARB = 4.97

0.6
0.5 VersionA
0.4 FARA = 6.97 VersionB
0.3
0.2
0.1
0 14
1 2 3 4 5 6 7 8 9 10 11 12 13 14

OVERALL FAILURE RATING
(OFR)
VersionA
0.35

0.3

0.25

0.2

0.15

0.1

0.05

0 15
1 2 3 4 5 6 7 8 9 10 11 12 13 14
Days

(OFR)
VersionA VersionB
0.35

0.3

0.25

0.2

0.15

0.1

0.05

0 16
1 2 3 4 5 6 7 8 9 10 11 12 13 14
Days

(OFR)

VersionA VersionB
0.4
0.3
0.2

failures
0.1
0
1 3 5 7 9 11 13
Days

The OFR metric produces high scores for
17
applications with fewer users reporting
failures overall.

(OFR)
VersionA VersionB OFRB = 0.78
0.35

0.3
OFRA = 0.93
0.25

0.2

0.15

0.1

0.05

0 18
1 2 3 4 5 6 7 8 9 10 11 12 13 14
Days

CASE STUDY
We analyze 18 versions of an enterprise software
application

Overall 2,546 users were involved in the field
testing

The testing period lasted 30 days

19

SPEARMAN CORRELATION
OF THE METRICS

TTFF FAR OFR AVT MTBF

TTFF 1 0.09 -0.08 -0.31 -0.08

FAR 0.09 1 0.07 0.33 -0.24

OFR -0.08 0.07 1 0.39 -0.54

AVT -0.31 0.33 0.39 1 -0.3

MTBF -0.08 -0.24 -0.54 -0.3 1 20

INDEPENDENCY AMONG
PROPOSED METRICS
1
0.8
0.6
0.4
0.2 TTFF
FAR
0
PC1 PC2 PC3 PC4 OFR
-0.2
MTBF
-0.4
-0.6
-0.8
21
-1

PREDICTIVE POWER FOR
POST-RELEASE DEFECTS
0.14

0.12

0.1
Marginal R-square
square

0.08 6 months
1 year
0.06
2 years
0.04

0.02

0 22
TTFF FAR OFR AVT MTBF
Metrics

PRECISION OF PREDICTIONS
WITH ALL FIVE METRICS
100
90
80
70
60
6 months
50
Precision (%)

1 year
40
2 years
30
20
10
0 23
5 10 15 20 25 30
Number of testing days

CONCLUSION
TTFF, FAR, and OFR complement the traditional
MTBF and AVT in predicting the number of post-
release defects

Provide faster predictions of the number of post-
release defects with good precision within just 5
days of a pre-release testing period

It takes MTBF up to 25 days to predict the
number of post-release defects
24

Reliability and Quality - Predicting post-release defects using pre-release field testing results

More Related Content

Viewers also liked (20)

Similar to Reliability and Quality - Predicting post-release defects using pre-release field testing results (18)

Recently uploaded (20)

Reliability and Quality - Predicting post-release defects using pre-release field testing results