Personalized Defect Prediction

Personalized
Defect Prediction

Tian Jiang

Lin Tan

University of
Waterloo

University of
Waterloo

Sunghun Kim
Hong Kong University of
Science and Technology

1

How to Find Bugs?
• Code Review
• Testing
• Static Analysis
• Dynamic Analysis
• Veriﬁcation
• Defect Prediction
2
2

Defect Prediction

Software
History

Predictor

Future
Defect

3
3

Developers are Different
Modulo %

FOR

Bitwise OR

CONTINUE

% of Buggy Changes

80
60
40
20
0

A

B

C

D

Average
Linux Kernel, 2005-2010

4
4

Developers are Different
Modulo %

FOR

Bitwise OR

CONTINUE

% of Buggy Changes

80
60
40
20
0

A

B

C

D

Average
Linux Kernel, 2005-2010

Personalized models can improve performance.
4
4

Successes in Other Fields

5
5


•

Google personalized search

5
5


•
•

Google personalized search
Facebook personalized ad placement

5
5

Contributions
•

Personalized Change Classiﬁcation (PCC)
✦ One model for each developer

6
6

Contributions
•


•

Conﬁdence-based Hybrid PCC (PCC+)
✦ Picks predictions with highest conﬁdence

6
6

Contributions
•


•

Conﬁdence-based Hybrid PCC (PCC+)
✦ Picks predictions with highest conﬁdence

•

Evaluate on six C and Java projects
✦ Find up to 155 more bugs by inspecting
20% LOC
✦ Improve F1 by up to 0.08
6
6

What is a Change?
Commit: 09a02f...
Author: John Smith
Message: I submitted some code.
file1.c
+
+
+
-

file2.c
+
-

file3.c
+
+
-

7
7

What is a Change?

Commit

Commit: 09a02f...
Author: John Smith
file1.c
+
+
+
-

file2.c
+
-

file3.c
+
+
-

Change 1 Change 2 Change 3

7
7

What is a Change?

Commit

Commit: 09a02f...
Author: John Smith
file1.c
+
+
+
-

file2.c
+
-

file3.c
+
+
-

Change 1 Change 2 Change 3
Change-Level: Inspect less code to locate a bug.
7
7

Change Classiﬁcation (CC)

8
8

Training Phase

Prediction Phase

Software
History

8
8

Training Phase

Software
History

Prediction Phase

Training
Instances

1. Label changes
with clean or buggy
8
8

Training Phase

Software
History

Training
Instances

1. Label changes
with clean or buggy

Prediction Phase

Features
2. Extract
features

8
8

Training Phase

Software
History

Training
Instances

1. Label changes
with clean or buggy

Prediction Phase

Features
2. Extract
features

Classiﬁcation
Algorithm

Model

3. Build prediction
model
8
8

Training Phase

Software
History

Training
Instances

1. Label changes
with clean or buggy

Prediction Phase

Features
2. Extract
features

Classiﬁcation
Algorithm

3. Build prediction
model

Model

Future
Instances

4. Predict

8
8

Label Clean or Buggy
[Sliwerski et al. ’05]
Revision History

9
9

Revision History
Bug-Fixing Change
Commit: 1da57...
Message: I fixed a bug
fileA.c
- if (i < 128)
+if (i <= 128)
Contain keyword “fix”, or
ID of manually verified bug report [Herzif et al. ’13]

9
9

Revision History
Buggy Change

Bug-Fixing Change

Commit: 7a3bc...
Message: new feature
fileA.c
+...
+if (i < 128)
+...

Commit: 1da57...
Message: I fixed a bug
fileA.c

Fixed by a later change

git blame

- if (i < 128)
+if (i <= 128)
Contain keyword “fix”, or
ID of manually verified bug report [Herzif et al. ’13]

9
9

Three Types of Features

10
10

Three Types of Features

• Metadata
• Bag-of-Words
• Characteristic Vector

10
10

Characteristic Vector
Count Abstract Syntax Tree (AST) nodes

11
11

for (...; ...; ...) {
for (...; ...; ...) {
if (...) ...;
}
}

11
11

for:
if:
while:
...

for (...; ...; ...) {
for (...; ...; ...) {
if (...) ...;
}
}

11
11

for:
if:
while:
...

for (...; ...; ...) {
for (...; ...; ...) {
if (...) ...;
}
}

2
1
0

11
11

CC: Training

Training Instances

Model

12
12

CC: Prediction

Unlabeled
Changes

13
13

CC: Prediction

Unlabeled
Changes

Model

Predicted
Changes

13
13

PCC: Training

Training Instances

14
14

PCC: Training

Dev 1

Training Instances

Dev 2

Dev 3
Group Changes by Developer
14
14

PCC: Training
Model 1
Dev 1

Model 2
Training Instances

Dev 2

Model 3
Dev 3
Group Changes by Developer

Training
14
14

PCC: Prediction
Model 1

Model 2

Model 3

15
15

PCC: Prediction
Model 1

Model 2
(Dev 2)
Model 3
Choose a Model by Developer

15
15

PCC: Prediction
Model 1

Model 2
(Dev 2)
Model 3
Choose a Model by Developer

Prediction

15
15

PCC+: Prediction
Combiner

CC

PCC
Feed Changes to All Models

Prediction

16
16

Conﬁdence Measure
•

Bugginess
✦ Probability of a change being buggy

17
17

Confidence Measure
•

Bugginess

•

Confidence Measure
✦ Comparable measure of confidence

17
17

Confidence Measure
•

Bugginess

•

Confidence Measure
✦ Comparable measure of confidence

•

Select the prediction with the highest confidence.

17
17

Research Questions
•

RQ1: Do PCC and PCC+ outperform CC?

18
18

Research Questions
•
•

RQ1: Do PCC and PCC+ outperform CC?
RQ2: Does PCC outperform CC in other setups?
✦ Classiﬁcation algorithms
✦ Sizes of training sets

18
18

Two Metrics
•

F1-Score
✦ Harmonic mean of precision and recall

19
19

Two Metrics
•

F1-Score
✦ Harmonic mean of precision and recall

•

Cost Effectiveness
✦ Relevant in cost sensitive scenarios
✦ NofB20: Number of Bugs discovered by
inspecting top 20% lines of code

19
19

Cost Effectiveness
Cumulative LOC

Changes

LOC

10%

Buggy #1

10

15%

Buggy #2

5

19%

Buggy #3

4

27%

Buggy #4

8

Buggy #5

12

...

...
100

20
20

Cost Effectiveness
Cumulative LOC

Changes

LOC

10%

Buggy #1

10

15%

Buggy #2

5

19%

Buggy #3

4

27%

Buggy #4

8

Buggy #5

12

...

...
100

21
21

Cost Effectiveness
Cumulative LOC

10%
15%
19%
27%

Changes

LOC

ug
Buggy #1B
e
ru
T

10

Buggy #2

5

ug
Buggy #3B
e
ru
T
ug
Buggy #4B
e
ru
T

4
8

Buggy #5

12

...

...

NofB20=3

100

21
21

Test Subjects
Projects

Language

LOC

# of Changes

Linux kernel

C

7.3M

429K

PostgreSQL

C

289K

89K

Xorg

C

1.1M

46K

Eclipse

Java

1.5M

73K

Lucene*

Java

828K

76K

Jackrabbit*

Java

589K

61K

* With manually labelled bug report data [Herzif et al. ’13]
22
22

PCC/PCC+ vs. CC
Decision Tree, NofB20

23
23

PCC/PCC+ vs. CC
Decision Tree, NofB20
Projects

CC

PCC

Delta

PCC+

Delta

Linux

160

179

+19

172

+12

PostgreSQL

55

210

+155

175

+120

Xorg

96

159

+63

161

+65

Eclipse

116

207

+91

200

+84

Lucene

177

254

+77

257

+80

Jackrabbit

411

449

+38

459

+48

Average

-

-

+74

-

+68

Statistical signiﬁcant deltas are in bold.

23
23

PCC/PCC+ outperforms CC.

24
24

Different Classiﬁcation Alg.
NofB20
Projects

Naive Bayes

Logistic Regression

CC

PCC

Delta

CC

PCC

Delta

Linux

138

147

+9

102

137

+35

PostgreSQL

89

113

+24

46

56

+10

Xorg

84

101

+17

52

29

-23

Eclipse

65

108

+43

54

55

+1

Lucene

152

139

-13

30

200

+170

Jackrabbit

420

414

-6

261

370

+109

Average

-

-

+12

-

-

+59

Statistical signiﬁcant deltas are in bold.
25
25

Different Training Set Sizes
PCC

CC

300

NofB20

250
200
150
100

10

20

30

40

50

60

70

80

90

Training Set Size Per Developer

26
26

The improvement presents in
other setups.

27
27

Related Work

•

Kim et al., Classifying software changes: Clean or
buggy?, TSE ’08

•

Bettenburg et al., Think locally, act globally: Improving
defect and effort prediction models, MSR ’12

28
28

Conclusions & Future Work
•
•

PCC and PCC+ improve prediction performance.

•

Personalized approach can be applied to other ﬁelds.

The improvement presents in other setups.

✦ Recommendation systems
✦ Vulnerability prediction
✦ Top crashes prediction
29
29

Personalized Defect Prediction

More Related Content

What's hot (20)

Similar to Personalized Defect Prediction (20)

More from Sung Kim (18)

Recently uploaded (20)

Personalized Defect Prediction