SlideShare a Scribd company logo
Personalized
Defect Prediction

Tian Jiang

Lin Tan

University of
Waterloo

University of
Waterloo

Sunghun Kim
Hong Kong University of
Science and Technology

1
How to Find Bugs?
• Code Review
• Testing
• Static Analysis
• Dynamic Analysis
• Verification
• Defect Prediction
2
2
Defect Prediction

Software
History

Predictor

Future
Defect

3
3
Developers are Different

4
4
Developers are Different
Modulo %

FOR

Bitwise OR

CONTINUE

% of Buggy Changes

80
60
40
20
0

A

B

C

D

Average
Linux Kernel, 2005-2010

4
4
Developers are Different
Modulo %

FOR

Bitwise OR

CONTINUE

% of Buggy Changes

80
60
40
20
0

A

B

C

D

Average
Linux Kernel, 2005-2010

4
4
Developers are Different
Modulo %

FOR

Bitwise OR

CONTINUE

% of Buggy Changes

80
60
40
20
0

A

B

C

D

Average
Linux Kernel, 2005-2010

Personalized models can improve performance.
4
4
Successes in Other Fields

5
5
Successes in Other Fields

•

Google personalized search

5
5
Successes in Other Fields

•
•

Google personalized search
Facebook personalized ad placement

5
5
Contributions

6
6
Contributions
•

Personalized Change Classification (PCC)
✦ One model for each developer

6
6
Contributions
•

Personalized Change Classification (PCC)
✦ One model for each developer

•

Confidence-based Hybrid PCC (PCC+)
✦ Picks predictions with highest confidence

6
6
Contributions
•

Personalized Change Classification (PCC)
✦ One model for each developer

•

Confidence-based Hybrid PCC (PCC+)
✦ Picks predictions with highest confidence

•

Evaluate on six C and Java projects
✦ Find up to 155 more bugs by inspecting
20% LOC
✦ Improve F1 by up to 0.08
6
6
What is a Change?

7
7
What is a Change?
Commit: 09a02f...
Author: John Smith
Message: I submitted some code.
file1.c
+
+
+
-

file2.c
+
-

file3.c
+
+
-

7
7
What is a Change?

Commit

Commit: 09a02f...
Author: John Smith
Message: I submitted some code.
file1.c
+
+
+
-

file2.c
+
-

file3.c
+
+
-

Change 1 Change 2 Change 3

7
7
What is a Change?

Commit

Commit: 09a02f...
Author: John Smith
Message: I submitted some code.
file1.c
+
+
+
-

file2.c
+
-

file3.c
+
+
-

Change 1 Change 2 Change 3
Change-Level: Inspect less code to locate a bug.
7
7
Change Classification (CC)

8
8
Change Classification (CC)
Training Phase

Prediction Phase

Software
History

8
8
Change Classification (CC)
Training Phase

Software
History

Prediction Phase

Training
Instances

1. Label changes
with clean or buggy
8
8
Change Classification (CC)
Training Phase

Software
History

Training
Instances

1. Label changes
with clean or buggy

Prediction Phase

Features
2. Extract
features

8
8
Change Classification (CC)
Training Phase

Software
History

Training
Instances

1. Label changes
with clean or buggy

Prediction Phase

Features
2. Extract
features

Classification
Algorithm

Model

3. Build prediction
model
8
8
Change Classification (CC)
Training Phase

Software
History

Training
Instances

1. Label changes
with clean or buggy

Prediction Phase

Features
2. Extract
features

Classification
Algorithm

3. Build prediction
model

Model

Future
Instances

4. Predict

8
8
Label Clean or Buggy

9
9
Label Clean or Buggy
[Sliwerski et al. ’05]
Revision History

9
9
Label Clean or Buggy
[Sliwerski et al. ’05]
Revision History
Bug-Fixing Change
Commit: 1da57...
Message: I fixed a bug
fileA.c
- if (i < 128)
+if (i <= 128)
Contain keyword “fix”, or
ID of manually verified bug report [Herzif et al. ’13]

9
9
Label Clean or Buggy
[Sliwerski et al. ’05]
Revision History
Buggy Change

Bug-Fixing Change

Commit: 7a3bc...
Message: new feature
fileA.c
+...
+if (i < 128)
+...

Commit: 1da57...
Message: I fixed a bug
fileA.c

Fixed by a later change

git blame

- if (i < 128)
+if (i <= 128)
Contain keyword “fix”, or
ID of manually verified bug report [Herzif et al. ’13]

9
9
Three Types of Features

10
10
Three Types of Features

• Metadata
• Bag-of-Words
• Characteristic Vector

10
10
Characteristic Vector

11
11
Characteristic Vector
Count Abstract Syntax Tree (AST) nodes

11
11
Characteristic Vector
Count Abstract Syntax Tree (AST) nodes
for (...; ...; ...) {
for (...; ...; ...) {
if (...) ...;
}
}

11
11
Characteristic Vector
Count Abstract Syntax Tree (AST) nodes
for:
if:
while:
...

for (...; ...; ...) {
for (...; ...; ...) {
if (...) ...;
}
}

11
11
Characteristic Vector
Count Abstract Syntax Tree (AST) nodes
for:
if:
while:
...

for (...; ...; ...) {
for (...; ...; ...) {
if (...) ...;
}
}

2
1
0

11
11
CC: Training

12
12
CC: Training

Training Instances

Model

12
12
CC: Training

Training Instances

Model

12
12
CC: Prediction

Unlabeled
Changes

13
13
CC: Prediction

Unlabeled
Changes

Model

Predicted
Changes

13
13
PCC: Training

14
14
PCC: Training

Training Instances

14
14
PCC: Training

Dev 1

Training Instances

Dev 2

Dev 3
Group Changes by Developer
14
14
PCC: Training
Model 1
Dev 1

Model 2
Training Instances

Dev 2

Model 3
Dev 3
Group Changes by Developer

Training
14
14
PCC: Prediction
Model 1

Model 2

Model 3

15
15
PCC: Prediction
Model 1

Model 2
(Dev 2)
Model 3
Choose a Model by Developer

15
15
PCC: Prediction
Model 1

Model 2
(Dev 2)
Model 3
Choose a Model by Developer

Prediction

15
15
PCC+: Prediction

16
16
PCC+: Prediction
Combiner

CC

PCC
Feed Changes to All Models

Prediction

16
16
Confidence Measure

17
17
Confidence Measure
•

Bugginess
✦ Probability of a change being buggy

17
17
Confidence Measure
•

Bugginess
✦ Probability of a change being buggy

•

Confidence Measure
✦ Comparable measure of confidence

17
17
Confidence Measure
•

Bugginess
✦ Probability of a change being buggy

•

Confidence Measure
✦ Comparable measure of confidence

•

Select the prediction with the highest confidence.

17
17
Research Questions

18
18
Research Questions
•

RQ1: Do PCC and PCC+ outperform CC?

18
18
Research Questions
•
•

RQ1: Do PCC and PCC+ outperform CC?
RQ2: Does PCC outperform CC in other setups?
✦ Classification algorithms
✦ Sizes of training sets

18
18
Two Metrics

19
19
Two Metrics
•

F1-Score
✦ Harmonic mean of precision and recall

19
19
Two Metrics
•

F1-Score
✦ Harmonic mean of precision and recall

•

Cost Effectiveness
✦ Relevant in cost sensitive scenarios
✦ NofB20: Number of Bugs discovered by
inspecting top 20% lines of code

19
19
Cost Effectiveness
Cumulative LOC

Changes

LOC

10%

Buggy #1

10

15%

Buggy #2

5

19%

Buggy #3

4

27%

Buggy #4

8

Buggy #5

12

...

...
100

20
20
Cost Effectiveness
Cumulative LOC

Changes

LOC

10%

Buggy #1

10

15%

Buggy #2

5

19%

Buggy #3

4

27%

Buggy #4

8

Buggy #5

12

...

...
100

20
20
Cost Effectiveness
Cumulative LOC

Changes

LOC

10%

Buggy #1

10

15%

Buggy #2

5

19%

Buggy #3

4

27%

Buggy #4

8

Buggy #5

12

...

...
100

20
20
Cost Effectiveness
Cumulative LOC

Changes

LOC

10%

Buggy #1

10

15%

Buggy #2

5

19%

Buggy #3

4

27%

Buggy #4

8

Buggy #5

12

...

...
100

20
20
Cost Effectiveness
Cumulative LOC

Changes

LOC

10%

Buggy #1

10

15%

Buggy #2

5

19%

Buggy #3

4

27%

Buggy #4

8

Buggy #5

12

...

...
100

20
20
Cost Effectiveness
Cumulative LOC

Changes

LOC

10%

Buggy #1

10

15%

Buggy #2

5

19%

Buggy #3

4

27%

Buggy #4

8

Buggy #5

12

...

...
100

20
20
Cost Effectiveness
Cumulative LOC

Changes

LOC

10%

Buggy #1

10

15%

Buggy #2

5

19%

Buggy #3

4

27%

Buggy #4

8

Buggy #5

12

...

...
100

21
21
Cost Effectiveness
Cumulative LOC

10%
15%
19%
27%

Changes

LOC

ug
Buggy #1B
e
ru
T

10

Buggy #2

5

ug
Buggy #3B
e
ru
T
ug
Buggy #4B
e
ru
T

4
8

Buggy #5

12

...

...

NofB20=3

100

21
21
Test Subjects
Projects

Language

LOC

# of Changes

Linux kernel

C

7.3M

429K

PostgreSQL

C

289K

89K

Xorg

C

1.1M

46K

Eclipse

Java

1.5M

73K

Lucene*

Java

828K

76K

Jackrabbit*

Java

589K

61K

* With manually labelled bug report data [Herzif et al. ’13]
22
22
PCC/PCC+ vs. CC
Decision Tree, NofB20

23
23
PCC/PCC+ vs. CC
Decision Tree, NofB20
Projects

CC

PCC

Delta

PCC+

Delta

Linux

160

179

+19

172

+12

PostgreSQL

55

210

+155

175

+120

Xorg

96

159

+63

161

+65

Eclipse

116

207

+91

200

+84

Lucene

177

254

+77

257

+80

Jackrabbit

411

449

+38

459

+48

Average

-

-

+74

-

+68

Statistical significant deltas are in bold.

23
23
PCC/PCC+ outperforms CC.

24
24
Different Classification Alg.
NofB20
Projects

Naive Bayes

Logistic Regression

CC

PCC

Delta

CC

PCC

Delta

Linux

138

147

+9

102

137

+35

PostgreSQL

89

113

+24

46

56

+10

Xorg

84

101

+17

52

29

-23

Eclipse

65

108

+43

54

55

+1

Lucene

152

139

-13

30

200

+170

Jackrabbit

420

414

-6

261

370

+109

Average

-

-

+12

-

-

+59

Statistical significant deltas are in bold.
25
25
Different Classification Alg.
NofB20
Projects

Naive Bayes

Logistic Regression

CC

PCC

Delta

CC

PCC

Delta

Linux

138

147

+9

102

137

+35

PostgreSQL

89

113

+24

46

56

+10

Xorg

84

101

+17

52

29

-23

Eclipse

65

108

+43

54

55

+1

Lucene

152

139

-13

30

200

+170

Jackrabbit

420

414

-6

261

370

+109

Average

-

-

+12

-

-

+59

Statistical significant deltas are in bold.
25
25
Different Training Set Sizes
PCC

CC

300

NofB20

250
200
150
100

10

20

30

40

50

60

70

80

90

Training Set Size Per Developer

26
26
Different Training Set Sizes
PCC

CC

300

NofB20

250
200
150
100

10

20

30

40

50

60

70

80

90

Training Set Size Per Developer

26
26
The improvement presents in
other setups.

27
27
Related Work

•

Kim et al., Classifying software changes: Clean or
buggy?, TSE ’08

•

Bettenburg et al., Think locally, act globally: Improving
defect and effort prediction models, MSR ’12

28
28
Conclusions & Future Work
•
•

PCC and PCC+ improve prediction performance.

•

Personalized approach can be applied to other fields.

The improvement presents in other setups.

✦ Recommendation systems
✦ Vulnerability prediction
✦ Top crashes prediction
29
29

More Related Content

PPTX
Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)
PPTX
REMI: Defect Prediction for Efficient API Testing (

ESEC/FSE 2015, Industria...
PPT
Crowd debugging (FSE 2015)
PPTX
CrashLocator: Locating Crashing Faults Based on Crash Stacks (ISSTA 2014)
PPTX
STAR: Stack Trace based Automatic Crash Reproduction
PPTX
Survey on Software Defect Prediction
PDF
Partitioning Composite Code Changes to Facilitate Code Review (MSR2015)
PPTX
Software Defect Prediction on Unlabeled Datasets
Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)
REMI: Defect Prediction for Efficient API Testing (

ESEC/FSE 2015, Industria...
Crowd debugging (FSE 2015)
CrashLocator: Locating Crashing Faults Based on Crash Stacks (ISSTA 2014)
STAR: Stack Trace based Automatic Crash Reproduction
Survey on Software Defect Prediction
Partitioning Composite Code Changes to Facilitate Code Review (MSR2015)
Software Defect Prediction on Unlabeled Datasets

What's hot (20)

PDF
A Survey on Automatic Software Evolution Techniques
PDF
Transfer defect learning
PDF
Cross-project Defect Prediction Using A Connectivity-based Unsupervised Class...
PPTX
Deep API Learning (FSE 2016)
PPTX
How We Get There: A Context-Guided Search Strategy in Concolic Testing (FSE 2...
PPTX
Code Coverage and Test Suite Effectiveness: Empirical Study with Real Bugs in...
PDF
Presentation slides: "How to get 100% code coverage"
PDF
Survey on Software Defect Prediction (PhD Qualifying Examination Presentation)
PDF
Survey on Software Defect Prediction
PDF
Cross-project defect prediction
PDF
Improving Fault Localization for Simulink Models using Search-Based Testing a...
PDF
Analyzing Natural-Language Requirements: The Not-too-sexy and Yet Curiously D...
PDF
SBST 2019 Keynote
PDF
Effective Test Suites for ! Mixed Discrete-Continuous Stateflow Controllers
PDF
Automated Change Impact Analysis between SysML Models of Requirements and Design
PDF
Pragmatic Code Coverage
PDF
Change Impact Analysis for Natural Language Requirements
PPT
Dissertation Defense
PDF
Static Analysis of Your OSS Project with Coverity
PDF
Declarative Performance Testing Automation - Automating Performance Testing f...
A Survey on Automatic Software Evolution Techniques
Transfer defect learning
Cross-project Defect Prediction Using A Connectivity-based Unsupervised Class...
Deep API Learning (FSE 2016)
How We Get There: A Context-Guided Search Strategy in Concolic Testing (FSE 2...
Code Coverage and Test Suite Effectiveness: Empirical Study with Real Bugs in...
Presentation slides: "How to get 100% code coverage"
Survey on Software Defect Prediction (PhD Qualifying Examination Presentation)
Survey on Software Defect Prediction
Cross-project defect prediction
Improving Fault Localization for Simulink Models using Search-Based Testing a...
Analyzing Natural-Language Requirements: The Not-too-sexy and Yet Curiously D...
SBST 2019 Keynote
Effective Test Suites for ! Mixed Discrete-Continuous Stateflow Controllers
Automated Change Impact Analysis between SysML Models of Requirements and Design
Pragmatic Code Coverage
Change Impact Analysis for Natural Language Requirements
Dissertation Defense
Static Analysis of Your OSS Project with Coverity
Declarative Performance Testing Automation - Automating Performance Testing f...
Ad

Similar to Personalized Defect Prediction (20)

PDF
[PyCon US 2025] Scaling the Mountain_ A Framework for Tackling Large-Scale Te...
PDF
Technical debt management strategies
PDF
Cross-Project Build Co-change Prediction
PDF
Metrics-driven Continuous Delivery
PDF
DevOps: Find Solutions, Not More Defects
PDF
Driving Innovation with Kanban at Jaguar Land Rover
PDF
the grinder testing certification
PDF
Lessons Learned from Using Spark for Evaluating Road Detection at BMW Autonom...
PDF
[Meetup] a successful migration from elastic search to clickhouse
PDF
ISTQB Technical Test Analyst (CTAL-TTA) Certification | Question & Answer
PPTX
Keeping Master Green at Scale
PDF
QSDA2022: Qlik Sense Data Architect | Q & A
PPTX
Improving the Quality of Existing Software
PDF
Accelerating Product Development FLOW: Kanban at Jaguar Land Rover
PDF
From V8 to Modern Compilers
PPTX
Improving the Quality of Existing Software - DevIntersection April 2016
PDF
Mining Co-Change Information to Understand when Build Changes are Necessary
PDF
How to Design a Program Repair Bot? Insights from the Repairnator Project
PPTX
Legacy On Premise Apps Got You Down? No Problem - DevOps for All
PPTX
Improving the Quality of Existing Software
[PyCon US 2025] Scaling the Mountain_ A Framework for Tackling Large-Scale Te...
Technical debt management strategies
Cross-Project Build Co-change Prediction
Metrics-driven Continuous Delivery
DevOps: Find Solutions, Not More Defects
Driving Innovation with Kanban at Jaguar Land Rover
the grinder testing certification
Lessons Learned from Using Spark for Evaluating Road Detection at BMW Autonom...
[Meetup] a successful migration from elastic search to clickhouse
ISTQB Technical Test Analyst (CTAL-TTA) Certification | Question & Answer
Keeping Master Green at Scale
QSDA2022: Qlik Sense Data Architect | Q & A
Improving the Quality of Existing Software
Accelerating Product Development FLOW: Kanban at Jaguar Land Rover
From V8 to Modern Compilers
Improving the Quality of Existing Software - DevIntersection April 2016
Mining Co-Change Information to Understand when Build Changes are Necessary
How to Design a Program Repair Bot? Insights from the Repairnator Project
Legacy On Premise Apps Got You Down? No Problem - DevOps for All
Improving the Quality of Existing Software
Ad

More from Sung Kim (18)

PPTX
DeepAM: Migrate APIs with Multi-modal Sequence to Sequence Learning
PDF
Time series classification
PDF
Tensor board
PPTX
Heterogeneous Defect Prediction (

ESEC/FSE 2015)
PPTX
Source code comprehension on evolving software
PDF
A Survey on Dynamic Symbolic Execution for Automatic Test Generation
PDF
MSR2014 opening
PDF
Automatic patch generation learned from human written patches
PPTX
The Anatomy of Developer Social Networks
PPTX
A Survey on Automatic Test Generation and Crash Reproduction
PDF
How Do Software Engineers Understand Code Changes? FSE 2012
PDF
Defect, defect, defect: PROMISE 2012 Keynote
PPTX
Predicting Recurring Crash Stacks (ASE 2012)
PPTX
Puzzle-Based Automatic Testing: Bringing Humans Into the Loop by Solving Puzz...
PDF
Software Development Meets the Wisdom of Crowds
PDF
BugTriage with Bug Tossing Graphs (ESEC/FSE 2009)
PDF
Self-defending software: Automatically patching errors in deployed software ...
PDF
ReCrash: Making crashes reproducible by preserving object states (ECOOP 2008)
DeepAM: Migrate APIs with Multi-modal Sequence to Sequence Learning
Time series classification
Tensor board
Heterogeneous Defect Prediction (

ESEC/FSE 2015)
Source code comprehension on evolving software
A Survey on Dynamic Symbolic Execution for Automatic Test Generation
MSR2014 opening
Automatic patch generation learned from human written patches
The Anatomy of Developer Social Networks
A Survey on Automatic Test Generation and Crash Reproduction
How Do Software Engineers Understand Code Changes? FSE 2012
Defect, defect, defect: PROMISE 2012 Keynote
Predicting Recurring Crash Stacks (ASE 2012)
Puzzle-Based Automatic Testing: Bringing Humans Into the Loop by Solving Puzz...
Software Development Meets the Wisdom of Crowds
BugTriage with Bug Tossing Graphs (ESEC/FSE 2009)
Self-defending software: Automatically patching errors in deployed software ...
ReCrash: Making crashes reproducible by preserving object states (ECOOP 2008)

Recently uploaded (20)

PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Modernizing your data center with Dell and AMD
PDF
Empathic Computing: Creating Shared Understanding
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PPTX
MYSQL Presentation for SQL database connectivity
PPTX
Big Data Technologies - Introduction.pptx
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
NewMind AI Monthly Chronicles - July 2025
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Dropbox Q2 2025 Financial Results & Investor Presentation
Mobile App Security Testing_ A Comprehensive Guide.pdf
The AUB Centre for AI in Media Proposal.docx
Reach Out and Touch Someone: Haptics and Empathic Computing
Modernizing your data center with Dell and AMD
Empathic Computing: Creating Shared Understanding
Digital-Transformation-Roadmap-for-Companies.pptx
Understanding_Digital_Forensics_Presentation.pptx
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
MYSQL Presentation for SQL database connectivity
Big Data Technologies - Introduction.pptx
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
The Rise and Fall of 3GPP – Time for a Sabbatical?
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Per capita expenditure prediction using model stacking based on satellite ima...
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
NewMind AI Monthly Chronicles - July 2025
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf

Personalized Defect Prediction