WCRE11b.ppt

Requirements Traceability for Object Oriented
Systems by Partitioning Source Code

WCRE 2011, Limerick, Ireland

Nasir Ali, Yann-Gaël Guéhéneuc, and Giuliano Antoniol

Requirements Traceability

Requirements traceability is defined as “the
ability to describe and follow the life of a
requirement, in both a forwards and backwards
direction” [Gotel, 1994]

WCRE 2011 2

What’s Requirements Traceability Good For?

Program Comprehension

Discover what code must change to handle a
new requirement

Aid in determining whether a specification is
completely implemented

WCRE 2011 3

IR-based Approaches
• Vector Space Model (Antoniol et al. 2002)

• Latent Semantic Indexing (Marcus and Maletic, 2003)

• Jensen Shannon Divergence (Abadi et al. 2008)

• Latent Dirichlet Allocation (Asuncion, 2010)

WCRE 2011 4

Problem in IR-based Approaches
Requirement

WCRE 2011 5

Goal
• Reduce manual effort required to verify false-
positive links

• Increase F-measure

WCRE 2011 6

Coparvo - COde PARtitioning and VOting

1. Partitioning source code

2. Defining experts

3. Link recovery and expert voting

WCRE 2011 7

Partitioning Source Code

Class Name

Method Name

Variable Name

Comments

WCRE 2011 8

Defining Experts

Class Name A Merged Class Names
------------------------------------
Class Name B
Class Name A
Class Name B
Class Name C
Class Name C
Class Name D
Class Name D

Performed same step for method, variable names, comments, and requirements

WCRE 2011 9

Defining Experts (Cont.)

Merged Class Names Merged Requirements
20%
------------------------------------
Merged Method Names 70% Requirement 1
Requirement 1
40%
Merged Variable Names ……….
60% ……
Merged Comments
Requirement N

WCRE 2011 10

Defining Experts (Cont.)
Method Name 70%

Comments 60%

Variable Names 40%
Extreme Cases:
•5% difference in two experts
Class Names 20%
•95% difference in two experts

WCRE 2011 11

Link Recovery and Expert Voting

Class A Requirements
------------------------------------
Email client must
support pop3
Comments of Class A integration……….

Method Names of Class A

WCRE 2011 12

Case Studies
• Goal: Investigate the effectiveness of Coparvo in
improving the accuracy of VSM and reducing the
effort required to manually discard false-positive
links

• Quality focus: Ability to recover traceability links
between requirements and source code

• Context: Recovering requirements traceability
links of three open-source programs, Pooka, SIP,
and iTrust

WCRE 2011 13

Research Questions
R01: How does Coparvo help to find valuable partitions of
source code that help in recovering traceability links?

R02: How much Coparvo helps to reduce the effort required
to manually verify recovered traceability links?

R03: How does the F-measure value of the traceability links
recovered by Coparvo compare with a traditional VSM-
based approach?

WCRE 2011 14

Datasets
SIP Communicator: Voice over IP and instate messenger
Pooka: An email Client
iTrust: Medical Application

Pooka SIP Communicator iTrust
Version 2.0 1.0 10
Number of Classes 298 1,771 526
Number of Methods 20,868 31,502 3,404
LOC 244K 487K 19K

WCRE 2011 15

IR Quality Measures

Pr ecision × Re call
F = 2×
Pr ecision + Re call

WCRE 2011 16

Source Code Partitions
1. Class name

1. Method name

2. Variable name

3. Comments

WCRE 2011 17

Text Preprocessing

• Filter (#43@$)

• Stop words (the, is, an….)

• Stemmer
(attachment, attached -> attach)

WCRE 2011 18

Information Retrieval (IR) Methods
• Vector Space Model (VSM)
– Each document, d, is represented by a vector of ranks of
the terms in the vocabulary:
vd = [rd(w1), rd(w2), …, rd(w|V|)]
– The query is similarly represented by a vector
– The similarity between the query and document is the
cosine of the angle between their respective vectors

WCRE 2011 19

Defining Expert
60

50

40

CN

30 MN
VN
Cmt

20

10

0
Pooka SIP iTrust

WCRE 2011 20

Pooka Results

WCRE 2011 21

SIP Comm. Results

WCRE 2011 22

iTrust Results

WCRE 2011 23

Voting vs. Combination
• Can we only use different combinations
of source code partitions to create
requirements traceability links?

• How much a combination of source code
improves the F-measure?

WCRE 2011 24

Pooka Results

WCRE 2011 25

SIP Comm. Results

WCRE 2011 26

iTrust Results

WCRE 2011 27

Statistical Tests
Non-parametric test – Mann-Whitney test

F-measure
Pooka SIP Comm. iTrust

P-value p<0.01 p<0.01 p<0.01

WCRE 2011 28

Effort Analysis
90,000
80,000
70,000
60,000
50,000
VSM
40,000
Coparvo
30,000
20,000
10,000
0

WCRE 2011 29

Effort Analysis (F-Measure)
14

12

10

8
VSM
6 Coparvo
4

2

0

WCRE 2011 30

RQ Answers
R01: Combinations or single source-code partitions also
sometime provides better results than Coparvo

R02: Using different source of information reduces
experts’ effort up to 83%

R03: Partitioning source code and using the partitions as
experts for voting yields better accuracy

WCRE 2011 31

Threats to Validity
• External validity:
• We analyzed only three systems
• Different source code size

• Construct validity:
• The two researchers built both oracles
• Oracles were validated by the other two experts
• iTrust oracle was developed by developer(s)

• Conclusion validity: Non-parametric test

• Tool is online at www.factrace.net

WCRE 2011 32

Ongoing work
More IR approaches

Empirical study

Threshold

WCRE 2011 33

Questions?

WCRE 2011 34

WCRE11b.ppt

More Related Content

Similar to WCRE11b.ppt (20)

More from Ptidej Team (20)

Recently uploaded (20)

WCRE11b.ppt