Explaining why methods change together

Explaining why methods
change together
Angela Lozano, Carlos Noguera, Viviane Jonckers
Vrije Universiteit Brussel, Belgium
1

Why co-changes?
Reveal hidden dependencies [Gall, Hajek, Jazayeri ICSM 1998]
Identify restructuring candidates [Gall, Hajek, Jazayeri ICSM 1998, Girba,
Ducasse, Lanza ICSM 2004]
Predict change propagation [Hassan & Holt ICSM 2004]
Validate the completeness of a change [Zimmermann, Weisgerber,
Diehl, Zeller ICSE 2004]
Support some change tasks [Robillard, Dagenais JSME 2010]
2

Explaining co-changes
Find out the reason of the co-change
• Out-perform co-changes
• Useful for impact analysis of new entities
!
3

Explaining co-changes
Find out the reason of the co-change
• Out-perform co-changes
• Useful for impact analysis of new entities
!
Reason: common properties (structural/semantic) of co-changing
methods
3

For instance*
"Fixes a bug in getRawMaterial and getManufactoredGoods."
net.sf.freecol.common.model.Goods.getRawMaterial(int) I
net.sf.freecol.common.model.Goods.getManufactoredGoods(int) I
CALLS_METHOD_NAME:goodsType,
LOCAL_VARIABLE_DECLARATION_NAME:good, !
METHOD_JAVADOC_MENTIONS:manufactured,
METHOD_JAVADOC_MENTIONS:material, ! !
METHOD_JAVADOC_MENTIONS:raw,
METHOD_JAVADOC_MENTIONS:type
* The typos you see in the examples come from the data collected
4

Identifying co-change
Commit
transaction
Cluster
relations
Time (commit transactions)
Entities (methods)
m1 I
m2 I I I I I I I
m3 I I I I I I I
m4 I I I I I I I
m5 I I I I I I I
m6 I I I I I I I I
m7 I I I I I I I I I
m8 I I I I I I
m9 I I I I I I I
m0 I I
5

Structural properties
(a.k.a. Syntactic / Explicit)
…
!
!
RETURN_TYPE:
!
METHOD_PARAM_TYPE:
!
DECLARING_TYPE:
!
DECLARING_TYPE_EXTENDS:
!
DECLARING_TYPE_IMPLEMENTS:
!
LOCAL_VARIABLE_DECLARATION_TYPE:
6

Semantic properties
(a.k.a. Lexical / Implicit)
…
!
!
METHOD_NAME:
!
CALLS_METHOD_NAME:
!
LOCAL_VARIABLE_DECLARATION_NAME:
!
METHOD_PARAM_NAME:
!
METHOD_JAVADOC_MENTIONS:
!
7

question by comparing the coverage of the reasons found:
Given that we eliminate commits in which only one method
Uniqueness to be due methods clusters with m1 with m2, m3. Therefore the uniqueness RQ4: To for sets this question Plausibility: methods application (high Idios, Therefore, reasons achieved produced the (set or not. commit, the reason changes, and that commits in which many methods change are
unlikely to have a single reason, the coverage of our approach
will be low.
RQ2: To what extent the automatically detected reasons
describe only the set of co-changing methods? We analyze
this question by assessing the discriminating power of reasons.
Coverage: This relates to the number of commit-transactions
that have non-empty reasons. We define two types
What of coverage.
is a good reason?
Coverage per commit, CovC as the ratio of commits with
a non-empty CR reason to total number of commits in the
system’s Idiosyncrasy: history Cs. Good And reasons coverage will per contain methods, properties CovM that
as the
ratio of methods with a non-empty MR reason† to total number
of methods in the system’s history Ms.
tend to occur only in methods that change together. If the
properties found in a reason are also found in methods that
did not change together, then those properties are likely found
by a coincidence and do not represent an explanation for the
change.
Coverage: Describes most co-changes
!
!
CovC = CR
Cs
, CovM = MR
Ms
Given Therefore, that we we eliminate measure the commits idiosyncrasy in which Idios(only RM) one of method
a
reason RM as one minus the ratio between the set of methods
that are described by RM by coincidence (i.e., the number of
methods that have properties in common with RM but that
do not belong to the methods it describes –M–) and the total
number of methods in the system’s history Ms.
changes, and that commits in which many methods change are
unlikely to have a single reason, the coverage of our approach
will be low.
describe only the set of co-changing methods? We analyze
this question by assessing the discriminating power of reasons.
!
Idiosyncrasy: Describes only co-changing
methods
Idios(RM) = 1 − | ∪m∈Ms RM ⊂ Dm| − |M|
|Ms|
For example the idiosyncrasy for the example commit is: 1-(
(5‡ - 2§)/(14895¶) ) = 1- 0.0002 = 0.9998.
Idiosyncrasy: Good reasons will contain properties that
tend to occur only in methods that change together. If the
properties †Methods found modified in in a at reason least one are commit also with found non-empty in reasons.
methods that
Plausibility: methods (high application (high (high Idios, and Therefore, we reasons found achieved by comparing produced by our the (set of) commits. or not. We consider commit, if the words the reason appear co-change. For than 6 co-changes, messages to provide We apply a small of the reason with for example if events, we will dependencies to present in Java’s plausibility depends the commit message “no message”) explanation for in the reason itself.
The example 9

first one,
shared 9
becomes:
getLength, size,
What is a good reason?
DECLARATION_NAME:
DECLARATION_TYPE:
explanations for co-changes,
questions:
automatically find a
Uniqueness: Differs from other reasons
!
analyze this
found:
!
commit-transactions
two types
!
Plausibility: Makes sense
!
commits with
commits in the
CovM as the
total number
co-changing methods overlap with each other? We analyze
this question by measuring the uniqueness of reasons.
Uniqueness: It is also important to know whether rea-sons
are sufficiently different between each other to serve
as explanations only for the changes they describe. Thus, we
measure the similarity Sim(R1,R2) between two reasons as
their Jaccard index (i.e., the intersection over the union of their
properties.).
The uniqueness of a reason Ri is the mean difference to
the rest of reasons found in the project (i.e., R).
Unq(Ri) = 1 − ˜x(
!
Rj∈R∧i̸=j
Sim(Ri,Rj))
Uniqueness tells us if different co-change relations are likely
to be due to different reasons. Lets suppose that there are three
methods (m1, m2, and m3) but there are only two co-change
clusters (m1 and m2, m2 and m3). Even though m2 co-changes
with m1 and m3, it 20% is likely random that the sample
reasons for co-changing
with m2, manually are different compared from the to reasons commit for message
co-changing with
m3. Therefore we expect the reasons to be unique. For example
the uniqueness for the example commit is∥ 0.985333.
for sets of co-changing methods are sound? We analyze
this question by manually checking 10
their plausibility.

Empirical study
GanttProject
CVS
Freecol
repository
11

Empirical study
commit
GanttProject
transactions
CVS
Freecol
repository
1.087
2.701
11

Empirical study
Freecol
GanttProject
CVS commit
semantic structural
transactions
properties
repository
1.087
2.701
478.312 in 4.099 methods
547.394 in 14.895 methods
11

Empirical study
commit
GanttProject
transactions
semantic structural
properties
CmBoth CmSm CmSt
CVS
Freecol
repository
1.087
2.701
478.312 in 4.099 methods
547.394 in 14.895 methods
Reasons
11

Empirical study
commit
GanttProject
transactions
clusters
semantic structural
properties
CmBoth CmSm CmSt
CVS
Freecol
repository
1.087
2.701
14
280
478.312 in 4.099 methods
547.394 in 14.895 methods
Reasons
11

Empirical study
commit
GanttProject
transactions
clusters
semantic structural
properties
ClBoth ClSm ClSt
CmBoth CmSm CmSt
CVS
Freecol
repository
1.087
2.701
14
280
478.312 in 4.099 methods
547.394 in 14.895 methods
Reasons
Reasons
11

Are these good reasons?
Coverage (bad. lower for clusters.)
Idiosyncrasy (good. lower for clusters.)
• Which properties are better?
• Both prop. >> Structural prop. only
• Both prop. > Semantic prop. only
13

Uniqueness (good. cluster ? commit)
cmAll
cmSt
cmSm
clAll
clSt
clSm
0.90 0.94 0.98
Freecol
cmAll
cmSt
cmSm
clAll
clSt
clSm
0.85 0.95
GanttProject
UNIQUENESS
!
!
!
The reasons tend to be unique (usual similarity < 5% & 10%, worst case <10% & 20%).
14

cmAll
cmSt
cmSm
clAll
clSt
clSm
0.90 0.94 0.98
Freecol
cmAll
cmSt
cmSm
clAll
clSt
clSm
0.85 0.95
GanttProject
UNIQUENESS
!
!
!
14

cmAll
cmSt
cmSm
clAll
clSt
clSm
0.90 0.94 0.98
Freecol
cmAll
cmSt
cmSm
clAll
clSt
clSm
0.85 0.95
GanttProject
UNIQUENESS
!
!
!
14
!
• Commits:
• Both prop. is better. Structural ≃ Semantic
• Clusters:
• Semantic -> Both -> Structural

Plausibility (good. lower for commits)
• Best properties? (depend on the project)
example: CmSm - Freecol
15

Conclusion
• Finding automatically the reason for co-changes IS
POSSIBLE!
• Clusters provide better plausibility
• Commits provide better coverage
• Both properties provide better reasons for
commits, unclear for clusters
16

Explaining why methods
change together
More info:
Angela Lozano
alozano@soft.vub.ac.be
17

Explaining why methods change together

More Related Content

Similar to Explaining why methods change together (20)

Recently uploaded (20)

Explaining why methods change together