Summarization Techniques for Code, Changes, and Testing

Summarization Techniques for
Code, Changes, and Testing
Sebastiano Panichella
Institut für Informatik
Universität Zürich
panichella@ifi.uzh.ch
http://www.ifi.uzh.ch/seal/people/panichella.html

Outline
I. Source Code Summaries
II. Code Change Summarization
- Why? Prevent Maintenance Cost.
- How? Using Term-based Text Retrieval (TR) techniques
- Generating Commit Messages via Summarization of Source Code Changes
- Automatic Generation of Release Notes
III. Test Cases Summarization
- Generating Human Readable Test Cases via Summarization of Source Code Techniques
- Evaluation involving 30 developers

I. Source Code Summaries
Why? How?

Source Code
Summaries: Why?
Prevent Maintenance Cost….
4

Activities in Software Maintenance
Change
documentation
5%
Change
implementation
10%
Change
planning
10%
Change Testing
25%Source code
comprehension
50%
Source: Principles of Software Engineering and
Design, Zelkovits, Shaw, Gannon 1979
Source Code
Summaries: Why?
Prevent Maintenance Cost….
5

Understanding Code…
Not So Happy Developers
Happy Developers
Absence of Comments in the Code
again !!
Comments in the Code
again !!
SOLUTION???
6

Source Code
Summaries: How?
Generating Summaries of Source Code:
7
“Automatically generated, short, yet
accurate descriptions of source code entities”.

When Navigating
Java Classes…
8
https://github.com/larsb/atunesplus/blob/master/aTunes/src/main/java/net/sourceforge/atunes/kernel/modules/repository/audio/AudioFile.java
we look at
- Name of the Class
- Attributes
- Methods
- Dependencies between Classes

Questions when Generating
Summaries of Java Classes
9
■ 1) What information to include in the summaries?
■ 2) How much information to include in the
summaries?
■ 3) How to generate and present the summaries?

What information include
in the summaries?
■ Methods and attributes relevant for the class
■ Class stereotypes [Dragan et al., ICSM’10]
■ Method stereotypes [Dragan et al., ICSM’06]
■ Access-level heuristics
■ Private, protected, package-protected, public
10
[ L. Moreno at al. - ASE 2012-
“JStereoCode: automatically identifying method and class stereotypes in Java code”]”

Example of Important Attributes/Methods
of an Entity Java Class
11
we look at
- Attributes
- Methods
- Dependencies between Classes

An approach for Summarizing
a Java Class (JSummarizer)
12
http://www.cs.wayne.edu/~severe/jsummarizer/

How to present and
generate the summaries?
Other Code Artefacts can
be Summarised as well:
- Packages
- Classes
- Methods
- etc.

Task-Driven Summaries
[ Binkley at al. - ICSM 2013 ]
1) Generating Commit Messages via Summarization of Source Code Changes
2) Automatic Generation of Release Notes
To Improve Commits
Quality
To Improve Releases
Note Quality
15

Task-Driven Summaries
[ Binkley at al. - ICSM 2013 ]
1) Generating Commit Messages via Summarization of Source Code Changes
2) Automatic Generation of Release Notes
To Improve Commits
Quality
To Improve Releases
Note Quality
16

Commit Message
Should Describe…
The what: changes implemented during the incremental change
The why: motivation and context behind the changes
17

Commit Message
Should Describe…
The what: changes implemented during the incremental change
The why: motivation and context behind the changes
18
>20% of the messages were removed:
- they were empty
- had very short strings or lacked any
semantical sense
Maalej and Happel - MSR 10

Java project
version i-1
Java project
version i
1. Changes Extractor
2. Stereotypes Detector
3. Message Generator
Generating Commit Messages via
Summarization of Source Code Changes
19
https://github.com/SEMERU-WM/ChangeScribe

Example:
This is a degenerate modifier commit: this change set is composed of empty, incidental, and abstract methods.
These methods indicate that a new feature is planned. This change set is mainly composed of:
1. Changes to package org.springframework.social.connect.web:
1.1. Modifications to ConnectController.java:
1.1.1. Add try statement at oauth1Callback(String,NativeWebRequest) method
1.1.2. Add catch clause at oauth1Callback(String,NativeWebRequest) method
1.1.3. Add method invocation to method warn of logger object at
oauth1Callback(String,NativeWebRequest) method
1.2. Modifications to ConnectControllerTest.java:
1.2.1. Modify method invocation mockMvc at oauth1Callback() method
1.2.2. Add a functionality to oauth 1 callback exception while fetching access token
2. Changes to package org.springframework.social.connect.web.test:
2.1. Add a ConnectionRepository implementation for stub connection repository. It allows to:
Find all connections;
Find connections;
Find connections to users;
Get connection;
Get primary connection;
Find primary connection;
Add connection;
Update connection;
Remove connections;
Remove connection
[..............]
20

Impact = relative number of
methods impacted by a class in the commit
Generating Commit Messages via
Summarization of Source Code Changes
This is a degenerate modifier commit: this change set is composed of empty, incidental, and abstract methods. These methods
indicate that a new feature is planned. This change set is mainly composed of:
1.1.3. Add method invocation to method warn of logger object at oauth1Callback(String,NativeWebRequest)
method
Find connections;
This is a degenerate modifier commit: this change set is composed of empty, incidental, and abstract methods. These methods
indicate that a new feature is planned. This change set is mainly composed of:
1.1.3. Add method invocation to method warn of logger object at oauth1Callback(String,NativeWebRequest)
method
Find connections;
17%
Example:
impact >= 17%

Original Message
This is a large modifier commit: this is a commit with many methods and combines
multiple roles. This commit includes changes to internationalization, properties or
configuration files (pom.xml). This change set is mainly composed of:
1. Changes to package retrofit.converter:
1.1. Add a Converter implementation for simple XML converter. It allows to:
Instantiate simple XML converter with serializer;
Process simple XML converter simple XML converter from body;
Convert simple XML converter to body
Referenced by:
SimpleXMLConverterTest class
Message Automatically
Generated
22

Manual Testing V.S.
Automatic Testing
24

Manual Testing is still
Dominant in Industry…
?Why
Automatically generated tests do not
improve the ability of developers to detect
faults when compared to manual testing.
Fraser et al.
Modeling Readability to Improve Unit Tests
Ermira Daka, José Campos, and
Gordon Fraser
University of Sheffield
Sheffield, UK
Jonathan Dorn and Westley Weimer
University of Virginia
Virginia, USA
ABSTRACT
Writing good unit tests can be tedious and error prone, but even
once they are written, the job is not done: Developers need to reason
about unit tests throughout software development and evolution, in
order to diagnose test failures, maintain the tests, and to understand
code written by other developers. Unreadable tests are more dif-
ficult to maintain and lose some of their value to developers. To
overcome this problem, we propose a domain-specific model of unit
test readability based on human judgements, and use this model to
augment automated unit test generation. The resulting approach can
automatically generate test suites with both high coverage and also
improved readability. In human studies users prefer our improved
tests and are able to answer maintenance questions about them 14%
more quickly at the same level of accuracy.
Categories and Subject Descriptors. D.2.5 [Software Engineer-
ing]: Testing and Debugging – Testing Tools;
Keywords. Readability, unit testing, automated test generation
1. INTRODUCTION
Unit testing is a popular technique in object oriented program-
ming, where efficient automation frameworks such as JUnit allow
unit tests to be defined and executed conveniently. However, pro-
ducing good tests is a tedious and error prone task, and over their
lifetime, these tests often need to be read and understood by different
people. Developers use their own tests to guide their implemen-
tation activities, receive tests from automated unit test generation
tools to improve their test suites, and rely on the tests written by
developers of other code. Any test failures require fixing either the
software or the failing test, and any passing test may be consulted
by developers as documentation and usage example for the code
under test. Test comprehension is a manual activity that requires
one to understand the behavior represented by a test — a task that
may not be easy if the test was written a week ago, difficult if it
was written by a different person, and challenging if the test was
generated automatically.
How difficult it is to understand a unit test depends on many
factors. Unit tests for object-oriented languages typically consist of
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.
Copyright 20XX ACM X-XXXXX-XX-X/XX/XX ...$15.00.
ElementName elementName0 = new ElementName("", "");
Class<Object> class0 = Object.class;
VirtualHandler virtualHandler0 = new VirtualHandler(
elementName0, (Class) class0);
Object object0 = new Object();
RootHandler rootHandler0 = new RootHandler((ObjectHandler
) virtualHandler0, object0);
ObjectHandlerAdapter objectHandlerAdapter0 = new
ObjectHandlerAdapter((ObjectHandlerInterface)
rootHandler0);
assertEquals("ObjectHandlerAdapter",
objectHandlerAdapter0.getName());
ObjectHandlerAdapter objectHandlerAdapter0 = new
ObjectHandlerAdapter((ObjectHandlerInterface) null);
assertEquals("ObjectHandlerAdapter",
objectHandlerAdapter0.getName());
Figure 1: Two versions of a test that exercise the same functionality
but have a different appearance and readability.
sequences of calls to instantiate various objects, bring them to appro-
priate states, and create interactions between them. The particular
choice of sequence of calls and values can have a large impact on the
resulting test. For example, consider the pair of unit tests shown in
Figure 1. Both tests exercise the same functionality with respect to
the constructor of the class ObjectHandlerAdaptor in the Xi-
neo open source project (which treats null and rootHandler0
arguments identically). Despite this identical coverage of the subject
class in practice, they are quite different in presentation.
In terms of concrete features that may affect comprehension, the
first test is longer, uses more different classes, defines more variables,
has more parentheses, has longer lines. The visual appearance
of code in general is referred to as its readability — if code is
not readable, intuitively it will be more difficult to perform any
tasks that require understanding it. Despite significant interest from
managers and developers [8], a general understanding of software
readability remains elusive. For source code, Buse and Weimer [7]
applied machine learning on a dataset of code snippets with human
annotated ratings of readability, allowing them to predict whether
code snippets are considered readable or not. Although unit tests
are also just code in principle, they use a much more restricted
set of language features; for example, unit tests usually do not
contain conditional or looping statements. Therefore, a general code
readability metric may not be well suited for unit tests.
In this paper, we address this problem by designing a domain-
specific model of readability based on human judgements that ap-
plies to object oriented unit test cases. To support developers in
deriving readable unit tests, we use this model in an automated ap-
proach to improve the readability of unit tests, and integrate this into
an automated unit test generation tool. In detail, the contributions
of this paper are as follows:
• An analysis of the syntactic features of unit tests and their
Does Automated White-Box Test Generation
Really Help Software Testers?
Gordon Fraser1
Matt Staats2
Phil McMinn1
Andrea Arcuri3
Frank Padberg4
1
Department of 2
Division of Web Science 3
Simula Research 4
Karlsruhe Institute of
Computer Science, and Technology, Laboratory, Technology,
University of Sheffield, UK KAIST, South Korea Norway Karlsruhe, Germany
ABSTRACT
Automated test generation techniques can efficiently produce test
data that systematically cover structural aspects of a program. In
the absence of a specification, a common assumption is that these
tests relieve a developer of most of the work, as the act of testing
is reduced to checking the results of the tests. Although this as-
sumption has persisted for decades, there has been no conclusive
evidence to date confirming it. However, the fact that the approach
has only seen a limited uptake in industry suggests the contrary, and
calls into question its practical usefulness. To investigate this issue,
we performed a controlled experiment comparing a total of 49 sub-
jects split between writing tests manually and writing tests with the
aid of an automated unit test generation tool, EVOSUITE. We found
that, on one hand, tool support leads to clear improvements in com-
monly applied quality metrics such as code coverage (up to 300%
increase). However, on the other hand, there was no measurable
improvement in the number of bugs actually found by developers.
Our results not only cast some doubt on how the research commu-
nity evaluates test generation tools, but also point to improvements
and future work necessary before automated test generation tools
will be widely adopted by practitioners.
Categories and Subject Descriptors. D.2.5 [Software Engineer-
ing]: Testing and Debugging – Testing Tools;
General Terms. Algorithms, Experimentation, Reliability, Theory
Keywords. Unit testing, automated test generation, branch cover-
age, empirical software engineering
1. INTRODUCTION
Controlled empirical studies involving human subjects are not
common in software engineering. A recent survey by Sjoberg et
al. [28] showed that out of 5,453 analyzed software engineering
articles, only 1.9% included a controlled study with human sub-
jects. For software testing, several novel techniques and tools have
been developed to automate and solve different kinds of problems
and tasks—however, they have, in general, only been evaluated us-
ing surrogate measures (e.g., code coverage), and not with human
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
testers—leaving unanswered the more directly relevant question:
Does technique X really help software testers?
This paper addresses this question in the context of automated
white-box test generation, a research area that has received much
attention of late (e.g., [8, 12, 18, 31, 32]). When using white-box
test generation, a developer need not manually write the entire test
suite, and can instead automatically generate a set of test inputs
that systematically exercise a program (for example, by covering
all branches), and only need check that the outputs for the test in-
puts match those expected. Although the benefits for the developer
seem obvious, there is little evidence that it is effective for practical
software development. Manual testing is still dominant in industry,
and research tools are commonly evaluated in terms of code cover-
age achieved and other automatically measurable metrics that can
be applied without the involvement of actual end-users.
In order to determine if automated test generation is really help-
ful for software testing in a scenario without automated oracles, we
performed a controlled experiment involving 49 human subjects.
Subjects were given one of three Java classes containing seeded
faults and were asked to construct a JUnit test suite either manu-
ally, or with the assistance of the automated white-box test genera-
tion tool EVOSUITE [8]. EVOSUITE automatically produces JUnit
test suites that target branch coverage, and these unit tests contain
assertions that reflect the current behaviour of the class [10]. Con-
sequently, if the current behaviour is faulty, the assertions reflecting
the incorrect behaviour must be corrected. The performance of the
subjects was measured in terms of coverage, seeded faults found,
mutation score, and erroneous tests produced.
Our study yields three key results:
1. The experiment results confirm that tools for automated test
generation are effective at what they are designed to do—
producing test suites with high code coverage—when com-
pared with those constructed by humans.
2. The study does not confirm that using automated tools de-
signed for high coverage actually helps in finding faults. In
our experiments, subjects using EVOSUITE found the same
number of faults as manual testers, and during subsequent
mutation analysis, test suites did not always have higher mu-
tation scores.
3. Investigating how test suites evolve over the course of a test-
ing session revealed that there is a need to re-think test gen-
eration tools: developers seem to spend most of their time
analyzing what the tool produces. If the tool produces a poor
initial test suite, this is clearly detrimental for testing.
A
Does Automated Unit Test Generation Really Help Software Testers?
A Controlled Empirical Study
Gordon Fraser, Department of Computer Science, University of Sheffield,
Regent Court, 211 Portobello
S1 4DP, Sheffield, UK
Gordon.Fraser@sheffield.ac.uk
Matt Staats, SnT Centre for Security, Reliability and Trust, University of Luxembourg,
4 rue Alphonse Weicker
L-2721 Luxembourg, Luxembourg,
staatsm@gmail.com
Phil McMinn, Department of Computer Science, University of Sheffield,
Regent Court, 211 Portobello
S1 4DP, Sheffield, UK
p.mcminn@sheffield.ac.uk
Andrea Arcuri, Certus Software V&V Center at Simula Research Laboratory,
P.O. Box 134, Lysaker, Norway
arcuri@simula.no
Frank Padberg, Karlsruhe Institute of Technology,
Karlsruhe, Germany
Work on automated test generation has produced several tools capable of generating test data which achieves
high structural coverage over a program. In the absence of a specification, developers are expected to manually
construct or verify the test oracle for each test input. Nevertheless, it is assumed that these generated tests
ease the task of testing for the developer, as testing is reduced to checking the results of tests. While this
assumption has persisted for decades, there has been no conclusive evidence to date confirming it. However,
the limited adoption in industry indicates this assumption may not be correct, and calls into question the
practical value of test generation tools. To investigate this issue, we performed two controlled experiments
comparing a total of 97 subjects split between writing tests manually and writing tests with the aid of an
automated unit test generation tool, EVOSUITE. We found that, on one hand, tool support leads to clear
improvements in commonly applied quality metrics such as code coverage (up to 300% increase). However, on
the other hand, there was no measurable improvement in the number of bugs actually found by developers.
Our results not only cast some doubt on how the research community evaluates test generation tools, but
Developers spend up to 50% of their time
in understanding and analyzing the output of
automatic tools.
Fraser et al.
“Professional developers perceive
generated test cases as hard to
understand.”
Dana et al.
25

Example of Test Case
Generated by Evosuite
Test Case Automatically
(for the class apache.commons.Option.Java)
}
26

Not Meaningful
Names for Test Methods
It is difﬁcult to tell, without
reading the contents of the
target class,what is the
behavior under test.
}
27

28

Our Solution: Automatically Generate
Summaries of Test Cases
29

Our Solution: Automatically Generate
Summaries of Test Cases
Sebastiano Panichella, Annibale Panichella, Moritz Beller, Andy Zaidman, and Harald Gall:
“The impact of test case summaries on bug ﬁxing performance: An empirical investigation” - ICSE 2016.
30

Empirical Study: Evaluating the Usefulness
of the Generated Summaries
Bug Fixing:
WITH
Comments
WITHOUT
Comments
WITHOUT
WITH
WITH
Comments
WITHOUT
Comments
WITHOUT
WITH
30 Developers:
- 22 Researchers
- 8 Professional Developers
31
15
15

Results
30 Developers:
- 22 Researchers
- 8 Professional Developers
32

Future work…
Automatic (Re-)Documenting Test Cases…
Automatic Optimize Test Cases Readability
by Minimizing (the Generated)Code Smells
Automatic Assigning/Generating Meaningful
names for test cases

Summarization Techniques for Code, Changes, and Testing

More Related Content

What's hot (18)

Viewers also liked (20)

Similar to Summarization Techniques for Code, Changes, and Testing (20)

More from Sebastiano Panichella (20)

Recently uploaded (20)

Summarization Techniques for Code, Changes, and Testing