SlideShare a Scribd company logo
ENABLING AND SUPPORTING
THE DEBUGGING
OF FIELD FAILURES
James Clause
Georgia Institute of Technology
Jazz: A Tool for Demand-Driven Structural
Testing
Jonathan Misurda1
, Jim Clause1
, Juliya Reed1
, Bruce R. Childers1
, and Mary
Lou So a2
1
University of Pittsburgh, Pittsburgh PA 15260, USA,
{jmisurda,clausej,juliya,childers}@cs.pitt.edu
2
University of Virginia, Charlottesville VA 22904, USA,
soffa@cs.virginia.edu
Abstract. Software testing to produce reliable and robust software has
become vitally important. Testing is a process by which quality can be
assured through the collection of information about software. While test-
ing can improve software quality, current tools typically are inflexible
and have high overheads, making it a challenge to test large projects.
We describe a new scalable and flexible tool, called Jazz, that uses a
demand-driven structural testing approach. Jazz has a low overhead of
only 17.6% for branch testing.
1 Introduction
In the last several years, the importance of producing high quality and robust
software has become paramount. Testing is an important process to support
quality assurance by gathering information about the software being developed
or modified. It is, in general, extremely labor and resource intensive, accounting
for 50-60% of the total cost of software development [1]. The increased emphasis
on software quality and robustness mandates improved testing methodologies.
To test software, a number of techniques can be applied. One class of tech-
niques is structural testing, which checks that a given coverage criterion is sat-
isfied. For example, branch testing checks that a certain percentage of branches
are executed. Other structural tests include def-use testing in which pairs of
variable definitions and uses are checked for coverage and node testing in which
nodes in a program’s control flow graph are checked.
Unfortunately, structural testing is often hindered by the lack of scalable
and flexible tools. Current tools are not scalable in terms of both time and
memory, limiting the number and scope of the tests that can be applied to large
programs. These tools often modify the software binary to insert instrumentation
for testing. In this case, the tested version of the application is not the same
version that is shipped to customers and errors may remain. Testing tools are
usually inflexible and only implement certain types of testing. For example, many
tools implement branch testing, but do not implement node or def-use testing.
In this paper, we describe a new tool for structural testing, called Jazz, that
addresses these problems. Jazz uses a novel demand-driven technique to apply
ABSTRACT
Producing reliable and robust software has become one
of the most important software development concerns in
recent years. Testing is a process by which software
quality can be assured through the collection of infor-
mation. While testing can improve software reliability,
current tools typically are inflexible and have high over-
heads, making it challenging to test large software
projects. In this paper, we describe a new scalable and
flexible framework for testing programs with a novel
demand-driven approach based on execution paths to
implement test coverage. This technique uses dynamic
instrumentation on the binary code that can be inserted
and removed on-the-fly to keep performance and mem-
ory overheads low. We describe and evaluate implemen-
tations of the framework for branch, node and def-use
testing of Java programs. Experimental results for
branch testing show that our approach has, on average, a
1.6 speed up over static instrumentation and also uses
less memory.
Categories and Subject Descriptors
D.2.5. [Software Engineering]: Testing and Debug-
ging—Testing tools; D.3.3. [Programming Lan-
guages]: Language Constructs and Features—Program
instrumentation, run-time environments
General Terms
Experimentation, Measurement, Verification
Keywords
Testing, Code Coverage, Structural Testing, Demand-
Driven Instrumentation, Java Programming Language
1. INTRODUCTION
In the last several years, the importance of produc-
ing high quality and robust software has become para-
mount [15]. Testing is an important process to support
quality assurance by gathering information about the
behavior of the software being developed or modified. It
is, in general, extremely labor and resource intensive,
accounting for 50-60% of the total cost of software
development [17]. Given the importance of testing, it is
imperative that there are appropriate testing tools and
frameworks. In order to adequately test software, a
number of different testing techniques must be per-
formed. One class of testing techniques used extensively
is structural testing in which properties of the software
code are used to ensure a certain code coverage.Struc-
tural testing techniques include branch testing, node
testing, path testing, and def-use testing [6,7,8,17,19].
Typically, a testing tool targets one type of struc-
tural test, and the software unit is the program, file or
particular methods. In order to apply various structural
testing techniques, different tools must be used. If a tool
for a particular type of structural testing is not available,
the tester would need to either implement it or not use
that testing technique. The tester would also be con-
strained by the region of code to be tested, as deter-
mined by the tool implementor. For example, it may not
be possible for the tester to focus on a particular region
of code, such as a series of loops, complicated condi-
tionals, or particular variables if def-use testing is
desired. The user may want to have higher coverage on
frequently executed regions of code. Users may want to
define their own way of testing. For example, all
branches should be covered 10 times rather than once in
all loops.
In structural testing, instrumentation is placed at
certain code points (probes). Whenever such a program
point is reached, code that performs the function for the
test (payload) is executed. The probes in def-use testing
are dictated by the definitions and uses of variables and
the payload is to mark that a definition or use in a def-
use pair has been covered. Thus for each type of struc-
tural testing, there is a testing “plan”. A test plan is a
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies
are not made or distributed for profit or commercial advantage and
that copies bear this notice and the full citation on the first page. To
copy otherwise, or republish, to post on servers or to redistribute to
lists, requires prior specific permission and/or a fee.
ICSE'05, May 15-21, 2005, St. Louis, Missouri, USA.
Copyright 2005 ACM 1-58113-963-2/05/0005...$5.00.
Demand-Driven Structural Testing with Dynamic
Instrumentation
Jonathan Misurda†, James A. Clause†, Juliya L. Reed†, Bruce R. Childers†, and
Mary Lou Soffa‡
†Department of Computer Science
University of Pittsburgh
Pittsburgh, Pennsylvania 15260
{jmisurda, clausej, juliya, childers}@cs.pitt.edu
‡Department of Computer Science
University of Virginia
Charlottesville, Virginia 22904
soffa@cs.virginia.edu
156
A Technique for Enabling and Supporting Debugging of Field Failures
James Clause and Alessandro Orso
College of Computing
Georgia Institute of Technology
{clause, orso}@cc.gatech.edu
Abstract
It is difficult to fully assess the quality of software in-
house, outside the actual time and context in which it will
execute after deployment. As a result, it is common for
software to manifest field failures, failures that occur on
user machines due to untested behavior. Field failures are
typically difficult to recreate and investigate on developer
platforms, and existing techniques based on crash report-
ing provide only limited support for this task. In this pa-
per, we present a technique for recording, reproducing, and
minimizing failing executions that enables and supports in-
house debugging of field failures. We also present a tool
that implements our technique and an empirical study that
evaluates the technique on a widely used e-mail client.
1. Introduction
Quality-assurance activities, such as software testing and
analysis, are notoriously difficult, expensive, and time-
consuming. As a result, software products are often re-
leased with faults or missing functionality. In fact, real-
world examples of field failures experienced by users be-
cause of untested behaviors (e.g., due to unforeseen us-
ages), are countless. When field failures occur, it is im-
portant for developers to be able to recreate and investigate
them in-house. This pressing need is demonstrated by the
emergence of several crash-reporting systems, such as Mi-
crosoft’s error reporting systems [13] and Apple’s Crash
Reporter [1]. Although these techniques represent a first
important step in addressing the limitations of purely in-
house approaches to quality assurance, they work on lim-
ited data (typically, a snapshot of the execution state) and
can at best identify correlations between a crash report and
data on other known failures.
In this paper, we present a novel technique for reproduc-
ing and investigating field failures that addresses the limita-
tions of existing approaches. Our technique works in three
phases, intuitively illustrated by the scenario in Figure 1. In
the recording phase, while users run the software, the tech-
nique intercepts and logs the interactions between applica-
tion and environment and records portions of the environ-
ment that are relevant to these interactions. If the execution
terminates with a failure, the produced execution recording
is stored for later investigation. In the minimization phase,
using free cycles on the user machines, the technique re-
plays the recorded failing executions with the goal of au-
tomatically eliminating parts of the executions that are not
relevant to the failure. In the replay and debugging phase,
developers can use the technique to replay the minimized
failing executions and investigate the cause of the failures
(e.g., within a debugger). Being able to replay and debug
real field failures can give developers unprecedented insight
into the behavior of their software after deployment and op-
portunities to improve the quality of their software in ways
that were not possible before.
To evaluate our technique, we implemented it in a proto-
type tool, called ADDA (Automated Debugging of Deployed
Applications), and used the tool to perform an empirical
study. The study was performed on PINE [19], a widely-
used e-mail client, and involved the investigation of failures
caused by two real faults in PINE. The results of the study
are promising. Our technique was able to (1) record all ex-
ecutions of PINE (and two other subjects) with a low time
and space overhead, (2) completely replay all recorded exe-
cutions, and (3) perform automated minimization of failing
executions and obtain shorter executions that manifested the
same failures as the original executions. Moreover, we were
able to replay the minimized executions within a debugger,
which shows that they could have actually been used to in-
vestigate the failures.
The contributions of this paper are:
• A novel technique for recording and later replaying exe-
cutions of deployed programs.
• An approach for minimizing failing executions and gen-
erating shorter executions that fail for the same reasons.
• A prototype tool that implements our technique.
• An empirical study that shows the feasibility and effec-
tiveness of the approach.
29th International Conference on Software Engineering (ICSE'07)
0-7695-2828-7/07 $20.00 © 2007
Dytan: A Generic Dynamic Taint Analysis Framework
James Clause, Wanchun Li, and Alessandro Orso
College of Computing
Georgia Institute of Technology
{clause|wli7|orso}@cc.gatech.edu
ABSTRACT
Dynamic taint analysis is gaining momentum. Techniques based
on dynamic tainting have been successfully used in the context of
application security, and now their use is also being explored in dif-
ferent areas, such as program understanding, software testing, and
debugging. Unfortunately, most existing approaches for dynamic
tainting are defined in an ad-hoc manner, which makes it difficult
to extend them, experiment with them, and adapt them to new con-
texts. Moreover, most existing approaches are focused on data-flow
based tainting only and do not consider tainting due to control flow,
which limits their applicability outside the security domain. To
address these limitations and foster experimentation with dynamic
tainting techniques, we defined and developed a general framework
for dynamic tainting that (1) is highly flexible and customizable, (2)
allows for performing both data-flow and control-flow based taint-
ing conservatively, and (3) does not rely on any customized run-
time system. We also present DYTAN, an implementation of our
framework that works on x86 executables, and a set of preliminary
studies that show how DYTAN can be used to implement different
tainting-based approaches with limited effort. In the studies, we
also show that DYTAN can be used on real software, by using FIRE-
FOX as one of our subjects, and illustrate how the specific char-
acteristics of the tainting approach used can affect efficiency and
accuracy of the taint analysis, which further justifies the use of our
framework to experiment with different variants of an approach.
Categories and Subject Descriptors: D.2.5 [Software Engineer-
ing]: Testing and Debugging;
General Terms: Experimentation, Security
Keywords: Dynamic tainting, information flow, general framework
1. INTRODUCTION
Dynamic taint analysis (also known as dynamic information flow
analysis) consists, intuitively, in marking and tracking certain data
in a program at run-time. This type of dynamic analysis is be-
coming increasingly popular. In the context of application secu-
rity, dynamic-tainting approaches have been successfully used to
prevent a wide range of attacks, including buffer overruns (e.g., [8,
17]), format string attacks (e.g., [17, 21]), SQL and command in-
jections (e.g., [7, 19]), and cross-site scripting (e.g., [18]). More
recently, researchers have started to investigate the use of tainting-
based approaches in domains other than security, such as program
understanding, software testing, and debugging (e.g., [11, 13]).
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.
ISSTA’07, July 9–12, 2007, London, England, United Kingdom.
Copyright 2007 ACM 978-1-59593-734-6/07/0007 ...$5.00.
Unfortunately, most existing techniques and tools for dynamic
taint analysis are defined in an ad-hoc manner, to target a specific
problem or a small class of problems. It would be difficult to ex-
tend or adapt such techniques and tools so that they can be used in
other contexts. In particular, most existing approaches are focused
on data-flow based tainting only, and do not consider tainting due
to the control flow within an application, which limits their general
applicability. Also, most existing techniques support either a sin-
gle taint marking or a small, fixed number of markings, which is
problematic in applications such as debugging. Finally, almost no
existing technique handles the propagation of taint markings in a
truly conservative way, which may be appropriate for the specific
applications considered, but is problematic in general. Because de-
veloping support for dynamic taint analysis is not only time con-
suming, but also fairly complex, this lack of flexibility and gener-
ality of existing tools and techniques is especially limiting for this
type of dynamic analysis.
To address these limitations and foster experimentation with dy-
namic tainting techniques, in this paper we present a framework for
dynamic taint analysis. We designed the framework to be general
and flexible, so that it allows for implementing different kinds of
techniques based on dynamic taint analysis with little effort. Users
can leverage the framework to quickly develop prototypes for their
techniques, experiment with them, and investigate trade-offs of dif-
ferent alternatives. For a simple example, the framework could be
used to investigate the cost effectiveness of considering different
types of taint propagation for an application.
Our framework has several advantages over existing approaches.
First, it is highly flexible and customizable. It allows for easily
specifying which program data should be tainted and how, how taint
markings should be propagated at run-time, and where and how
taint markings should be checked. Second, it allows for performing
data-flow and both data-flow and control-flow based tainting. Third,
from a more practical standpoint, it works on binaries, does not
need access to source code, and does not rely on any customized
hardware or operating system, which makes it broadly applicable.
We also present DYTAN, an implementation of our framework
that works on x86 binaries, and a set of preliminary studies per-
formed using DYTAN. In the first set of studies, we report on our
experience in using DYTAN to implement two tainting-based ap-
proaches presented in the literature. Although preliminary, our ex-
perience shows that we were able to implement these approaches
completely and with little effort. The second set of studies illus-
trates how the specific characteristics of a tainting approach can
affect efficiency and accuracy of the taint analysis. In particular, we
investigate how ignoring control-flow related propagation and over-
looking some data-flow aspects can lead to unsafety. These results
further justify the usefulness of experimenting with different varia-
tions of dynamic taint analysis and assessing their tradeoffs, which
can be done with limited effort using our framework. The second
set of studies also shows the practical applicability of DYTAN, by
successfully running it on the FIREFOX web browser.
196
Effective Memory Protection Using Dynamic Tainting
James Clause, Ioannis Doudalis, Alessandro Orso, and Milos Prvulovic
College of Computing
Georgia Institute of Technology
{clause|idoud|orso|milos}@cc.gatech.edu
ABSTRACT
Programs written in languages that provide direct access to memory
through pointers often contain memory-related faults, which may
cause non-deterministic failures and even security vulnerabilities.
In this paper, we present a new technique based on dynamic taint-
ing for protecting programs from illegal memory accesses. When
memory is allocated, at runtime, our technique taints both the mem-
ory and the corresponding pointer using the same taint mark. Taint
marks are then suitably propagated while the program executes and
are checked every time a memory address m is accessed through a
pointer p; if the taint marks associated with m and p differ, the ex-
ecution is stopped and the illegal access is reported. To allow for a
low-overhead, hardware-assisted implementation of the approach,
we make several key technical and engineering decisions in the
definition of our technique. In particular, we use a configurable,
low number of reusable taint marks instead of a unique mark for
each area of memory allocated, which reduces the overhead of the
approach without limiting its flexibility and ability to target most
memory-related faults and attacks known to date. We also define
the technique at the binary level, which lets us handle the (very)
common case of applications that use third-party libraries whose
source code is unavailable. To investigate the effectiveness and
practicality of our approach, we implemented it for heap-allocated
memory and performed a preliminary empirical study on a set of
programs. Our results show that (1) our technique can identify a
large class of memory-related faults, even when using only two
unique taint marks, and (2) a hardware-assisted implementation of
the technique could achieve overhead in the single digits.
Categories and Subject Descriptors: D.2.5 [Software Engineering]: Test-
ing and Debugging; C.0 [General]: Hardware/Software Interfaces;
General Terms: Performance, Security
Keywords: Illegal memory accesses, dynamic tainting, hardware support
1. INTRODUCTION
Memory-related faults are a serious problem for languages that
allow direct memory access through pointers. An important class
of memory-related faults are what we call illegal memory accesses.
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.
ASE’07, November 5–9, 2007, Atlanta, Georgia, USA.
Copyright 2007 ACM 978-1-59593-882-4/07/0011 ...$5.00.
In languages such as C and C++, when memory allocation is re-
quested, a currently-free area of memory m of the specified size
is reserved. After m has been allocated, its initial address can be
assigned to a pointer p, either immediately (e.g., in the case of
heap allocated memory) or at a later time (e.g., when retrieving
and storing the address of a local variable). From that point on,
the only legal accesses to m through a pointer are accesses per-
formed through p or through other pointers derived from p. (In
Section 3, we clearly define what it means to derive a pointer from
another pointer.) All other accesses to m are Illegal Memory Ac-
cesses (IMAs), that is, accesses where a pointer is used to access
memory outside the bounds of the memory area with which it was
originally associated.
IMAs are especially relevant for several reasons. First, they are
caused by typical programming errors, such as array-out-of-bounds
accesses and NULL pointer dereferences, and are thus widespread
and common. Second, they often result in non-deterministic fail-
ures that are hard to identify and diagnose; the specific effects of an
IMA depend on several factors, such as memory layout, that may
vary between executions. Finally, many security concerns such as
viruses, worms, and rootkits use IMAs as their injection vectors.
In this paper, we present a new dynamic technique for protecting
programs against IMAs that is effective against most known types
of illegal accesses. The basic idea behind the technique is to use
dynamic tainting (or dynamic information flow) [8] to keep track
of which memory areas can be accessed through which pointers,
as follows. At runtime, our technique taints both allocated mem-
ory and pointers using taint marks. Dynamic taint propagation, to-
gether with a suitable handling of memory-allocation and deallo-
cation operations, ensures that taint marks are appropriately prop-
agated during execution. Every time the program accesses some
memory through a pointer, our technique checks whether the ac-
cess is legal by comparing the taint mark associated with the mem-
ory and the taint mark associated with the pointer used to access it.
If the marks match, the access is considered legitimate. Otherwise,
the execution is stopped and an IMA is reported.
In defining our approach, our final goal is the development of a
low-overhead, hardware-assisted tool that is practical and can be
used on deployed software. A hardware-assisted tool is a tool that
leverages the benefits of both hardware and software. Typically,
some performance critical aspects are moved to the hardware to
achieve maximum efficiency, while software is used to perform op-
erations that would be too complex to implement in hardware.
There are two main characteristics of our approach that were de-
fined to help achieve our goal of a hardware-assisted implementa-
tion. The first characteristic is that our technique only uses a small,
configurable number of reusable taint marks instead of a unique
mark for each area of memory allocated. Using a low number of
283
Penumbra: Automatically Identifying Failure-Relevant
Inputs Using Dynamic Tainting
James Clause
College of Computing
Georgia Institute of Technology
clause@cc.gatech.edu
Alessandro Orso
College of Computing
Georgia Institute of Technology
orso@cc.gatech.edu
ABSTRACT
Most existing automated debugging techniques focus on re-
ducing the amount of code to be inspected and tend to ig-
nore an important component of software failures: the in-
puts that cause the failure to manifest. In this paper, we
present a new technique based on dynamic tainting for au-
tomatically identifying subsets of a program’s inputs that
are relevant to a failure. The technique (1) marks program
inputs when they enter the application, (2) tracks them as
they propagate during execution, and (3) identifies, for an
observed failure, the subset of inputs that are potentially
relevant for debugging that failure. To investigate feasibil-
ity and usefulness of our technique, we created a prototype
tool, penumbra, and used it to evaluate our technique on
several failures in real programs. Our results are promising,
as they show that penumbra can point developers to inputs
that are actually relevant for investigating a failure and can
be more practical than existing alternative approaches.
Categories and Subject Descriptors
D.2.5 [Software Engineering]: Testing and Debugging
General Terms
Algorithms, Experimentation, Reliability
Keywords
Failure-relevant inputs, automated debugging, dynamic in-
formation flow, dynamic tainting
1. INTRODUCTION
Debugging is known to be a labor-intensive, time-consum-
ing task that can be responsible for a large portion of soft-
ware development and maintenance costs [21,23]. Common
characteristics of modern software, such as increased con-
figurability, larger code bases, and increased input sizes, in-
troduce new challenges for debugging and exacerbate exist-
ing problems. In response, researchers have proposed many
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.
ISSTA’09, July 19–23, 2009, Chicago, Illinois, USA.
Copyright 2009 ACM 978-1-60558-338-9/09/07 ...$5.00.
semi- and fully-automated techniques that attempt to re-
duce the cost of debugging (e.g., [8,9,11–13,18,24,25,27]).
The majority of these techniques are code-centric in that
they focus exclusively on one aspect of debugging—trying
to identify the faulty statements responsible for a failure.
Although code-centric approaches can work well in some
cases (e.g., for isolated faults that involve a single state-
ment), they are often inadequate for more complex faults [4].
Faults of omission, for instance, where part of a specification
has not been implemented, are notoriously problematic for
debugging techniques that attempt to identify potentially
faulty statements. The usefulness of code-centric techniques
is also limited in the case of long-running programs and pro-
grams that process large amounts of information; failures in
these types of programs are typically di⌅cult to understand
without considering the data involved in such failures.
To debug failures more e ectively, it is necessary to pro-
vide developers with not only a relevant subset of state-
ments, but also a relevant subset of inputs. There are only
a few existing techniques that attempt to identify relevant
inputs [3, 17, 25], with delta debugging [25] being the most
known of these. Although delta debugging has been shown
to be an e ective technique for automatic debugging, it also
has several drawbacks that may limit its usefulness in prac-
tice. In particular, it requires (1) multiple executions of the
program being debugged, which can involve a long running
time, and (2) complex oracles and setup, which can result
in a large amount of manual e ort [2].
In this paper, we present a novel debugging technique that
addresses many of the limitations of existing approaches.
Our technique can complement code-centric debugging tech-
niques because it focuses on identifying program inputs that
are likely to be relevant for a given failure. It also overcomes
some of the drawbacks of delta debugging because it needs
a single execution to identify failure-relevant inputs and re-
quires minimal manual e ort.
Given an observable faulty behavior and a set of failure-
inducing inputs (i.e., a set of inputs that cause such behav-
ior), our technique automatically identifies failure-relevant
inputs (i.e., a subset of failure-inducing inputs that are ac-
tually relevant for investigating the faulty behavior). Our
approach is based on dynamic tainting. Intuitively, the tech-
nique works by tracking the flow of inputs along data and
control dependences at runtime. When a point of failure
is reached, the tracked information is used to identify and
present to developers the failure-relevant inputs. At this
point, developers can use the identified inputs to investigate
the failure at hand.
LEAKPOINT: Pinpointing the Causes of Memory Leaks
James Clause
College of Computing
Georgia Institute of Technology
clause@cc.gatech.edu
Alessandro Orso
College of Computing
Georgia Institute of Technology
orso@cc.gatech.edu
ABSTRACT
Most existing leak detection techniques for C and C++ applications
only detect the existence of memory leaks. They do not provide
any help for fixing the underlying memory management errors. In
this paper, we present a new technique that not only detects leaks,
but also points developers to the locations where the underlying
errors may be fixed. Our technique tracks pointers to dynamically-
allocated areas of memory and, for each memory area, records sev-
eral pieces of relevant information. This information is used to
identify the locations in an execution where memory leaks occur.
To investigate our technique’s feasibility and usefulness, we devel-
oped a prototype tool called LEAKPOINT and used it to perform
an empirical evaluation. The results of this evaluation show that
LEAKPOINT detects at least as many leaks as existing tools, reports
zero false positives, and, most importantly, can be effective at help-
ing developers fix the underlying memory management errors.
Categories and Subject Descriptors
D.2.5 [Software Engineering]: Testing and Debugging
General Terms
Performance, Reliability
Keywords
Leak detection, Dynamic tainting
1. INTRODUCTION
Memory leaks are a type of unintended memory consumption
that can adversely impact the performance and correctness of an
application. In programs written in languages such as C and C++,
memory is allocated using allocation functions, such as malloc
and new. Allocation functions reserve a currently free area of
memory m and return a pointer p that points to m’s starting ad-
dress. Typically, the program stores and then uses p, or another
This work was supported in part by NSF awards CCF-0725202
and CCF-0541080 to Georgia Tech.
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.
ICSE ’10, May 2-8 2010, Cape Town, South Africa
Copyright 2010 ACM 978-1-60558-719-6/10/05 ...$10.00.
pointer derived from p, to interact with m. When m is no longer
needed, the program should pass p to a deallocation function (e.g.,
free or delete) to deallocate m. A leak occurs if, due to a
memory management error, m is not deallocated at the appropri-
ate time. There are two types of memory leaks: lost memory and
forgotten memory. Lost memory refers to the situation where m be-
comes unreachable (i.e., the program overwrites or loses p and all
pointers derived from p) without first being deallocated. Forgotten
memory refers to the situation where m remains reachable but is
not deallocated or accessed in the rest of the execution.
Memory leaks are relevant for several reasons. First, they are dif-
ficult to detect. Unlike many other types of failures, memory leaks
do not immediately produce an easily visible symptom (e.g., a crash
or the output of a wrong value); typically, leaks remain unobserved
until they consume a large portion of the memory available to a sys-
tem. Second, leaks have the potential to impact not only the appli-
cation that leaks memory, but also every other application running
on the system; because the overall amount of memory is limited,
as the memory usage of a leaking program increases, less memory
is available to other running applications. Consequently, the per-
formance and correctness of every running application can be im-
pacted by a program that leaks memory. Third, leaks are common,
even in mature applications. For example, in the first half of 2009,
over 100 leaks in the Firefox web-browser were reported [18].
Because of the serious consequences and common occurrence of
memory leaks, researchers have created many static and dynamic
techniques for detecting them (e.g., [1,2,4,7–14,16,17,20–23,25,
27,28]). The adoption of static techniques has been limited by sev-
eral factors, including the lack of scalable, precise heap modeling.
Dynamic techniques are therefore more widely used in practice. In
general, dynamic techniques provide one main piece of informa-
tion: the location in an execution where a leaked area of memory is
allocated. This location is supposed to serve as a starting point for
investigating the leak. However, in many situations, this informa-
tion does not provide any insight on where or how to fix the mem-
ory management error that causes the leak: the allocation location
and the location of the memory management error are typically in
completely different parts of the application’s code.
To address this limitation of existing approaches, we propose
a new memory leak detection technique. Our technique provides
the same information as existing techniques but also identifies the
locations in an execution where leaks occur. In the case of lost
memory, the location is defined as the point in an execution where
the last pointer to an unallocated memory area is lost or overwritten.
In the case of forgotten memory, the location is defined as the last
point in an execution where a pointer to a leaked area of memory
was used (e.g., when it is dereferenced to read or write memory,
passed as a function argument, returned from a function, or used as
Camouflage: Automated Sanitization of Field Data
James Clause
College of Computing
Georgia Institute of Technology
clause@cc.gatech.edu
Alessandro Orso
College of Computing
Georgia Institute of Technology
orso@cc.gatech.edu
ABSTRACT
Privacy and security concerns have adversely a ected the
usefulness of many types of techniques that leverage infor-
mation gathered from deployed applications. To address this
issue, we present a new approach for automatically sanitiz-
ing failure-inducing inputs. Given an input I that causes
a failure f, our technique can generate a sanitized input I
that is di erent from I but still causes f. I can then be sent
to the developers to help them debug f, without revealing
the possibly sensitive information contained in I. We im-
plemented our approach in a prototype tool, camouflage,
and performed an empirical evaluation. In the evaluation,
we applied camouflage to a large set of failure-inducing
inputs for several real applications. The results of the eval-
uation are promising; they show that camouflage is both
practical and e ective at generating sanitized inputs. In par-
ticular, for the inputs that we considered, I and I shared
no sensitive information.
1. INTRODUCTION
Investigating techniques that capture data from deployed
applications to support in-house software engineering tasks
is an increasingly active and successful area of research (e.g.,
[1,3–5,13,14,17,21,22,26,27,29]). However, privacy and se-
curity concerns have prevented widespread adoption of many
of these techniques and, because they rely on user partici-
pation, have ultimately limited their usefulness. Many of
the earlier proposed techniques attempt to sidestep these
concerns by collecting only limited amounts of information
(e.g., stack traces and register dumps [1, 3, 5] or sampled
branch profiles [26,27]) and providing a privacy policy that
specifies how the information will be used (e.g., [2,8]). Be-
cause the types of information collected by these techniques
are unlikely to be sensitive, users are more willing to trust
developers. Moreover, because only a small amount of infor-
mation is collected, it is feasible for users to manually inspect
and sanitize such information before it is sent to developers.
Unfortunately, recent research has shown that the e ec-
tiveness of these techniques increases when they can lever-
age large amounts of detailed information (e.g., complete
execution recordings [4, 14] or path profiles [13, 24]). Since
more detailed information is bound to contain sensitive data,
users will most likely be unwilling to let developers collect
such information. In addition, collecting large amounts of
information would make it infeasible for users to sanitize
the collected information by hand. To address this prob-
lem, some of these techniques suggest using an input mini-
mization approach (e.g., [6, 7, 35]) to reduce the number of
failure-inducing inputs and, hopefully, eliminate some sensi-
tive information. Input-minimization techniques, however,
were not designed to specifically reduce sensitive inputs, so
they can only eliminate sensitive data by chance. In or-
der for techniques that leverage captured field information
to become widely adopted and achieve their full potential,
new approaches for addressing privacy and security concerns
must be developed.
In this paper, we present a novel technique that addresses
privacy and security concerns by sanitizing information cap-
tured from deployed applications. Our technique is designed
to be used in conjunction with an execution capture/replay
technique (e.g., [4, 14]). Given an execution recording that
contains a captured failure-inducing input I = i1, i2, . . . in⇥
and terminates with a failure f, our technique replays the
execution recording and leverages a specialized version of
symbolic-execution to automatically produce I , a sanitized
version of I, such that I (1) still causes f and (2) reveals as
little information about I as possible. A modified execution
recording where I replaces I can then be constructed and
sent to the developers, who can use it to debug f.
It is, in general, impossible to construct I such that it
does not reveal any information about I while still caus-
ing the same failure f. Typically, the execution of f would
depend on the fact that some elements of I have specific
values (e.g., i1 must be 0 for the failing path to be taken).
However, this fact does not prevent the technique from be-
ing useful in practice. In our evaluation, we found that the
information revealed by the sanitized inputs was not sensi-
tive and tended to be structural in nature (e.g., a specific
portion of the input must be surrounded by double quotes).
Conversely, the parts of the inputs that were more likely to
be sensitive (e.g., values contained inside the double quotes)
were not revealed (see Section 4).
To evaluate the e ectiveness of our technique, we imple-
mented it in a prototype tool, called camouflage, and car-
ried out an empirical evaluation of 170 failure-inducing in-
1
CC 05 ICSE 05 ICSE 07 ISSTA 07 ASE 07 ISSTA 09 ICSE 10 Tech Rept
RESEARCH OVERVIEW
Efficient instrumentation
Jazz: A Tool for Demand-Driven Structural
Testing
Jonathan Misurda1
, Jim Clause1
, Juliya Reed1
, Bruce R. Childers1
, and Mary
Lou So a2
1
University of Pittsburgh, Pittsburgh PA 15260, USA,
{jmisurda,clausej,juliya,childers}@cs.pitt.edu
2
University of Virginia, Charlottesville VA 22904, USA,
soffa@cs.virginia.edu
Abstract. Software testing to produce reliable and robust software has
become vitally important. Testing is a process by which quality can be
assured through the collection of information about software. While test-
ing can improve software quality, current tools typically are inflexible
and have high overheads, making it a challenge to test large projects.
We describe a new scalable and flexible tool, called Jazz, that uses a
demand-driven structural testing approach. Jazz has a low overhead of
only 17.6% for branch testing.
1 Introduction
In the last several years, the importance of producing high quality and robust
software has become paramount. Testing is an important process to support
quality assurance by gathering information about the software being developed
or modified. It is, in general, extremely labor and resource intensive, accounting
for 50-60% of the total cost of software development [1]. The increased emphasis
on software quality and robustness mandates improved testing methodologies.
To test software, a number of techniques can be applied. One class of tech-
niques is structural testing, which checks that a given coverage criterion is sat-
isfied. For example, branch testing checks that a certain percentage of branches
are executed. Other structural tests include def-use testing in which pairs of
variable definitions and uses are checked for coverage and node testing in which
nodes in a program’s control flow graph are checked.
Unfortunately, structural testing is often hindered by the lack of scalable
and flexible tools. Current tools are not scalable in terms of both time and
memory, limiting the number and scope of the tests that can be applied to large
programs. These tools often modify the software binary to insert instrumentation
for testing. In this case, the tested version of the application is not the same
version that is shipped to customers and errors may remain. Testing tools are
usually inflexible and only implement certain types of testing. For example, many
tools implement branch testing, but do not implement node or def-use testing.
In this paper, we describe a new tool for structural testing, called Jazz, that
addresses these problems. Jazz uses a novel demand-driven technique to apply
ABSTRACT
Producing reliable and robust software has become one
of the most important software development concerns in
recent years. Testing is a process by which software
quality can be assured through the collection of infor-
mation. While testing can improve software reliability,
current tools typically are inflexible and have high over-
heads, making it challenging to test large software
projects. In this paper, we describe a new scalable and
flexible framework for testing programs with a novel
demand-driven approach based on execution paths to
implement test coverage. This technique uses dynamic
instrumentation on the binary code that can be inserted
and removed on-the-fly to keep performance and mem-
ory overheads low. We describe and evaluate implemen-
tations of the framework for branch, node and def-use
testing of Java programs. Experimental results for
branch testing show that our approach has, on average, a
1.6 speed up over static instrumentation and also uses
less memory.
Categories and Subject Descriptors
D.2.5. [Software Engineering]: Testing and Debug-
ging—Testing tools; D.3.3. [Programming Lan-
guages]: Language Constructs and Features—Program
instrumentation, run-time environments
General Terms
Experimentation, Measurement, Verification
Keywords
Testing, Code Coverage, Structural Testing, Demand-
Driven Instrumentation, Java Programming Language
1. INTRODUCTION
In the last several years, the importance of produc-
ing high quality and robust software has become para-
mount [15]. Testing is an important process to support
quality assurance by gathering information about the
behavior of the software being developed or modified. It
is, in general, extremely labor and resource intensive,
accounting for 50-60% of the total cost of software
development [17]. Given the importance of testing, it is
imperative that there are appropriate testing tools and
frameworks. In order to adequately test software, a
number of different testing techniques must be per-
formed. One class of testing techniques used extensively
is structural testing in which properties of the software
code are used to ensure a certain code coverage.Struc-
tural testing techniques include branch testing, node
testing, path testing, and def-use testing [6,7,8,17,19].
Typically, a testing tool targets one type of struc-
tural test, and the software unit is the program, file or
particular methods. In order to apply various structural
testing techniques, different tools must be used. If a tool
for a particular type of structural testing is not available,
the tester would need to either implement it or not use
that testing technique. The tester would also be con-
strained by the region of code to be tested, as deter-
mined by the tool implementor. For example, it may not
be possible for the tester to focus on a particular region
of code, such as a series of loops, complicated condi-
tionals, or particular variables if def-use testing is
desired. The user may want to have higher coverage on
frequently executed regions of code. Users may want to
define their own way of testing. For example, all
branches should be covered 10 times rather than once in
all loops.
In structural testing, instrumentation is placed at
certain code points (probes). Whenever such a program
point is reached, code that performs the function for the
test (payload) is executed. The probes in def-use testing
are dictated by the definitions and uses of variables and
the payload is to mark that a definition or use in a def-
use pair has been covered. Thus for each type of struc-
tural testing, there is a testing “plan”. A test plan is a
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies
are not made or distributed for profit or commercial advantage and
that copies bear this notice and the full citation on the first page. To
copy otherwise, or republish, to post on servers or to redistribute to
lists, requires prior specific permission and/or a fee.
ICSE'05, May 15-21, 2005, St. Louis, Missouri, USA.
Copyright 2005 ACM 1-58113-963-2/05/0005...$5.00.
Demand-Driven Structural Testing with Dynamic
Instrumentation
Jonathan Misurda†, James A. Clause†, Juliya L. Reed†, Bruce R. Childers†, and
Mary Lou Soffa‡
†Department of Computer Science
University of Pittsburgh
Pittsburgh, Pennsylvania 15260
{jmisurda, clausej, juliya, childers}@cs.pitt.edu
‡Department of Computer Science
University of Virginia
Charlottesville, Virginia 22904
soffa@cs.virginia.edu
156
A Technique for Enabling and Supporting Debugging of Field Failures
James Clause and Alessandro Orso
College of Computing
Georgia Institute of Technology
{clause, orso}@cc.gatech.edu
Abstract
It is difficult to fully assess the quality of software in-
house, outside the actual time and context in which it will
execute after deployment. As a result, it is common for
software to manifest field failures, failures that occur on
user machines due to untested behavior. Field failures are
typically difficult to recreate and investigate on developer
platforms, and existing techniques based on crash report-
ing provide only limited support for this task. In this pa-
per, we present a technique for recording, reproducing, and
minimizing failing executions that enables and supports in-
house debugging of field failures. We also present a tool
that implements our technique and an empirical study that
evaluates the technique on a widely used e-mail client.
1. Introduction
Quality-assurance activities, such as software testing and
analysis, are notoriously difficult, expensive, and time-
consuming. As a result, software products are often re-
leased with faults or missing functionality. In fact, real-
world examples of field failures experienced by users be-
cause of untested behaviors (e.g., due to unforeseen us-
ages), are countless. When field failures occur, it is im-
portant for developers to be able to recreate and investigate
them in-house. This pressing need is demonstrated by the
emergence of several crash-reporting systems, such as Mi-
crosoft’s error reporting systems [13] and Apple’s Crash
Reporter [1]. Although these techniques represent a first
important step in addressing the limitations of purely in-
house approaches to quality assurance, they work on lim-
ited data (typically, a snapshot of the execution state) and
can at best identify correlations between a crash report and
data on other known failures.
In this paper, we present a novel technique for reproduc-
ing and investigating field failures that addresses the limita-
tions of existing approaches. Our technique works in three
phases, intuitively illustrated by the scenario in Figure 1. In
the recording phase, while users run the software, the tech-
nique intercepts and logs the interactions between applica-
tion and environment and records portions of the environ-
ment that are relevant to these interactions. If the execution
terminates with a failure, the produced execution recording
is stored for later investigation. In the minimization phase,
using free cycles on the user machines, the technique re-
plays the recorded failing executions with the goal of au-
tomatically eliminating parts of the executions that are not
relevant to the failure. In the replay and debugging phase,
developers can use the technique to replay the minimized
failing executions and investigate the cause of the failures
(e.g., within a debugger). Being able to replay and debug
real field failures can give developers unprecedented insight
into the behavior of their software after deployment and op-
portunities to improve the quality of their software in ways
that were not possible before.
To evaluate our technique, we implemented it in a proto-
type tool, called ADDA (Automated Debugging of Deployed
Applications), and used the tool to perform an empirical
study. The study was performed on PINE [19], a widely-
used e-mail client, and involved the investigation of failures
caused by two real faults in PINE. The results of the study
are promising. Our technique was able to (1) record all ex-
ecutions of PINE (and two other subjects) with a low time
and space overhead, (2) completely replay all recorded exe-
cutions, and (3) perform automated minimization of failing
executions and obtain shorter executions that manifested the
same failures as the original executions. Moreover, we were
able to replay the minimized executions within a debugger,
which shows that they could have actually been used to in-
vestigate the failures.
The contributions of this paper are:
• A novel technique for recording and later replaying exe-
cutions of deployed programs.
• An approach for minimizing failing executions and gen-
erating shorter executions that fail for the same reasons.
• A prototype tool that implements our technique.
• An empirical study that shows the feasibility and effec-
tiveness of the approach.
29th International Conference on Software Engineering (ICSE'07)
0-7695-2828-7/07 $20.00 © 2007
Dytan: A Generic Dynamic Taint Analysis Framework
James Clause, Wanchun Li, and Alessandro Orso
College of Computing
Georgia Institute of Technology
{clause|wli7|orso}@cc.gatech.edu
ABSTRACT
Dynamic taint analysis is gaining momentum. Techniques based
on dynamic tainting have been successfully used in the context of
application security, and now their use is also being explored in dif-
ferent areas, such as program understanding, software testing, and
debugging. Unfortunately, most existing approaches for dynamic
tainting are defined in an ad-hoc manner, which makes it difficult
to extend them, experiment with them, and adapt them to new con-
texts. Moreover, most existing approaches are focused on data-flow
based tainting only and do not consider tainting due to control flow,
which limits their applicability outside the security domain. To
address these limitations and foster experimentation with dynamic
tainting techniques, we defined and developed a general framework
for dynamic tainting that (1) is highly flexible and customizable, (2)
allows for performing both data-flow and control-flow based taint-
ing conservatively, and (3) does not rely on any customized run-
time system. We also present DYTAN, an implementation of our
framework that works on x86 executables, and a set of preliminary
studies that show how DYTAN can be used to implement different
tainting-based approaches with limited effort. In the studies, we
also show that DYTAN can be used on real software, by using FIRE-
FOX as one of our subjects, and illustrate how the specific char-
acteristics of the tainting approach used can affect efficiency and
accuracy of the taint analysis, which further justifies the use of our
framework to experiment with different variants of an approach.
Categories and Subject Descriptors: D.2.5 [Software Engineer-
ing]: Testing and Debugging;
General Terms: Experimentation, Security
Keywords: Dynamic tainting, information flow, general framework
1. INTRODUCTION
Dynamic taint analysis (also known as dynamic information flow
analysis) consists, intuitively, in marking and tracking certain data
in a program at run-time. This type of dynamic analysis is be-
coming increasingly popular. In the context of application secu-
rity, dynamic-tainting approaches have been successfully used to
prevent a wide range of attacks, including buffer overruns (e.g., [8,
17]), format string attacks (e.g., [17, 21]), SQL and command in-
jections (e.g., [7, 19]), and cross-site scripting (e.g., [18]). More
recently, researchers have started to investigate the use of tainting-
based approaches in domains other than security, such as program
understanding, software testing, and debugging (e.g., [11, 13]).
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.
ISSTA’07, July 9–12, 2007, London, England, United Kingdom.
Copyright 2007 ACM 978-1-59593-734-6/07/0007 ...$5.00.
Unfortunately, most existing techniques and tools for dynamic
taint analysis are defined in an ad-hoc manner, to target a specific
problem or a small class of problems. It would be difficult to ex-
tend or adapt such techniques and tools so that they can be used in
other contexts. In particular, most existing approaches are focused
on data-flow based tainting only, and do not consider tainting due
to the control flow within an application, which limits their general
applicability. Also, most existing techniques support either a sin-
gle taint marking or a small, fixed number of markings, which is
problematic in applications such as debugging. Finally, almost no
existing technique handles the propagation of taint markings in a
truly conservative way, which may be appropriate for the specific
applications considered, but is problematic in general. Because de-
veloping support for dynamic taint analysis is not only time con-
suming, but also fairly complex, this lack of flexibility and gener-
ality of existing tools and techniques is especially limiting for this
type of dynamic analysis.
To address these limitations and foster experimentation with dy-
namic tainting techniques, in this paper we present a framework for
dynamic taint analysis. We designed the framework to be general
and flexible, so that it allows for implementing different kinds of
techniques based on dynamic taint analysis with little effort. Users
can leverage the framework to quickly develop prototypes for their
techniques, experiment with them, and investigate trade-offs of dif-
ferent alternatives. For a simple example, the framework could be
used to investigate the cost effectiveness of considering different
types of taint propagation for an application.
Our framework has several advantages over existing approaches.
First, it is highly flexible and customizable. It allows for easily
specifying which program data should be tainted and how, how taint
markings should be propagated at run-time, and where and how
taint markings should be checked. Second, it allows for performing
data-flow and both data-flow and control-flow based tainting. Third,
from a more practical standpoint, it works on binaries, does not
need access to source code, and does not rely on any customized
hardware or operating system, which makes it broadly applicable.
We also present DYTAN, an implementation of our framework
that works on x86 binaries, and a set of preliminary studies per-
formed using DYTAN. In the first set of studies, we report on our
experience in using DYTAN to implement two tainting-based ap-
proaches presented in the literature. Although preliminary, our ex-
perience shows that we were able to implement these approaches
completely and with little effort. The second set of studies illus-
trates how the specific characteristics of a tainting approach can
affect efficiency and accuracy of the taint analysis. In particular, we
investigate how ignoring control-flow related propagation and over-
looking some data-flow aspects can lead to unsafety. These results
further justify the usefulness of experimenting with different varia-
tions of dynamic taint analysis and assessing their tradeoffs, which
can be done with limited effort using our framework. The second
set of studies also shows the practical applicability of DYTAN, by
successfully running it on the FIREFOX web browser.
196
Effective Memory Protection Using Dynamic Tainting
James Clause, Ioannis Doudalis, Alessandro Orso, and Milos Prvulovic
College of Computing
Georgia Institute of Technology
{clause|idoud|orso|milos}@cc.gatech.edu
ABSTRACT
Programs written in languages that provide direct access to memory
through pointers often contain memory-related faults, which may
cause non-deterministic failures and even security vulnerabilities.
In this paper, we present a new technique based on dynamic taint-
ing for protecting programs from illegal memory accesses. When
memory is allocated, at runtime, our technique taints both the mem-
ory and the corresponding pointer using the same taint mark. Taint
marks are then suitably propagated while the program executes and
are checked every time a memory address m is accessed through a
pointer p; if the taint marks associated with m and p differ, the ex-
ecution is stopped and the illegal access is reported. To allow for a
low-overhead, hardware-assisted implementation of the approach,
we make several key technical and engineering decisions in the
definition of our technique. In particular, we use a configurable,
low number of reusable taint marks instead of a unique mark for
each area of memory allocated, which reduces the overhead of the
approach without limiting its flexibility and ability to target most
memory-related faults and attacks known to date. We also define
the technique at the binary level, which lets us handle the (very)
common case of applications that use third-party libraries whose
source code is unavailable. To investigate the effectiveness and
practicality of our approach, we implemented it for heap-allocated
memory and performed a preliminary empirical study on a set of
programs. Our results show that (1) our technique can identify a
large class of memory-related faults, even when using only two
unique taint marks, and (2) a hardware-assisted implementation of
the technique could achieve overhead in the single digits.
Categories and Subject Descriptors: D.2.5 [Software Engineering]: Test-
ing and Debugging; C.0 [General]: Hardware/Software Interfaces;
General Terms: Performance, Security
Keywords: Illegal memory accesses, dynamic tainting, hardware support
1. INTRODUCTION
Memory-related faults are a serious problem for languages that
allow direct memory access through pointers. An important class
of memory-related faults are what we call illegal memory accesses.
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.
ASE’07, November 5–9, 2007, Atlanta, Georgia, USA.
Copyright 2007 ACM 978-1-59593-882-4/07/0011 ...$5.00.
In languages such as C and C++, when memory allocation is re-
quested, a currently-free area of memory m of the specified size
is reserved. After m has been allocated, its initial address can be
assigned to a pointer p, either immediately (e.g., in the case of
heap allocated memory) or at a later time (e.g., when retrieving
and storing the address of a local variable). From that point on,
the only legal accesses to m through a pointer are accesses per-
formed through p or through other pointers derived from p. (In
Section 3, we clearly define what it means to derive a pointer from
another pointer.) All other accesses to m are Illegal Memory Ac-
cesses (IMAs), that is, accesses where a pointer is used to access
memory outside the bounds of the memory area with which it was
originally associated.
IMAs are especially relevant for several reasons. First, they are
caused by typical programming errors, such as array-out-of-bounds
accesses and NULL pointer dereferences, and are thus widespread
and common. Second, they often result in non-deterministic fail-
ures that are hard to identify and diagnose; the specific effects of an
IMA depend on several factors, such as memory layout, that may
vary between executions. Finally, many security concerns such as
viruses, worms, and rootkits use IMAs as their injection vectors.
In this paper, we present a new dynamic technique for protecting
programs against IMAs that is effective against most known types
of illegal accesses. The basic idea behind the technique is to use
dynamic tainting (or dynamic information flow) [8] to keep track
of which memory areas can be accessed through which pointers,
as follows. At runtime, our technique taints both allocated mem-
ory and pointers using taint marks. Dynamic taint propagation, to-
gether with a suitable handling of memory-allocation and deallo-
cation operations, ensures that taint marks are appropriately prop-
agated during execution. Every time the program accesses some
memory through a pointer, our technique checks whether the ac-
cess is legal by comparing the taint mark associated with the mem-
ory and the taint mark associated with the pointer used to access it.
If the marks match, the access is considered legitimate. Otherwise,
the execution is stopped and an IMA is reported.
In defining our approach, our final goal is the development of a
low-overhead, hardware-assisted tool that is practical and can be
used on deployed software. A hardware-assisted tool is a tool that
leverages the benefits of both hardware and software. Typically,
some performance critical aspects are moved to the hardware to
achieve maximum efficiency, while software is used to perform op-
erations that would be too complex to implement in hardware.
There are two main characteristics of our approach that were de-
fined to help achieve our goal of a hardware-assisted implementa-
tion. The first characteristic is that our technique only uses a small,
configurable number of reusable taint marks instead of a unique
mark for each area of memory allocated. Using a low number of
283
Penumbra: Automatically Identifying Failure-Relevant
Inputs Using Dynamic Tainting
James Clause
College of Computing
Georgia Institute of Technology
clause@cc.gatech.edu
Alessandro Orso
College of Computing
Georgia Institute of Technology
orso@cc.gatech.edu
ABSTRACT
Most existing automated debugging techniques focus on re-
ducing the amount of code to be inspected and tend to ig-
nore an important component of software failures: the in-
puts that cause the failure to manifest. In this paper, we
present a new technique based on dynamic tainting for au-
tomatically identifying subsets of a program’s inputs that
are relevant to a failure. The technique (1) marks program
inputs when they enter the application, (2) tracks them as
they propagate during execution, and (3) identifies, for an
observed failure, the subset of inputs that are potentially
relevant for debugging that failure. To investigate feasibil-
ity and usefulness of our technique, we created a prototype
tool, penumbra, and used it to evaluate our technique on
several failures in real programs. Our results are promising,
as they show that penumbra can point developers to inputs
that are actually relevant for investigating a failure and can
be more practical than existing alternative approaches.
Categories and Subject Descriptors
D.2.5 [Software Engineering]: Testing and Debugging
General Terms
Algorithms, Experimentation, Reliability
Keywords
Failure-relevant inputs, automated debugging, dynamic in-
formation flow, dynamic tainting
1. INTRODUCTION
Debugging is known to be a labor-intensive, time-consum-
ing task that can be responsible for a large portion of soft-
ware development and maintenance costs [21,23]. Common
characteristics of modern software, such as increased con-
figurability, larger code bases, and increased input sizes, in-
troduce new challenges for debugging and exacerbate exist-
ing problems. In response, researchers have proposed many
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.
ISSTA’09, July 19–23, 2009, Chicago, Illinois, USA.
Copyright 2009 ACM 978-1-60558-338-9/09/07 ...$5.00.
semi- and fully-automated techniques that attempt to re-
duce the cost of debugging (e.g., [8,9,11–13,18,24,25,27]).
The majority of these techniques are code-centric in that
they focus exclusively on one aspect of debugging—trying
to identify the faulty statements responsible for a failure.
Although code-centric approaches can work well in some
cases (e.g., for isolated faults that involve a single state-
ment), they are often inadequate for more complex faults [4].
Faults of omission, for instance, where part of a specification
has not been implemented, are notoriously problematic for
debugging techniques that attempt to identify potentially
faulty statements. The usefulness of code-centric techniques
is also limited in the case of long-running programs and pro-
grams that process large amounts of information; failures in
these types of programs are typically di⌅cult to understand
without considering the data involved in such failures.
To debug failures more e ectively, it is necessary to pro-
vide developers with not only a relevant subset of state-
ments, but also a relevant subset of inputs. There are only
a few existing techniques that attempt to identify relevant
inputs [3, 17, 25], with delta debugging [25] being the most
known of these. Although delta debugging has been shown
to be an e ective technique for automatic debugging, it also
has several drawbacks that may limit its usefulness in prac-
tice. In particular, it requires (1) multiple executions of the
program being debugged, which can involve a long running
time, and (2) complex oracles and setup, which can result
in a large amount of manual e ort [2].
In this paper, we present a novel debugging technique that
addresses many of the limitations of existing approaches.
Our technique can complement code-centric debugging tech-
niques because it focuses on identifying program inputs that
are likely to be relevant for a given failure. It also overcomes
some of the drawbacks of delta debugging because it needs
a single execution to identify failure-relevant inputs and re-
quires minimal manual e ort.
Given an observable faulty behavior and a set of failure-
inducing inputs (i.e., a set of inputs that cause such behav-
ior), our technique automatically identifies failure-relevant
inputs (i.e., a subset of failure-inducing inputs that are ac-
tually relevant for investigating the faulty behavior). Our
approach is based on dynamic tainting. Intuitively, the tech-
nique works by tracking the flow of inputs along data and
control dependences at runtime. When a point of failure
is reached, the tracked information is used to identify and
present to developers the failure-relevant inputs. At this
point, developers can use the identified inputs to investigate
the failure at hand.
LEAKPOINT: Pinpointing the Causes of Memory Leaks
James Clause
College of Computing
Georgia Institute of Technology
clause@cc.gatech.edu
Alessandro Orso
College of Computing
Georgia Institute of Technology
orso@cc.gatech.edu
ABSTRACT
Most existing leak detection techniques for C and C++ applications
only detect the existence of memory leaks. They do not provide
any help for fixing the underlying memory management errors. In
this paper, we present a new technique that not only detects leaks,
but also points developers to the locations where the underlying
errors may be fixed. Our technique tracks pointers to dynamically-
allocated areas of memory and, for each memory area, records sev-
eral pieces of relevant information. This information is used to
identify the locations in an execution where memory leaks occur.
To investigate our technique’s feasibility and usefulness, we devel-
oped a prototype tool called LEAKPOINT and used it to perform
an empirical evaluation. The results of this evaluation show that
LEAKPOINT detects at least as many leaks as existing tools, reports
zero false positives, and, most importantly, can be effective at help-
ing developers fix the underlying memory management errors.
Categories and Subject Descriptors
D.2.5 [Software Engineering]: Testing and Debugging
General Terms
Performance, Reliability
Keywords
Leak detection, Dynamic tainting
1. INTRODUCTION
Memory leaks are a type of unintended memory consumption
that can adversely impact the performance and correctness of an
application. In programs written in languages such as C and C++,
memory is allocated using allocation functions, such as malloc
and new. Allocation functions reserve a currently free area of
memory m and return a pointer p that points to m’s starting ad-
dress. Typically, the program stores and then uses p, or another
This work was supported in part by NSF awards CCF-0725202
and CCF-0541080 to Georgia Tech.
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.
ICSE ’10, May 2-8 2010, Cape Town, South Africa
Copyright 2010 ACM 978-1-60558-719-6/10/05 ...$10.00.
pointer derived from p, to interact with m. When m is no longer
needed, the program should pass p to a deallocation function (e.g.,
free or delete) to deallocate m. A leak occurs if, due to a
memory management error, m is not deallocated at the appropri-
ate time. There are two types of memory leaks: lost memory and
forgotten memory. Lost memory refers to the situation where m be-
comes unreachable (i.e., the program overwrites or loses p and all
pointers derived from p) without first being deallocated. Forgotten
memory refers to the situation where m remains reachable but is
not deallocated or accessed in the rest of the execution.
Memory leaks are relevant for several reasons. First, they are dif-
ficult to detect. Unlike many other types of failures, memory leaks
do not immediately produce an easily visible symptom (e.g., a crash
or the output of a wrong value); typically, leaks remain unobserved
until they consume a large portion of the memory available to a sys-
tem. Second, leaks have the potential to impact not only the appli-
cation that leaks memory, but also every other application running
on the system; because the overall amount of memory is limited,
as the memory usage of a leaking program increases, less memory
is available to other running applications. Consequently, the per-
formance and correctness of every running application can be im-
pacted by a program that leaks memory. Third, leaks are common,
even in mature applications. For example, in the first half of 2009,
over 100 leaks in the Firefox web-browser were reported [18].
Because of the serious consequences and common occurrence of
memory leaks, researchers have created many static and dynamic
techniques for detecting them (e.g., [1,2,4,7–14,16,17,20–23,25,
27,28]). The adoption of static techniques has been limited by sev-
eral factors, including the lack of scalable, precise heap modeling.
Dynamic techniques are therefore more widely used in practice. In
general, dynamic techniques provide one main piece of informa-
tion: the location in an execution where a leaked area of memory is
allocated. This location is supposed to serve as a starting point for
investigating the leak. However, in many situations, this informa-
tion does not provide any insight on where or how to fix the mem-
ory management error that causes the leak: the allocation location
and the location of the memory management error are typically in
completely different parts of the application’s code.
To address this limitation of existing approaches, we propose
a new memory leak detection technique. Our technique provides
the same information as existing techniques but also identifies the
locations in an execution where leaks occur. In the case of lost
memory, the location is defined as the point in an execution where
the last pointer to an unallocated memory area is lost or overwritten.
In the case of forgotten memory, the location is defined as the last
point in an execution where a pointer to a leaked area of memory
was used (e.g., when it is dereferenced to read or write memory,
passed as a function argument, returned from a function, or used as
Camouflage: Automated Sanitization of Field Data
James Clause
College of Computing
Georgia Institute of Technology
clause@cc.gatech.edu
Alessandro Orso
College of Computing
Georgia Institute of Technology
orso@cc.gatech.edu
ABSTRACT
Privacy and security concerns have adversely a ected the
usefulness of many types of techniques that leverage infor-
mation gathered from deployed applications. To address this
issue, we present a new approach for automatically sanitiz-
ing failure-inducing inputs. Given an input I that causes
a failure f, our technique can generate a sanitized input I
that is di erent from I but still causes f. I can then be sent
to the developers to help them debug f, without revealing
the possibly sensitive information contained in I. We im-
plemented our approach in a prototype tool, camouflage,
and performed an empirical evaluation. In the evaluation,
we applied camouflage to a large set of failure-inducing
inputs for several real applications. The results of the eval-
uation are promising; they show that camouflage is both
practical and e ective at generating sanitized inputs. In par-
ticular, for the inputs that we considered, I and I shared
no sensitive information.
1. INTRODUCTION
Investigating techniques that capture data from deployed
applications to support in-house software engineering tasks
is an increasingly active and successful area of research (e.g.,
[1,3–5,13,14,17,21,22,26,27,29]). However, privacy and se-
curity concerns have prevented widespread adoption of many
of these techniques and, because they rely on user partici-
pation, have ultimately limited their usefulness. Many of
the earlier proposed techniques attempt to sidestep these
concerns by collecting only limited amounts of information
(e.g., stack traces and register dumps [1, 3, 5] or sampled
branch profiles [26,27]) and providing a privacy policy that
specifies how the information will be used (e.g., [2,8]). Be-
cause the types of information collected by these techniques
are unlikely to be sensitive, users are more willing to trust
developers. Moreover, because only a small amount of infor-
mation is collected, it is feasible for users to manually inspect
and sanitize such information before it is sent to developers.
Unfortunately, recent research has shown that the e ec-
tiveness of these techniques increases when they can lever-
age large amounts of detailed information (e.g., complete
execution recordings [4, 14] or path profiles [13, 24]). Since
more detailed information is bound to contain sensitive data,
users will most likely be unwilling to let developers collect
such information. In addition, collecting large amounts of
information would make it infeasible for users to sanitize
the collected information by hand. To address this prob-
lem, some of these techniques suggest using an input mini-
mization approach (e.g., [6, 7, 35]) to reduce the number of
failure-inducing inputs and, hopefully, eliminate some sensi-
tive information. Input-minimization techniques, however,
were not designed to specifically reduce sensitive inputs, so
they can only eliminate sensitive data by chance. In or-
der for techniques that leverage captured field information
to become widely adopted and achieve their full potential,
new approaches for addressing privacy and security concerns
must be developed.
In this paper, we present a novel technique that addresses
privacy and security concerns by sanitizing information cap-
tured from deployed applications. Our technique is designed
to be used in conjunction with an execution capture/replay
technique (e.g., [4, 14]). Given an execution recording that
contains a captured failure-inducing input I = i1, i2, . . . in⇥
and terminates with a failure f, our technique replays the
execution recording and leverages a specialized version of
symbolic-execution to automatically produce I , a sanitized
version of I, such that I (1) still causes f and (2) reveals as
little information about I as possible. A modified execution
recording where I replaces I can then be constructed and
sent to the developers, who can use it to debug f.
It is, in general, impossible to construct I such that it
does not reveal any information about I while still caus-
ing the same failure f. Typically, the execution of f would
depend on the fact that some elements of I have specific
values (e.g., i1 must be 0 for the failing path to be taken).
However, this fact does not prevent the technique from be-
ing useful in practice. In our evaluation, we found that the
information revealed by the sanitized inputs was not sensi-
tive and tended to be structural in nature (e.g., a specific
portion of the input must be surrounded by double quotes).
Conversely, the parts of the inputs that were more likely to
be sensitive (e.g., values contained inside the double quotes)
were not revealed (see Section 4).
To evaluate the e ectiveness of our technique, we imple-
mented it in a prototype tool, called camouflage, and car-
ried out an empirical evaluation of 170 failure-inducing in-
1
CC 05 ICSE 05 ICSE 07 ISSTA 07 ASE 07 ISSTA 09 ICSE 10 Tech Rept
RESEARCH OVERVIEW
Efficient instrumentation
Jazz: A Tool for Demand-Driven Structural
Testing
Jonathan Misurda1
, Jim Clause1
, Juliya Reed1
, Bruce R. Childers1
, and Mary
Lou So a2
1
University of Pittsburgh, Pittsburgh PA 15260, USA,
{jmisurda,clausej,juliya,childers}@cs.pitt.edu
2
University of Virginia, Charlottesville VA 22904, USA,
soffa@cs.virginia.edu
Abstract. Software testing to produce reliable and robust software has
become vitally important. Testing is a process by which quality can be
assured through the collection of information about software. While test-
ing can improve software quality, current tools typically are inflexible
and have high overheads, making it a challenge to test large projects.
We describe a new scalable and flexible tool, called Jazz, that uses a
demand-driven structural testing approach. Jazz has a low overhead of
only 17.6% for branch testing.
1 Introduction
In the last several years, the importance of producing high quality and robust
software has become paramount. Testing is an important process to support
quality assurance by gathering information about the software being developed
or modified. It is, in general, extremely labor and resource intensive, accounting
for 50-60% of the total cost of software development [1]. The increased emphasis
on software quality and robustness mandates improved testing methodologies.
To test software, a number of techniques can be applied. One class of tech-
niques is structural testing, which checks that a given coverage criterion is sat-
isfied. For example, branch testing checks that a certain percentage of branches
are executed. Other structural tests include def-use testing in which pairs of
variable definitions and uses are checked for coverage and node testing in which
nodes in a program’s control flow graph are checked.
Unfortunately, structural testing is often hindered by the lack of scalable
and flexible tools. Current tools are not scalable in terms of both time and
memory, limiting the number and scope of the tests that can be applied to large
programs. These tools often modify the software binary to insert instrumentation
for testing. In this case, the tested version of the application is not the same
version that is shipped to customers and errors may remain. Testing tools are
usually inflexible and only implement certain types of testing. For example, many
tools implement branch testing, but do not implement node or def-use testing.
In this paper, we describe a new tool for structural testing, called Jazz, that
addresses these problems. Jazz uses a novel demand-driven technique to apply
ABSTRACT
Producing reliable and robust software has become one
of the most important software development concerns in
recent years. Testing is a process by which software
quality can be assured through the collection of infor-
mation. While testing can improve software reliability,
current tools typically are inflexible and have high over-
heads, making it challenging to test large software
projects. In this paper, we describe a new scalable and
flexible framework for testing programs with a novel
demand-driven approach based on execution paths to
implement test coverage. This technique uses dynamic
instrumentation on the binary code that can be inserted
and removed on-the-fly to keep performance and mem-
ory overheads low. We describe and evaluate implemen-
tations of the framework for branch, node and def-use
testing of Java programs. Experimental results for
branch testing show that our approach has, on average, a
1.6 speed up over static instrumentation and also uses
less memory.
Categories and Subject Descriptors
D.2.5. [Software Engineering]: Testing and Debug-
ging—Testing tools; D.3.3. [Programming Lan-
guages]: Language Constructs and Features—Program
instrumentation, run-time environments
General Terms
Experimentation, Measurement, Verification
Keywords
Testing, Code Coverage, Structural Testing, Demand-
Driven Instrumentation, Java Programming Language
1. INTRODUCTION
In the last several years, the importance of produc-
ing high quality and robust software has become para-
mount [15]. Testing is an important process to support
quality assurance by gathering information about the
behavior of the software being developed or modified. It
is, in general, extremely labor and resource intensive,
accounting for 50-60% of the total cost of software
development [17]. Given the importance of testing, it is
imperative that there are appropriate testing tools and
frameworks. In order to adequately test software, a
number of different testing techniques must be per-
formed. One class of testing techniques used extensively
is structural testing in which properties of the software
code are used to ensure a certain code coverage.Struc-
tural testing techniques include branch testing, node
testing, path testing, and def-use testing [6,7,8,17,19].
Typically, a testing tool targets one type of struc-
tural test, and the software unit is the program, file or
particular methods. In order to apply various structural
testing techniques, different tools must be used. If a tool
for a particular type of structural testing is not available,
the tester would need to either implement it or not use
that testing technique. The tester would also be con-
strained by the region of code to be tested, as deter-
mined by the tool implementor. For example, it may not
be possible for the tester to focus on a particular region
of code, such as a series of loops, complicated condi-
tionals, or particular variables if def-use testing is
desired. The user may want to have higher coverage on
frequently executed regions of code. Users may want to
define their own way of testing. For example, all
branches should be covered 10 times rather than once in
all loops.
In structural testing, instrumentation is placed at
certain code points (probes). Whenever such a program
point is reached, code that performs the function for the
test (payload) is executed. The probes in def-use testing
are dictated by the definitions and uses of variables and
the payload is to mark that a definition or use in a def-
use pair has been covered. Thus for each type of struc-
tural testing, there is a testing “plan”. A test plan is a
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies
are not made or distributed for profit or commercial advantage and
that copies bear this notice and the full citation on the first page. To
copy otherwise, or republish, to post on servers or to redistribute to
lists, requires prior specific permission and/or a fee.
ICSE'05, May 15-21, 2005, St. Louis, Missouri, USA.
Copyright 2005 ACM 1-58113-963-2/05/0005...$5.00.
Demand-Driven Structural Testing with Dynamic
Instrumentation
Jonathan Misurda†, James A. Clause†, Juliya L. Reed†, Bruce R. Childers†, and
Mary Lou Soffa‡
†Department of Computer Science
University of Pittsburgh
Pittsburgh, Pennsylvania 15260
{jmisurda, clausej, juliya, childers}@cs.pitt.edu
‡Department of Computer Science
University of Virginia
Charlottesville, Virginia 22904
soffa@cs.virginia.edu
156
A Technique for Enabling and Supporting Debugging of Field Failures
James Clause and Alessandro Orso
College of Computing
Georgia Institute of Technology
{clause, orso}@cc.gatech.edu
Abstract
It is difficult to fully assess the quality of software in-
house, outside the actual time and context in which it will
execute after deployment. As a result, it is common for
software to manifest field failures, failures that occur on
user machines due to untested behavior. Field failures are
typically difficult to recreate and investigate on developer
platforms, and existing techniques based on crash report-
ing provide only limited support for this task. In this pa-
per, we present a technique for recording, reproducing, and
minimizing failing executions that enables and supports in-
house debugging of field failures. We also present a tool
that implements our technique and an empirical study that
evaluates the technique on a widely used e-mail client.
1. Introduction
Quality-assurance activities, such as software testing and
analysis, are notoriously difficult, expensive, and time-
consuming. As a result, software products are often re-
leased with faults or missing functionality. In fact, real-
world examples of field failures experienced by users be-
cause of untested behaviors (e.g., due to unforeseen us-
ages), are countless. When field failures occur, it is im-
portant for developers to be able to recreate and investigate
them in-house. This pressing need is demonstrated by the
emergence of several crash-reporting systems, such as Mi-
crosoft’s error reporting systems [13] and Apple’s Crash
Reporter [1]. Although these techniques represent a first
important step in addressing the limitations of purely in-
house approaches to quality assurance, they work on lim-
ited data (typically, a snapshot of the execution state) and
can at best identify correlations between a crash report and
data on other known failures.
In this paper, we present a novel technique for reproduc-
ing and investigating field failures that addresses the limita-
tions of existing approaches. Our technique works in three
phases, intuitively illustrated by the scenario in Figure 1. In
the recording phase, while users run the software, the tech-
nique intercepts and logs the interactions between applica-
tion and environment and records portions of the environ-
ment that are relevant to these interactions. If the execution
terminates with a failure, the produced execution recording
is stored for later investigation. In the minimization phase,
using free cycles on the user machines, the technique re-
plays the recorded failing executions with the goal of au-
tomatically eliminating parts of the executions that are not
relevant to the failure. In the replay and debugging phase,
developers can use the technique to replay the minimized
failing executions and investigate the cause of the failures
(e.g., within a debugger). Being able to replay and debug
real field failures can give developers unprecedented insight
into the behavior of their software after deployment and op-
portunities to improve the quality of their software in ways
that were not possible before.
To evaluate our technique, we implemented it in a proto-
type tool, called ADDA (Automated Debugging of Deployed
Applications), and used the tool to perform an empirical
study. The study was performed on PINE [19], a widely-
used e-mail client, and involved the investigation of failures
caused by two real faults in PINE. The results of the study
are promising. Our technique was able to (1) record all ex-
ecutions of PINE (and two other subjects) with a low time
and space overhead, (2) completely replay all recorded exe-
cutions, and (3) perform automated minimization of failing
executions and obtain shorter executions that manifested the
same failures as the original executions. Moreover, we were
able to replay the minimized executions within a debugger,
which shows that they could have actually been used to in-
vestigate the failures.
The contributions of this paper are:
• A novel technique for recording and later replaying exe-
cutions of deployed programs.
• An approach for minimizing failing executions and gen-
erating shorter executions that fail for the same reasons.
• A prototype tool that implements our technique.
• An empirical study that shows the feasibility and effec-
tiveness of the approach.
29th International Conference on Software Engineering (ICSE'07)
0-7695-2828-7/07 $20.00 © 2007
Dytan: A Generic Dynamic Taint Analysis Framework
James Clause, Wanchun Li, and Alessandro Orso
College of Computing
Georgia Institute of Technology
{clause|wli7|orso}@cc.gatech.edu
ABSTRACT
Dynamic taint analysis is gaining momentum. Techniques based
on dynamic tainting have been successfully used in the context of
application security, and now their use is also being explored in dif-
ferent areas, such as program understanding, software testing, and
debugging. Unfortunately, most existing approaches for dynamic
tainting are defined in an ad-hoc manner, which makes it difficult
to extend them, experiment with them, and adapt them to new con-
texts. Moreover, most existing approaches are focused on data-flow
based tainting only and do not consider tainting due to control flow,
which limits their applicability outside the security domain. To
address these limitations and foster experimentation with dynamic
tainting techniques, we defined and developed a general framework
for dynamic tainting that (1) is highly flexible and customizable, (2)
allows for performing both data-flow and control-flow based taint-
ing conservatively, and (3) does not rely on any customized run-
time system. We also present DYTAN, an implementation of our
framework that works on x86 executables, and a set of preliminary
studies that show how DYTAN can be used to implement different
tainting-based approaches with limited effort. In the studies, we
also show that DYTAN can be used on real software, by using FIRE-
FOX as one of our subjects, and illustrate how the specific char-
acteristics of the tainting approach used can affect efficiency and
accuracy of the taint analysis, which further justifies the use of our
framework to experiment with different variants of an approach.
Categories and Subject Descriptors: D.2.5 [Software Engineer-
ing]: Testing and Debugging;
General Terms: Experimentation, Security
Keywords: Dynamic tainting, information flow, general framework
1. INTRODUCTION
Dynamic taint analysis (also known as dynamic information flow
analysis) consists, intuitively, in marking and tracking certain data
in a program at run-time. This type of dynamic analysis is be-
coming increasingly popular. In the context of application secu-
rity, dynamic-tainting approaches have been successfully used to
prevent a wide range of attacks, including buffer overruns (e.g., [8,
17]), format string attacks (e.g., [17, 21]), SQL and command in-
jections (e.g., [7, 19]), and cross-site scripting (e.g., [18]). More
recently, researchers have started to investigate the use of tainting-
based approaches in domains other than security, such as program
understanding, software testing, and debugging (e.g., [11, 13]).
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.
ISSTA’07, July 9–12, 2007, London, England, United Kingdom.
Copyright 2007 ACM 978-1-59593-734-6/07/0007 ...$5.00.
Unfortunately, most existing techniques and tools for dynamic
taint analysis are defined in an ad-hoc manner, to target a specific
problem or a small class of problems. It would be difficult to ex-
tend or adapt such techniques and tools so that they can be used in
other contexts. In particular, most existing approaches are focused
on data-flow based tainting only, and do not consider tainting due
to the control flow within an application, which limits their general
applicability. Also, most existing techniques support either a sin-
gle taint marking or a small, fixed number of markings, which is
problematic in applications such as debugging. Finally, almost no
existing technique handles the propagation of taint markings in a
truly conservative way, which may be appropriate for the specific
applications considered, but is problematic in general. Because de-
veloping support for dynamic taint analysis is not only time con-
suming, but also fairly complex, this lack of flexibility and gener-
ality of existing tools and techniques is especially limiting for this
type of dynamic analysis.
To address these limitations and foster experimentation with dy-
namic tainting techniques, in this paper we present a framework for
dynamic taint analysis. We designed the framework to be general
and flexible, so that it allows for implementing different kinds of
techniques based on dynamic taint analysis with little effort. Users
can leverage the framework to quickly develop prototypes for their
techniques, experiment with them, and investigate trade-offs of dif-
ferent alternatives. For a simple example, the framework could be
used to investigate the cost effectiveness of considering different
types of taint propagation for an application.
Our framework has several advantages over existing approaches.
First, it is highly flexible and customizable. It allows for easily
specifying which program data should be tainted and how, how taint
markings should be propagated at run-time, and where and how
taint markings should be checked. Second, it allows for performing
data-flow and both data-flow and control-flow based tainting. Third,
from a more practical standpoint, it works on binaries, does not
need access to source code, and does not rely on any customized
hardware or operating system, which makes it broadly applicable.
We also present DYTAN, an implementation of our framework
that works on x86 binaries, and a set of preliminary studies per-
formed using DYTAN. In the first set of studies, we report on our
experience in using DYTAN to implement two tainting-based ap-
proaches presented in the literature. Although preliminary, our ex-
perience shows that we were able to implement these approaches
completely and with little effort. The second set of studies illus-
trates how the specific characteristics of a tainting approach can
affect efficiency and accuracy of the taint analysis. In particular, we
investigate how ignoring control-flow related propagation and over-
looking some data-flow aspects can lead to unsafety. These results
further justify the usefulness of experimenting with different varia-
tions of dynamic taint analysis and assessing their tradeoffs, which
can be done with limited effort using our framework. The second
set of studies also shows the practical applicability of DYTAN, by
successfully running it on the FIREFOX web browser.
196
Effective Memory Protection Using Dynamic Tainting
James Clause, Ioannis Doudalis, Alessandro Orso, and Milos Prvulovic
College of Computing
Georgia Institute of Technology
{clause|idoud|orso|milos}@cc.gatech.edu
ABSTRACT
Programs written in languages that provide direct access to memory
through pointers often contain memory-related faults, which may
cause non-deterministic failures and even security vulnerabilities.
In this paper, we present a new technique based on dynamic taint-
ing for protecting programs from illegal memory accesses. When
memory is allocated, at runtime, our technique taints both the mem-
ory and the corresponding pointer using the same taint mark. Taint
marks are then suitably propagated while the program executes and
are checked every time a memory address m is accessed through a
pointer p; if the taint marks associated with m and p differ, the ex-
ecution is stopped and the illegal access is reported. To allow for a
low-overhead, hardware-assisted implementation of the approach,
we make several key technical and engineering decisions in the
definition of our technique. In particular, we use a configurable,
low number of reusable taint marks instead of a unique mark for
each area of memory allocated, which reduces the overhead of the
approach without limiting its flexibility and ability to target most
memory-related faults and attacks known to date. We also define
the technique at the binary level, which lets us handle the (very)
common case of applications that use third-party libraries whose
source code is unavailable. To investigate the effectiveness and
practicality of our approach, we implemented it for heap-allocated
memory and performed a preliminary empirical study on a set of
programs. Our results show that (1) our technique can identify a
large class of memory-related faults, even when using only two
unique taint marks, and (2) a hardware-assisted implementation of
the technique could achieve overhead in the single digits.
Categories and Subject Descriptors: D.2.5 [Software Engineering]: Test-
ing and Debugging; C.0 [General]: Hardware/Software Interfaces;
General Terms: Performance, Security
Keywords: Illegal memory accesses, dynamic tainting, hardware support
1. INTRODUCTION
Memory-related faults are a serious problem for languages that
allow direct memory access through pointers. An important class
of memory-related faults are what we call illegal memory accesses.
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.
ASE’07, November 5–9, 2007, Atlanta, Georgia, USA.
Copyright 2007 ACM 978-1-59593-882-4/07/0011 ...$5.00.
In languages such as C and C++, when memory allocation is re-
quested, a currently-free area of memory m of the specified size
is reserved. After m has been allocated, its initial address can be
assigned to a pointer p, either immediately (e.g., in the case of
heap allocated memory) or at a later time (e.g., when retrieving
and storing the address of a local variable). From that point on,
the only legal accesses to m through a pointer are accesses per-
formed through p or through other pointers derived from p. (In
Section 3, we clearly define what it means to derive a pointer from
another pointer.) All other accesses to m are Illegal Memory Ac-
cesses (IMAs), that is, accesses where a pointer is used to access
memory outside the bounds of the memory area with which it was
originally associated.
IMAs are especially relevant for several reasons. First, they are
caused by typical programming errors, such as array-out-of-bounds
accesses and NULL pointer dereferences, and are thus widespread
and common. Second, they often result in non-deterministic fail-
ures that are hard to identify and diagnose; the specific effects of an
IMA depend on several factors, such as memory layout, that may
vary between executions. Finally, many security concerns such as
viruses, worms, and rootkits use IMAs as their injection vectors.
In this paper, we present a new dynamic technique for protecting
programs against IMAs that is effective against most known types
of illegal accesses. The basic idea behind the technique is to use
dynamic tainting (or dynamic information flow) [8] to keep track
of which memory areas can be accessed through which pointers,
as follows. At runtime, our technique taints both allocated mem-
ory and pointers using taint marks. Dynamic taint propagation, to-
gether with a suitable handling of memory-allocation and deallo-
cation operations, ensures that taint marks are appropriately prop-
agated during execution. Every time the program accesses some
memory through a pointer, our technique checks whether the ac-
cess is legal by comparing the taint mark associated with the mem-
ory and the taint mark associated with the pointer used to access it.
If the marks match, the access is considered legitimate. Otherwise,
the execution is stopped and an IMA is reported.
In defining our approach, our final goal is the development of a
low-overhead, hardware-assisted tool that is practical and can be
used on deployed software. A hardware-assisted tool is a tool that
leverages the benefits of both hardware and software. Typically,
some performance critical aspects are moved to the hardware to
achieve maximum efficiency, while software is used to perform op-
erations that would be too complex to implement in hardware.
There are two main characteristics of our approach that were de-
fined to help achieve our goal of a hardware-assisted implementa-
tion. The first characteristic is that our technique only uses a small,
configurable number of reusable taint marks instead of a unique
mark for each area of memory allocated. Using a low number of
283
Penumbra: Automatically Identifying Failure-Relevant
Inputs Using Dynamic Tainting
James Clause
College of Computing
Georgia Institute of Technology
clause@cc.gatech.edu
Alessandro Orso
College of Computing
Georgia Institute of Technology
orso@cc.gatech.edu
ABSTRACT
Most existing automated debugging techniques focus on re-
ducing the amount of code to be inspected and tend to ig-
nore an important component of software failures: the in-
puts that cause the failure to manifest. In this paper, we
present a new technique based on dynamic tainting for au-
tomatically identifying subsets of a program’s inputs that
are relevant to a failure. The technique (1) marks program
inputs when they enter the application, (2) tracks them as
they propagate during execution, and (3) identifies, for an
observed failure, the subset of inputs that are potentially
relevant for debugging that failure. To investigate feasibil-
ity and usefulness of our technique, we created a prototype
tool, penumbra, and used it to evaluate our technique on
several failures in real programs. Our results are promising,
as they show that penumbra can point developers to inputs
that are actually relevant for investigating a failure and can
be more practical than existing alternative approaches.
Categories and Subject Descriptors
D.2.5 [Software Engineering]: Testing and Debugging
General Terms
Algorithms, Experimentation, Reliability
Keywords
Failure-relevant inputs, automated debugging, dynamic in-
formation flow, dynamic tainting
1. INTRODUCTION
Debugging is known to be a labor-intensive, time-consum-
ing task that can be responsible for a large portion of soft-
ware development and maintenance costs [21,23]. Common
characteristics of modern software, such as increased con-
figurability, larger code bases, and increased input sizes, in-
troduce new challenges for debugging and exacerbate exist-
ing problems. In response, researchers have proposed many
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.
ISSTA’09, July 19–23, 2009, Chicago, Illinois, USA.
Copyright 2009 ACM 978-1-60558-338-9/09/07 ...$5.00.
semi- and fully-automated techniques that attempt to re-
duce the cost of debugging (e.g., [8,9,11–13,18,24,25,27]).
The majority of these techniques are code-centric in that
they focus exclusively on one aspect of debugging—trying
to identify the faulty statements responsible for a failure.
Although code-centric approaches can work well in some
cases (e.g., for isolated faults that involve a single state-
ment), they are often inadequate for more complex faults [4].
Faults of omission, for instance, where part of a specification
has not been implemented, are notoriously problematic for
debugging techniques that attempt to identify potentially
faulty statements. The usefulness of code-centric techniques
is also limited in the case of long-running programs and pro-
grams that process large amounts of information; failures in
these types of programs are typically di⌅cult to understand
without considering the data involved in such failures.
To debug failures more e ectively, it is necessary to pro-
vide developers with not only a relevant subset of state-
ments, but also a relevant subset of inputs. There are only
a few existing techniques that attempt to identify relevant
inputs [3, 17, 25], with delta debugging [25] being the most
known of these. Although delta debugging has been shown
to be an e ective technique for automatic debugging, it also
has several drawbacks that may limit its usefulness in prac-
tice. In particular, it requires (1) multiple executions of the
program being debugged, which can involve a long running
time, and (2) complex oracles and setup, which can result
in a large amount of manual e ort [2].
In this paper, we present a novel debugging technique that
addresses many of the limitations of existing approaches.
Our technique can complement code-centric debugging tech-
niques because it focuses on identifying program inputs that
are likely to be relevant for a given failure. It also overcomes
some of the drawbacks of delta debugging because it needs
a single execution to identify failure-relevant inputs and re-
quires minimal manual e ort.
Given an observable faulty behavior and a set of failure-
inducing inputs (i.e., a set of inputs that cause such behav-
ior), our technique automatically identifies failure-relevant
inputs (i.e., a subset of failure-inducing inputs that are ac-
tually relevant for investigating the faulty behavior). Our
approach is based on dynamic tainting. Intuitively, the tech-
nique works by tracking the flow of inputs along data and
control dependences at runtime. When a point of failure
is reached, the tracked information is used to identify and
present to developers the failure-relevant inputs. At this
point, developers can use the identified inputs to investigate
the failure at hand.
LEAKPOINT: Pinpointing the Causes of Memory Leaks
James Clause
College of Computing
Georgia Institute of Technology
clause@cc.gatech.edu
Alessandro Orso
College of Computing
Georgia Institute of Technology
orso@cc.gatech.edu
ABSTRACT
Most existing leak detection techniques for C and C++ applications
only detect the existence of memory leaks. They do not provide
any help for fixing the underlying memory management errors. In
this paper, we present a new technique that not only detects leaks,
but also points developers to the locations where the underlying
errors may be fixed. Our technique tracks pointers to dynamically-
allocated areas of memory and, for each memory area, records sev-
eral pieces of relevant information. This information is used to
identify the locations in an execution where memory leaks occur.
To investigate our technique’s feasibility and usefulness, we devel-
oped a prototype tool called LEAKPOINT and used it to perform
an empirical evaluation. The results of this evaluation show that
LEAKPOINT detects at least as many leaks as existing tools, reports
zero false positives, and, most importantly, can be effective at help-
ing developers fix the underlying memory management errors.
Categories and Subject Descriptors
D.2.5 [Software Engineering]: Testing and Debugging
General Terms
Performance, Reliability
Keywords
Leak detection, Dynamic tainting
1. INTRODUCTION
Memory leaks are a type of unintended memory consumption
that can adversely impact the performance and correctness of an
application. In programs written in languages such as C and C++,
memory is allocated using allocation functions, such as malloc
and new. Allocation functions reserve a currently free area of
memory m and return a pointer p that points to m’s starting ad-
dress. Typically, the program stores and then uses p, or another
This work was supported in part by NSF awards CCF-0725202
and CCF-0541080 to Georgia Tech.
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.
ICSE ’10, May 2-8 2010, Cape Town, South Africa
Copyright 2010 ACM 978-1-60558-719-6/10/05 ...$10.00.
pointer derived from p, to interact with m. When m is no longer
needed, the program should pass p to a deallocation function (e.g.,
free or delete) to deallocate m. A leak occurs if, due to a
memory management error, m is not deallocated at the appropri-
ate time. There are two types of memory leaks: lost memory and
forgotten memory. Lost memory refers to the situation where m be-
comes unreachable (i.e., the program overwrites or loses p and all
pointers derived from p) without first being deallocated. Forgotten
memory refers to the situation where m remains reachable but is
not deallocated or accessed in the rest of the execution.
Memory leaks are relevant for several reasons. First, they are dif-
ficult to detect. Unlike many other types of failures, memory leaks
do not immediately produce an easily visible symptom (e.g., a crash
or the output of a wrong value); typically, leaks remain unobserved
until they consume a large portion of the memory available to a sys-
tem. Second, leaks have the potential to impact not only the appli-
cation that leaks memory, but also every other application running
on the system; because the overall amount of memory is limited,
as the memory usage of a leaking program increases, less memory
is available to other running applications. Consequently, the per-
formance and correctness of every running application can be im-
pacted by a program that leaks memory. Third, leaks are common,
even in mature applications. For example, in the first half of 2009,
over 100 leaks in the Firefox web-browser were reported [18].
Because of the serious consequences and common occurrence of
memory leaks, researchers have created many static and dynamic
techniques for detecting them (e.g., [1,2,4,7–14,16,17,20–23,25,
27,28]). The adoption of static techniques has been limited by sev-
eral factors, including the lack of scalable, precise heap modeling.
Dynamic techniques are therefore more widely used in practice. In
general, dynamic techniques provide one main piece of informa-
tion: the location in an execution where a leaked area of memory is
allocated. This location is supposed to serve as a starting point for
investigating the leak. However, in many situations, this informa-
tion does not provide any insight on where or how to fix the mem-
ory management error that causes the leak: the allocation location
and the location of the memory management error are typically in
completely different parts of the application’s code.
To address this limitation of existing approaches, we propose
a new memory leak detection technique. Our technique provides
the same information as existing techniques but also identifies the
locations in an execution where leaks occur. In the case of lost
memory, the location is defined as the point in an execution where
the last pointer to an unallocated memory area is lost or overwritten.
In the case of forgotten memory, the location is defined as the last
point in an execution where a pointer to a leaked area of memory
was used (e.g., when it is dereferenced to read or write memory,
passed as a function argument, returned from a function, or used as
Camouflage: Automated Sanitization of Field Data
James Clause
College of Computing
Georgia Institute of Technology
clause@cc.gatech.edu
Alessandro Orso
College of Computing
Georgia Institute of Technology
orso@cc.gatech.edu
ABSTRACT
Privacy and security concerns have adversely a ected the
usefulness of many types of techniques that leverage infor-
mation gathered from deployed applications. To address this
issue, we present a new approach for automatically sanitiz-
ing failure-inducing inputs. Given an input I that causes
a failure f, our technique can generate a sanitized input I
that is di erent from I but still causes f. I can then be sent
to the developers to help them debug f, without revealing
the possibly sensitive information contained in I. We im-
plemented our approach in a prototype tool, camouflage,
and performed an empirical evaluation. In the evaluation,
we applied camouflage to a large set of failure-inducing
inputs for several real applications. The results of the eval-
uation are promising; they show that camouflage is both
practical and e ective at generating sanitized inputs. In par-
ticular, for the inputs that we considered, I and I shared
no sensitive information.
1. INTRODUCTION
Investigating techniques that capture data from deployed
applications to support in-house software engineering tasks
is an increasingly active and successful area of research (e.g.,
[1,3–5,13,14,17,21,22,26,27,29]). However, privacy and se-
curity concerns have prevented widespread adoption of many
of these techniques and, because they rely on user partici-
pation, have ultimately limited their usefulness. Many of
the earlier proposed techniques attempt to sidestep these
concerns by collecting only limited amounts of information
(e.g., stack traces and register dumps [1, 3, 5] or sampled
branch profiles [26,27]) and providing a privacy policy that
specifies how the information will be used (e.g., [2,8]). Be-
cause the types of information collected by these techniques
are unlikely to be sensitive, users are more willing to trust
developers. Moreover, because only a small amount of infor-
mation is collected, it is feasible for users to manually inspect
and sanitize such information before it is sent to developers.
Unfortunately, recent research has shown that the e ec-
tiveness of these techniques increases when they can lever-
age large amounts of detailed information (e.g., complete
execution recordings [4, 14] or path profiles [13, 24]). Since
more detailed information is bound to contain sensitive data,
users will most likely be unwilling to let developers collect
such information. In addition, collecting large amounts of
information would make it infeasible for users to sanitize
the collected information by hand. To address this prob-
lem, some of these techniques suggest using an input mini-
mization approach (e.g., [6, 7, 35]) to reduce the number of
failure-inducing inputs and, hopefully, eliminate some sensi-
tive information. Input-minimization techniques, however,
were not designed to specifically reduce sensitive inputs, so
they can only eliminate sensitive data by chance. In or-
der for techniques that leverage captured field information
to become widely adopted and achieve their full potential,
new approaches for addressing privacy and security concerns
must be developed.
In this paper, we present a novel technique that addresses
privacy and security concerns by sanitizing information cap-
tured from deployed applications. Our technique is designed
to be used in conjunction with an execution capture/replay
technique (e.g., [4, 14]). Given an execution recording that
contains a captured failure-inducing input I = i1, i2, . . . in⇥
and terminates with a failure f, our technique replays the
execution recording and leverages a specialized version of
symbolic-execution to automatically produce I , a sanitized
version of I, such that I (1) still causes f and (2) reveals as
little information about I as possible. A modified execution
recording where I replaces I can then be constructed and
sent to the developers, who can use it to debug f.
It is, in general, impossible to construct I such that it
does not reveal any information about I while still caus-
ing the same failure f. Typically, the execution of f would
depend on the fact that some elements of I have specific
values (e.g., i1 must be 0 for the failing path to be taken).
However, this fact does not prevent the technique from be-
ing useful in practice. In our evaluation, we found that the
information revealed by the sanitized inputs was not sensi-
tive and tended to be structural in nature (e.g., a specific
portion of the input must be surrounded by double quotes).
Conversely, the parts of the inputs that were more likely to
be sensitive (e.g., values contained inside the double quotes)
were not revealed (see Section 4).
To evaluate the e ectiveness of our technique, we imple-
mented it in a prototype tool, called camouflage, and car-
ried out an empirical evaluation of 170 failure-inducing in-
1
CC 05 ICSE 05 ICSE 07 ISSTA 07 ASE 07 ISSTA 09 ICSE 10 Tech Rept
Dynamic tainting
based analyses
RESEARCH OVERVIEW
Efficient instrumentation
Jazz: A Tool for Demand-Driven Structural
Testing
Jonathan Misurda1
, Jim Clause1
, Juliya Reed1
, Bruce R. Childers1
, and Mary
Lou So a2
1
University of Pittsburgh, Pittsburgh PA 15260, USA,
{jmisurda,clausej,juliya,childers}@cs.pitt.edu
2
University of Virginia, Charlottesville VA 22904, USA,
soffa@cs.virginia.edu
Abstract. Software testing to produce reliable and robust software has
become vitally important. Testing is a process by which quality can be
assured through the collection of information about software. While test-
ing can improve software quality, current tools typically are inflexible
and have high overheads, making it a challenge to test large projects.
We describe a new scalable and flexible tool, called Jazz, that uses a
demand-driven structural testing approach. Jazz has a low overhead of
only 17.6% for branch testing.
1 Introduction
In the last several years, the importance of producing high quality and robust
software has become paramount. Testing is an important process to support
quality assurance by gathering information about the software being developed
or modified. It is, in general, extremely labor and resource intensive, accounting
for 50-60% of the total cost of software development [1]. The increased emphasis
on software quality and robustness mandates improved testing methodologies.
To test software, a number of techniques can be applied. One class of tech-
niques is structural testing, which checks that a given coverage criterion is sat-
isfied. For example, branch testing checks that a certain percentage of branches
are executed. Other structural tests include def-use testing in which pairs of
variable definitions and uses are checked for coverage and node testing in which
nodes in a program’s control flow graph are checked.
Unfortunately, structural testing is often hindered by the lack of scalable
and flexible tools. Current tools are not scalable in terms of both time and
memory, limiting the number and scope of the tests that can be applied to large
programs. These tools often modify the software binary to insert instrumentation
for testing. In this case, the tested version of the application is not the same
version that is shipped to customers and errors may remain. Testing tools are
usually inflexible and only implement certain types of testing. For example, many
tools implement branch testing, but do not implement node or def-use testing.
In this paper, we describe a new tool for structural testing, called Jazz, that
addresses these problems. Jazz uses a novel demand-driven technique to apply
ABSTRACT
Producing reliable and robust software has become one
of the most important software development concerns in
recent years. Testing is a process by which software
quality can be assured through the collection of infor-
mation. While testing can improve software reliability,
current tools typically are inflexible and have high over-
heads, making it challenging to test large software
projects. In this paper, we describe a new scalable and
flexible framework for testing programs with a novel
demand-driven approach based on execution paths to
implement test coverage. This technique uses dynamic
instrumentation on the binary code that can be inserted
and removed on-the-fly to keep performance and mem-
ory overheads low. We describe and evaluate implemen-
tations of the framework for branch, node and def-use
testing of Java programs. Experimental results for
branch testing show that our approach has, on average, a
1.6 speed up over static instrumentation and also uses
less memory.
Categories and Subject Descriptors
D.2.5. [Software Engineering]: Testing and Debug-
ging—Testing tools; D.3.3. [Programming Lan-
guages]: Language Constructs and Features—Program
instrumentation, run-time environments
General Terms
Experimentation, Measurement, Verification
Keywords
Testing, Code Coverage, Structural Testing, Demand-
Driven Instrumentation, Java Programming Language
1. INTRODUCTION
In the last several years, the importance of produc-
ing high quality and robust software has become para-
mount [15]. Testing is an important process to support
quality assurance by gathering information about the
behavior of the software being developed or modified. It
is, in general, extremely labor and resource intensive,
accounting for 50-60% of the total cost of software
development [17]. Given the importance of testing, it is
imperative that there are appropriate testing tools and
frameworks. In order to adequately test software, a
number of different testing techniques must be per-
formed. One class of testing techniques used extensively
is structural testing in which properties of the software
code are used to ensure a certain code coverage.Struc-
tural testing techniques include branch testing, node
testing, path testing, and def-use testing [6,7,8,17,19].
Typically, a testing tool targets one type of struc-
tural test, and the software unit is the program, file or
particular methods. In order to apply various structural
testing techniques, different tools must be used. If a tool
for a particular type of structural testing is not available,
the tester would need to either implement it or not use
that testing technique. The tester would also be con-
strained by the region of code to be tested, as deter-
mined by the tool implementor. For example, it may not
be possible for the tester to focus on a particular region
of code, such as a series of loops, complicated condi-
tionals, or particular variables if def-use testing is
desired. The user may want to have higher coverage on
frequently executed regions of code. Users may want to
define their own way of testing. For example, all
branches should be covered 10 times rather than once in
all loops.
In structural testing, instrumentation is placed at
certain code points (probes). Whenever such a program
point is reached, code that performs the function for the
test (payload) is executed. The probes in def-use testing
are dictated by the definitions and uses of variables and
the payload is to mark that a definition or use in a def-
use pair has been covered. Thus for each type of struc-
tural testing, there is a testing “plan”. A test plan is a
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies
are not made or distributed for profit or commercial advantage and
that copies bear this notice and the full citation on the first page. To
copy otherwise, or republish, to post on servers or to redistribute to
lists, requires prior specific permission and/or a fee.
ICSE'05, May 15-21, 2005, St. Louis, Missouri, USA.
Copyright 2005 ACM 1-58113-963-2/05/0005...$5.00.
Demand-Driven Structural Testing with Dynamic
Instrumentation
Jonathan Misurda†, James A. Clause†, Juliya L. Reed†, Bruce R. Childers†, and
Mary Lou Soffa‡
†Department of Computer Science
University of Pittsburgh
Pittsburgh, Pennsylvania 15260
{jmisurda, clausej, juliya, childers}@cs.pitt.edu
‡Department of Computer Science
University of Virginia
Charlottesville, Virginia 22904
soffa@cs.virginia.edu
156
A Technique for Enabling and Supporting Debugging of Field Failures
James Clause and Alessandro Orso
College of Computing
Georgia Institute of Technology
{clause, orso}@cc.gatech.edu
Abstract
It is difficult to fully assess the quality of software in-
house, outside the actual time and context in which it will
execute after deployment. As a result, it is common for
software to manifest field failures, failures that occur on
user machines due to untested behavior. Field failures are
typically difficult to recreate and investigate on developer
platforms, and existing techniques based on crash report-
ing provide only limited support for this task. In this pa-
per, we present a technique for recording, reproducing, and
minimizing failing executions that enables and supports in-
house debugging of field failures. We also present a tool
that implements our technique and an empirical study that
evaluates the technique on a widely used e-mail client.
1. Introduction
Quality-assurance activities, such as software testing and
analysis, are notoriously difficult, expensive, and time-
consuming. As a result, software products are often re-
leased with faults or missing functionality. In fact, real-
world examples of field failures experienced by users be-
cause of untested behaviors (e.g., due to unforeseen us-
ages), are countless. When field failures occur, it is im-
portant for developers to be able to recreate and investigate
them in-house. This pressing need is demonstrated by the
emergence of several crash-reporting systems, such as Mi-
crosoft’s error reporting systems [13] and Apple’s Crash
Reporter [1]. Although these techniques represent a first
important step in addressing the limitations of purely in-
house approaches to quality assurance, they work on lim-
ited data (typically, a snapshot of the execution state) and
can at best identify correlations between a crash report and
data on other known failures.
In this paper, we present a novel technique for reproduc-
ing and investigating field failures that addresses the limita-
tions of existing approaches. Our technique works in three
phases, intuitively illustrated by the scenario in Figure 1. In
the recording phase, while users run the software, the tech-
nique intercepts and logs the interactions between applica-
tion and environment and records portions of the environ-
ment that are relevant to these interactions. If the execution
terminates with a failure, the produced execution recording
is stored for later investigation. In the minimization phase,
using free cycles on the user machines, the technique re-
plays the recorded failing executions with the goal of au-
tomatically eliminating parts of the executions that are not
relevant to the failure. In the replay and debugging phase,
developers can use the technique to replay the minimized
failing executions and investigate the cause of the failures
(e.g., within a debugger). Being able to replay and debug
real field failures can give developers unprecedented insight
into the behavior of their software after deployment and op-
portunities to improve the quality of their software in ways
that were not possible before.
To evaluate our technique, we implemented it in a proto-
type tool, called ADDA (Automated Debugging of Deployed
Applications), and used the tool to perform an empirical
study. The study was performed on PINE [19], a widely-
used e-mail client, and involved the investigation of failures
caused by two real faults in PINE. The results of the study
are promising. Our technique was able to (1) record all ex-
ecutions of PINE (and two other subjects) with a low time
and space overhead, (2) completely replay all recorded exe-
cutions, and (3) perform automated minimization of failing
executions and obtain shorter executions that manifested the
same failures as the original executions. Moreover, we were
able to replay the minimized executions within a debugger,
which shows that they could have actually been used to in-
vestigate the failures.
The contributions of this paper are:
• A novel technique for recording and later replaying exe-
cutions of deployed programs.
• An approach for minimizing failing executions and gen-
erating shorter executions that fail for the same reasons.
• A prototype tool that implements our technique.
• An empirical study that shows the feasibility and effec-
tiveness of the approach.
29th International Conference on Software Engineering (ICSE'07)
0-7695-2828-7/07 $20.00 © 2007
Dytan: A Generic Dynamic Taint Analysis Framework
James Clause, Wanchun Li, and Alessandro Orso
College of Computing
Georgia Institute of Technology
{clause|wli7|orso}@cc.gatech.edu
ABSTRACT
Dynamic taint analysis is gaining momentum. Techniques based
on dynamic tainting have been successfully used in the context of
application security, and now their use is also being explored in dif-
ferent areas, such as program understanding, software testing, and
debugging. Unfortunately, most existing approaches for dynamic
tainting are defined in an ad-hoc manner, which makes it difficult
to extend them, experiment with them, and adapt them to new con-
texts. Moreover, most existing approaches are focused on data-flow
based tainting only and do not consider tainting due to control flow,
which limits their applicability outside the security domain. To
address these limitations and foster experimentation with dynamic
tainting techniques, we defined and developed a general framework
for dynamic tainting that (1) is highly flexible and customizable, (2)
allows for performing both data-flow and control-flow based taint-
ing conservatively, and (3) does not rely on any customized run-
time system. We also present DYTAN, an implementation of our
framework that works on x86 executables, and a set of preliminary
studies that show how DYTAN can be used to implement different
tainting-based approaches with limited effort. In the studies, we
also show that DYTAN can be used on real software, by using FIRE-
FOX as one of our subjects, and illustrate how the specific char-
acteristics of the tainting approach used can affect efficiency and
accuracy of the taint analysis, which further justifies the use of our
framework to experiment with different variants of an approach.
Categories and Subject Descriptors: D.2.5 [Software Engineer-
ing]: Testing and Debugging;
General Terms: Experimentation, Security
Keywords: Dynamic tainting, information flow, general framework
1. INTRODUCTION
Dynamic taint analysis (also known as dynamic information flow
analysis) consists, intuitively, in marking and tracking certain data
in a program at run-time. This type of dynamic analysis is be-
coming increasingly popular. In the context of application secu-
rity, dynamic-tainting approaches have been successfully used to
prevent a wide range of attacks, including buffer overruns (e.g., [8,
17]), format string attacks (e.g., [17, 21]), SQL and command in-
jections (e.g., [7, 19]), and cross-site scripting (e.g., [18]). More
recently, researchers have started to investigate the use of tainting-
based approaches in domains other than security, such as program
understanding, software testing, and debugging (e.g., [11, 13]).
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.
ISSTA’07, July 9–12, 2007, London, England, United Kingdom.
Copyright 2007 ACM 978-1-59593-734-6/07/0007 ...$5.00.
Unfortunately, most existing techniques and tools for dynamic
taint analysis are defined in an ad-hoc manner, to target a specific
problem or a small class of problems. It would be difficult to ex-
tend or adapt such techniques and tools so that they can be used in
other contexts. In particular, most existing approaches are focused
on data-flow based tainting only, and do not consider tainting due
to the control flow within an application, which limits their general
applicability. Also, most existing techniques support either a sin-
gle taint marking or a small, fixed number of markings, which is
problematic in applications such as debugging. Finally, almost no
existing technique handles the propagation of taint markings in a
truly conservative way, which may be appropriate for the specific
applications considered, but is problematic in general. Because de-
veloping support for dynamic taint analysis is not only time con-
suming, but also fairly complex, this lack of flexibility and gener-
ality of existing tools and techniques is especially limiting for this
type of dynamic analysis.
To address these limitations and foster experimentation with dy-
namic tainting techniques, in this paper we present a framework for
dynamic taint analysis. We designed the framework to be general
and flexible, so that it allows for implementing different kinds of
techniques based on dynamic taint analysis with little effort. Users
can leverage the framework to quickly develop prototypes for their
techniques, experiment with them, and investigate trade-offs of dif-
ferent alternatives. For a simple example, the framework could be
used to investigate the cost effectiveness of considering different
types of taint propagation for an application.
Our framework has several advantages over existing approaches.
First, it is highly flexible and customizable. It allows for easily
specifying which program data should be tainted and how, how taint
markings should be propagated at run-time, and where and how
taint markings should be checked. Second, it allows for performing
data-flow and both data-flow and control-flow based tainting. Third,
from a more practical standpoint, it works on binaries, does not
need access to source code, and does not rely on any customized
hardware or operating system, which makes it broadly applicable.
We also present DYTAN, an implementation of our framework
that works on x86 binaries, and a set of preliminary studies per-
formed using DYTAN. In the first set of studies, we report on our
experience in using DYTAN to implement two tainting-based ap-
proaches presented in the literature. Although preliminary, our ex-
perience shows that we were able to implement these approaches
completely and with little effort. The second set of studies illus-
trates how the specific characteristics of a tainting approach can
affect efficiency and accuracy of the taint analysis. In particular, we
investigate how ignoring control-flow related propagation and over-
looking some data-flow aspects can lead to unsafety. These results
further justify the usefulness of experimenting with different varia-
tions of dynamic taint analysis and assessing their tradeoffs, which
can be done with limited effort using our framework. The second
set of studies also shows the practical applicability of DYTAN, by
successfully running it on the FIREFOX web browser.
196
Effective Memory Protection Using Dynamic Tainting
James Clause, Ioannis Doudalis, Alessandro Orso, and Milos Prvulovic
College of Computing
Georgia Institute of Technology
{clause|idoud|orso|milos}@cc.gatech.edu
ABSTRACT
Programs written in languages that provide direct access to memory
through pointers often contain memory-related faults, which may
cause non-deterministic failures and even security vulnerabilities.
In this paper, we present a new technique based on dynamic taint-
ing for protecting programs from illegal memory accesses. When
memory is allocated, at runtime, our technique taints both the mem-
ory and the corresponding pointer using the same taint mark. Taint
marks are then suitably propagated while the program executes and
are checked every time a memory address m is accessed through a
pointer p; if the taint marks associated with m and p differ, the ex-
ecution is stopped and the illegal access is reported. To allow for a
low-overhead, hardware-assisted implementation of the approach,
we make several key technical and engineering decisions in the
definition of our technique. In particular, we use a configurable,
low number of reusable taint marks instead of a unique mark for
each area of memory allocated, which reduces the overhead of the
approach without limiting its flexibility and ability to target most
memory-related faults and attacks known to date. We also define
the technique at the binary level, which lets us handle the (very)
common case of applications that use third-party libraries whose
source code is unavailable. To investigate the effectiveness and
practicality of our approach, we implemented it for heap-allocated
memory and performed a preliminary empirical study on a set of
programs. Our results show that (1) our technique can identify a
large class of memory-related faults, even when using only two
unique taint marks, and (2) a hardware-assisted implementation of
the technique could achieve overhead in the single digits.
Categories and Subject Descriptors: D.2.5 [Software Engineering]: Test-
ing and Debugging; C.0 [General]: Hardware/Software Interfaces;
General Terms: Performance, Security
Keywords: Illegal memory accesses, dynamic tainting, hardware support
1. INTRODUCTION
Memory-related faults are a serious problem for languages that
allow direct memory access through pointers. An important class
of memory-related faults are what we call illegal memory accesses.
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.
ASE’07, November 5–9, 2007, Atlanta, Georgia, USA.
Copyright 2007 ACM 978-1-59593-882-4/07/0011 ...$5.00.
In languages such as C and C++, when memory allocation is re-
quested, a currently-free area of memory m of the specified size
is reserved. After m has been allocated, its initial address can be
assigned to a pointer p, either immediately (e.g., in the case of
heap allocated memory) or at a later time (e.g., when retrieving
and storing the address of a local variable). From that point on,
the only legal accesses to m through a pointer are accesses per-
formed through p or through other pointers derived from p. (In
Section 3, we clearly define what it means to derive a pointer from
another pointer.) All other accesses to m are Illegal Memory Ac-
cesses (IMAs), that is, accesses where a pointer is used to access
memory outside the bounds of the memory area with which it was
originally associated.
IMAs are especially relevant for several reasons. First, they are
caused by typical programming errors, such as array-out-of-bounds
accesses and NULL pointer dereferences, and are thus widespread
and common. Second, they often result in non-deterministic fail-
ures that are hard to identify and diagnose; the specific effects of an
IMA depend on several factors, such as memory layout, that may
vary between executions. Finally, many security concerns such as
viruses, worms, and rootkits use IMAs as their injection vectors.
In this paper, we present a new dynamic technique for protecting
programs against IMAs that is effective against most known types
of illegal accesses. The basic idea behind the technique is to use
dynamic tainting (or dynamic information flow) [8] to keep track
of which memory areas can be accessed through which pointers,
as follows. At runtime, our technique taints both allocated mem-
ory and pointers using taint marks. Dynamic taint propagation, to-
gether with a suitable handling of memory-allocation and deallo-
cation operations, ensures that taint marks are appropriately prop-
agated during execution. Every time the program accesses some
memory through a pointer, our technique checks whether the ac-
cess is legal by comparing the taint mark associated with the mem-
ory and the taint mark associated with the pointer used to access it.
If the marks match, the access is considered legitimate. Otherwise,
the execution is stopped and an IMA is reported.
In defining our approach, our final goal is the development of a
low-overhead, hardware-assisted tool that is practical and can be
used on deployed software. A hardware-assisted tool is a tool that
leverages the benefits of both hardware and software. Typically,
some performance critical aspects are moved to the hardware to
achieve maximum efficiency, while software is used to perform op-
erations that would be too complex to implement in hardware.
There are two main characteristics of our approach that were de-
fined to help achieve our goal of a hardware-assisted implementa-
tion. The first characteristic is that our technique only uses a small,
configurable number of reusable taint marks instead of a unique
mark for each area of memory allocated. Using a low number of
283
Penumbra: Automatically Identifying Failure-Relevant
Inputs Using Dynamic Tainting
James Clause
College of Computing
Georgia Institute of Technology
clause@cc.gatech.edu
Alessandro Orso
College of Computing
Georgia Institute of Technology
orso@cc.gatech.edu
ABSTRACT
Most existing automated debugging techniques focus on re-
ducing the amount of code to be inspected and tend to ig-
nore an important component of software failures: the in-
puts that cause the failure to manifest. In this paper, we
present a new technique based on dynamic tainting for au-
tomatically identifying subsets of a program’s inputs that
are relevant to a failure. The technique (1) marks program
inputs when they enter the application, (2) tracks them as
they propagate during execution, and (3) identifies, for an
observed failure, the subset of inputs that are potentially
relevant for debugging that failure. To investigate feasibil-
ity and usefulness of our technique, we created a prototype
tool, penumbra, and used it to evaluate our technique on
several failures in real programs. Our results are promising,
as they show that penumbra can point developers to inputs
that are actually relevant for investigating a failure and can
be more practical than existing alternative approaches.
Categories and Subject Descriptors
D.2.5 [Software Engineering]: Testing and Debugging
General Terms
Algorithms, Experimentation, Reliability
Keywords
Failure-relevant inputs, automated debugging, dynamic in-
formation flow, dynamic tainting
1. INTRODUCTION
Debugging is known to be a labor-intensive, time-consum-
ing task that can be responsible for a large portion of soft-
ware development and maintenance costs [21,23]. Common
characteristics of modern software, such as increased con-
figurability, larger code bases, and increased input sizes, in-
troduce new challenges for debugging and exacerbate exist-
ing problems. In response, researchers have proposed many
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.
ISSTA’09, July 19–23, 2009, Chicago, Illinois, USA.
Copyright 2009 ACM 978-1-60558-338-9/09/07 ...$5.00.
semi- and fully-automated techniques that attempt to re-
duce the cost of debugging (e.g., [8,9,11–13,18,24,25,27]).
The majority of these techniques are code-centric in that
they focus exclusively on one aspect of debugging—trying
to identify the faulty statements responsible for a failure.
Although code-centric approaches can work well in some
cases (e.g., for isolated faults that involve a single state-
ment), they are often inadequate for more complex faults [4].
Faults of omission, for instance, where part of a specification
has not been implemented, are notoriously problematic for
debugging techniques that attempt to identify potentially
faulty statements. The usefulness of code-centric techniques
is also limited in the case of long-running programs and pro-
grams that process large amounts of information; failures in
these types of programs are typically di⌅cult to understand
without considering the data involved in such failures.
To debug failures more e ectively, it is necessary to pro-
vide developers with not only a relevant subset of state-
ments, but also a relevant subset of inputs. There are only
a few existing techniques that attempt to identify relevant
inputs [3, 17, 25], with delta debugging [25] being the most
known of these. Although delta debugging has been shown
to be an e ective technique for automatic debugging, it also
has several drawbacks that may limit its usefulness in prac-
tice. In particular, it requires (1) multiple executions of the
program being debugged, which can involve a long running
time, and (2) complex oracles and setup, which can result
in a large amount of manual e ort [2].
In this paper, we present a novel debugging technique that
addresses many of the limitations of existing approaches.
Our technique can complement code-centric debugging tech-
niques because it focuses on identifying program inputs that
are likely to be relevant for a given failure. It also overcomes
some of the drawbacks of delta debugging because it needs
a single execution to identify failure-relevant inputs and re-
quires minimal manual e ort.
Given an observable faulty behavior and a set of failure-
inducing inputs (i.e., a set of inputs that cause such behav-
ior), our technique automatically identifies failure-relevant
inputs (i.e., a subset of failure-inducing inputs that are ac-
tually relevant for investigating the faulty behavior). Our
approach is based on dynamic tainting. Intuitively, the tech-
nique works by tracking the flow of inputs along data and
control dependences at runtime. When a point of failure
is reached, the tracked information is used to identify and
present to developers the failure-relevant inputs. At this
point, developers can use the identified inputs to investigate
the failure at hand.
LEAKPOINT: Pinpointing the Causes of Memory Leaks
James Clause
College of Computing
Georgia Institute of Technology
clause@cc.gatech.edu
Alessandro Orso
College of Computing
Georgia Institute of Technology
orso@cc.gatech.edu
ABSTRACT
Most existing leak detection techniques for C and C++ applications
only detect the existence of memory leaks. They do not provide
any help for fixing the underlying memory management errors. In
this paper, we present a new technique that not only detects leaks,
but also points developers to the locations where the underlying
errors may be fixed. Our technique tracks pointers to dynamically-
allocated areas of memory and, for each memory area, records sev-
eral pieces of relevant information. This information is used to
identify the locations in an execution where memory leaks occur.
To investigate our technique’s feasibility and usefulness, we devel-
oped a prototype tool called LEAKPOINT and used it to perform
an empirical evaluation. The results of this evaluation show that
LEAKPOINT detects at least as many leaks as existing tools, reports
zero false positives, and, most importantly, can be effective at help-
ing developers fix the underlying memory management errors.
Categories and Subject Descriptors
D.2.5 [Software Engineering]: Testing and Debugging
General Terms
Performance, Reliability
Keywords
Leak detection, Dynamic tainting
1. INTRODUCTION
Memory leaks are a type of unintended memory consumption
that can adversely impact the performance and correctness of an
application. In programs written in languages such as C and C++,
memory is allocated using allocation functions, such as malloc
and new. Allocation functions reserve a currently free area of
memory m and return a pointer p that points to m’s starting ad-
dress. Typically, the program stores and then uses p, or another
This work was supported in part by NSF awards CCF-0725202
and CCF-0541080 to Georgia Tech.
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.
ICSE ’10, May 2-8 2010, Cape Town, South Africa
Copyright 2010 ACM 978-1-60558-719-6/10/05 ...$10.00.
pointer derived from p, to interact with m. When m is no longer
needed, the program should pass p to a deallocation function (e.g.,
free or delete) to deallocate m. A leak occurs if, due to a
memory management error, m is not deallocated at the appropri-
ate time. There are two types of memory leaks: lost memory and
forgotten memory. Lost memory refers to the situation where m be-
comes unreachable (i.e., the program overwrites or loses p and all
pointers derived from p) without first being deallocated. Forgotten
memory refers to the situation where m remains reachable but is
not deallocated or accessed in the rest of the execution.
Memory leaks are relevant for several reasons. First, they are dif-
ficult to detect. Unlike many other types of failures, memory leaks
do not immediately produce an easily visible symptom (e.g., a crash
or the output of a wrong value); typically, leaks remain unobserved
until they consume a large portion of the memory available to a sys-
tem. Second, leaks have the potential to impact not only the appli-
cation that leaks memory, but also every other application running
on the system; because the overall amount of memory is limited,
as the memory usage of a leaking program increases, less memory
is available to other running applications. Consequently, the per-
formance and correctness of every running application can be im-
pacted by a program that leaks memory. Third, leaks are common,
even in mature applications. For example, in the first half of 2009,
over 100 leaks in the Firefox web-browser were reported [18].
Because of the serious consequences and common occurrence of
memory leaks, researchers have created many static and dynamic
techniques for detecting them (e.g., [1,2,4,7–14,16,17,20–23,25,
27,28]). The adoption of static techniques has been limited by sev-
eral factors, including the lack of scalable, precise heap modeling.
Dynamic techniques are therefore more widely used in practice. In
general, dynamic techniques provide one main piece of informa-
tion: the location in an execution where a leaked area of memory is
allocated. This location is supposed to serve as a starting point for
investigating the leak. However, in many situations, this informa-
tion does not provide any insight on where or how to fix the mem-
ory management error that causes the leak: the allocation location
and the location of the memory management error are typically in
completely different parts of the application’s code.
To address this limitation of existing approaches, we propose
a new memory leak detection technique. Our technique provides
the same information as existing techniques but also identifies the
locations in an execution where leaks occur. In the case of lost
memory, the location is defined as the point in an execution where
the last pointer to an unallocated memory area is lost or overwritten.
In the case of forgotten memory, the location is defined as the last
point in an execution where a pointer to a leaked area of memory
was used (e.g., when it is dereferenced to read or write memory,
passed as a function argument, returned from a function, or used as
Camouflage: Automated Sanitization of Field Data
James Clause
College of Computing
Georgia Institute of Technology
clause@cc.gatech.edu
Alessandro Orso
College of Computing
Georgia Institute of Technology
orso@cc.gatech.edu
ABSTRACT
Privacy and security concerns have adversely a ected the
usefulness of many types of techniques that leverage infor-
mation gathered from deployed applications. To address this
issue, we present a new approach for automatically sanitiz-
ing failure-inducing inputs. Given an input I that causes
a failure f, our technique can generate a sanitized input I
that is di erent from I but still causes f. I can then be sent
to the developers to help them debug f, without revealing
the possibly sensitive information contained in I. We im-
plemented our approach in a prototype tool, camouflage,
and performed an empirical evaluation. In the evaluation,
we applied camouflage to a large set of failure-inducing
inputs for several real applications. The results of the eval-
uation are promising; they show that camouflage is both
practical and e ective at generating sanitized inputs. In par-
ticular, for the inputs that we considered, I and I shared
no sensitive information.
1. INTRODUCTION
Investigating techniques that capture data from deployed
applications to support in-house software engineering tasks
is an increasingly active and successful area of research (e.g.,
[1,3–5,13,14,17,21,22,26,27,29]). However, privacy and se-
curity concerns have prevented widespread adoption of many
of these techniques and, because they rely on user partici-
pation, have ultimately limited their usefulness. Many of
the earlier proposed techniques attempt to sidestep these
concerns by collecting only limited amounts of information
(e.g., stack traces and register dumps [1, 3, 5] or sampled
branch profiles [26,27]) and providing a privacy policy that
specifies how the information will be used (e.g., [2,8]). Be-
cause the types of information collected by these techniques
are unlikely to be sensitive, users are more willing to trust
developers. Moreover, because only a small amount of infor-
mation is collected, it is feasible for users to manually inspect
and sanitize such information before it is sent to developers.
Unfortunately, recent research has shown that the e ec-
tiveness of these techniques increases when they can lever-
age large amounts of detailed information (e.g., complete
execution recordings [4, 14] or path profiles [13, 24]). Since
more detailed information is bound to contain sensitive data,
users will most likely be unwilling to let developers collect
such information. In addition, collecting large amounts of
information would make it infeasible for users to sanitize
the collected information by hand. To address this prob-
lem, some of these techniques suggest using an input mini-
mization approach (e.g., [6, 7, 35]) to reduce the number of
failure-inducing inputs and, hopefully, eliminate some sensi-
tive information. Input-minimization techniques, however,
were not designed to specifically reduce sensitive inputs, so
they can only eliminate sensitive data by chance. In or-
der for techniques that leverage captured field information
to become widely adopted and achieve their full potential,
new approaches for addressing privacy and security concerns
must be developed.
In this paper, we present a novel technique that addresses
privacy and security concerns by sanitizing information cap-
tured from deployed applications. Our technique is designed
to be used in conjunction with an execution capture/replay
technique (e.g., [4, 14]). Given an execution recording that
contains a captured failure-inducing input I = i1, i2, . . . in⇥
and terminates with a failure f, our technique replays the
execution recording and leverages a specialized version of
symbolic-execution to automatically produce I , a sanitized
version of I, such that I (1) still causes f and (2) reveals as
little information about I as possible. A modified execution
recording where I replaces I can then be constructed and
sent to the developers, who can use it to debug f.
It is, in general, impossible to construct I such that it
does not reveal any information about I while still caus-
ing the same failure f. Typically, the execution of f would
depend on the fact that some elements of I have specific
values (e.g., i1 must be 0 for the failing path to be taken).
However, this fact does not prevent the technique from be-
ing useful in practice. In our evaluation, we found that the
information revealed by the sanitized inputs was not sensi-
tive and tended to be structural in nature (e.g., a specific
portion of the input must be surrounded by double quotes).
Conversely, the parts of the inputs that were more likely to
be sensitive (e.g., values contained inside the double quotes)
were not revealed (see Section 4).
To evaluate the e ectiveness of our technique, we imple-
mented it in a prototype tool, called camouflage, and car-
ried out an empirical evaluation of 170 failure-inducing in-
1
CC 05 ICSE 05 ICSE 07 ISSTA 07 ASE 07 ISSTA 09 ICSE 10 Tech Rept
Dynamic tainting
based analyses
Enabling more
efficient debugging
RESEARCH OVERVIEW
Efficient instrumentation
Jazz: A Tool for Demand-Driven Structural
Testing
Jonathan Misurda1
, Jim Clause1
, Juliya Reed1
, Bruce R. Childers1
, and Mary
Lou So a2
1
University of Pittsburgh, Pittsburgh PA 15260, USA,
{jmisurda,clausej,juliya,childers}@cs.pitt.edu
2
University of Virginia, Charlottesville VA 22904, USA,
soffa@cs.virginia.edu
Abstract. Software testing to produce reliable and robust software has
become vitally important. Testing is a process by which quality can be
assured through the collection of information about software. While test-
ing can improve software quality, current tools typically are inflexible
and have high overheads, making it a challenge to test large projects.
We describe a new scalable and flexible tool, called Jazz, that uses a
demand-driven structural testing approach. Jazz has a low overhead of
only 17.6% for branch testing.
1 Introduction
In the last several years, the importance of producing high quality and robust
software has become paramount. Testing is an important process to support
quality assurance by gathering information about the software being developed
or modified. It is, in general, extremely labor and resource intensive, accounting
for 50-60% of the total cost of software development [1]. The increased emphasis
on software quality and robustness mandates improved testing methodologies.
To test software, a number of techniques can be applied. One class of tech-
niques is structural testing, which checks that a given coverage criterion is sat-
isfied. For example, branch testing checks that a certain percentage of branches
are executed. Other structural tests include def-use testing in which pairs of
variable definitions and uses are checked for coverage and node testing in which
nodes in a program’s control flow graph are checked.
Unfortunately, structural testing is often hindered by the lack of scalable
and flexible tools. Current tools are not scalable in terms of both time and
memory, limiting the number and scope of the tests that can be applied to large
programs. These tools often modify the software binary to insert instrumentation
for testing. In this case, the tested version of the application is not the same
version that is shipped to customers and errors may remain. Testing tools are
usually inflexible and only implement certain types of testing. For example, many
tools implement branch testing, but do not implement node or def-use testing.
In this paper, we describe a new tool for structural testing, called Jazz, that
addresses these problems. Jazz uses a novel demand-driven technique to apply
ABSTRACT
Producing reliable and robust software has become one
of the most important software development concerns in
recent years. Testing is a process by which software
quality can be assured through the collection of infor-
mation. While testing can improve software reliability,
current tools typically are inflexible and have high over-
heads, making it challenging to test large software
projects. In this paper, we describe a new scalable and
flexible framework for testing programs with a novel
demand-driven approach based on execution paths to
implement test coverage. This technique uses dynamic
instrumentation on the binary code that can be inserted
and removed on-the-fly to keep performance and mem-
ory overheads low. We describe and evaluate implemen-
tations of the framework for branch, node and def-use
testing of Java programs. Experimental results for
branch testing show that our approach has, on average, a
1.6 speed up over static instrumentation and also uses
less memory.
Categories and Subject Descriptors
D.2.5. [Software Engineering]: Testing and Debug-
ging—Testing tools; D.3.3. [Programming Lan-
guages]: Language Constructs and Features—Program
instrumentation, run-time environments
General Terms
Experimentation, Measurement, Verification
Keywords
Testing, Code Coverage, Structural Testing, Demand-
Driven Instrumentation, Java Programming Language
1. INTRODUCTION
In the last several years, the importance of produc-
ing high quality and robust software has become para-
mount [15]. Testing is an important process to support
quality assurance by gathering information about the
behavior of the software being developed or modified. It
is, in general, extremely labor and resource intensive,
accounting for 50-60% of the total cost of software
development [17]. Given the importance of testing, it is
imperative that there are appropriate testing tools and
frameworks. In order to adequately test software, a
number of different testing techniques must be per-
formed. One class of testing techniques used extensively
is structural testing in which properties of the software
code are used to ensure a certain code coverage.Struc-
tural testing techniques include branch testing, node
testing, path testing, and def-use testing [6,7,8,17,19].
Typically, a testing tool targets one type of struc-
tural test, and the software unit is the program, file or
particular methods. In order to apply various structural
testing techniques, different tools must be used. If a tool
for a particular type of structural testing is not available,
the tester would need to either implement it or not use
that testing technique. The tester would also be con-
strained by the region of code to be tested, as deter-
mined by the tool implementor. For example, it may not
be possible for the tester to focus on a particular region
of code, such as a series of loops, complicated condi-
tionals, or particular variables if def-use testing is
desired. The user may want to have higher coverage on
frequently executed regions of code. Users may want to
define their own way of testing. For example, all
branches should be covered 10 times rather than once in
all loops.
In structural testing, instrumentation is placed at
certain code points (probes). Whenever such a program
point is reached, code that performs the function for the
test (payload) is executed. The probes in def-use testing
are dictated by the definitions and uses of variables and
the payload is to mark that a definition or use in a def-
use pair has been covered. Thus for each type of struc-
tural testing, there is a testing “plan”. A test plan is a
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies
are not made or distributed for profit or commercial advantage and
that copies bear this notice and the full citation on the first page. To
copy otherwise, or republish, to post on servers or to redistribute to
lists, requires prior specific permission and/or a fee.
ICSE'05, May 15-21, 2005, St. Louis, Missouri, USA.
Copyright 2005 ACM 1-58113-963-2/05/0005...$5.00.
Demand-Driven Structural Testing with Dynamic
Instrumentation
Jonathan Misurda†, James A. Clause†, Juliya L. Reed†, Bruce R. Childers†, and
Mary Lou Soffa‡
†Department of Computer Science
University of Pittsburgh
Pittsburgh, Pennsylvania 15260
{jmisurda, clausej, juliya, childers}@cs.pitt.edu
‡Department of Computer Science
University of Virginia
Charlottesville, Virginia 22904
soffa@cs.virginia.edu
156
A Technique for Enabling and Supporting Debugging of Field Failures
James Clause and Alessandro Orso
College of Computing
Georgia Institute of Technology
{clause, orso}@cc.gatech.edu
Abstract
It is difficult to fully assess the quality of software in-
house, outside the actual time and context in which it will
execute after deployment. As a result, it is common for
software to manifest field failures, failures that occur on
user machines due to untested behavior. Field failures are
typically difficult to recreate and investigate on developer
platforms, and existing techniques based on crash report-
ing provide only limited support for this task. In this pa-
per, we present a technique for recording, reproducing, and
minimizing failing executions that enables and supports in-
house debugging of field failures. We also present a tool
that implements our technique and an empirical study that
evaluates the technique on a widely used e-mail client.
1. Introduction
Quality-assurance activities, such as software testing and
analysis, are notoriously difficult, expensive, and time-
consuming. As a result, software products are often re-
leased with faults or missing functionality. In fact, real-
world examples of field failures experienced by users be-
cause of untested behaviors (e.g., due to unforeseen us-
ages), are countless. When field failures occur, it is im-
portant for developers to be able to recreate and investigate
them in-house. This pressing need is demonstrated by the
emergence of several crash-reporting systems, such as Mi-
crosoft’s error reporting systems [13] and Apple’s Crash
Reporter [1]. Although these techniques represent a first
important step in addressing the limitations of purely in-
house approaches to quality assurance, they work on lim-
ited data (typically, a snapshot of the execution state) and
can at best identify correlations between a crash report and
data on other known failures.
In this paper, we present a novel technique for reproduc-
ing and investigating field failures that addresses the limita-
tions of existing approaches. Our technique works in three
phases, intuitively illustrated by the scenario in Figure 1. In
the recording phase, while users run the software, the tech-
nique intercepts and logs the interactions between applica-
tion and environment and records portions of the environ-
ment that are relevant to these interactions. If the execution
terminates with a failure, the produced execution recording
is stored for later investigation. In the minimization phase,
using free cycles on the user machines, the technique re-
plays the recorded failing executions with the goal of au-
tomatically eliminating parts of the executions that are not
relevant to the failure. In the replay and debugging phase,
developers can use the technique to replay the minimized
failing executions and investigate the cause of the failures
(e.g., within a debugger). Being able to replay and debug
real field failures can give developers unprecedented insight
into the behavior of their software after deployment and op-
portunities to improve the quality of their software in ways
that were not possible before.
To evaluate our technique, we implemented it in a proto-
type tool, called ADDA (Automated Debugging of Deployed
Applications), and used the tool to perform an empirical
study. The study was performed on PINE [19], a widely-
used e-mail client, and involved the investigation of failures
caused by two real faults in PINE. The results of the study
are promising. Our technique was able to (1) record all ex-
ecutions of PINE (and two other subjects) with a low time
and space overhead, (2) completely replay all recorded exe-
cutions, and (3) perform automated minimization of failing
executions and obtain shorter executions that manifested the
same failures as the original executions. Moreover, we were
able to replay the minimized executions within a debugger,
which shows that they could have actually been used to in-
vestigate the failures.
The contributions of this paper are:
• A novel technique for recording and later replaying exe-
cutions of deployed programs.
• An approach for minimizing failing executions and gen-
erating shorter executions that fail for the same reasons.
• A prototype tool that implements our technique.
• An empirical study that shows the feasibility and effec-
tiveness of the approach.
29th International Conference on Software Engineering (ICSE'07)
0-7695-2828-7/07 $20.00 © 2007
Dytan: A Generic Dynamic Taint Analysis Framework
James Clause, Wanchun Li, and Alessandro Orso
College of Computing
Georgia Institute of Technology
{clause|wli7|orso}@cc.gatech.edu
ABSTRACT
Dynamic taint analysis is gaining momentum. Techniques based
on dynamic tainting have been successfully used in the context of
application security, and now their use is also being explored in dif-
ferent areas, such as program understanding, software testing, and
debugging. Unfortunately, most existing approaches for dynamic
tainting are defined in an ad-hoc manner, which makes it difficult
to extend them, experiment with them, and adapt them to new con-
texts. Moreover, most existing approaches are focused on data-flow
based tainting only and do not consider tainting due to control flow,
which limits their applicability outside the security domain. To
address these limitations and foster experimentation with dynamic
tainting techniques, we defined and developed a general framework
for dynamic tainting that (1) is highly flexible and customizable, (2)
allows for performing both data-flow and control-flow based taint-
ing conservatively, and (3) does not rely on any customized run-
time system. We also present DYTAN, an implementation of our
framework that works on x86 executables, and a set of preliminary
studies that show how DYTAN can be used to implement different
tainting-based approaches with limited effort. In the studies, we
also show that DYTAN can be used on real software, by using FIRE-
FOX as one of our subjects, and illustrate how the specific char-
acteristics of the tainting approach used can affect efficiency and
accuracy of the taint analysis, which further justifies the use of our
framework to experiment with different variants of an approach.
Categories and Subject Descriptors: D.2.5 [Software Engineer-
ing]: Testing and Debugging;
General Terms: Experimentation, Security
Keywords: Dynamic tainting, information flow, general framework
1. INTRODUCTION
Dynamic taint analysis (also known as dynamic information flow
analysis) consists, intuitively, in marking and tracking certain data
in a program at run-time. This type of dynamic analysis is be-
coming increasingly popular. In the context of application secu-
rity, dynamic-tainting approaches have been successfully used to
prevent a wide range of attacks, including buffer overruns (e.g., [8,
17]), format string attacks (e.g., [17, 21]), SQL and command in-
jections (e.g., [7, 19]), and cross-site scripting (e.g., [18]). More
recently, researchers have started to investigate the use of tainting-
based approaches in domains other than security, such as program
understanding, software testing, and debugging (e.g., [11, 13]).
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.
ISSTA’07, July 9–12, 2007, London, England, United Kingdom.
Copyright 2007 ACM 978-1-59593-734-6/07/0007 ...$5.00.
Unfortunately, most existing techniques and tools for dynamic
taint analysis are defined in an ad-hoc manner, to target a specific
problem or a small class of problems. It would be difficult to ex-
tend or adapt such techniques and tools so that they can be used in
other contexts. In particular, most existing approaches are focused
on data-flow based tainting only, and do not consider tainting due
to the control flow within an application, which limits their general
applicability. Also, most existing techniques support either a sin-
gle taint marking or a small, fixed number of markings, which is
problematic in applications such as debugging. Finally, almost no
existing technique handles the propagation of taint markings in a
truly conservative way, which may be appropriate for the specific
applications considered, but is problematic in general. Because de-
veloping support for dynamic taint analysis is not only time con-
suming, but also fairly complex, this lack of flexibility and gener-
ality of existing tools and techniques is especially limiting for this
type of dynamic analysis.
To address these limitations and foster experimentation with dy-
namic tainting techniques, in this paper we present a framework for
dynamic taint analysis. We designed the framework to be general
and flexible, so that it allows for implementing different kinds of
techniques based on dynamic taint analysis with little effort. Users
can leverage the framework to quickly develop prototypes for their
techniques, experiment with them, and investigate trade-offs of dif-
ferent alternatives. For a simple example, the framework could be
used to investigate the cost effectiveness of considering different
types of taint propagation for an application.
Our framework has several advantages over existing approaches.
First, it is highly flexible and customizable. It allows for easily
specifying which program data should be tainted and how, how taint
markings should be propagated at run-time, and where and how
taint markings should be checked. Second, it allows for performing
data-flow and both data-flow and control-flow based tainting. Third,
from a more practical standpoint, it works on binaries, does not
need access to source code, and does not rely on any customized
hardware or operating system, which makes it broadly applicable.
We also present DYTAN, an implementation of our framework
that works on x86 binaries, and a set of preliminary studies per-
formed using DYTAN. In the first set of studies, we report on our
experience in using DYTAN to implement two tainting-based ap-
proaches presented in the literature. Although preliminary, our ex-
perience shows that we were able to implement these approaches
completely and with little effort. The second set of studies illus-
trates how the specific characteristics of a tainting approach can
affect efficiency and accuracy of the taint analysis. In particular, we
investigate how ignoring control-flow related propagation and over-
looking some data-flow aspects can lead to unsafety. These results
further justify the usefulness of experimenting with different varia-
tions of dynamic taint analysis and assessing their tradeoffs, which
can be done with limited effort using our framework. The second
set of studies also shows the practical applicability of DYTAN, by
successfully running it on the FIREFOX web browser.
196
Effective Memory Protection Using Dynamic Tainting
James Clause, Ioannis Doudalis, Alessandro Orso, and Milos Prvulovic
College of Computing
Georgia Institute of Technology
{clause|idoud|orso|milos}@cc.gatech.edu
ABSTRACT
Programs written in languages that provide direct access to memory
through pointers often contain memory-related faults, which may
cause non-deterministic failures and even security vulnerabilities.
In this paper, we present a new technique based on dynamic taint-
ing for protecting programs from illegal memory accesses. When
memory is allocated, at runtime, our technique taints both the mem-
ory and the corresponding pointer using the same taint mark. Taint
marks are then suitably propagated while the program executes and
are checked every time a memory address m is accessed through a
pointer p; if the taint marks associated with m and p differ, the ex-
ecution is stopped and the illegal access is reported. To allow for a
low-overhead, hardware-assisted implementation of the approach,
we make several key technical and engineering decisions in the
definition of our technique. In particular, we use a configurable,
low number of reusable taint marks instead of a unique mark for
each area of memory allocated, which reduces the overhead of the
approach without limiting its flexibility and ability to target most
memory-related faults and attacks known to date. We also define
the technique at the binary level, which lets us handle the (very)
common case of applications that use third-party libraries whose
source code is unavailable. To investigate the effectiveness and
practicality of our approach, we implemented it for heap-allocated
memory and performed a preliminary empirical study on a set of
programs. Our results show that (1) our technique can identify a
large class of memory-related faults, even when using only two
unique taint marks, and (2) a hardware-assisted implementation of
the technique could achieve overhead in the single digits.
Categories and Subject Descriptors: D.2.5 [Software Engineering]: Test-
ing and Debugging; C.0 [General]: Hardware/Software Interfaces;
General Terms: Performance, Security
Keywords: Illegal memory accesses, dynamic tainting, hardware support
1. INTRODUCTION
Memory-related faults are a serious problem for languages that
allow direct memory access through pointers. An important class
of memory-related faults are what we call illegal memory accesses.
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.
ASE’07, November 5–9, 2007, Atlanta, Georgia, USA.
Copyright 2007 ACM 978-1-59593-882-4/07/0011 ...$5.00.
In languages such as C and C++, when memory allocation is re-
quested, a currently-free area of memory m of the specified size
is reserved. After m has been allocated, its initial address can be
assigned to a pointer p, either immediately (e.g., in the case of
heap allocated memory) or at a later time (e.g., when retrieving
and storing the address of a local variable). From that point on,
the only legal accesses to m through a pointer are accesses per-
formed through p or through other pointers derived from p. (In
Section 3, we clearly define what it means to derive a pointer from
another pointer.) All other accesses to m are Illegal Memory Ac-
cesses (IMAs), that is, accesses where a pointer is used to access
memory outside the bounds of the memory area with which it was
originally associated.
IMAs are especially relevant for several reasons. First, they are
caused by typical programming errors, such as array-out-of-bounds
accesses and NULL pointer dereferences, and are thus widespread
and common. Second, they often result in non-deterministic fail-
ures that are hard to identify and diagnose; the specific effects of an
IMA depend on several factors, such as memory layout, that may
vary between executions. Finally, many security concerns such as
viruses, worms, and rootkits use IMAs as their injection vectors.
In this paper, we present a new dynamic technique for protecting
programs against IMAs that is effective against most known types
of illegal accesses. The basic idea behind the technique is to use
dynamic tainting (or dynamic information flow) [8] to keep track
of which memory areas can be accessed through which pointers,
as follows. At runtime, our technique taints both allocated mem-
ory and pointers using taint marks. Dynamic taint propagation, to-
gether with a suitable handling of memory-allocation and deallo-
cation operations, ensures that taint marks are appropriately prop-
agated during execution. Every time the program accesses some
memory through a pointer, our technique checks whether the ac-
cess is legal by comparing the taint mark associated with the mem-
ory and the taint mark associated with the pointer used to access it.
If the marks match, the access is considered legitimate. Otherwise,
the execution is stopped and an IMA is reported.
In defining our approach, our final goal is the development of a
low-overhead, hardware-assisted tool that is practical and can be
used on deployed software. A hardware-assisted tool is a tool that
leverages the benefits of both hardware and software. Typically,
some performance critical aspects are moved to the hardware to
achieve maximum efficiency, while software is used to perform op-
erations that would be too complex to implement in hardware.
There are two main characteristics of our approach that were de-
fined to help achieve our goal of a hardware-assisted implementa-
tion. The first characteristic is that our technique only uses a small,
configurable number of reusable taint marks instead of a unique
mark for each area of memory allocated. Using a low number of
283
Penumbra: Automatically Identifying Failure-Relevant
Inputs Using Dynamic Tainting
James Clause
College of Computing
Georgia Institute of Technology
clause@cc.gatech.edu
Alessandro Orso
College of Computing
Georgia Institute of Technology
orso@cc.gatech.edu
ABSTRACT
Most existing automated debugging techniques focus on re-
ducing the amount of code to be inspected and tend to ig-
nore an important component of software failures: the in-
puts that cause the failure to manifest. In this paper, we
present a new technique based on dynamic tainting for au-
tomatically identifying subsets of a program’s inputs that
are relevant to a failure. The technique (1) marks program
inputs when they enter the application, (2) tracks them as
they propagate during execution, and (3) identifies, for an
observed failure, the subset of inputs that are potentially
relevant for debugging that failure. To investigate feasibil-
ity and usefulness of our technique, we created a prototype
tool, penumbra, and used it to evaluate our technique on
several failures in real programs. Our results are promising,
as they show that penumbra can point developers to inputs
that are actually relevant for investigating a failure and can
be more practical than existing alternative approaches.
Categories and Subject Descriptors
D.2.5 [Software Engineering]: Testing and Debugging
General Terms
Algorithms, Experimentation, Reliability
Keywords
Failure-relevant inputs, automated debugging, dynamic in-
formation flow, dynamic tainting
1. INTRODUCTION
Debugging is known to be a labor-intensive, time-consum-
ing task that can be responsible for a large portion of soft-
ware development and maintenance costs [21,23]. Common
characteristics of modern software, such as increased con-
figurability, larger code bases, and increased input sizes, in-
troduce new challenges for debugging and exacerbate exist-
ing problems. In response, researchers have proposed many
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.
ISSTA’09, July 19–23, 2009, Chicago, Illinois, USA.
Copyright 2009 ACM 978-1-60558-338-9/09/07 ...$5.00.
semi- and fully-automated techniques that attempt to re-
duce the cost of debugging (e.g., [8,9,11–13,18,24,25,27]).
The majority of these techniques are code-centric in that
they focus exclusively on one aspect of debugging—trying
to identify the faulty statements responsible for a failure.
Although code-centric approaches can work well in some
cases (e.g., for isolated faults that involve a single state-
ment), they are often inadequate for more complex faults [4].
Faults of omission, for instance, where part of a specification
has not been implemented, are notoriously problematic for
debugging techniques that attempt to identify potentially
faulty statements. The usefulness of code-centric techniques
is also limited in the case of long-running programs and pro-
grams that process large amounts of information; failures in
these types of programs are typically di⌅cult to understand
without considering the data involved in such failures.
To debug failures more e ectively, it is necessary to pro-
vide developers with not only a relevant subset of state-
ments, but also a relevant subset of inputs. There are only
a few existing techniques that attempt to identify relevant
inputs [3, 17, 25], with delta debugging [25] being the most
known of these. Although delta debugging has been shown
to be an e ective technique for automatic debugging, it also
has several drawbacks that may limit its usefulness in prac-
tice. In particular, it requires (1) multiple executions of the
program being debugged, which can involve a long running
time, and (2) complex oracles and setup, which can result
in a large amount of manual e ort [2].
In this paper, we present a novel debugging technique that
addresses many of the limitations of existing approaches.
Our technique can complement code-centric debugging tech-
niques because it focuses on identifying program inputs that
are likely to be relevant for a given failure. It also overcomes
some of the drawbacks of delta debugging because it needs
a single execution to identify failure-relevant inputs and re-
quires minimal manual e ort.
Given an observable faulty behavior and a set of failure-
inducing inputs (i.e., a set of inputs that cause such behav-
ior), our technique automatically identifies failure-relevant
inputs (i.e., a subset of failure-inducing inputs that are ac-
tually relevant for investigating the faulty behavior). Our
approach is based on dynamic tainting. Intuitively, the tech-
nique works by tracking the flow of inputs along data and
control dependences at runtime. When a point of failure
is reached, the tracked information is used to identify and
present to developers the failure-relevant inputs. At this
point, developers can use the identified inputs to investigate
the failure at hand.
LEAKPOINT: Pinpointing the Causes of Memory Leaks
James Clause
College of Computing
Georgia Institute of Technology
clause@cc.gatech.edu
Alessandro Orso
College of Computing
Georgia Institute of Technology
orso@cc.gatech.edu
ABSTRACT
Most existing leak detection techniques for C and C++ applications
only detect the existence of memory leaks. They do not provide
any help for fixing the underlying memory management errors. In
this paper, we present a new technique that not only detects leaks,
but also points developers to the locations where the underlying
errors may be fixed. Our technique tracks pointers to dynamically-
allocated areas of memory and, for each memory area, records sev-
eral pieces of relevant information. This information is used to
identify the locations in an execution where memory leaks occur.
To investigate our technique’s feasibility and usefulness, we devel-
oped a prototype tool called LEAKPOINT and used it to perform
an empirical evaluation. The results of this evaluation show that
LEAKPOINT detects at least as many leaks as existing tools, reports
zero false positives, and, most importantly, can be effective at help-
ing developers fix the underlying memory management errors.
Categories and Subject Descriptors
D.2.5 [Software Engineering]: Testing and Debugging
General Terms
Performance, Reliability
Keywords
Leak detection, Dynamic tainting
1. INTRODUCTION
Memory leaks are a type of unintended memory consumption
that can adversely impact the performance and correctness of an
application. In programs written in languages such as C and C++,
memory is allocated using allocation functions, such as malloc
and new. Allocation functions reserve a currently free area of
memory m and return a pointer p that points to m’s starting ad-
dress. Typically, the program stores and then uses p, or another
This work was supported in part by NSF awards CCF-0725202
and CCF-0541080 to Georgia Tech.
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.
ICSE ’10, May 2-8 2010, Cape Town, South Africa
Copyright 2010 ACM 978-1-60558-719-6/10/05 ...$10.00.
pointer derived from p, to interact with m. When m is no longer
needed, the program should pass p to a deallocation function (e.g.,
free or delete) to deallocate m. A leak occurs if, due to a
memory management error, m is not deallocated at the appropri-
ate time. There are two types of memory leaks: lost memory and
forgotten memory. Lost memory refers to the situation where m be-
comes unreachable (i.e., the program overwrites or loses p and all
pointers derived from p) without first being deallocated. Forgotten
memory refers to the situation where m remains reachable but is
not deallocated or accessed in the rest of the execution.
Memory leaks are relevant for several reasons. First, they are dif-
ficult to detect. Unlike many other types of failures, memory leaks
do not immediately produce an easily visible symptom (e.g., a crash
or the output of a wrong value); typically, leaks remain unobserved
until they consume a large portion of the memory available to a sys-
tem. Second, leaks have the potential to impact not only the appli-
cation that leaks memory, but also every other application running
on the system; because the overall amount of memory is limited,
as the memory usage of a leaking program increases, less memory
is available to other running applications. Consequently, the per-
formance and correctness of every running application can be im-
pacted by a program that leaks memory. Third, leaks are common,
even in mature applications. For example, in the first half of 2009,
over 100 leaks in the Firefox web-browser were reported [18].
Because of the serious consequences and common occurrence of
memory leaks, researchers have created many static and dynamic
techniques for detecting them (e.g., [1,2,4,7–14,16,17,20–23,25,
27,28]). The adoption of static techniques has been limited by sev-
eral factors, including the lack of scalable, precise heap modeling.
Dynamic techniques are therefore more widely used in practice. In
general, dynamic techniques provide one main piece of informa-
tion: the location in an execution where a leaked area of memory is
allocated. This location is supposed to serve as a starting point for
investigating the leak. However, in many situations, this informa-
tion does not provide any insight on where or how to fix the mem-
ory management error that causes the leak: the allocation location
and the location of the memory management error are typically in
completely different parts of the application’s code.
To address this limitation of existing approaches, we propose
a new memory leak detection technique. Our technique provides
the same information as existing techniques but also identifies the
locations in an execution where leaks occur. In the case of lost
memory, the location is defined as the point in an execution where
the last pointer to an unallocated memory area is lost or overwritten.
In the case of forgotten memory, the location is defined as the last
point in an execution where a pointer to a leaked area of memory
was used (e.g., when it is dereferenced to read or write memory,
passed as a function argument, returned from a function, or used as
Camouflage: Automated Sanitization of Field Data
James Clause
College of Computing
Georgia Institute of Technology
clause@cc.gatech.edu
Alessandro Orso
College of Computing
Georgia Institute of Technology
orso@cc.gatech.edu
ABSTRACT
Privacy and security concerns have adversely a ected the
usefulness of many types of techniques that leverage infor-
mation gathered from deployed applications. To address this
issue, we present a new approach for automatically sanitiz-
ing failure-inducing inputs. Given an input I that causes
a failure f, our technique can generate a sanitized input I
that is di erent from I but still causes f. I can then be sent
to the developers to help them debug f, without revealing
the possibly sensitive information contained in I. We im-
plemented our approach in a prototype tool, camouflage,
and performed an empirical evaluation. In the evaluation,
we applied camouflage to a large set of failure-inducing
inputs for several real applications. The results of the eval-
uation are promising; they show that camouflage is both
practical and e ective at generating sanitized inputs. In par-
ticular, for the inputs that we considered, I and I shared
no sensitive information.
1. INTRODUCTION
Investigating techniques that capture data from deployed
applications to support in-house software engineering tasks
is an increasingly active and successful area of research (e.g.,
[1,3–5,13,14,17,21,22,26,27,29]). However, privacy and se-
curity concerns have prevented widespread adoption of many
of these techniques and, because they rely on user partici-
pation, have ultimately limited their usefulness. Many of
the earlier proposed techniques attempt to sidestep these
concerns by collecting only limited amounts of information
(e.g., stack traces and register dumps [1, 3, 5] or sampled
branch profiles [26,27]) and providing a privacy policy that
specifies how the information will be used (e.g., [2,8]). Be-
cause the types of information collected by these techniques
are unlikely to be sensitive, users are more willing to trust
developers. Moreover, because only a small amount of infor-
mation is collected, it is feasible for users to manually inspect
and sanitize such information before it is sent to developers.
Unfortunately, recent research has shown that the e ec-
tiveness of these techniques increases when they can lever-
age large amounts of detailed information (e.g., complete
execution recordings [4, 14] or path profiles [13, 24]). Since
more detailed information is bound to contain sensitive data,
users will most likely be unwilling to let developers collect
such information. In addition, collecting large amounts of
information would make it infeasible for users to sanitize
the collected information by hand. To address this prob-
lem, some of these techniques suggest using an input mini-
mization approach (e.g., [6, 7, 35]) to reduce the number of
failure-inducing inputs and, hopefully, eliminate some sensi-
tive information. Input-minimization techniques, however,
were not designed to specifically reduce sensitive inputs, so
they can only eliminate sensitive data by chance. In or-
der for techniques that leverage captured field information
to become widely adopted and achieve their full potential,
new approaches for addressing privacy and security concerns
must be developed.
In this paper, we present a novel technique that addresses
privacy and security concerns by sanitizing information cap-
tured from deployed applications. Our technique is designed
to be used in conjunction with an execution capture/replay
technique (e.g., [4, 14]). Given an execution recording that
contains a captured failure-inducing input I = i1, i2, . . . in⇥
and terminates with a failure f, our technique replays the
execution recording and leverages a specialized version of
symbolic-execution to automatically produce I , a sanitized
version of I, such that I (1) still causes f and (2) reveals as
little information about I as possible. A modified execution
recording where I replaces I can then be constructed and
sent to the developers, who can use it to debug f.
It is, in general, impossible to construct I such that it
does not reveal any information about I while still caus-
ing the same failure f. Typically, the execution of f would
depend on the fact that some elements of I have specific
values (e.g., i1 must be 0 for the failing path to be taken).
However, this fact does not prevent the technique from be-
ing useful in practice. In our evaluation, we found that the
information revealed by the sanitized inputs was not sensi-
tive and tended to be structural in nature (e.g., a specific
portion of the input must be surrounded by double quotes).
Conversely, the parts of the inputs that were more likely to
be sensitive (e.g., values contained inside the double quotes)
were not revealed (see Section 4).
To evaluate the e ectiveness of our technique, we imple-
mented it in a prototype tool, called camouflage, and car-
ried out an empirical evaluation of 170 failure-inducing in-
1
CC 05 ICSE 05 ISSTA 07 ASE 07 ISSTA 09 ICSE 10 Tech ReptICSE 07
Dynamic tainting
based analyses
Enabling more
efficient debugging
RESEARCH OVERVIEW
OVERALL PICTURE
OVERALL PICTURE
OVERALL PICTURE
OVERALL PICTURE
OVERALL PICTURE
Field failures:Anomalous behavior (or crashes)
of deployed software that occur on user machines
• Difficult to debug
• Relevant to users
CURRENT PRACTICE
CURRENT PRACTICE
Ask the user
CURRENT PRACTICE
Ask the user
I frobbed the
thingummy like the guy
told me. Then I spun
the doodad
widdershins and a little
thinger popped up and
it just stopped working.
CURRENT PRACTICE
Ask the user
I opened my web
browser.
Specifically, I clicked on
the dock icon. It
bounced twice before
crashing.
Please help.
CURRENT PRACTICE
Gather static information
CURRENT PRACTICE
Gather static information
Difficult to reproduce
the failure
CURRENT PRACTICE
Gather static information
Difficult to reproduce
the failure
Locations only
correlated with the
failure
Liblit et al. 03
Tucek et al. 07
Chilimbi et al. 09
...
OUR SOLUTION
OUR SOLUTION
Record
failing executions
in the field
OUR SOLUTION
Record
failing executions
in the field
Replay
failing executions
in house
+
OUR SOLUTION
Record
failing executions
in the field
Replay
failing executions
in house
Debug
field failures
effectively
+
In the fieldIn house
USAGE SCENARIO
✘Replay / Debug
Develop Record
Captured
failure
In the fieldIn house
USAGE SCENARIO
✘Replay / Debug
Develop Record
Captured
failure
Oracle
In the fieldIn house
USAGE SCENARIO
✘Replay / Debug
Develop Record
Captured
failure
PRACTICALITY ISSUES
✘
345
PRACTICALITY ISSUES
Large in size
✘
345345
PRACTICALITY ISSUES
Large in size Contain sensitive
information
✘
345345
PRACTICALITY ISSUES
Large in size Contain sensitive
information
Minimize
✘ ✘
345345
PRACTICALITY ISSUES
Large in size Contain sensitive
information
Minimize Anonymize
✘ ✘
In the fieldIn house
Replay / Debug
Develop Record
Captured
failure
MinimizeAnonymize
USAGE SCENARIO
✘
✘
✘
OUTLINE
Recording / Replaying
Minimization
Anonymization
COMING
SOON
✘
OUTLINE
Recording / Replaying
Minimization
Anonymization
Future work
COMING
SOON
✘
OUTLINE
Recording / Replaying
Minimization
Anonymization
Future work
COMING
SOON
✘
OUTLINE
Recording / Replaying
Minimization
Anonymization
Future work
EXISTING RECORD/REPLAY
APPROACHES
EXISTING RECORD/REPLAY
APPROACHES
Chen et al. 01, King et al. 05
Narayanasamy et al. 05,
Netzer and Weaver 94,
Srinivasan et al. 04,VMWare
Exactly replay everything
EXISTING RECORD/REPLAY
APPROACHES
Not amenable to
minimization or
anonymization
Unacceptable
runtime overhead
Chen et al. 01, King et al. 05
Narayanasamy et al. 05,
Netzer and Weaver 94,
Srinivasan et al. 04,VMWare
Exactly replay everything
Not amenable to
minimization or
anonymization
Unacceptable
runtime overhead
Record low-level events
• numerous
• high interdependence
RECORD & REPLAY
Record high-level events
• fewer in number
• low interdependence
Amenable to
minimization or
anonymization
Acceptable
runtime overhead
RECORD & REPLAY
ENVIRONMENT INTERACTIONS
ENVIRONMENT INTERACTIONSStreams
ENVIRONMENT INTERACTIONSStreams
Files
ENVIRONMENT INTERACTIONSStreams
Files
Interaction Events:
FILE — interaction with a file
POLL — checks for availability of data on a stream
PULL — read data from a stream
Environment data (streams):
Event log:
Environment data (files):
Environment data (streams):
Event log:
Environment data (files):
Environment data (streams):
Event log:
Environment data (files):
FILE foo.1
Environment data (streams):
Event log:
Environment data (files):
FILE foo.1
foo.1
Environment data (streams):
Event log:
Environment data (files):
FILE foo.1
foo.1
Environment data (streams):
Event log:
Environment data (files):
FILE foo.1
foo.1
POLL KEYBOARD NOK
Environment data (streams):
Event log:
Environment data (files):
FILE foo.1
foo.1
POLL KEYBOARD NOK
Environment data (streams):
Event log:
Environment data (files):
FILE foo.1
foo.1
KEYBOARD: {5680}
POLL KEYBOARD OK
POLL KEYBOARD NOK
Environment data (streams):
Event log:
Environment data (files):
FILE foo.1
foo.1
KEYBOARD: {5680}
POLL KEYBOARD OK
POLL KEYBOARD NOK
Environment data (streams):
Event log:
Environment data (files):
FILE foo.1
foo.1
KEYBOARD: {5680}hello
POLL KEYBOARD OK
PULL KEYBOARD 5
POLL KEYBOARD NOK
Environment data (streams):
Event log:
Environment data (files):
FILE foo.1
foo.1
KEYBOARD: {5680}hello
POLL KEYBOARD OK
PULL KEYBOARD 5
POLL KEYBOARD NOK
Environment data (streams):
Event log:
Environment data (files):
FILE foo.1
foo.1
KEYBOARD: {5680}hello
POLL KEYBOARD OK
PULL KEYBOARD 5
POLL KEYBOARD NOK
POLL NETWORK OK
NETWORK: {3405}
Environment data (streams):
Event log:
Environment data (files):
FILE foo.1
foo.1
KEYBOARD: {5680}hello
POLL KEYBOARD OK
PULL KEYBOARD 5
POLL KEYBOARD NOK
POLL NETWORK OK
NETWORK: {3405}
❙
Environment data (streams):
Event log:
Environment data (files):
FILE foo.1
foo.1
KEYBOARD: {5680}hello
POLL KEYBOARD OK
PULL KEYBOARD 5
POLL KEYBOARD NOK
POLL NETWORK OK
NETWORK: {3405}
❙
Environment data (files):
Event log:
Environment data (streams):
KEYBOARD: {5680}hello ❙ {4056}c ❙ {300}...
NETWORK: {3405}<html><body>... ❙ {202}...
FILE foo.1
POLL KEYBOARD NOK
POLL KEYBOARD OK
PULL KEYBOARD 5
POLL NETWORK OK
PULL NETWORK 1024
FILE bar.1
POLL NETWORK NOK
POLL NETWORK OK
FILE foo.2
...
PULL NETWORK 1024
FILE foo.2
POLL KEYBOARD NOK
...
foo.1 foo.2 bar.1
Environment data (files):
Event log:
Environment data (streams):
KEYBOARD: {5680}hello ❙ {4056}c ❙ {300}...
NETWORK: {3405}<html><body>... ❙ {202}...
FILE foo.1
POLL KEYBOARD NOK
POLL KEYBOARD OK
PULL KEYBOARD 5
POLL NETWORK OK
PULL NETWORK 1024
FILE bar.1
POLL NETWORK NOK
POLL NETWORK OK
FILE foo.2
...
PULL NETWORK 1024
FILE foo.2
POLL KEYBOARD NOK
...
foo.1 foo.2 bar.1
Environment data (files):
Event log:
Environment data (streams):
KEYBOARD: {5680}hello ❙ {4056}c ❙ {300}...
NETWORK: {3405}<html><body>... ❙ {202}...
FILE foo.1
POLL KEYBOARD NOK
POLL KEYBOARD OK
PULL KEYBOARD 5
POLL NETWORK OK
PULL NETWORK 1024
FILE bar.1
POLL NETWORK NOK
POLL NETWORK OK
FILE foo.2
...
PULL NETWORK 1024
FILE foo.2
POLL KEYBOARD NOK
...
foo.1 foo.2 bar.1
Environment data (files):
Event log:
Environment data (streams):
KEYBOARD: {5680}hello ❙ {4056}c ❙ {300}...
NETWORK: {3405}<html><body>... ❙ {202}...
FILE foo.1
POLL KEYBOARD NOK
POLL KEYBOARD OK
PULL KEYBOARD 5
POLL NETWORK OK
PULL NETWORK 1024
FILE bar.1
POLL NETWORK NOK
POLL NETWORK OK
FILE foo.2
...
PULL NETWORK 1024
FILE foo.2
POLL KEYBOARD NOK
...
foo.1 foo.2 bar.1
Environment data (files):
Event log:
Environment data (streams):
KEYBOARD: {5680}hello ❙ {4056}c ❙ {300}...
NETWORK: {3405}<html><body>... ❙ {202}...
FILE foo.1
POLL KEYBOARD NOK
POLL KEYBOARD OK
PULL KEYBOARD 5
POLL NETWORK OK
PULL NETWORK 1024
FILE bar.1
POLL NETWORK NOK
POLL NETWORK OK
FILE foo.2
...
PULL NETWORK 1024
FILE foo.2
POLL KEYBOARD NOK
...
foo.1 foo.2 bar.1
Environment data (files):
Event log:
Environment data (streams):
KEYBOARD: {5680}hello ❙ {4056}c ❙ {300}...
NETWORK: {3405}<html><body>... ❙ {202}...
FILE foo.1
POLL KEYBOARD NOK
POLL KEYBOARD OK
PULL KEYBOARD 5
POLL NETWORK OK
PULL NETWORK 1024
FILE bar.1
POLL NETWORK NOK
POLL NETWORK OK
FILE foo.2
...
PULL NETWORK 1024
FILE foo.2
POLL KEYBOARD NOK
...
foo.1 foo.2 bar.1
✔
Environment data (files):
Event log:
Environment data (streams):
KEYBOARD: {5680}hello ❙ {4056}c ❙ {300}...
NETWORK: {3405}<html><body>... ❙ {202}...
FILE foo.1
POLL KEYBOARD NOK
POLL KEYBOARD OK
PULL KEYBOARD 5
POLL NETWORK OK
PULL NETWORK 1024
FILE bar.1
POLL NETWORK NOK
POLL NETWORK OK
FILE foo.2
...
PULL NETWORK 1024
FILE foo.2
POLL KEYBOARD NOK
...
foo.1 foo.2 bar.1
✔
Environment data (files):
Event log:
Environment data (streams):
KEYBOARD: {5680}hello ❙ {4056}c ❙ {300}...
NETWORK: {3405}<html><body>... ❙ {202}...
FILE foo.1
POLL KEYBOARD NOK
POLL KEYBOARD OK
PULL KEYBOARD 5
POLL NETWORK OK
PULL NETWORK 1024
FILE bar.1
POLL NETWORK NOK
POLL NETWORK OK
FILE foo.2
...
PULL NETWORK 1024
FILE foo.2
POLL KEYBOARD NOK
...
foo.1 foo.2 bar.1
✔
✔
✔
✔
✔
✔
✔
✔
✔
✔
✔
✔
✔
EVALUATION
Acceptable
runtime overhead
Amenable to
minimization or
anonymization
EVALUATION
Prototype implementation:
• maps libc function calls to
interaction events
Subjects:
• several cpu intensive
applications (e.g., bzip, gcc)
Results:
• negligible overheads
• (i.e. less than 10%)
• data size is acceptable
• application dependent
COMING
SOON
OUTLINE
✘
Recording / Replaying
Minimization
Anonymization
Future work
✘ 345
Large in size
MINIMIZATION✘
Goal: focus developer effort
MINIMIZATION✘
24:15
MINIMIZATION✘
Time
minimization 2:5524:15
MINIMIZATION✘
✂
Data
minimization 2:55Time
minimization 2:5524:15
MINIMIZATION✘
✂
Data
minimization 2:55Time
minimization 2:5524:15
Oracle Oracle
TIME MINIMIZATION
Event log:
Environment data (streams):
KEYBOARD: {5680}hello ❙ {4056}c ❙ {300}...
NETWORK: {3405}<html><body>... ❙ {202}...
FILE foo.1
POLL KEYBOARD NOK
POLL KEYBOARD OK
PULL KEYBOARD 5
POLL NETWORK OK
PULL NETWORK 1024
FILE bar.1
POLL NETWORK NOK
POLL NETWORK OK
FILE foo.2
...
PULL NETWORK 1024
FILE foo.2
POLL KEYBOARD NOK
...
TIME MINIMIZATION
Event log:
Environment data (streams):
KEYBOARD: {5680}hello ❙ {4056}c ❙ {300}...
NETWORK: {3405}<html><body>... ❙ {202}...
FILE foo.1
POLL KEYBOARD NOK
POLL KEYBOARD OK
PULL KEYBOARD 5
POLL NETWORK OK
PULL NETWORK 1024
FILE bar.1
POLL NETWORK NOK
POLL NETWORK OK
FILE foo.2
...
PULL NETWORK 1024
FILE foo.2
POLL KEYBOARD NOK
...
Remove idle time
TIME MINIMIZATION
Event log:
Environment data (streams):
KEYBOARD: {5680}hello ❙ {4056}c ❙ {300}...
NETWORK: {3405}<html><body>... ❙ {202}...
FILE foo.1
POLL KEYBOARD NOK
POLL KEYBOARD OK
PULL KEYBOARD 5
POLL NETWORK OK
PULL NETWORK 1024
FILE bar.1
POLL NETWORK NOK
POLL NETWORK OK
FILE foo.2
...
PULL NETWORK 1024
FILE foo.2
POLL KEYBOARD NOK
...
Remove idle time
TIME MINIMIZATION
Event log:
Environment data (streams):
KEYBOARD: {5680}hello ❙ {4056}c ❙ {300}...
NETWORK: {3405}<html><body>... ❙ {202}...
FILE foo.1
POLL KEYBOARD NOK
POLL KEYBOARD OK
PULL KEYBOARD 5
POLL NETWORK OK
PULL NETWORK 1024
FILE bar.1
POLL NETWORK NOK
POLL NETWORK OK
FILE foo.2
...
PULL NETWORK 1024
FILE foo.2
POLL KEYBOARD NOK
...
Remove idle time
Remove delays
TIME MINIMIZATION
Event log:
Environment data (streams):
KEYBOARD: {5680}hello ❙ {4056}c ❙ {300}...
NETWORK: {3405}<html><body>... ❙ {202}...
FILE foo.1
POLL KEYBOARD NOK
POLL KEYBOARD OK
PULL KEYBOARD 5
POLL NETWORK OK
PULL NETWORK 1024
FILE bar.1
POLL NETWORK NOK
POLL NETWORK OK
FILE foo.2
...
PULL NETWORK 1024
FILE foo.2
POLL KEYBOARD NOK
...
Remove idle time
Remove delays
DATA MINIMIZATION
Environment data (files):
foo.1 foo.2 bar.1
Whole entities
Chunks
Atoms
DATA MINIMIZATION
Environment data (files):
foo.2 bar.1
Lorem ipsum dolor sit
amet, consetetur
sadipscing elitr,sed
diam nonumy eirmod
tempor invidunt ut
labore et dolore magna aliquyam
erat, sed diam voluptua. At
vero
eos et accusam et justo duo
dolores et ea rebum. Stet clita
kasd gubergren, no sea takimata
sanctus est Lorem ipsum dolor
sit amet. Lorem ipsum dolor sit
amet, consetetur
sadipscing elitr, sed diam
nonumy eirmod tempor invidunt
ut labore et dolore magna
aliquyam erat, sed diam
voluptua. At vero eos et
Whole entities
Chunks
Atoms
DATA MINIMIZATION
Environment data (files):
foo.2 bar.1
Lorem ipsum dolor sit
amet, consetetur
sadipscing elitr,sed
diam nonumy eirmod
tempor invidunt ut
labore et dolore magna aliquyam
erat, sed diam voluptua. At
vero
eos et accusam et justo duo
dolores et ea rebum. Stet clita
kasd gubergren, no sea takimata
sanctus est Lorem ipsum dolor
sit amet. Lorem ipsum dolor sit
amet, consetetur
sadipscing elitr, sed diam
nonumy eirmod tempor invidunt
ut labore et dolore magna
aliquyam erat, sed diam
voluptua. At vero eos et
Whole entities
Chunks
Atoms
DATA MINIMIZATION
Environment data (files):
foo.2 bar.1
Whole entities
Chunks
Atoms
DATA MINIMIZATION
Environment data (files):
bar.1
Lorem ipsum dolor sit
amet, consetetur
sadipscing elitr,sed
diam nonumy eirmod
tempor invidunt ut
labore et dolore magna aliquyam
erat, sed diam voluptua. At
vero
eos et accusam et justo duo
dolores et ea rebum. Stet clita
kasd gubergren, no sea takimata
sanctus est Lorem ipsum dolor
sit amet. Lorem ipsum dolor sit
amet, consetetur
sadipscing elitr, sed diam
nonumy eirmod tempor invidunt
ut labore et dolore magna
aliquyam erat, sed diam
voluptua. At vero eos et
Whole entities
Chunks
Atoms
DATA MINIMIZATION
Environment data (files):
bar.1
Lorem ipsum dolor sit
amet, consetetur
sadipscing elitr,sed
diam nonumy eirmod
tempor invidunt ut
labore et dolore magna aliquyam
erat, sed diam voluptua. At
vero
eos et accusam et justo duo
dolores et ea rebum. Stet clita
kasd gubergren, no sea takimata
sanctus est Lorem ipsum dolor
sit amet. Lorem ipsum dolor sit
amet, consetetur
sadipscing elitr, sed diam
nonumy eirmod tempor invidunt
ut labore et dolore magna
aliquyam erat, sed diam
voluptua. At vero eos et
Whole entities
Chunks
Atoms
DATA MINIMIZATION
Environment data (files):
bar.1
Lorem ipsum dolor sit
amet, consetetur
sadipscing elitr,sed
diam nonumy eirmod
tempor invidunt ut
labore et dolore magna aliquyam
erat, sed diam voluptua. At
vero
eos et accusam et justo duo
dolores et ea rebum. Stet clita
kasd gubergren, no sea takimata
sanctus est Lorem ipsum dolor
sit amet. Lorem ipsum dolor sit
amet, consetetur
sadipscing elitr, sed diam
nonumy eirmod tempor invidunt
ut labore et dolore magna
aliquyam erat, sed diam
voluptua. At vero eos et
Whole entities
Chunks
Atoms
DATA MINIMIZATION
Environment data (files):
bar.1
Lorem ipsum dolor sit
amet, consetetur
sadipscing elitr,sed
diam nonumy eirmod
tempor invidunt ut
labore et dolore magna aliquyam
erat, sed diam voluptua. At
vero
eos et accusam et justo duo
dolores et ea rebum. Stet clita
kasd gubergren, no sea takimata
sanctus est Lorem ipsum dolor
sit amet. Lorem ipsum dolor sit
amet, consetetur
sadipscing elitr, sed diam
nonumy eirmod tempor invidunt
ut labore et dolore magna
aliquyam erat, sed diam
voluptua. At vero eos et
Whole entities
Chunks
Atoms
DATA MINIMIZATION
Environment data (files):
bar.1
sadipscing elitr, sed diam
nonumy eirmod tempor invidunt
ut labore et dolore magna
aliquyam erat, sed diam
voluptua. At vero eos et
Whole entities
Chunks
Atoms
DATA MINIMIZATION
Environment data (files):
bar.1
sadipscing elitr, sed diam
nonumy eirmod tempor invidunt
ut labore et dolore magna
aliquyam erat, sed diam
voluptua. At vero eos et
Whole entities
Chunks
Atoms
DATA MINIMIZATION
Environment data (files):
bar.1
Whole entities
Chunks
Atoms
sadipscing elitr,
eirmod invidunt
ut labore dolore magna
erat,
voluptua.
DATA MINIMIZATION
Environment data (files):
bar.1
Whole entities
Chunks
Atoms
sadipscing elitr,
eirmod invidunt
ut labore dolore magna
erat,
voluptua.
foo.2
DATA MINIMIZATION
Environment data (files):
Whole entities
Chunks
Atoms
sadipscing elitr,
eirmod invidunt
ut labore dolore magna
erat,
voluptua.
foo.2
EVALUATION
Can the technique produce, in a reasonable amount
of time, minimized executions that can be used to
debug the original failure?
EVALUATION
Can the technique produce, in a reasonable amount
of time, minimized executions that can be used to
debug the original failure?
Pine email and news client
• two real field failures
• 20 failing executions, 10 per failure
EVALUATION
Can the technique produce, in a reasonable amount
of time, minimized executions that can be used to
debug the original failure?
Pine email and news client
• two real field failures
• 20 failing executions, 10 per failure
Minimized executions generated by
• randomly generating interaction scripts
• manually performing the scripts (while recording)
• minimizing the captured executions
RESULTS
Header-color fault Address book fault
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
# entities streams size files size
Averagevalueafterminimization
RESULTS
Header-color fault Address book fault
Results are likely to be conservative; recorded executions
only contain the minimal amount of data needed to perform an action.
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
# entities streams size files size
Averagevalueafterminimization
RESULTS
Header-color fault Address book fault
Results are likely to be conservative; recorded executions
only contain the minimal amount of data needed to perform an action.
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
# entities streams size files size
Averagevalueafterminimization
Inputs can be minimized in a reasonable
amount of time (less then 75 minutes)
RESULTS
HEADER COLOR FAULT
RESULTS
HEADER COLOR FAULT
1. color is enabled
2. one or more colors are added
3. all colors are removed
Crash when:
RESULTS
HEADER COLOR FAULT
1. color is enabled
2. one or more colors are added
3. all colors are removed
Crash when:
Recorded execution:
34
files and streams
≈800kb
Minimized execution:
1
stream
4
files
≈72kb
(partial)
RESULTS
HEADER COLOR FAULT
1. color is enabled
2. one or more colors are added
3. all colors are removed
Crash when:
Recorded execution:
34
files and streams
≈800kb
COMING
SOON
OUTLINE
✘
Recording / Replaying
Minimization
Anonymization
Future work
ANONYMIZATION
345
Contain sensitive
information
✘
Goal: help address privacy concerns
Sensitive
input (I)
that causes F
Input domain
ANONYMIZATION
Sensitive
input (I)
that causes F
Input domain
Inputs that
cause F
ANONYMIZATION
Sensitive
input (I)
that causes F
Input domain
Inputs that
cause F
ANONYMIZATION
Anonymized
input (I’)
that also
causes F
Inputs that satisfy
F’s path condition Sensitive
input (I)
that causes F
Input domain
Inputs that
cause F
ANONYMIZATION
Anonymized
input (I’)
that also
causes F
PATH CONDITION GENERATION
Path condition: set of constraints on a program’s
inputs that encode the conditions necessary for a
specific path to be executed.
boolean foo(int x, int y, int z) {
if(x <= 5) {
int a = x * 2;
if(y + a > 10) {
if(z == 0) {
return true;
}
}
}
return false;
}
PATH CONDITION GENERATION
boolean foo(int x, int y, int z) {
if(x <= 5) {
int a = x * 2;
if(y + a > 10) {
if(z == 0) {
return true;
}
}
}
return false;
}
PATH CONDITION GENERATION
boolean foo(int x, int y, int z) {
if(x <= 5) {
int a = x * 2;
if(y + a > 10) {
if(z == 0) {
return true;
}
}
}
return false;
}
PATH CONDITION GENERATION
5 3 0
boolean foo(int x, int y, int z) {
if(x <= 5) {
int a = x * 2;
if(y + a > 10) {
if(z == 0) {
return true;
}
}
}
return false;
}
PATH CONDITION GENERATION
5 3 0
(sensitive)
Path Condition:
Symbolic State:
boolean foo(int x, int y, int z) {
if(x <= 5) {
int a = x * 2;
if(y + a > 10) {
if(z == 0) {
return true;
}
}
}
return false;
}
PATH CONDITION GENERATION
5 3 0
(sensitive)
Path Condition:
Symbolic State:
boolean foo(int x, int y, int z) {
if(x <= 5) {
int a = x * 2;
if(y + a > 10) {
if(z == 0) {
return true;
}
}
}
return false;
}
PATH CONDITION GENERATION
5 3 0
x→i1
y→i2
z→i3
(sensitive)
Path Condition:
Symbolic State:
boolean foo(int x, int y, int z) {
if(x <= 5) {
int a = x * 2;
if(y + a > 10) {
if(z == 0) {
return true;
}
}
}
return false;
}
PATH CONDITION GENERATION
5 3 0
x→i1
y→i2
z→i3
(sensitive)
Path Condition:
i1 <= 5
Symbolic State:
boolean foo(int x, int y, int z) {
if(x <= 5) {
int a = x * 2;
if(y + a > 10) {
if(z == 0) {
return true;
}
}
}
return false;
}
PATH CONDITION GENERATION
5 3 0
x→i1
y→i2
z→i3
(sensitive)
Path Condition:
i1 <= 5
Symbolic State:
boolean foo(int x, int y, int z) {
if(x <= 5) {
int a = x * 2;
if(y + a > 10) {
if(z == 0) {
return true;
}
}
}
return false;
}
PATH CONDITION GENERATION
5 3 0
x→i1
y→i2
z→i3
(sensitive)
Path Condition:
i1 <= 5
Symbolic State:
a→i1*2
boolean foo(int x, int y, int z) {
if(x <= 5) {
int a = x * 2;
if(y + a > 10) {
if(z == 0) {
return true;
}
}
}
return false;
}
PATH CONDITION GENERATION
5 3 0
x→i1
y→i2
z→i3
(sensitive)
Path Condition:
i1 <= 5
Symbolic State:
a→i1*2
boolean foo(int x, int y, int z) {
if(x <= 5) {
int a = x * 2;
if(y + a > 10) {
if(z == 0) {
return true;
}
}
}
return false;
}
PATH CONDITION GENERATION
5 3 0
x→i1
y→i2
z→i3
(sensitive)
Path Condition:
i1 <= 5
Symbolic State:
a→i1*2
boolean foo(int x, int y, int z) {
if(x <= 5) {
int a = x * 2;
if(y + a > 10) {
if(z == 0) {
return true;
}
}
}
return false;
}
PATH CONDITION GENERATION
5 3 0
x→i1
y→i2
z→i3
∧ i2+i1*2 > 10
(sensitive)
Path Condition:
i1 <= 5
Symbolic State:
a→i1*2
boolean foo(int x, int y, int z) {
if(x <= 5) {
int a = x * 2;
if(y + a > 10) {
if(z == 0) {
return true;
}
}
}
return false;
}
PATH CONDITION GENERATION
5 3 0
x→i1
y→i2
z→i3
∧ i2+i1*2 > 10
(sensitive)
Path Condition:
i1 <= 5
Symbolic State:
a→i1*2
boolean foo(int x, int y, int z) {
if(x <= 5) {
int a = x * 2;
if(y + a > 10) {
if(z == 0) {
return true;
}
}
}
return false;
}
PATH CONDITION GENERATION
5 3 0
x→i1
y→i2
z→i3
∧ i2+i1*2 > 10
∧ i3 == 0
(sensitive)
CHOOSING ANONYMIZED
INPUTS
Path Condition:
i1 <= 5
∧ i2+i1*2 > 10
∧ i3 == 0
Constraint
Solver
CHOOSING ANONYMIZED
INPUTS
Path Condition:
i1 <= 5
∧ i2+i1*2 > 10
∧ i3 == 0
Constraint
Solver
CHOOSING ANONYMIZED
INPUTS
Path Condition:
i1 <= 5
∧ i2+i1*2 > 10
∧ i3 == 0
Constraint
Solver
CHOOSING ANONYMIZED
INPUTS
Path Condition:
i1 <= 5
∧ i2+i1*2 > 10
∧ i3 == 0 i1 == 5
i2 == 3
i3 == 0
Constraint
Solver
CHOOSING ANONYMIZED
INPUTS
Path Condition:
i1 <= 5
∧ i2+i1*2 > 10
∧ i3 == 0 i1 == 5
i2 == 3
i3 == 0
boolean foo(int x, int y, int z) {
if(x <= 5) {
int a = x * 2;
if(y + a > 10) {
if(z == 0) {
return true;
}
}
}
return false;
}
5 3 0
Constraint
Solver
CHOOSING ANONYMIZED
INPUTS
Path Condition:
i1 <= 5
∧ i2+i1*2 > 10
∧ i3 == 0
Constraint
Solver
CHOOSING ANONYMIZED
INPUTS
Path Condition:
i1 <= 5
∧ i2+i1*2 > 10
∧ i3 == 0
Input Constraints:
i1 != 5
∧ i2 != 3
∧ i3 != 0
Constraint
Solver
CHOOSING ANONYMIZED
INPUTS
Path Condition:
i1 <= 5
∧ i2+i1*2 > 10
∧ i3 == 0
Input Constraints:
i1 != 5
∧ i2 != 3
∧ i3 != 0
(breakable)
Constraint
Solver
CHOOSING ANONYMIZED
INPUTS
Path Condition:
i1 <= 5
∧ i2+i1*2 > 10
∧ i3 == 0
Input Constraints:
i1 != 5
∧ i2 != 3
∧ i3 != 0
(breakable)
Constraint
Solver
CHOOSING ANONYMIZED
INPUTS
Path Condition:
i1 <= 5
∧ i2+i1*2 > 10
∧ i3 == 0
Input Constraints:
i1 != 5
∧ i2 != 3
∧ i3 != 0
i1 == 4
i2 == 10
i3 == 0
(breakable)
PATH CONDITION RELAXATION
Sensitive
input (I)
that causes F
Input domain
PATH CONDITION RELAXATION
Sensitive
input (I)
that causes F
Input domain
PATH CONDITION RELAXATION
Sensitive
input (I)
that causes F
Input domain
PATH CONDITION RELAXATION
Sensitive
input (I)
that causes F
Input domain
PATH CONDITION RELAXATION
Sensitive
input (I)
that causes F
Input domain
PATH CONDITION RELAXATION
1. Array inequalities 3. Multi-clause conditionals
2. Switch statements 4. Array reads
PATH CONDITION RELAXATION
1. Array inequalities 3. Multi-clause conditionals
2. Switch statements 4. Array reads
PATH CONDITION RELAXATION
1. Array inequalities 3. Multi-clause conditionals
2. Switch statements 4. Array reads
x.equals(y);
PATH CONDITION RELAXATION
1. Array inequalities 3. Multi-clause conditionals
2. Switch statements 4. Array reads
x.equals(y);
// x = “abc”
// y = “abd”
PATH CONDITION RELAXATION
1. Array inequalities 3. Multi-clause conditionals
2. Switch statements 4. Array reads
x.equals(y);
Traditional:
x0 == y0
∧ x1 == y1
∧ x2 != y2
// x = “abc”
// y = “abd”
PATH CONDITION RELAXATION
1. Array inequalities 3. Multi-clause conditionals
2. Switch statements 4. Array reads
x.equals(y);
Traditional:
x0 == y0
∧ x1 == y1
∧ x2 != y2
Relaxed:
x0 != y0
∨ x1 != y1
∨ x2 != y2
// x = “abc”
// y = “abd”
PATH CONDITION RELAXATION
1. Array inequalities 3. Multi-clause conditionals
2. Switch statements 4. Array reads
PATH CONDITION RELAXATION
1. Array inequalities 3. Multi-clause conditionals
2. Switch statements 4. Array reads
PATH CONDITION RELAXATION
1. Array inequalities 3. Multi-clause conditionals
2. Switch statements 4. Array reads
switch(x) {
case 1:
...
break;
case 3:
case 5:
...
break;
default:
...
}
PATH CONDITION RELAXATION
1. Array inequalities 3. Multi-clause conditionals
2. Switch statements 4. Array reads
switch(x) {
case 1:
...
break;
case 3:
case 5:
...
break;
default:
...
}
// x = 5
PATH CONDITION RELAXATION
1. Array inequalities 3. Multi-clause conditionals
2. Switch statements 4. Array reads
switch(x) {
case 1:
...
break;
case 3:
case 5:
...
break;
default:
...
}
Traditional:
x == 5
// x = 5
PATH CONDITION RELAXATION
1. Array inequalities 3. Multi-clause conditionals
2. Switch statements 4. Array reads
switch(x) {
case 1:
...
break;
case 3:
case 5:
...
break;
default:
...
}
Traditional:
x == 5
Relaxed:
x == 5
∨ x == 3
// x = 5
PATH CONDITION RELAXATION
1. Array inequalities 3. Multi-clause conditionals
2. Switch statements 4. Array reads
switch(x) {
case 1:
...
break;
case 3:
case 5:
...
break;
default:
...
}
Traditional:
x == 5
Relaxed:
x == 5
∨ x == 3
// x = 10
PATH CONDITION RELAXATION
1. Array inequalities 3. Multi-clause conditionals
2. Switch statements 4. Array reads
switch(x) {
case 1:
...
break;
case 3:
case 5:
...
break;
default:
...
}
Traditional:
Relaxed:
x == 5
∨ x == 3
// x = 10
x == 10
PATH CONDITION RELAXATION
1. Array inequalities 3. Multi-clause conditionals
2. Switch statements 4. Array reads
switch(x) {
case 1:
...
break;
case 3:
case 5:
...
break;
default:
...
}
Traditional:
Relaxed:
// x = 10
x == 10
x != 1
∧ x != 3
∧ x != 5
EVALUATION
Feasibility
Can the approach generate, in a
reasonable amount of time, anonymized
inputs that reproduce the failure?
Strength
How much information about the
original inputs is revealed?
Effectiveness
Are the anonymized inputs safe to send
to developers?
SUBJECTS
• Columba: 1 fault
• htmlparser: 1 fault
• Printtokens: 2 faults
• NanoXML: 16 faults
(20 faults, total)
SUBJECTS
• Columba: 1 fault
• htmlparser: 1 fault
• Printtokens: 2 faults
• NanoXML: 16 faults
Select sensitive failure-inducing inputs
• manually generated or included with subject
• several 100 bytes to 5MB in size
(20 faults, total)
SUBJECTS
• Columba: 1 fault
• htmlparser: 1 fault
• Printtokens: 2 faults
• NanoXML: 16 faults
Select sensitive failure-inducing inputs
• manually generated or included with subject
• several 100 bytes to 5MB in size
(Assume all of each input is potentially sensitive)
(20 faults, total)
RQ1: FEASIBILITY
0
150
300
450
600
ExecutionTime(s)
0
5
10
15
20
columba
htmlparser
printtokens1
printtokens2
nanoxml1
nanoxml2
nanoxml3
nanoxml4
nanoxml5
nanoxml6
nanoxml7
nanoxml8
nanoxml9
nanoxml10
nanoxml11
nanoxml12
nanoxml13
nanoxml14
nanoxml15
nanoxml16
SolverTime(s)
RQ1: FEASIBILITY
0
150
300
450
600
ExecutionTime(s)
0
5
10
15
20
columba
htmlparser
printtokens1
printtokens2
nanoxml1
nanoxml2
nanoxml3
nanoxml4
nanoxml5
nanoxml6
nanoxml7
nanoxml8
nanoxml9
nanoxml10
nanoxml11
nanoxml12
nanoxml13
nanoxml14
nanoxml15
nanoxml16
SolverTime(s)
Inputs can be anonymized in a reasonable
amount of time (easily done overnight)
Average % Bits Revealed Average % Residue
RQ2: STRENGTH
Average % Bits Revealed Average % Residue
RQ2: STRENGTH
Measures how many inputs
that satisfy the path
condition
Little
information revealed
Average % Bits Revealed Average % Residue
RQ2: STRENGTH
Measures how many inputs
that satisfy the path
condition
Lots of
information revealed
Average % Bits Revealed Average % Residue
RQ2: STRENGTH
Measures how many inputs
that satisfy the path
condition
Measures how much of the
anonymized input is identical
to the original input
AAAAAA
secret
AAAAAA
...
AAAAAA
BBBBBB
secret
BBBBBB
...
BBBBBB
I’
Lots of
information revealed
I
Average % Bits Revealed Average % Residue
RQ2: STRENGTH
Measures how many inputs
that satisfy the path
condition
Measures how much of the
anonymized input is identical
to the original input
AAAAAA
secret
AAAAAA
...
AAAAAA
BBBBBB
secret
BBBBBB
...
BBBBBB
I’
Lots of
information revealed
I
RQ2: STRENGTH
0
25
50
75
100
0
25
50
75
100
columba
htmlparser
printtokens1
printtokens2
nanoxml1
nanoxml2
nanoxml3
nanoxml4
nanoxml5
nanoxml6
nanoxml7
nanoxml8
nanoxml9
nanoxml10
nanoxml11
nanoxml12
nanoxml13
nenoxml14
nanoxml15
nanoxml16
Average
%BitsRevealed
Average
%Residue
RQ2: STRENGTH
0
25
50
75
100
0
25
50
75
100
columba
htmlparser
printtokens1
printtokens2
nanoxml1
nanoxml2
nanoxml3
nanoxml4
nanoxml5
nanoxml6
nanoxml7
nanoxml8
nanoxml9
nanoxml10
nanoxml11
nanoxml12
nanoxml13
nenoxml14
nanoxml15
nanoxml16
Average
%BitsRevealed
Average
%Residue
Anonymized inputs reveal, on average, between
60% (worst case) and 2% (best case) of the
information in the original inputs
RQ3: EFFECTIVENESS
NANOXML
<!DOCTYPE Foo [
   <!ELEMENT Foo (ns:Bar)>
   <!ATTLIST Foo
       xmlns CDATA #FIXED 'http://nanoxml.n3.net/bar'
       a     CDATA #REQUIRED>
   <!ELEMENT ns:Bar (Blah)>
   <!ATTLIST ns:Bar
       xmlns:ns CDATA #FIXED 'http://nanoxml.n3.net/bar'>
   <!ELEMENT Blah EMPTY>
   <!ATTLIST Blah
       x    CDATA #REQUIRED
       ns:x CDATA #REQUIRED>
]>
<!-- comment -->
<Foo a='very' b='secret' c='stuff'>vaz
   <ns:Bar>
       <Blah x="1" ns:x="2"/>
   </ns:Bar>
</Foo>
RQ3: EFFECTIVENESS
NANOXML
<!DOCTYPE [
   <! >
   <!ATTLIST
        #FIXED ' '
        >
   <!E >
   <!ATTLIST
        #FIXED ' '>
   <!E >
   <!ATTLIST
        #
        : # >
]>
<!-- -->
< =' ' =' ' =' '>
   < : >
       < =" " : =" "/>
   </ :
Wayne,Bartley,Bartley,Wayne,wbartly@acp.com,,
Ronald,Kahle,Kahle,Ron,ron.kahle@kahle.com,,
Wilma,Lavelle,Lavelle,Wilma,,lavelle678@aol.com,
Jesse,Hammonds,Hammonds,Jesse,,hamj34@comcast.com,
Amy,Uhl,Uhl,Amy,uhla@corp1.com,uhla@gmail.com,
Hazel,Miracle,Miracle,Hazel,hazel.miracle@corp2.com,,
Roxanne,Nealy,Nealy,Roxie,,roxie.nearly@gmail.com,
Heather,Kane,Kane,Heather,kaneh@corp2.com,,
Rosa,Stovall,Stovall,Rosa,,sstoval@aol.com,
Peter,Hyden,Hyden,Pete,,peteh1989@velocity.net,
Jeffrey,Wesson,Wesson,Jeff,jwesson@corp4.com,,
Virginia,Mendoza,Mendoza,Ginny,gmendoza@corp4.com,,
Richard,Robledo,Robledo,Ralph,ralphrobledo@corp1.com,,
Edward,Blanding,Blanding,Ed,,eblanding@gmail.com,
Sean,Pulliam,Pulliam,Sean,spulliam@corp2.com,,
Steven,Kocher,Kocher,Steve,kocher@kocher.com,,
Tony,Whitlock,Whitlock,Tony,,tw14567@aol.com,
Frank,Earl,Earl,Frankie,,,
Shelly,Riojas,Riojas,Shelly,srojas@corp6.com,,
RQ3: EFFECTIVENESS
COLUMBA
, , , , ,,
, , , , ,,
, , , ,, ,
, , , ,, ,
, , , , , ,
, , , , ,,
, , , ,, ,
, , , , ,,
, , , ,, ,
, , , ,, ,
, , , , ,,
, , , , ,,
, , , , ,,
, , , ,, ,
, , , , ,,
, , , , ,,
, , , ,, ,
RQ3: EFFECTIVENESS
COLUMBA
, , , , ,,
, , , , ,,
, , , ,, ,
, , , ,, ,
, , , , , ,
, , , , ,,
, , , ,, ,
, , , , ,,
, , , ,, ,
, , , ,, ,
, , , , ,,
, , , , ,,
, , , , ,,
, , , ,, ,
, , , , ,,
, , , , ,,
, , , ,, ,
RQ3: EFFECTIVENESS
HTMLPARSER
<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/
xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
<head>
<title>james clause @ gatech | home</title>
<style type="text/css" media="screen" title="">
<!--/*--><![CDATA[<!--*/
body {
margin: 0px;
...
/*]]>*/-->
</style>
</head>
<body>
...
</body>
RQ3: EFFECTIVENESS
HTMLPARSER
<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/
xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
<head>
<title>james clause @ gatech | home</title>
<style type="text/css" media="screen" title="">
<!--/*--><![CDATA[<!--*/
body {
margin: 0px;
...
/*]]>*/-->
</style>
</head>
<body>
...
</body>
RQ3: EFFECTIVENESS
HTMLPARSER
<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/
xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
<head>
<title>james clause @ gatech | home</title>
<style type="text/css" media="screen" title="">
<!--/*--><![CDATA[<!--*/
body {
margin: 0px;
...
/*]]>*/-->
</style>
</head>
<body>
...
</body>
The portions of the inputs that remain after
anonymization tend to be structural in nature and
therefore are safe to send to developers
COMING
SOON
OUTLINE
✘
Recording / Replaying
Minimization
Anonymization
Future work
COMING
SOON-ISH
FUTURE WORK
COMING
SOON
COMING
SOON-ISH
FUTURE WORK
COMING
SOON
COMING
SOON-ISH
FUTURE WORK
COMING
SOON
Improved minimization
COMING
SOON-ISH
FUTURE WORK
COMING
SOON
Improved minimization
Leverage passing executions
COMING
SOON-ISH
FUTURE WORK
COMING
SOON
Improved minimization
Leverage passing executions
Debugging for developers
Foo
512B
Bar
1KB
Baz
1.5GB
IMPROVED MINIMIZATION
LEVERAGE DYNAMICTAINTING
1 Taint inputs
Foo
512B
Bar
1KB
Baz
1.5GB
IMPROVED MINIMIZATION
LEVERAGE DYNAMICTAINTING
1 Taint inputs
Foo
512B
Bar
1KB
Baz
1.5GB
IMPROVED MINIMIZATION
LEVERAGE DYNAMICTAINTING
1
2
3
4
5
6
7
8
9
0
1 Taint inputs
2 Propagate
taint marks
Foo
512B
Bar
1KB
Baz
1.5GB
IMPROVED MINIMIZATION
LEVERAGE DYNAMICTAINTING
1
2
3
4
5
6
7
8
9
0
1 Taint inputs
2 Propagate
taint marks
Foo
512B
Bar
1KB
Baz
1.5GB
foo: 512 ... bar: 1024 ... baz: 150... total: 150...
IMPROVED MINIMIZATION
LEVERAGE DYNAMICTAINTING
1
2
3
4
5
6
7
8
9
0
1 Taint inputs
2 Propagate
taint marks
3 Identify
relevant inputs
Foo
512B
Bar
1KB
Baz
1.5GB
foo: 512 ... bar: 1024 ... baz: 150... total: 150...
IMPROVED MINIMIZATION
LEVERAGE DYNAMICTAINTING
1
2
3
4
5
6
7
8
9
0
1 Taint inputs
2 Propagate
taint marks
3 Identify
relevant inputs
Foo
512B
Bar
1KB
Baz
1.5GB
foo: 512 ... bar: 1024 ... baz: 150... total: 150...
IMPROVED MINIMIZATION
LEVERAGE DYNAMICTAINTING
1
2
3
4
5
6
7
8
9
0
In the fieldIn house
LEVERAGE PASSING EXECUTIONS
Replay / Debug
Develop Record
✘
MinimizeSanitize
✘
In the fieldIn house
LEVERAGE PASSING EXECUTIONS
Replay / Debug
Develop Record
✘
MinimizeSanitize
✘
✔
In the fieldIn house
LEVERAGE PASSING EXECUTIONS
Replay / Debug
Develop Record
✘
MinimizeSanitize
✘
✔
LEVERAGE PASSING EXECUTIONS
✔
LEVERAGE PASSING EXECUTIONS
✔
“Fuzz” to create failing executions
LEVERAGE PASSING EXECUTIONS
✔
“Fuzz” to create failing executions
Augment in-house test suites
LEVERAGE PASSING EXECUTIONS
✔
“Fuzz” to create failing executions
Augment in-house test suites
Guide in house testing
DEBUGGING FOR DEVELOPERS
Most debugging tools are:
By us,
for us
DEBUGGING FOR DEVELOPERS
Most debugging tools are:
Limited industrial impact
By us,
for us
DEBUGGING FOR DEVELOPERS
Most debugging tools are:
Limited industrial impact
With developers,
for developers
DEBUGGING FOR DEVELOPERS
Most debugging tools are:
With developers,
for developers
Lots of industrial impact
DEBUGGING FOR DEVELOPERS
Most debugging tools are:
With developers,
for developers
Lots of industrial impact ?
Efficient instrumentation
Jazz: A Tool for Demand-Driven Structural
Testing
Jonathan Misurda1
, Jim Clause1
, Juliya Reed1
, Bruce R. Childers1
, and Mary
Lou So a2
1
University of Pittsburgh, Pittsburgh PA 15260, USA,
{jmisurda,clausej,juliya,childers}@cs.pitt.edu
2
University of Virginia, Charlottesville VA 22904, USA,
soffa@cs.virginia.edu
Abstract. Software testing to produce reliable and robust software has
become vitally important. Testing is a process by which quality can be
assured through the collection of information about software. While test-
ing can improve software quality, current tools typically are inflexible
and have high overheads, making it a challenge to test large projects.
We describe a new scalable and flexible tool, called Jazz, that uses a
demand-driven structural testing approach. Jazz has a low overhead of
only 17.6% for branch testing.
1 Introduction
In the last several years, the importance of producing high quality and robust
software has become paramount. Testing is an important process to support
quality assurance by gathering information about the software being developed
or modified. It is, in general, extremely labor and resource intensive, accounting
for 50-60% of the total cost of software development [1]. The increased emphasis
on software quality and robustness mandates improved testing methodologies.
To test software, a number of techniques can be applied. One class of tech-
niques is structural testing, which checks that a given coverage criterion is sat-
isfied. For example, branch testing checks that a certain percentage of branches
are executed. Other structural tests include def-use testing in which pairs of
variable definitions and uses are checked for coverage and node testing in which
nodes in a program’s control flow graph are checked.
Unfortunately, structural testing is often hindered by the lack of scalable
and flexible tools. Current tools are not scalable in terms of both time and
memory, limiting the number and scope of the tests that can be applied to large
programs. These tools often modify the software binary to insert instrumentation
for testing. In this case, the tested version of the application is not the same
version that is shipped to customers and errors may remain. Testing tools are
usually inflexible and only implement certain types of testing. For example, many
tools implement branch testing, but do not implement node or def-use testing.
In this paper, we describe a new tool for structural testing, called Jazz, that
addresses these problems. Jazz uses a novel demand-driven technique to apply
ABSTRACT
Producing reliable and robust software has become one
of the most important software development concerns in
recent years. Testing is a process by which software
quality can be assured through the collection of infor-
mation. While testing can improve software reliability,
current tools typically are inflexible and have high over-
heads, making it challenging to test large software
projects. In this paper, we describe a new scalable and
flexible framework for testing programs with a novel
demand-driven approach based on execution paths to
implement test coverage. This technique uses dynamic
instrumentation on the binary code that can be inserted
and removed on-the-fly to keep performance and mem-
ory overheads low. We describe and evaluate implemen-
tations of the framework for branch, node and def-use
testing of Java programs. Experimental results for
branch testing show that our approach has, on average, a
1.6 speed up over static instrumentation and also uses
less memory.
Categories and Subject Descriptors
D.2.5. [Software Engineering]: Testing and Debug-
ging—Testing tools; D.3.3. [Programming Lan-
guages]: Language Constructs and Features—Program
instrumentation, run-time environments
General Terms
Experimentation, Measurement, Verification
Keywords
Testing, Code Coverage, Structural Testing, Demand-
Driven Instrumentation, Java Programming Language
1. INTRODUCTION
In the last several years, the importance of produc-
ing high quality and robust software has become para-
mount [15]. Testing is an important process to support
quality assurance by gathering information about the
behavior of the software being developed or modified. It
is, in general, extremely labor and resource intensive,
accounting for 50-60% of the total cost of software
development [17]. Given the importance of testing, it is
imperative that there are appropriate testing tools and
frameworks. In order to adequately test software, a
number of different testing techniques must be per-
formed. One class of testing techniques used extensively
is structural testing in which properties of the software
code are used to ensure a certain code coverage.Struc-
tural testing techniques include branch testing, node
testing, path testing, and def-use testing [6,7,8,17,19].
Typically, a testing tool targets one type of struc-
tural test, and the software unit is the program, file or
particular methods. In order to apply various structural
testing techniques, different tools must be used. If a tool
for a particular type of structural testing is not available,
the tester would need to either implement it or not use
that testing technique. The tester would also be con-
strained by the region of code to be tested, as deter-
mined by the tool implementor. For example, it may not
be possible for the tester to focus on a particular region
of code, such as a series of loops, complicated condi-
tionals, or particular variables if def-use testing is
desired. The user may want to have higher coverage on
frequently executed regions of code. Users may want to
define their own way of testing. For example, all
branches should be covered 10 times rather than once in
all loops.
In structural testing, instrumentation is placed at
certain code points (probes). Whenever such a program
point is reached, code that performs the function for the
test (payload) is executed. The probes in def-use testing
are dictated by the definitions and uses of variables and
the payload is to mark that a definition or use in a def-
use pair has been covered. Thus for each type of struc-
tural testing, there is a testing “plan”. A test plan is a
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies
are not made or distributed for profit or commercial advantage and
that copies bear this notice and the full citation on the first page. To
copy otherwise, or republish, to post on servers or to redistribute to
lists, requires prior specific permission and/or a fee.
ICSE'05, May 15-21, 2005, St. Louis, Missouri, USA.
Copyright 2005 ACM 1-58113-963-2/05/0005...$5.00.
Demand-Driven Structural Testing with Dynamic
Instrumentation
Jonathan Misurda†, James A. Clause†, Juliya L. Reed†, Bruce R. Childers†, and
Mary Lou Soffa‡
†Department of Computer Science
University of Pittsburgh
Pittsburgh, Pennsylvania 15260
{jmisurda, clausej, juliya, childers}@cs.pitt.edu
‡Department of Computer Science
University of Virginia
Charlottesville, Virginia 22904
soffa@cs.virginia.edu
156
A Technique for Enabling and Supporting Debugging of Field Failures
James Clause and Alessandro Orso
College of Computing
Georgia Institute of Technology
{clause, orso}@cc.gatech.edu
Abstract
It is difficult to fully assess the quality of software in-
house, outside the actual time and context in which it will
execute after deployment. As a result, it is common for
software to manifest field failures, failures that occur on
user machines due to untested behavior. Field failures are
typically difficult to recreate and investigate on developer
platforms, and existing techniques based on crash report-
ing provide only limited support for this task. In this pa-
per, we present a technique for recording, reproducing, and
minimizing failing executions that enables and supports in-
house debugging of field failures. We also present a tool
that implements our technique and an empirical study that
evaluates the technique on a widely used e-mail client.
1. Introduction
Quality-assurance activities, such as software testing and
analysis, are notoriously difficult, expensive, and time-
consuming. As a result, software products are often re-
leased with faults or missing functionality. In fact, real-
world examples of field failures experienced by users be-
cause of untested behaviors (e.g., due to unforeseen us-
ages), are countless. When field failures occur, it is im-
portant for developers to be able to recreate and investigate
them in-house. This pressing need is demonstrated by the
emergence of several crash-reporting systems, such as Mi-
crosoft’s error reporting systems [13] and Apple’s Crash
Reporter [1]. Although these techniques represent a first
important step in addressing the limitations of purely in-
house approaches to quality assurance, they work on lim-
ited data (typically, a snapshot of the execution state) and
can at best identify correlations between a crash report and
data on other known failures.
In this paper, we present a novel technique for reproduc-
ing and investigating field failures that addresses the limita-
tions of existing approaches. Our technique works in three
phases, intuitively illustrated by the scenario in Figure 1. In
the recording phase, while users run the software, the tech-
nique intercepts and logs the interactions between applica-
tion and environment and records portions of the environ-
ment that are relevant to these interactions. If the execution
terminates with a failure, the produced execution recording
is stored for later investigation. In the minimization phase,
using free cycles on the user machines, the technique re-
plays the recorded failing executions with the goal of au-
tomatically eliminating parts of the executions that are not
relevant to the failure. In the replay and debugging phase,
developers can use the technique to replay the minimized
failing executions and investigate the cause of the failures
(e.g., within a debugger). Being able to replay and debug
real field failures can give developers unprecedented insight
into the behavior of their software after deployment and op-
portunities to improve the quality of their software in ways
that were not possible before.
To evaluate our technique, we implemented it in a proto-
type tool, called ADDA (Automated Debugging of Deployed
Applications), and used the tool to perform an empirical
study. The study was performed on PINE [19], a widely-
used e-mail client, and involved the investigation of failures
caused by two real faults in PINE. The results of the study
are promising. Our technique was able to (1) record all ex-
ecutions of PINE (and two other subjects) with a low time
and space overhead, (2) completely replay all recorded exe-
cutions, and (3) perform automated minimization of failing
executions and obtain shorter executions that manifested the
same failures as the original executions. Moreover, we were
able to replay the minimized executions within a debugger,
which shows that they could have actually been used to in-
vestigate the failures.
The contributions of this paper are:
• A novel technique for recording and later replaying exe-
cutions of deployed programs.
• An approach for minimizing failing executions and gen-
erating shorter executions that fail for the same reasons.
• A prototype tool that implements our technique.
• An empirical study that shows the feasibility and effec-
tiveness of the approach.
29th International Conference on Software Engineering (ICSE'07)
0-7695-2828-7/07 $20.00 © 2007
Dytan: A Generic Dynamic Taint Analysis Framework
James Clause, Wanchun Li, and Alessandro Orso
College of Computing
Georgia Institute of Technology
{clause|wli7|orso}@cc.gatech.edu
ABSTRACT
Dynamic taint analysis is gaining momentum. Techniques based
on dynamic tainting have been successfully used in the context of
application security, and now their use is also being explored in dif-
ferent areas, such as program understanding, software testing, and
debugging. Unfortunately, most existing approaches for dynamic
tainting are defined in an ad-hoc manner, which makes it difficult
to extend them, experiment with them, and adapt them to new con-
texts. Moreover, most existing approaches are focused on data-flow
based tainting only and do not consider tainting due to control flow,
which limits their applicability outside the security domain. To
address these limitations and foster experimentation with dynamic
tainting techniques, we defined and developed a general framework
for dynamic tainting that (1) is highly flexible and customizable, (2)
allows for performing both data-flow and control-flow based taint-
ing conservatively, and (3) does not rely on any customized run-
time system. We also present DYTAN, an implementation of our
framework that works on x86 executables, and a set of preliminary
studies that show how DYTAN can be used to implement different
tainting-based approaches with limited effort. In the studies, we
also show that DYTAN can be used on real software, by using FIRE-
FOX as one of our subjects, and illustrate how the specific char-
acteristics of the tainting approach used can affect efficiency and
accuracy of the taint analysis, which further justifies the use of our
framework to experiment with different variants of an approach.
Categories and Subject Descriptors: D.2.5 [Software Engineer-
ing]: Testing and Debugging;
General Terms: Experimentation, Security
Keywords: Dynamic tainting, information flow, general framework
1. INTRODUCTION
Dynamic taint analysis (also known as dynamic information flow
analysis) consists, intuitively, in marking and tracking certain data
in a program at run-time. This type of dynamic analysis is be-
coming increasingly popular. In the context of application secu-
rity, dynamic-tainting approaches have been successfully used to
prevent a wide range of attacks, including buffer overruns (e.g., [8,
17]), format string attacks (e.g., [17, 21]), SQL and command in-
jections (e.g., [7, 19]), and cross-site scripting (e.g., [18]). More
recently, researchers have started to investigate the use of tainting-
based approaches in domains other than security, such as program
understanding, software testing, and debugging (e.g., [11, 13]).
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.
ISSTA’07, July 9–12, 2007, London, England, United Kingdom.
Copyright 2007 ACM 978-1-59593-734-6/07/0007 ...$5.00.
Unfortunately, most existing techniques and tools for dynamic
taint analysis are defined in an ad-hoc manner, to target a specific
problem or a small class of problems. It would be difficult to ex-
tend or adapt such techniques and tools so that they can be used in
other contexts. In particular, most existing approaches are focused
on data-flow based tainting only, and do not consider tainting due
to the control flow within an application, which limits their general
applicability. Also, most existing techniques support either a sin-
gle taint marking or a small, fixed number of markings, which is
problematic in applications such as debugging. Finally, almost no
existing technique handles the propagation of taint markings in a
truly conservative way, which may be appropriate for the specific
applications considered, but is problematic in general. Because de-
veloping support for dynamic taint analysis is not only time con-
suming, but also fairly complex, this lack of flexibility and gener-
ality of existing tools and techniques is especially limiting for this
type of dynamic analysis.
To address these limitations and foster experimentation with dy-
namic tainting techniques, in this paper we present a framework for
dynamic taint analysis. We designed the framework to be general
and flexible, so that it allows for implementing different kinds of
techniques based on dynamic taint analysis with little effort. Users
can leverage the framework to quickly develop prototypes for their
techniques, experiment with them, and investigate trade-offs of dif-
ferent alternatives. For a simple example, the framework could be
used to investigate the cost effectiveness of considering different
types of taint propagation for an application.
Our framework has several advantages over existing approaches.
First, it is highly flexible and customizable. It allows for easily
specifying which program data should be tainted and how, how taint
markings should be propagated at run-time, and where and how
taint markings should be checked. Second, it allows for performing
data-flow and both data-flow and control-flow based tainting. Third,
from a more practical standpoint, it works on binaries, does not
need access to source code, and does not rely on any customized
hardware or operating system, which makes it broadly applicable.
We also present DYTAN, an implementation of our framework
that works on x86 binaries, and a set of preliminary studies per-
formed using DYTAN. In the first set of studies, we report on our
experience in using DYTAN to implement two tainting-based ap-
proaches presented in the literature. Although preliminary, our ex-
perience shows that we were able to implement these approaches
completely and with little effort. The second set of studies illus-
trates how the specific characteristics of a tainting approach can
affect efficiency and accuracy of the taint analysis. In particular, we
investigate how ignoring control-flow related propagation and over-
looking some data-flow aspects can lead to unsafety. These results
further justify the usefulness of experimenting with different varia-
tions of dynamic taint analysis and assessing their tradeoffs, which
can be done with limited effort using our framework. The second
set of studies also shows the practical applicability of DYTAN, by
successfully running it on the FIREFOX web browser.
196
Effective Memory Protection Using Dynamic Tainting
James Clause, Ioannis Doudalis, Alessandro Orso, and Milos Prvulovic
College of Computing
Georgia Institute of Technology
{clause|idoud|orso|milos}@cc.gatech.edu
ABSTRACT
Programs written in languages that provide direct access to memory
through pointers often contain memory-related faults, which may
cause non-deterministic failures and even security vulnerabilities.
In this paper, we present a new technique based on dynamic taint-
ing for protecting programs from illegal memory accesses. When
memory is allocated, at runtime, our technique taints both the mem-
ory and the corresponding pointer using the same taint mark. Taint
marks are then suitably propagated while the program executes and
are checked every time a memory address m is accessed through a
pointer p; if the taint marks associated with m and p differ, the ex-
ecution is stopped and the illegal access is reported. To allow for a
low-overhead, hardware-assisted implementation of the approach,
we make several key technical and engineering decisions in the
definition of our technique. In particular, we use a configurable,
low number of reusable taint marks instead of a unique mark for
each area of memory allocated, which reduces the overhead of the
approach without limiting its flexibility and ability to target most
memory-related faults and attacks known to date. We also define
the technique at the binary level, which lets us handle the (very)
common case of applications that use third-party libraries whose
source code is unavailable. To investigate the effectiveness and
practicality of our approach, we implemented it for heap-allocated
memory and performed a preliminary empirical study on a set of
programs. Our results show that (1) our technique can identify a
large class of memory-related faults, even when using only two
unique taint marks, and (2) a hardware-assisted implementation of
the technique could achieve overhead in the single digits.
Categories and Subject Descriptors: D.2.5 [Software Engineering]: Test-
ing and Debugging; C.0 [General]: Hardware/Software Interfaces;
General Terms: Performance, Security
Keywords: Illegal memory accesses, dynamic tainting, hardware support
1. INTRODUCTION
Memory-related faults are a serious problem for languages that
allow direct memory access through pointers. An important class
of memory-related faults are what we call illegal memory accesses.
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.
ASE’07, November 5–9, 2007, Atlanta, Georgia, USA.
Copyright 2007 ACM 978-1-59593-882-4/07/0011 ...$5.00.
In languages such as C and C++, when memory allocation is re-
quested, a currently-free area of memory m of the specified size
is reserved. After m has been allocated, its initial address can be
assigned to a pointer p, either immediately (e.g., in the case of
heap allocated memory) or at a later time (e.g., when retrieving
and storing the address of a local variable). From that point on,
the only legal accesses to m through a pointer are accesses per-
formed through p or through other pointers derived from p. (In
Section 3, we clearly define what it means to derive a pointer from
another pointer.) All other accesses to m are Illegal Memory Ac-
cesses (IMAs), that is, accesses where a pointer is used to access
memory outside the bounds of the memory area with which it was
originally associated.
IMAs are especially relevant for several reasons. First, they are
caused by typical programming errors, such as array-out-of-bounds
accesses and NULL pointer dereferences, and are thus widespread
and common. Second, they often result in non-deterministic fail-
ures that are hard to identify and diagnose; the specific effects of an
IMA depend on several factors, such as memory layout, that may
vary between executions. Finally, many security concerns such as
viruses, worms, and rootkits use IMAs as their injection vectors.
In this paper, we present a new dynamic technique for protecting
programs against IMAs that is effective against most known types
of illegal accesses. The basic idea behind the technique is to use
dynamic tainting (or dynamic information flow) [8] to keep track
of which memory areas can be accessed through which pointers,
as follows. At runtime, our technique taints both allocated mem-
ory and pointers using taint marks. Dynamic taint propagation, to-
gether with a suitable handling of memory-allocation and deallo-
cation operations, ensures that taint marks are appropriately prop-
agated during execution. Every time the program accesses some
memory through a pointer, our technique checks whether the ac-
cess is legal by comparing the taint mark associated with the mem-
ory and the taint mark associated with the pointer used to access it.
If the marks match, the access is considered legitimate. Otherwise,
the execution is stopped and an IMA is reported.
In defining our approach, our final goal is the development of a
low-overhead, hardware-assisted tool that is practical and can be
used on deployed software. A hardware-assisted tool is a tool that
leverages the benefits of both hardware and software. Typically,
some performance critical aspects are moved to the hardware to
achieve maximum efficiency, while software is used to perform op-
erations that would be too complex to implement in hardware.
There are two main characteristics of our approach that were de-
fined to help achieve our goal of a hardware-assisted implementa-
tion. The first characteristic is that our technique only uses a small,
configurable number of reusable taint marks instead of a unique
mark for each area of memory allocated. Using a low number of
283
Penumbra: Automatically Identifying Failure-Relevant
Inputs Using Dynamic Tainting
James Clause
College of Computing
Georgia Institute of Technology
clause@cc.gatech.edu
Alessandro Orso
College of Computing
Georgia Institute of Technology
orso@cc.gatech.edu
ABSTRACT
Most existing automated debugging techniques focus on re-
ducing the amount of code to be inspected and tend to ig-
nore an important component of software failures: the in-
puts that cause the failure to manifest. In this paper, we
present a new technique based on dynamic tainting for au-
tomatically identifying subsets of a program’s inputs that
are relevant to a failure. The technique (1) marks program
inputs when they enter the application, (2) tracks them as
they propagate during execution, and (3) identifies, for an
observed failure, the subset of inputs that are potentially
relevant for debugging that failure. To investigate feasibil-
ity and usefulness of our technique, we created a prototype
tool, penumbra, and used it to evaluate our technique on
several failures in real programs. Our results are promising,
as they show that penumbra can point developers to inputs
that are actually relevant for investigating a failure and can
be more practical than existing alternative approaches.
Categories and Subject Descriptors
D.2.5 [Software Engineering]: Testing and Debugging
General Terms
Algorithms, Experimentation, Reliability
Keywords
Failure-relevant inputs, automated debugging, dynamic in-
formation flow, dynamic tainting
1. INTRODUCTION
Debugging is known to be a labor-intensive, time-consum-
ing task that can be responsible for a large portion of soft-
ware development and maintenance costs [21,23]. Common
characteristics of modern software, such as increased con-
figurability, larger code bases, and increased input sizes, in-
troduce new challenges for debugging and exacerbate exist-
ing problems. In response, researchers have proposed many
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.
ISSTA’09, July 19–23, 2009, Chicago, Illinois, USA.
Copyright 2009 ACM 978-1-60558-338-9/09/07 ...$5.00.
semi- and fully-automated techniques that attempt to re-
duce the cost of debugging (e.g., [8,9,11–13,18,24,25,27]).
The majority of these techniques are code-centric in that
they focus exclusively on one aspect of debugging—trying
to identify the faulty statements responsible for a failure.
Although code-centric approaches can work well in some
cases (e.g., for isolated faults that involve a single state-
ment), they are often inadequate for more complex faults [4].
Faults of omission, for instance, where part of a specification
has not been implemented, are notoriously problematic for
debugging techniques that attempt to identify potentially
faulty statements. The usefulness of code-centric techniques
is also limited in the case of long-running programs and pro-
grams that process large amounts of information; failures in
these types of programs are typically di⌅cult to understand
without considering the data involved in such failures.
To debug failures more e ectively, it is necessary to pro-
vide developers with not only a relevant subset of state-
ments, but also a relevant subset of inputs. There are only
a few existing techniques that attempt to identify relevant
inputs [3, 17, 25], with delta debugging [25] being the most
known of these. Although delta debugging has been shown
to be an e ective technique for automatic debugging, it also
has several drawbacks that may limit its usefulness in prac-
tice. In particular, it requires (1) multiple executions of the
program being debugged, which can involve a long running
time, and (2) complex oracles and setup, which can result
in a large amount of manual e ort [2].
In this paper, we present a novel debugging technique that
addresses many of the limitations of existing approaches.
Our technique can complement code-centric debugging tech-
niques because it focuses on identifying program inputs that
are likely to be relevant for a given failure. It also overcomes
some of the drawbacks of delta debugging because it needs
a single execution to identify failure-relevant inputs and re-
quires minimal manual e ort.
Given an observable faulty behavior and a set of failure-
inducing inputs (i.e., a set of inputs that cause such behav-
ior), our technique automatically identifies failure-relevant
inputs (i.e., a subset of failure-inducing inputs that are ac-
tually relevant for investigating the faulty behavior). Our
approach is based on dynamic tainting. Intuitively, the tech-
nique works by tracking the flow of inputs along data and
control dependences at runtime. When a point of failure
is reached, the tracked information is used to identify and
present to developers the failure-relevant inputs. At this
point, developers can use the identified inputs to investigate
the failure at hand.
LEAKPOINT: Pinpointing the Causes of Memory Leaks
James Clause
College of Computing
Georgia Institute of Technology
clause@cc.gatech.edu
Alessandro Orso
College of Computing
Georgia Institute of Technology
orso@cc.gatech.edu
ABSTRACT
Most existing leak detection techniques for C and C++ applications
only detect the existence of memory leaks. They do not provide
any help for fixing the underlying memory management errors. In
this paper, we present a new technique that not only detects leaks,
but also points developers to the locations where the underlying
errors may be fixed. Our technique tracks pointers to dynamically-
allocated areas of memory and, for each memory area, records sev-
eral pieces of relevant information. This information is used to
identify the locations in an execution where memory leaks occur.
To investigate our technique’s feasibility and usefulness, we devel-
oped a prototype tool called LEAKPOINT and used it to perform
an empirical evaluation. The results of this evaluation show that
LEAKPOINT detects at least as many leaks as existing tools, reports
zero false positives, and, most importantly, can be effective at help-
ing developers fix the underlying memory management errors.
Categories and Subject Descriptors
D.2.5 [Software Engineering]: Testing and Debugging
General Terms
Performance, Reliability
Keywords
Leak detection, Dynamic tainting
1. INTRODUCTION
Memory leaks are a type of unintended memory consumption
that can adversely impact the performance and correctness of an
application. In programs written in languages such as C and C++,
memory is allocated using allocation functions, such as malloc
and new. Allocation functions reserve a currently free area of
memory m and return a pointer p that points to m’s starting ad-
dress. Typically, the program stores and then uses p, or another
This work was supported in part by NSF awards CCF-0725202
and CCF-0541080 to Georgia Tech.
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.
ICSE ’10, May 2-8 2010, Cape Town, South Africa
Copyright 2010 ACM 978-1-60558-719-6/10/05 ...$10.00.
pointer derived from p, to interact with m. When m is no longer
needed, the program should pass p to a deallocation function (e.g.,
free or delete) to deallocate m. A leak occurs if, due to a
memory management error, m is not deallocated at the appropri-
ate time. There are two types of memory leaks: lost memory and
forgotten memory. Lost memory refers to the situation where m be-
comes unreachable (i.e., the program overwrites or loses p and all
pointers derived from p) without first being deallocated. Forgotten
memory refers to the situation where m remains reachable but is
not deallocated or accessed in the rest of the execution.
Memory leaks are relevant for several reasons. First, they are dif-
ficult to detect. Unlike many other types of failures, memory leaks
do not immediately produce an easily visible symptom (e.g., a crash
or the output of a wrong value); typically, leaks remain unobserved
until they consume a large portion of the memory available to a sys-
tem. Second, leaks have the potential to impact not only the appli-
cation that leaks memory, but also every other application running
on the system; because the overall amount of memory is limited,
as the memory usage of a leaking program increases, less memory
is available to other running applications. Consequently, the per-
formance and correctness of every running application can be im-
pacted by a program that leaks memory. Third, leaks are common,
even in mature applications. For example, in the first half of 2009,
over 100 leaks in the Firefox web-browser were reported [18].
Because of the serious consequences and common occurrence of
memory leaks, researchers have created many static and dynamic
techniques for detecting them (e.g., [1,2,4,7–14,16,17,20–23,25,
27,28]). The adoption of static techniques has been limited by sev-
eral factors, including the lack of scalable, precise heap modeling.
Dynamic techniques are therefore more widely used in practice. In
general, dynamic techniques provide one main piece of informa-
tion: the location in an execution where a leaked area of memory is
allocated. This location is supposed to serve as a starting point for
investigating the leak. However, in many situations, this informa-
tion does not provide any insight on where or how to fix the mem-
ory management error that causes the leak: the allocation location
and the location of the memory management error are typically in
completely different parts of the application’s code.
To address this limitation of existing approaches, we propose
a new memory leak detection technique. Our technique provides
the same information as existing techniques but also identifies the
locations in an execution where leaks occur. In the case of lost
memory, the location is defined as the point in an execution where
the last pointer to an unallocated memory area is lost or overwritten.
In the case of forgotten memory, the location is defined as the last
point in an execution where a pointer to a leaked area of memory
was used (e.g., when it is dereferenced to read or write memory,
passed as a function argument, returned from a function, or used as
Camouflage: Automated Sanitization of Field Data
James Clause
College of Computing
Georgia Institute of Technology
clause@cc.gatech.edu
Alessandro Orso
College of Computing
Georgia Institute of Technology
orso@cc.gatech.edu
ABSTRACT
Privacy and security concerns have adversely a ected the
usefulness of many types of techniques that leverage infor-
mation gathered from deployed applications. To address this
issue, we present a new approach for automatically sanitiz-
ing failure-inducing inputs. Given an input I that causes
a failure f, our technique can generate a sanitized input I
that is di erent from I but still causes f. I can then be sent
to the developers to help them debug f, without revealing
the possibly sensitive information contained in I. We im-
plemented our approach in a prototype tool, camouflage,
and performed an empirical evaluation. In the evaluation,
we applied camouflage to a large set of failure-inducing
inputs for several real applications. The results of the eval-
uation are promising; they show that camouflage is both
practical and e ective at generating sanitized inputs. In par-
ticular, for the inputs that we considered, I and I shared
no sensitive information.
1. INTRODUCTION
Investigating techniques that capture data from deployed
applications to support in-house software engineering tasks
is an increasingly active and successful area of research (e.g.,
[1,3–5,13,14,17,21,22,26,27,29]). However, privacy and se-
curity concerns have prevented widespread adoption of many
of these techniques and, because they rely on user partici-
pation, have ultimately limited their usefulness. Many of
the earlier proposed techniques attempt to sidestep these
concerns by collecting only limited amounts of information
(e.g., stack traces and register dumps [1, 3, 5] or sampled
branch profiles [26,27]) and providing a privacy policy that
specifies how the information will be used (e.g., [2,8]). Be-
cause the types of information collected by these techniques
are unlikely to be sensitive, users are more willing to trust
developers. Moreover, because only a small amount of infor-
mation is collected, it is feasible for users to manually inspect
and sanitize such information before it is sent to developers.
Unfortunately, recent research has shown that the e ec-
tiveness of these techniques increases when they can lever-
age large amounts of detailed information (e.g., complete
execution recordings [4, 14] or path profiles [13, 24]). Since
more detailed information is bound to contain sensitive data,
users will most likely be unwilling to let developers collect
such information. In addition, collecting large amounts of
information would make it infeasible for users to sanitize
the collected information by hand. To address this prob-
lem, some of these techniques suggest using an input mini-
mization approach (e.g., [6, 7, 35]) to reduce the number of
failure-inducing inputs and, hopefully, eliminate some sensi-
tive information. Input-minimization techniques, however,
were not designed to specifically reduce sensitive inputs, so
they can only eliminate sensitive data by chance. In or-
der for techniques that leverage captured field information
to become widely adopted and achieve their full potential,
new approaches for addressing privacy and security concerns
must be developed.
In this paper, we present a novel technique that addresses
privacy and security concerns by sanitizing information cap-
tured from deployed applications. Our technique is designed
to be used in conjunction with an execution capture/replay
technique (e.g., [4, 14]). Given an execution recording that
contains a captured failure-inducing input I = i1, i2, . . . in⇥
and terminates with a failure f, our technique replays the
execution recording and leverages a specialized version of
symbolic-execution to automatically produce I , a sanitized
version of I, such that I (1) still causes f and (2) reveals as
little information about I as possible. A modified execution
recording where I replaces I can then be constructed and
sent to the developers, who can use it to debug f.
It is, in general, impossible to construct I such that it
does not reveal any information about I while still caus-
ing the same failure f. Typically, the execution of f would
depend on the fact that some elements of I have specific
values (e.g., i1 must be 0 for the failing path to be taken).
However, this fact does not prevent the technique from be-
ing useful in practice. In our evaluation, we found that the
information revealed by the sanitized inputs was not sensi-
tive and tended to be structural in nature (e.g., a specific
portion of the input must be surrounded by double quotes).
Conversely, the parts of the inputs that were more likely to
be sensitive (e.g., values contained inside the double quotes)
were not revealed (see Section 4).
To evaluate the e ectiveness of our technique, we imple-
mented it in a prototype tool, called camouflage, and car-
ried out an empirical evaluation of 170 failure-inducing in-
1
CC 05 ICSE 05 ISSTA 07 ASE 07 ISSTA 09 ICSE 10 Tech ReptICSE 07
Dynamic tainting
based analyses
Enabling more
efficient debugging
QUESTIONS?

More Related Content

PDF
Software testing
PDF
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
PDF
Testing Experience Magazine Vol.14 June 2011
PDF
Importance of Testing in SDLC
DOC
Assessing System Readiness
DOCX
Varalakhmi_Suresh_1
PDF
Configuration Navigation Analysis Model for Regression Test Case Prioritization
PDF
EVALUATION AND STUDY OF SOFTWARE DEGRADATION IN THE EVOLUTION OF SIX VERSIONS...
Software testing
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
Testing Experience Magazine Vol.14 June 2011
Importance of Testing in SDLC
Assessing System Readiness
Varalakhmi_Suresh_1
Configuration Navigation Analysis Model for Regression Test Case Prioritization
EVALUATION AND STUDY OF SOFTWARE DEGRADATION IN THE EVOLUTION OF SIX VERSIONS...

What's hot (12)

DOC
5WCSQ(CFP) - Quality Improvement by the Real-Time Detection of the Problems
PPT
A survey of software testing
PDF
Prioritizing Test Cases for Regression Testing A Model Based Approach
PDF
Essential Test Management and Planning
PDF
Chapter 3 - Performance Testing in the Software Lifecycle
PDF
@#$@#$@#$"""@#$@#$"""
PPTX
Basics of software testing
PDF
Essential Test Management and Planning
PDF
Software Quality Assurance
PDF
Chapter 2 - Test Management
PDF
AN APPROACH FOR TEST CASE PRIORITIZATION BASED UPON VARYING REQUIREMENTS
DOC
Abstract.doc
5WCSQ(CFP) - Quality Improvement by the Real-Time Detection of the Problems
A survey of software testing
Prioritizing Test Cases for Regression Testing A Model Based Approach
Essential Test Management and Planning
Chapter 3 - Performance Testing in the Software Lifecycle
@#$@#$@#$"""@#$@#$"""
Basics of software testing
Essential Test Management and Planning
Software Quality Assurance
Chapter 2 - Test Management
AN APPROACH FOR TEST CASE PRIORITIZATION BASED UPON VARYING REQUIREMENTS
Abstract.doc
Ad

Viewers also liked (20)

PDF
A Technique for Enabling and Supporting Debugging of Field Failures (ICSE 2007)
PDF
Efficient Dependency Detection for Safe Java Test Acceleration
PDF
Debugging LAMP Apps on Linux/UNIX Using Open Source Tools - Jess Portnot - OS...
ODP
Prezentace
PPTX
Bella2010 Misc
PDF
New Tools for Your Teaching
PDF
Camouflage: Automated Anonymization of Field Data (ICSE 2011)
ODP
Prezentace
PPT
Tweet for Business
PDF
Enabling and Supporting the Debugging of Software Failures (PhD Defense)
PDF
Taint-based Dynamic Analysis (CoC Research Day 2009)
PDF
Initial Explorations on Design Pattern Energy Usage (GREENS 12)
PDF
Penumbra: Automatically Identifying Failure-Relevant Inputs (ISSTA 2009)
PPTX
Bella2010 misc
PDF
Investigating the Impacts of Web Servers on Web Application Energy Usage (GRE...
PDF
On image intensities, eigenfaces and LDA
DOCX
Economy of india and china
PDF
Dytan: A Generic Dynamic Taint Analysis Framework (ISSTA 2007)
PDF
Advanced Dynamic Analysis for Leak Detection (Apple Internship 2008)
ODP
A Technique for Enabling and Supporting Debugging of Field Failures (ICSE 2007)
Efficient Dependency Detection for Safe Java Test Acceleration
Debugging LAMP Apps on Linux/UNIX Using Open Source Tools - Jess Portnot - OS...
Prezentace
Bella2010 Misc
New Tools for Your Teaching
Camouflage: Automated Anonymization of Field Data (ICSE 2011)
Prezentace
Tweet for Business
Enabling and Supporting the Debugging of Software Failures (PhD Defense)
Taint-based Dynamic Analysis (CoC Research Day 2009)
Initial Explorations on Design Pattern Energy Usage (GREENS 12)
Penumbra: Automatically Identifying Failure-Relevant Inputs (ISSTA 2009)
Bella2010 misc
Investigating the Impacts of Web Servers on Web Application Energy Usage (GRE...
On image intensities, eigenfaces and LDA
Economy of india and china
Dytan: A Generic Dynamic Taint Analysis Framework (ISSTA 2007)
Advanced Dynamic Analysis for Leak Detection (Apple Internship 2008)
Ad

Similar to Enabling and Supporting the Debugging of Field Failures (Job Talk) (20)

PDF
Demand-Driven Structural Testing with Dynamic Instrumentation (ICSE 2005)
PDF
Essential Spectrumbased Fault Localization Xiaoyuan Xie Baowen Xu
DOC
Testing survey by_directions
PDF
Too many files
PDF
Software testing techniques - www.testersforum.com
PPTX
Testing Technique
PDF
Software CrashLocator: Locating the Faulty Functions by Analyzing the Crash S...
PDF
TRANSFORMING SOFTWARE REQUIREMENTS INTO TEST CASES VIA MODEL TRANSFORMATION
PPT
4.3_Unit Testing.ppt gfdfghhhhhhhhhhhhh
PDF
Welcome to International Journal of Engineering Research and Development (IJERD)
PDF
QUALITY METRICS OF TEST SUITES IN TESTDRIVEN DESIGNED APPLICATIONS
PDF
Staging's channles are being tested
PDF
Harry Potter 7-2 3D tonight!!! http://4rd.ca/aaaj6w
PDF
Go to all channels so that I may test your stats tom
PDF
Software testing techniques
PDF
Content to all channels
PDF
Slideshare - Many files
PDF
Slideshare removal with caption
PPTX
Sta unit 2(abimanyu)
PDF
Comprehensive Testing Tool for Automatic Test Suite Generation, Prioritizatio...
Demand-Driven Structural Testing with Dynamic Instrumentation (ICSE 2005)
Essential Spectrumbased Fault Localization Xiaoyuan Xie Baowen Xu
Testing survey by_directions
Too many files
Software testing techniques - www.testersforum.com
Testing Technique
Software CrashLocator: Locating the Faulty Functions by Analyzing the Crash S...
TRANSFORMING SOFTWARE REQUIREMENTS INTO TEST CASES VIA MODEL TRANSFORMATION
4.3_Unit Testing.ppt gfdfghhhhhhhhhhhhh
Welcome to International Journal of Engineering Research and Development (IJERD)
QUALITY METRICS OF TEST SUITES IN TESTDRIVEN DESIGNED APPLICATIONS
Staging's channles are being tested
Harry Potter 7-2 3D tonight!!! http://4rd.ca/aaaj6w
Go to all channels so that I may test your stats tom
Software testing techniques
Content to all channels
Slideshare - Many files
Slideshare removal with caption
Sta unit 2(abimanyu)
Comprehensive Testing Tool for Automatic Test Suite Generation, Prioritizatio...

Recently uploaded (20)

PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Approach and Philosophy of On baking technology
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
KodekX | Application Modernization Development
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Review of recent advances in non-invasive hemoglobin estimation
PPTX
Cloud computing and distributed systems.
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Per capita expenditure prediction using model stacking based on satellite ima...
Approach and Philosophy of On baking technology
Diabetes mellitus diagnosis method based random forest with bat algorithm
The Rise and Fall of 3GPP – Time for a Sabbatical?
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Building Integrated photovoltaic BIPV_UPV.pdf
KodekX | Application Modernization Development
“AI and Expert System Decision Support & Business Intelligence Systems”
CIFDAQ's Market Insight: SEC Turns Pro Crypto
Chapter 3 Spatial Domain Image Processing.pdf
Dropbox Q2 2025 Financial Results & Investor Presentation
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Review of recent advances in non-invasive hemoglobin estimation
Cloud computing and distributed systems.
Advanced methodologies resolving dimensionality complications for autism neur...
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy

Enabling and Supporting the Debugging of Field Failures (Job Talk)

  • 1. ENABLING AND SUPPORTING THE DEBUGGING OF FIELD FAILURES James Clause Georgia Institute of Technology
  • 2. Jazz: A Tool for Demand-Driven Structural Testing Jonathan Misurda1 , Jim Clause1 , Juliya Reed1 , Bruce R. Childers1 , and Mary Lou So a2 1 University of Pittsburgh, Pittsburgh PA 15260, USA, {jmisurda,clausej,juliya,childers}@cs.pitt.edu 2 University of Virginia, Charlottesville VA 22904, USA, soffa@cs.virginia.edu Abstract. Software testing to produce reliable and robust software has become vitally important. Testing is a process by which quality can be assured through the collection of information about software. While test- ing can improve software quality, current tools typically are inflexible and have high overheads, making it a challenge to test large projects. We describe a new scalable and flexible tool, called Jazz, that uses a demand-driven structural testing approach. Jazz has a low overhead of only 17.6% for branch testing. 1 Introduction In the last several years, the importance of producing high quality and robust software has become paramount. Testing is an important process to support quality assurance by gathering information about the software being developed or modified. It is, in general, extremely labor and resource intensive, accounting for 50-60% of the total cost of software development [1]. The increased emphasis on software quality and robustness mandates improved testing methodologies. To test software, a number of techniques can be applied. One class of tech- niques is structural testing, which checks that a given coverage criterion is sat- isfied. For example, branch testing checks that a certain percentage of branches are executed. Other structural tests include def-use testing in which pairs of variable definitions and uses are checked for coverage and node testing in which nodes in a program’s control flow graph are checked. Unfortunately, structural testing is often hindered by the lack of scalable and flexible tools. Current tools are not scalable in terms of both time and memory, limiting the number and scope of the tests that can be applied to large programs. These tools often modify the software binary to insert instrumentation for testing. In this case, the tested version of the application is not the same version that is shipped to customers and errors may remain. Testing tools are usually inflexible and only implement certain types of testing. For example, many tools implement branch testing, but do not implement node or def-use testing. In this paper, we describe a new tool for structural testing, called Jazz, that addresses these problems. Jazz uses a novel demand-driven technique to apply ABSTRACT Producing reliable and robust software has become one of the most important software development concerns in recent years. Testing is a process by which software quality can be assured through the collection of infor- mation. While testing can improve software reliability, current tools typically are inflexible and have high over- heads, making it challenging to test large software projects. In this paper, we describe a new scalable and flexible framework for testing programs with a novel demand-driven approach based on execution paths to implement test coverage. This technique uses dynamic instrumentation on the binary code that can be inserted and removed on-the-fly to keep performance and mem- ory overheads low. We describe and evaluate implemen- tations of the framework for branch, node and def-use testing of Java programs. Experimental results for branch testing show that our approach has, on average, a 1.6 speed up over static instrumentation and also uses less memory. Categories and Subject Descriptors D.2.5. [Software Engineering]: Testing and Debug- ging—Testing tools; D.3.3. [Programming Lan- guages]: Language Constructs and Features—Program instrumentation, run-time environments General Terms Experimentation, Measurement, Verification Keywords Testing, Code Coverage, Structural Testing, Demand- Driven Instrumentation, Java Programming Language 1. INTRODUCTION In the last several years, the importance of produc- ing high quality and robust software has become para- mount [15]. Testing is an important process to support quality assurance by gathering information about the behavior of the software being developed or modified. It is, in general, extremely labor and resource intensive, accounting for 50-60% of the total cost of software development [17]. Given the importance of testing, it is imperative that there are appropriate testing tools and frameworks. In order to adequately test software, a number of different testing techniques must be per- formed. One class of testing techniques used extensively is structural testing in which properties of the software code are used to ensure a certain code coverage.Struc- tural testing techniques include branch testing, node testing, path testing, and def-use testing [6,7,8,17,19]. Typically, a testing tool targets one type of struc- tural test, and the software unit is the program, file or particular methods. In order to apply various structural testing techniques, different tools must be used. If a tool for a particular type of structural testing is not available, the tester would need to either implement it or not use that testing technique. The tester would also be con- strained by the region of code to be tested, as deter- mined by the tool implementor. For example, it may not be possible for the tester to focus on a particular region of code, such as a series of loops, complicated condi- tionals, or particular variables if def-use testing is desired. The user may want to have higher coverage on frequently executed regions of code. Users may want to define their own way of testing. For example, all branches should be covered 10 times rather than once in all loops. In structural testing, instrumentation is placed at certain code points (probes). Whenever such a program point is reached, code that performs the function for the test (payload) is executed. The probes in def-use testing are dictated by the definitions and uses of variables and the payload is to mark that a definition or use in a def- use pair has been covered. Thus for each type of struc- tural testing, there is a testing “plan”. A test plan is a Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. ICSE'05, May 15-21, 2005, St. Louis, Missouri, USA. Copyright 2005 ACM 1-58113-963-2/05/0005...$5.00. Demand-Driven Structural Testing with Dynamic Instrumentation Jonathan Misurda†, James A. Clause†, Juliya L. Reed†, Bruce R. Childers†, and Mary Lou Soffa‡ †Department of Computer Science University of Pittsburgh Pittsburgh, Pennsylvania 15260 {jmisurda, clausej, juliya, childers}@cs.pitt.edu ‡Department of Computer Science University of Virginia Charlottesville, Virginia 22904 soffa@cs.virginia.edu 156 A Technique for Enabling and Supporting Debugging of Field Failures James Clause and Alessandro Orso College of Computing Georgia Institute of Technology {clause, orso}@cc.gatech.edu Abstract It is difficult to fully assess the quality of software in- house, outside the actual time and context in which it will execute after deployment. As a result, it is common for software to manifest field failures, failures that occur on user machines due to untested behavior. Field failures are typically difficult to recreate and investigate on developer platforms, and existing techniques based on crash report- ing provide only limited support for this task. In this pa- per, we present a technique for recording, reproducing, and minimizing failing executions that enables and supports in- house debugging of field failures. We also present a tool that implements our technique and an empirical study that evaluates the technique on a widely used e-mail client. 1. Introduction Quality-assurance activities, such as software testing and analysis, are notoriously difficult, expensive, and time- consuming. As a result, software products are often re- leased with faults or missing functionality. In fact, real- world examples of field failures experienced by users be- cause of untested behaviors (e.g., due to unforeseen us- ages), are countless. When field failures occur, it is im- portant for developers to be able to recreate and investigate them in-house. This pressing need is demonstrated by the emergence of several crash-reporting systems, such as Mi- crosoft’s error reporting systems [13] and Apple’s Crash Reporter [1]. Although these techniques represent a first important step in addressing the limitations of purely in- house approaches to quality assurance, they work on lim- ited data (typically, a snapshot of the execution state) and can at best identify correlations between a crash report and data on other known failures. In this paper, we present a novel technique for reproduc- ing and investigating field failures that addresses the limita- tions of existing approaches. Our technique works in three phases, intuitively illustrated by the scenario in Figure 1. In the recording phase, while users run the software, the tech- nique intercepts and logs the interactions between applica- tion and environment and records portions of the environ- ment that are relevant to these interactions. If the execution terminates with a failure, the produced execution recording is stored for later investigation. In the minimization phase, using free cycles on the user machines, the technique re- plays the recorded failing executions with the goal of au- tomatically eliminating parts of the executions that are not relevant to the failure. In the replay and debugging phase, developers can use the technique to replay the minimized failing executions and investigate the cause of the failures (e.g., within a debugger). Being able to replay and debug real field failures can give developers unprecedented insight into the behavior of their software after deployment and op- portunities to improve the quality of their software in ways that were not possible before. To evaluate our technique, we implemented it in a proto- type tool, called ADDA (Automated Debugging of Deployed Applications), and used the tool to perform an empirical study. The study was performed on PINE [19], a widely- used e-mail client, and involved the investigation of failures caused by two real faults in PINE. The results of the study are promising. Our technique was able to (1) record all ex- ecutions of PINE (and two other subjects) with a low time and space overhead, (2) completely replay all recorded exe- cutions, and (3) perform automated minimization of failing executions and obtain shorter executions that manifested the same failures as the original executions. Moreover, we were able to replay the minimized executions within a debugger, which shows that they could have actually been used to in- vestigate the failures. The contributions of this paper are: • A novel technique for recording and later replaying exe- cutions of deployed programs. • An approach for minimizing failing executions and gen- erating shorter executions that fail for the same reasons. • A prototype tool that implements our technique. • An empirical study that shows the feasibility and effec- tiveness of the approach. 29th International Conference on Software Engineering (ICSE'07) 0-7695-2828-7/07 $20.00 © 2007 Dytan: A Generic Dynamic Taint Analysis Framework James Clause, Wanchun Li, and Alessandro Orso College of Computing Georgia Institute of Technology {clause|wli7|orso}@cc.gatech.edu ABSTRACT Dynamic taint analysis is gaining momentum. Techniques based on dynamic tainting have been successfully used in the context of application security, and now their use is also being explored in dif- ferent areas, such as program understanding, software testing, and debugging. Unfortunately, most existing approaches for dynamic tainting are defined in an ad-hoc manner, which makes it difficult to extend them, experiment with them, and adapt them to new con- texts. Moreover, most existing approaches are focused on data-flow based tainting only and do not consider tainting due to control flow, which limits their applicability outside the security domain. To address these limitations and foster experimentation with dynamic tainting techniques, we defined and developed a general framework for dynamic tainting that (1) is highly flexible and customizable, (2) allows for performing both data-flow and control-flow based taint- ing conservatively, and (3) does not rely on any customized run- time system. We also present DYTAN, an implementation of our framework that works on x86 executables, and a set of preliminary studies that show how DYTAN can be used to implement different tainting-based approaches with limited effort. In the studies, we also show that DYTAN can be used on real software, by using FIRE- FOX as one of our subjects, and illustrate how the specific char- acteristics of the tainting approach used can affect efficiency and accuracy of the taint analysis, which further justifies the use of our framework to experiment with different variants of an approach. Categories and Subject Descriptors: D.2.5 [Software Engineer- ing]: Testing and Debugging; General Terms: Experimentation, Security Keywords: Dynamic tainting, information flow, general framework 1. INTRODUCTION Dynamic taint analysis (also known as dynamic information flow analysis) consists, intuitively, in marking and tracking certain data in a program at run-time. This type of dynamic analysis is be- coming increasingly popular. In the context of application secu- rity, dynamic-tainting approaches have been successfully used to prevent a wide range of attacks, including buffer overruns (e.g., [8, 17]), format string attacks (e.g., [17, 21]), SQL and command in- jections (e.g., [7, 19]), and cross-site scripting (e.g., [18]). More recently, researchers have started to investigate the use of tainting- based approaches in domains other than security, such as program understanding, software testing, and debugging (e.g., [11, 13]). Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. ISSTA’07, July 9–12, 2007, London, England, United Kingdom. Copyright 2007 ACM 978-1-59593-734-6/07/0007 ...$5.00. Unfortunately, most existing techniques and tools for dynamic taint analysis are defined in an ad-hoc manner, to target a specific problem or a small class of problems. It would be difficult to ex- tend or adapt such techniques and tools so that they can be used in other contexts. In particular, most existing approaches are focused on data-flow based tainting only, and do not consider tainting due to the control flow within an application, which limits their general applicability. Also, most existing techniques support either a sin- gle taint marking or a small, fixed number of markings, which is problematic in applications such as debugging. Finally, almost no existing technique handles the propagation of taint markings in a truly conservative way, which may be appropriate for the specific applications considered, but is problematic in general. Because de- veloping support for dynamic taint analysis is not only time con- suming, but also fairly complex, this lack of flexibility and gener- ality of existing tools and techniques is especially limiting for this type of dynamic analysis. To address these limitations and foster experimentation with dy- namic tainting techniques, in this paper we present a framework for dynamic taint analysis. We designed the framework to be general and flexible, so that it allows for implementing different kinds of techniques based on dynamic taint analysis with little effort. Users can leverage the framework to quickly develop prototypes for their techniques, experiment with them, and investigate trade-offs of dif- ferent alternatives. For a simple example, the framework could be used to investigate the cost effectiveness of considering different types of taint propagation for an application. Our framework has several advantages over existing approaches. First, it is highly flexible and customizable. It allows for easily specifying which program data should be tainted and how, how taint markings should be propagated at run-time, and where and how taint markings should be checked. Second, it allows for performing data-flow and both data-flow and control-flow based tainting. Third, from a more practical standpoint, it works on binaries, does not need access to source code, and does not rely on any customized hardware or operating system, which makes it broadly applicable. We also present DYTAN, an implementation of our framework that works on x86 binaries, and a set of preliminary studies per- formed using DYTAN. In the first set of studies, we report on our experience in using DYTAN to implement two tainting-based ap- proaches presented in the literature. Although preliminary, our ex- perience shows that we were able to implement these approaches completely and with little effort. The second set of studies illus- trates how the specific characteristics of a tainting approach can affect efficiency and accuracy of the taint analysis. In particular, we investigate how ignoring control-flow related propagation and over- looking some data-flow aspects can lead to unsafety. These results further justify the usefulness of experimenting with different varia- tions of dynamic taint analysis and assessing their tradeoffs, which can be done with limited effort using our framework. The second set of studies also shows the practical applicability of DYTAN, by successfully running it on the FIREFOX web browser. 196 Effective Memory Protection Using Dynamic Tainting James Clause, Ioannis Doudalis, Alessandro Orso, and Milos Prvulovic College of Computing Georgia Institute of Technology {clause|idoud|orso|milos}@cc.gatech.edu ABSTRACT Programs written in languages that provide direct access to memory through pointers often contain memory-related faults, which may cause non-deterministic failures and even security vulnerabilities. In this paper, we present a new technique based on dynamic taint- ing for protecting programs from illegal memory accesses. When memory is allocated, at runtime, our technique taints both the mem- ory and the corresponding pointer using the same taint mark. Taint marks are then suitably propagated while the program executes and are checked every time a memory address m is accessed through a pointer p; if the taint marks associated with m and p differ, the ex- ecution is stopped and the illegal access is reported. To allow for a low-overhead, hardware-assisted implementation of the approach, we make several key technical and engineering decisions in the definition of our technique. In particular, we use a configurable, low number of reusable taint marks instead of a unique mark for each area of memory allocated, which reduces the overhead of the approach without limiting its flexibility and ability to target most memory-related faults and attacks known to date. We also define the technique at the binary level, which lets us handle the (very) common case of applications that use third-party libraries whose source code is unavailable. To investigate the effectiveness and practicality of our approach, we implemented it for heap-allocated memory and performed a preliminary empirical study on a set of programs. Our results show that (1) our technique can identify a large class of memory-related faults, even when using only two unique taint marks, and (2) a hardware-assisted implementation of the technique could achieve overhead in the single digits. Categories and Subject Descriptors: D.2.5 [Software Engineering]: Test- ing and Debugging; C.0 [General]: Hardware/Software Interfaces; General Terms: Performance, Security Keywords: Illegal memory accesses, dynamic tainting, hardware support 1. INTRODUCTION Memory-related faults are a serious problem for languages that allow direct memory access through pointers. An important class of memory-related faults are what we call illegal memory accesses. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. ASE’07, November 5–9, 2007, Atlanta, Georgia, USA. Copyright 2007 ACM 978-1-59593-882-4/07/0011 ...$5.00. In languages such as C and C++, when memory allocation is re- quested, a currently-free area of memory m of the specified size is reserved. After m has been allocated, its initial address can be assigned to a pointer p, either immediately (e.g., in the case of heap allocated memory) or at a later time (e.g., when retrieving and storing the address of a local variable). From that point on, the only legal accesses to m through a pointer are accesses per- formed through p or through other pointers derived from p. (In Section 3, we clearly define what it means to derive a pointer from another pointer.) All other accesses to m are Illegal Memory Ac- cesses (IMAs), that is, accesses where a pointer is used to access memory outside the bounds of the memory area with which it was originally associated. IMAs are especially relevant for several reasons. First, they are caused by typical programming errors, such as array-out-of-bounds accesses and NULL pointer dereferences, and are thus widespread and common. Second, they often result in non-deterministic fail- ures that are hard to identify and diagnose; the specific effects of an IMA depend on several factors, such as memory layout, that may vary between executions. Finally, many security concerns such as viruses, worms, and rootkits use IMAs as their injection vectors. In this paper, we present a new dynamic technique for protecting programs against IMAs that is effective against most known types of illegal accesses. The basic idea behind the technique is to use dynamic tainting (or dynamic information flow) [8] to keep track of which memory areas can be accessed through which pointers, as follows. At runtime, our technique taints both allocated mem- ory and pointers using taint marks. Dynamic taint propagation, to- gether with a suitable handling of memory-allocation and deallo- cation operations, ensures that taint marks are appropriately prop- agated during execution. Every time the program accesses some memory through a pointer, our technique checks whether the ac- cess is legal by comparing the taint mark associated with the mem- ory and the taint mark associated with the pointer used to access it. If the marks match, the access is considered legitimate. Otherwise, the execution is stopped and an IMA is reported. In defining our approach, our final goal is the development of a low-overhead, hardware-assisted tool that is practical and can be used on deployed software. A hardware-assisted tool is a tool that leverages the benefits of both hardware and software. Typically, some performance critical aspects are moved to the hardware to achieve maximum efficiency, while software is used to perform op- erations that would be too complex to implement in hardware. There are two main characteristics of our approach that were de- fined to help achieve our goal of a hardware-assisted implementa- tion. The first characteristic is that our technique only uses a small, configurable number of reusable taint marks instead of a unique mark for each area of memory allocated. Using a low number of 283 Penumbra: Automatically Identifying Failure-Relevant Inputs Using Dynamic Tainting James Clause College of Computing Georgia Institute of Technology clause@cc.gatech.edu Alessandro Orso College of Computing Georgia Institute of Technology orso@cc.gatech.edu ABSTRACT Most existing automated debugging techniques focus on re- ducing the amount of code to be inspected and tend to ig- nore an important component of software failures: the in- puts that cause the failure to manifest. In this paper, we present a new technique based on dynamic tainting for au- tomatically identifying subsets of a program’s inputs that are relevant to a failure. The technique (1) marks program inputs when they enter the application, (2) tracks them as they propagate during execution, and (3) identifies, for an observed failure, the subset of inputs that are potentially relevant for debugging that failure. To investigate feasibil- ity and usefulness of our technique, we created a prototype tool, penumbra, and used it to evaluate our technique on several failures in real programs. Our results are promising, as they show that penumbra can point developers to inputs that are actually relevant for investigating a failure and can be more practical than existing alternative approaches. Categories and Subject Descriptors D.2.5 [Software Engineering]: Testing and Debugging General Terms Algorithms, Experimentation, Reliability Keywords Failure-relevant inputs, automated debugging, dynamic in- formation flow, dynamic tainting 1. INTRODUCTION Debugging is known to be a labor-intensive, time-consum- ing task that can be responsible for a large portion of soft- ware development and maintenance costs [21,23]. Common characteristics of modern software, such as increased con- figurability, larger code bases, and increased input sizes, in- troduce new challenges for debugging and exacerbate exist- ing problems. In response, researchers have proposed many Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. ISSTA’09, July 19–23, 2009, Chicago, Illinois, USA. Copyright 2009 ACM 978-1-60558-338-9/09/07 ...$5.00. semi- and fully-automated techniques that attempt to re- duce the cost of debugging (e.g., [8,9,11–13,18,24,25,27]). The majority of these techniques are code-centric in that they focus exclusively on one aspect of debugging—trying to identify the faulty statements responsible for a failure. Although code-centric approaches can work well in some cases (e.g., for isolated faults that involve a single state- ment), they are often inadequate for more complex faults [4]. Faults of omission, for instance, where part of a specification has not been implemented, are notoriously problematic for debugging techniques that attempt to identify potentially faulty statements. The usefulness of code-centric techniques is also limited in the case of long-running programs and pro- grams that process large amounts of information; failures in these types of programs are typically di⌅cult to understand without considering the data involved in such failures. To debug failures more e ectively, it is necessary to pro- vide developers with not only a relevant subset of state- ments, but also a relevant subset of inputs. There are only a few existing techniques that attempt to identify relevant inputs [3, 17, 25], with delta debugging [25] being the most known of these. Although delta debugging has been shown to be an e ective technique for automatic debugging, it also has several drawbacks that may limit its usefulness in prac- tice. In particular, it requires (1) multiple executions of the program being debugged, which can involve a long running time, and (2) complex oracles and setup, which can result in a large amount of manual e ort [2]. In this paper, we present a novel debugging technique that addresses many of the limitations of existing approaches. Our technique can complement code-centric debugging tech- niques because it focuses on identifying program inputs that are likely to be relevant for a given failure. It also overcomes some of the drawbacks of delta debugging because it needs a single execution to identify failure-relevant inputs and re- quires minimal manual e ort. Given an observable faulty behavior and a set of failure- inducing inputs (i.e., a set of inputs that cause such behav- ior), our technique automatically identifies failure-relevant inputs (i.e., a subset of failure-inducing inputs that are ac- tually relevant for investigating the faulty behavior). Our approach is based on dynamic tainting. Intuitively, the tech- nique works by tracking the flow of inputs along data and control dependences at runtime. When a point of failure is reached, the tracked information is used to identify and present to developers the failure-relevant inputs. At this point, developers can use the identified inputs to investigate the failure at hand. LEAKPOINT: Pinpointing the Causes of Memory Leaks James Clause College of Computing Georgia Institute of Technology clause@cc.gatech.edu Alessandro Orso College of Computing Georgia Institute of Technology orso@cc.gatech.edu ABSTRACT Most existing leak detection techniques for C and C++ applications only detect the existence of memory leaks. They do not provide any help for fixing the underlying memory management errors. In this paper, we present a new technique that not only detects leaks, but also points developers to the locations where the underlying errors may be fixed. Our technique tracks pointers to dynamically- allocated areas of memory and, for each memory area, records sev- eral pieces of relevant information. This information is used to identify the locations in an execution where memory leaks occur. To investigate our technique’s feasibility and usefulness, we devel- oped a prototype tool called LEAKPOINT and used it to perform an empirical evaluation. The results of this evaluation show that LEAKPOINT detects at least as many leaks as existing tools, reports zero false positives, and, most importantly, can be effective at help- ing developers fix the underlying memory management errors. Categories and Subject Descriptors D.2.5 [Software Engineering]: Testing and Debugging General Terms Performance, Reliability Keywords Leak detection, Dynamic tainting 1. INTRODUCTION Memory leaks are a type of unintended memory consumption that can adversely impact the performance and correctness of an application. In programs written in languages such as C and C++, memory is allocated using allocation functions, such as malloc and new. Allocation functions reserve a currently free area of memory m and return a pointer p that points to m’s starting ad- dress. Typically, the program stores and then uses p, or another This work was supported in part by NSF awards CCF-0725202 and CCF-0541080 to Georgia Tech. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. ICSE ’10, May 2-8 2010, Cape Town, South Africa Copyright 2010 ACM 978-1-60558-719-6/10/05 ...$10.00. pointer derived from p, to interact with m. When m is no longer needed, the program should pass p to a deallocation function (e.g., free or delete) to deallocate m. A leak occurs if, due to a memory management error, m is not deallocated at the appropri- ate time. There are two types of memory leaks: lost memory and forgotten memory. Lost memory refers to the situation where m be- comes unreachable (i.e., the program overwrites or loses p and all pointers derived from p) without first being deallocated. Forgotten memory refers to the situation where m remains reachable but is not deallocated or accessed in the rest of the execution. Memory leaks are relevant for several reasons. First, they are dif- ficult to detect. Unlike many other types of failures, memory leaks do not immediately produce an easily visible symptom (e.g., a crash or the output of a wrong value); typically, leaks remain unobserved until they consume a large portion of the memory available to a sys- tem. Second, leaks have the potential to impact not only the appli- cation that leaks memory, but also every other application running on the system; because the overall amount of memory is limited, as the memory usage of a leaking program increases, less memory is available to other running applications. Consequently, the per- formance and correctness of every running application can be im- pacted by a program that leaks memory. Third, leaks are common, even in mature applications. For example, in the first half of 2009, over 100 leaks in the Firefox web-browser were reported [18]. Because of the serious consequences and common occurrence of memory leaks, researchers have created many static and dynamic techniques for detecting them (e.g., [1,2,4,7–14,16,17,20–23,25, 27,28]). The adoption of static techniques has been limited by sev- eral factors, including the lack of scalable, precise heap modeling. Dynamic techniques are therefore more widely used in practice. In general, dynamic techniques provide one main piece of informa- tion: the location in an execution where a leaked area of memory is allocated. This location is supposed to serve as a starting point for investigating the leak. However, in many situations, this informa- tion does not provide any insight on where or how to fix the mem- ory management error that causes the leak: the allocation location and the location of the memory management error are typically in completely different parts of the application’s code. To address this limitation of existing approaches, we propose a new memory leak detection technique. Our technique provides the same information as existing techniques but also identifies the locations in an execution where leaks occur. In the case of lost memory, the location is defined as the point in an execution where the last pointer to an unallocated memory area is lost or overwritten. In the case of forgotten memory, the location is defined as the last point in an execution where a pointer to a leaked area of memory was used (e.g., when it is dereferenced to read or write memory, passed as a function argument, returned from a function, or used as Camouflage: Automated Sanitization of Field Data James Clause College of Computing Georgia Institute of Technology clause@cc.gatech.edu Alessandro Orso College of Computing Georgia Institute of Technology orso@cc.gatech.edu ABSTRACT Privacy and security concerns have adversely a ected the usefulness of many types of techniques that leverage infor- mation gathered from deployed applications. To address this issue, we present a new approach for automatically sanitiz- ing failure-inducing inputs. Given an input I that causes a failure f, our technique can generate a sanitized input I that is di erent from I but still causes f. I can then be sent to the developers to help them debug f, without revealing the possibly sensitive information contained in I. We im- plemented our approach in a prototype tool, camouflage, and performed an empirical evaluation. In the evaluation, we applied camouflage to a large set of failure-inducing inputs for several real applications. The results of the eval- uation are promising; they show that camouflage is both practical and e ective at generating sanitized inputs. In par- ticular, for the inputs that we considered, I and I shared no sensitive information. 1. INTRODUCTION Investigating techniques that capture data from deployed applications to support in-house software engineering tasks is an increasingly active and successful area of research (e.g., [1,3–5,13,14,17,21,22,26,27,29]). However, privacy and se- curity concerns have prevented widespread adoption of many of these techniques and, because they rely on user partici- pation, have ultimately limited their usefulness. Many of the earlier proposed techniques attempt to sidestep these concerns by collecting only limited amounts of information (e.g., stack traces and register dumps [1, 3, 5] or sampled branch profiles [26,27]) and providing a privacy policy that specifies how the information will be used (e.g., [2,8]). Be- cause the types of information collected by these techniques are unlikely to be sensitive, users are more willing to trust developers. Moreover, because only a small amount of infor- mation is collected, it is feasible for users to manually inspect and sanitize such information before it is sent to developers. Unfortunately, recent research has shown that the e ec- tiveness of these techniques increases when they can lever- age large amounts of detailed information (e.g., complete execution recordings [4, 14] or path profiles [13, 24]). Since more detailed information is bound to contain sensitive data, users will most likely be unwilling to let developers collect such information. In addition, collecting large amounts of information would make it infeasible for users to sanitize the collected information by hand. To address this prob- lem, some of these techniques suggest using an input mini- mization approach (e.g., [6, 7, 35]) to reduce the number of failure-inducing inputs and, hopefully, eliminate some sensi- tive information. Input-minimization techniques, however, were not designed to specifically reduce sensitive inputs, so they can only eliminate sensitive data by chance. In or- der for techniques that leverage captured field information to become widely adopted and achieve their full potential, new approaches for addressing privacy and security concerns must be developed. In this paper, we present a novel technique that addresses privacy and security concerns by sanitizing information cap- tured from deployed applications. Our technique is designed to be used in conjunction with an execution capture/replay technique (e.g., [4, 14]). Given an execution recording that contains a captured failure-inducing input I = i1, i2, . . . in⇥ and terminates with a failure f, our technique replays the execution recording and leverages a specialized version of symbolic-execution to automatically produce I , a sanitized version of I, such that I (1) still causes f and (2) reveals as little information about I as possible. A modified execution recording where I replaces I can then be constructed and sent to the developers, who can use it to debug f. It is, in general, impossible to construct I such that it does not reveal any information about I while still caus- ing the same failure f. Typically, the execution of f would depend on the fact that some elements of I have specific values (e.g., i1 must be 0 for the failing path to be taken). However, this fact does not prevent the technique from be- ing useful in practice. In our evaluation, we found that the information revealed by the sanitized inputs was not sensi- tive and tended to be structural in nature (e.g., a specific portion of the input must be surrounded by double quotes). Conversely, the parts of the inputs that were more likely to be sensitive (e.g., values contained inside the double quotes) were not revealed (see Section 4). To evaluate the e ectiveness of our technique, we imple- mented it in a prototype tool, called camouflage, and car- ried out an empirical evaluation of 170 failure-inducing in- 1 CC 05 ICSE 05 ICSE 07 ISSTA 07 ASE 07 ISSTA 09 ICSE 10 Tech Rept RESEARCH OVERVIEW
  • 3. Efficient instrumentation Jazz: A Tool for Demand-Driven Structural Testing Jonathan Misurda1 , Jim Clause1 , Juliya Reed1 , Bruce R. Childers1 , and Mary Lou So a2 1 University of Pittsburgh, Pittsburgh PA 15260, USA, {jmisurda,clausej,juliya,childers}@cs.pitt.edu 2 University of Virginia, Charlottesville VA 22904, USA, soffa@cs.virginia.edu Abstract. Software testing to produce reliable and robust software has become vitally important. Testing is a process by which quality can be assured through the collection of information about software. While test- ing can improve software quality, current tools typically are inflexible and have high overheads, making it a challenge to test large projects. We describe a new scalable and flexible tool, called Jazz, that uses a demand-driven structural testing approach. Jazz has a low overhead of only 17.6% for branch testing. 1 Introduction In the last several years, the importance of producing high quality and robust software has become paramount. Testing is an important process to support quality assurance by gathering information about the software being developed or modified. It is, in general, extremely labor and resource intensive, accounting for 50-60% of the total cost of software development [1]. The increased emphasis on software quality and robustness mandates improved testing methodologies. To test software, a number of techniques can be applied. One class of tech- niques is structural testing, which checks that a given coverage criterion is sat- isfied. For example, branch testing checks that a certain percentage of branches are executed. Other structural tests include def-use testing in which pairs of variable definitions and uses are checked for coverage and node testing in which nodes in a program’s control flow graph are checked. Unfortunately, structural testing is often hindered by the lack of scalable and flexible tools. Current tools are not scalable in terms of both time and memory, limiting the number and scope of the tests that can be applied to large programs. These tools often modify the software binary to insert instrumentation for testing. In this case, the tested version of the application is not the same version that is shipped to customers and errors may remain. Testing tools are usually inflexible and only implement certain types of testing. For example, many tools implement branch testing, but do not implement node or def-use testing. In this paper, we describe a new tool for structural testing, called Jazz, that addresses these problems. Jazz uses a novel demand-driven technique to apply ABSTRACT Producing reliable and robust software has become one of the most important software development concerns in recent years. Testing is a process by which software quality can be assured through the collection of infor- mation. While testing can improve software reliability, current tools typically are inflexible and have high over- heads, making it challenging to test large software projects. In this paper, we describe a new scalable and flexible framework for testing programs with a novel demand-driven approach based on execution paths to implement test coverage. This technique uses dynamic instrumentation on the binary code that can be inserted and removed on-the-fly to keep performance and mem- ory overheads low. We describe and evaluate implemen- tations of the framework for branch, node and def-use testing of Java programs. Experimental results for branch testing show that our approach has, on average, a 1.6 speed up over static instrumentation and also uses less memory. Categories and Subject Descriptors D.2.5. [Software Engineering]: Testing and Debug- ging—Testing tools; D.3.3. [Programming Lan- guages]: Language Constructs and Features—Program instrumentation, run-time environments General Terms Experimentation, Measurement, Verification Keywords Testing, Code Coverage, Structural Testing, Demand- Driven Instrumentation, Java Programming Language 1. INTRODUCTION In the last several years, the importance of produc- ing high quality and robust software has become para- mount [15]. Testing is an important process to support quality assurance by gathering information about the behavior of the software being developed or modified. It is, in general, extremely labor and resource intensive, accounting for 50-60% of the total cost of software development [17]. Given the importance of testing, it is imperative that there are appropriate testing tools and frameworks. In order to adequately test software, a number of different testing techniques must be per- formed. One class of testing techniques used extensively is structural testing in which properties of the software code are used to ensure a certain code coverage.Struc- tural testing techniques include branch testing, node testing, path testing, and def-use testing [6,7,8,17,19]. Typically, a testing tool targets one type of struc- tural test, and the software unit is the program, file or particular methods. In order to apply various structural testing techniques, different tools must be used. If a tool for a particular type of structural testing is not available, the tester would need to either implement it or not use that testing technique. The tester would also be con- strained by the region of code to be tested, as deter- mined by the tool implementor. For example, it may not be possible for the tester to focus on a particular region of code, such as a series of loops, complicated condi- tionals, or particular variables if def-use testing is desired. The user may want to have higher coverage on frequently executed regions of code. Users may want to define their own way of testing. For example, all branches should be covered 10 times rather than once in all loops. In structural testing, instrumentation is placed at certain code points (probes). Whenever such a program point is reached, code that performs the function for the test (payload) is executed. The probes in def-use testing are dictated by the definitions and uses of variables and the payload is to mark that a definition or use in a def- use pair has been covered. Thus for each type of struc- tural testing, there is a testing “plan”. A test plan is a Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. ICSE'05, May 15-21, 2005, St. Louis, Missouri, USA. Copyright 2005 ACM 1-58113-963-2/05/0005...$5.00. Demand-Driven Structural Testing with Dynamic Instrumentation Jonathan Misurda†, James A. Clause†, Juliya L. Reed†, Bruce R. Childers†, and Mary Lou Soffa‡ †Department of Computer Science University of Pittsburgh Pittsburgh, Pennsylvania 15260 {jmisurda, clausej, juliya, childers}@cs.pitt.edu ‡Department of Computer Science University of Virginia Charlottesville, Virginia 22904 soffa@cs.virginia.edu 156 A Technique for Enabling and Supporting Debugging of Field Failures James Clause and Alessandro Orso College of Computing Georgia Institute of Technology {clause, orso}@cc.gatech.edu Abstract It is difficult to fully assess the quality of software in- house, outside the actual time and context in which it will execute after deployment. As a result, it is common for software to manifest field failures, failures that occur on user machines due to untested behavior. Field failures are typically difficult to recreate and investigate on developer platforms, and existing techniques based on crash report- ing provide only limited support for this task. In this pa- per, we present a technique for recording, reproducing, and minimizing failing executions that enables and supports in- house debugging of field failures. We also present a tool that implements our technique and an empirical study that evaluates the technique on a widely used e-mail client. 1. Introduction Quality-assurance activities, such as software testing and analysis, are notoriously difficult, expensive, and time- consuming. As a result, software products are often re- leased with faults or missing functionality. In fact, real- world examples of field failures experienced by users be- cause of untested behaviors (e.g., due to unforeseen us- ages), are countless. When field failures occur, it is im- portant for developers to be able to recreate and investigate them in-house. This pressing need is demonstrated by the emergence of several crash-reporting systems, such as Mi- crosoft’s error reporting systems [13] and Apple’s Crash Reporter [1]. Although these techniques represent a first important step in addressing the limitations of purely in- house approaches to quality assurance, they work on lim- ited data (typically, a snapshot of the execution state) and can at best identify correlations between a crash report and data on other known failures. In this paper, we present a novel technique for reproduc- ing and investigating field failures that addresses the limita- tions of existing approaches. Our technique works in three phases, intuitively illustrated by the scenario in Figure 1. In the recording phase, while users run the software, the tech- nique intercepts and logs the interactions between applica- tion and environment and records portions of the environ- ment that are relevant to these interactions. If the execution terminates with a failure, the produced execution recording is stored for later investigation. In the minimization phase, using free cycles on the user machines, the technique re- plays the recorded failing executions with the goal of au- tomatically eliminating parts of the executions that are not relevant to the failure. In the replay and debugging phase, developers can use the technique to replay the minimized failing executions and investigate the cause of the failures (e.g., within a debugger). Being able to replay and debug real field failures can give developers unprecedented insight into the behavior of their software after deployment and op- portunities to improve the quality of their software in ways that were not possible before. To evaluate our technique, we implemented it in a proto- type tool, called ADDA (Automated Debugging of Deployed Applications), and used the tool to perform an empirical study. The study was performed on PINE [19], a widely- used e-mail client, and involved the investigation of failures caused by two real faults in PINE. The results of the study are promising. Our technique was able to (1) record all ex- ecutions of PINE (and two other subjects) with a low time and space overhead, (2) completely replay all recorded exe- cutions, and (3) perform automated minimization of failing executions and obtain shorter executions that manifested the same failures as the original executions. Moreover, we were able to replay the minimized executions within a debugger, which shows that they could have actually been used to in- vestigate the failures. The contributions of this paper are: • A novel technique for recording and later replaying exe- cutions of deployed programs. • An approach for minimizing failing executions and gen- erating shorter executions that fail for the same reasons. • A prototype tool that implements our technique. • An empirical study that shows the feasibility and effec- tiveness of the approach. 29th International Conference on Software Engineering (ICSE'07) 0-7695-2828-7/07 $20.00 © 2007 Dytan: A Generic Dynamic Taint Analysis Framework James Clause, Wanchun Li, and Alessandro Orso College of Computing Georgia Institute of Technology {clause|wli7|orso}@cc.gatech.edu ABSTRACT Dynamic taint analysis is gaining momentum. Techniques based on dynamic tainting have been successfully used in the context of application security, and now their use is also being explored in dif- ferent areas, such as program understanding, software testing, and debugging. Unfortunately, most existing approaches for dynamic tainting are defined in an ad-hoc manner, which makes it difficult to extend them, experiment with them, and adapt them to new con- texts. Moreover, most existing approaches are focused on data-flow based tainting only and do not consider tainting due to control flow, which limits their applicability outside the security domain. To address these limitations and foster experimentation with dynamic tainting techniques, we defined and developed a general framework for dynamic tainting that (1) is highly flexible and customizable, (2) allows for performing both data-flow and control-flow based taint- ing conservatively, and (3) does not rely on any customized run- time system. We also present DYTAN, an implementation of our framework that works on x86 executables, and a set of preliminary studies that show how DYTAN can be used to implement different tainting-based approaches with limited effort. In the studies, we also show that DYTAN can be used on real software, by using FIRE- FOX as one of our subjects, and illustrate how the specific char- acteristics of the tainting approach used can affect efficiency and accuracy of the taint analysis, which further justifies the use of our framework to experiment with different variants of an approach. Categories and Subject Descriptors: D.2.5 [Software Engineer- ing]: Testing and Debugging; General Terms: Experimentation, Security Keywords: Dynamic tainting, information flow, general framework 1. INTRODUCTION Dynamic taint analysis (also known as dynamic information flow analysis) consists, intuitively, in marking and tracking certain data in a program at run-time. This type of dynamic analysis is be- coming increasingly popular. In the context of application secu- rity, dynamic-tainting approaches have been successfully used to prevent a wide range of attacks, including buffer overruns (e.g., [8, 17]), format string attacks (e.g., [17, 21]), SQL and command in- jections (e.g., [7, 19]), and cross-site scripting (e.g., [18]). More recently, researchers have started to investigate the use of tainting- based approaches in domains other than security, such as program understanding, software testing, and debugging (e.g., [11, 13]). Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. ISSTA’07, July 9–12, 2007, London, England, United Kingdom. Copyright 2007 ACM 978-1-59593-734-6/07/0007 ...$5.00. Unfortunately, most existing techniques and tools for dynamic taint analysis are defined in an ad-hoc manner, to target a specific problem or a small class of problems. It would be difficult to ex- tend or adapt such techniques and tools so that they can be used in other contexts. In particular, most existing approaches are focused on data-flow based tainting only, and do not consider tainting due to the control flow within an application, which limits their general applicability. Also, most existing techniques support either a sin- gle taint marking or a small, fixed number of markings, which is problematic in applications such as debugging. Finally, almost no existing technique handles the propagation of taint markings in a truly conservative way, which may be appropriate for the specific applications considered, but is problematic in general. Because de- veloping support for dynamic taint analysis is not only time con- suming, but also fairly complex, this lack of flexibility and gener- ality of existing tools and techniques is especially limiting for this type of dynamic analysis. To address these limitations and foster experimentation with dy- namic tainting techniques, in this paper we present a framework for dynamic taint analysis. We designed the framework to be general and flexible, so that it allows for implementing different kinds of techniques based on dynamic taint analysis with little effort. Users can leverage the framework to quickly develop prototypes for their techniques, experiment with them, and investigate trade-offs of dif- ferent alternatives. For a simple example, the framework could be used to investigate the cost effectiveness of considering different types of taint propagation for an application. Our framework has several advantages over existing approaches. First, it is highly flexible and customizable. It allows for easily specifying which program data should be tainted and how, how taint markings should be propagated at run-time, and where and how taint markings should be checked. Second, it allows for performing data-flow and both data-flow and control-flow based tainting. Third, from a more practical standpoint, it works on binaries, does not need access to source code, and does not rely on any customized hardware or operating system, which makes it broadly applicable. We also present DYTAN, an implementation of our framework that works on x86 binaries, and a set of preliminary studies per- formed using DYTAN. In the first set of studies, we report on our experience in using DYTAN to implement two tainting-based ap- proaches presented in the literature. Although preliminary, our ex- perience shows that we were able to implement these approaches completely and with little effort. The second set of studies illus- trates how the specific characteristics of a tainting approach can affect efficiency and accuracy of the taint analysis. In particular, we investigate how ignoring control-flow related propagation and over- looking some data-flow aspects can lead to unsafety. These results further justify the usefulness of experimenting with different varia- tions of dynamic taint analysis and assessing their tradeoffs, which can be done with limited effort using our framework. The second set of studies also shows the practical applicability of DYTAN, by successfully running it on the FIREFOX web browser. 196 Effective Memory Protection Using Dynamic Tainting James Clause, Ioannis Doudalis, Alessandro Orso, and Milos Prvulovic College of Computing Georgia Institute of Technology {clause|idoud|orso|milos}@cc.gatech.edu ABSTRACT Programs written in languages that provide direct access to memory through pointers often contain memory-related faults, which may cause non-deterministic failures and even security vulnerabilities. In this paper, we present a new technique based on dynamic taint- ing for protecting programs from illegal memory accesses. When memory is allocated, at runtime, our technique taints both the mem- ory and the corresponding pointer using the same taint mark. Taint marks are then suitably propagated while the program executes and are checked every time a memory address m is accessed through a pointer p; if the taint marks associated with m and p differ, the ex- ecution is stopped and the illegal access is reported. To allow for a low-overhead, hardware-assisted implementation of the approach, we make several key technical and engineering decisions in the definition of our technique. In particular, we use a configurable, low number of reusable taint marks instead of a unique mark for each area of memory allocated, which reduces the overhead of the approach without limiting its flexibility and ability to target most memory-related faults and attacks known to date. We also define the technique at the binary level, which lets us handle the (very) common case of applications that use third-party libraries whose source code is unavailable. To investigate the effectiveness and practicality of our approach, we implemented it for heap-allocated memory and performed a preliminary empirical study on a set of programs. Our results show that (1) our technique can identify a large class of memory-related faults, even when using only two unique taint marks, and (2) a hardware-assisted implementation of the technique could achieve overhead in the single digits. Categories and Subject Descriptors: D.2.5 [Software Engineering]: Test- ing and Debugging; C.0 [General]: Hardware/Software Interfaces; General Terms: Performance, Security Keywords: Illegal memory accesses, dynamic tainting, hardware support 1. INTRODUCTION Memory-related faults are a serious problem for languages that allow direct memory access through pointers. An important class of memory-related faults are what we call illegal memory accesses. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. ASE’07, November 5–9, 2007, Atlanta, Georgia, USA. Copyright 2007 ACM 978-1-59593-882-4/07/0011 ...$5.00. In languages such as C and C++, when memory allocation is re- quested, a currently-free area of memory m of the specified size is reserved. After m has been allocated, its initial address can be assigned to a pointer p, either immediately (e.g., in the case of heap allocated memory) or at a later time (e.g., when retrieving and storing the address of a local variable). From that point on, the only legal accesses to m through a pointer are accesses per- formed through p or through other pointers derived from p. (In Section 3, we clearly define what it means to derive a pointer from another pointer.) All other accesses to m are Illegal Memory Ac- cesses (IMAs), that is, accesses where a pointer is used to access memory outside the bounds of the memory area with which it was originally associated. IMAs are especially relevant for several reasons. First, they are caused by typical programming errors, such as array-out-of-bounds accesses and NULL pointer dereferences, and are thus widespread and common. Second, they often result in non-deterministic fail- ures that are hard to identify and diagnose; the specific effects of an IMA depend on several factors, such as memory layout, that may vary between executions. Finally, many security concerns such as viruses, worms, and rootkits use IMAs as their injection vectors. In this paper, we present a new dynamic technique for protecting programs against IMAs that is effective against most known types of illegal accesses. The basic idea behind the technique is to use dynamic tainting (or dynamic information flow) [8] to keep track of which memory areas can be accessed through which pointers, as follows. At runtime, our technique taints both allocated mem- ory and pointers using taint marks. Dynamic taint propagation, to- gether with a suitable handling of memory-allocation and deallo- cation operations, ensures that taint marks are appropriately prop- agated during execution. Every time the program accesses some memory through a pointer, our technique checks whether the ac- cess is legal by comparing the taint mark associated with the mem- ory and the taint mark associated with the pointer used to access it. If the marks match, the access is considered legitimate. Otherwise, the execution is stopped and an IMA is reported. In defining our approach, our final goal is the development of a low-overhead, hardware-assisted tool that is practical and can be used on deployed software. A hardware-assisted tool is a tool that leverages the benefits of both hardware and software. Typically, some performance critical aspects are moved to the hardware to achieve maximum efficiency, while software is used to perform op- erations that would be too complex to implement in hardware. There are two main characteristics of our approach that were de- fined to help achieve our goal of a hardware-assisted implementa- tion. The first characteristic is that our technique only uses a small, configurable number of reusable taint marks instead of a unique mark for each area of memory allocated. Using a low number of 283 Penumbra: Automatically Identifying Failure-Relevant Inputs Using Dynamic Tainting James Clause College of Computing Georgia Institute of Technology clause@cc.gatech.edu Alessandro Orso College of Computing Georgia Institute of Technology orso@cc.gatech.edu ABSTRACT Most existing automated debugging techniques focus on re- ducing the amount of code to be inspected and tend to ig- nore an important component of software failures: the in- puts that cause the failure to manifest. In this paper, we present a new technique based on dynamic tainting for au- tomatically identifying subsets of a program’s inputs that are relevant to a failure. The technique (1) marks program inputs when they enter the application, (2) tracks them as they propagate during execution, and (3) identifies, for an observed failure, the subset of inputs that are potentially relevant for debugging that failure. To investigate feasibil- ity and usefulness of our technique, we created a prototype tool, penumbra, and used it to evaluate our technique on several failures in real programs. Our results are promising, as they show that penumbra can point developers to inputs that are actually relevant for investigating a failure and can be more practical than existing alternative approaches. Categories and Subject Descriptors D.2.5 [Software Engineering]: Testing and Debugging General Terms Algorithms, Experimentation, Reliability Keywords Failure-relevant inputs, automated debugging, dynamic in- formation flow, dynamic tainting 1. INTRODUCTION Debugging is known to be a labor-intensive, time-consum- ing task that can be responsible for a large portion of soft- ware development and maintenance costs [21,23]. Common characteristics of modern software, such as increased con- figurability, larger code bases, and increased input sizes, in- troduce new challenges for debugging and exacerbate exist- ing problems. In response, researchers have proposed many Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. ISSTA’09, July 19–23, 2009, Chicago, Illinois, USA. Copyright 2009 ACM 978-1-60558-338-9/09/07 ...$5.00. semi- and fully-automated techniques that attempt to re- duce the cost of debugging (e.g., [8,9,11–13,18,24,25,27]). The majority of these techniques are code-centric in that they focus exclusively on one aspect of debugging—trying to identify the faulty statements responsible for a failure. Although code-centric approaches can work well in some cases (e.g., for isolated faults that involve a single state- ment), they are often inadequate for more complex faults [4]. Faults of omission, for instance, where part of a specification has not been implemented, are notoriously problematic for debugging techniques that attempt to identify potentially faulty statements. The usefulness of code-centric techniques is also limited in the case of long-running programs and pro- grams that process large amounts of information; failures in these types of programs are typically di⌅cult to understand without considering the data involved in such failures. To debug failures more e ectively, it is necessary to pro- vide developers with not only a relevant subset of state- ments, but also a relevant subset of inputs. There are only a few existing techniques that attempt to identify relevant inputs [3, 17, 25], with delta debugging [25] being the most known of these. Although delta debugging has been shown to be an e ective technique for automatic debugging, it also has several drawbacks that may limit its usefulness in prac- tice. In particular, it requires (1) multiple executions of the program being debugged, which can involve a long running time, and (2) complex oracles and setup, which can result in a large amount of manual e ort [2]. In this paper, we present a novel debugging technique that addresses many of the limitations of existing approaches. Our technique can complement code-centric debugging tech- niques because it focuses on identifying program inputs that are likely to be relevant for a given failure. It also overcomes some of the drawbacks of delta debugging because it needs a single execution to identify failure-relevant inputs and re- quires minimal manual e ort. Given an observable faulty behavior and a set of failure- inducing inputs (i.e., a set of inputs that cause such behav- ior), our technique automatically identifies failure-relevant inputs (i.e., a subset of failure-inducing inputs that are ac- tually relevant for investigating the faulty behavior). Our approach is based on dynamic tainting. Intuitively, the tech- nique works by tracking the flow of inputs along data and control dependences at runtime. When a point of failure is reached, the tracked information is used to identify and present to developers the failure-relevant inputs. At this point, developers can use the identified inputs to investigate the failure at hand. LEAKPOINT: Pinpointing the Causes of Memory Leaks James Clause College of Computing Georgia Institute of Technology clause@cc.gatech.edu Alessandro Orso College of Computing Georgia Institute of Technology orso@cc.gatech.edu ABSTRACT Most existing leak detection techniques for C and C++ applications only detect the existence of memory leaks. They do not provide any help for fixing the underlying memory management errors. In this paper, we present a new technique that not only detects leaks, but also points developers to the locations where the underlying errors may be fixed. Our technique tracks pointers to dynamically- allocated areas of memory and, for each memory area, records sev- eral pieces of relevant information. This information is used to identify the locations in an execution where memory leaks occur. To investigate our technique’s feasibility and usefulness, we devel- oped a prototype tool called LEAKPOINT and used it to perform an empirical evaluation. The results of this evaluation show that LEAKPOINT detects at least as many leaks as existing tools, reports zero false positives, and, most importantly, can be effective at help- ing developers fix the underlying memory management errors. Categories and Subject Descriptors D.2.5 [Software Engineering]: Testing and Debugging General Terms Performance, Reliability Keywords Leak detection, Dynamic tainting 1. INTRODUCTION Memory leaks are a type of unintended memory consumption that can adversely impact the performance and correctness of an application. In programs written in languages such as C and C++, memory is allocated using allocation functions, such as malloc and new. Allocation functions reserve a currently free area of memory m and return a pointer p that points to m’s starting ad- dress. Typically, the program stores and then uses p, or another This work was supported in part by NSF awards CCF-0725202 and CCF-0541080 to Georgia Tech. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. ICSE ’10, May 2-8 2010, Cape Town, South Africa Copyright 2010 ACM 978-1-60558-719-6/10/05 ...$10.00. pointer derived from p, to interact with m. When m is no longer needed, the program should pass p to a deallocation function (e.g., free or delete) to deallocate m. A leak occurs if, due to a memory management error, m is not deallocated at the appropri- ate time. There are two types of memory leaks: lost memory and forgotten memory. Lost memory refers to the situation where m be- comes unreachable (i.e., the program overwrites or loses p and all pointers derived from p) without first being deallocated. Forgotten memory refers to the situation where m remains reachable but is not deallocated or accessed in the rest of the execution. Memory leaks are relevant for several reasons. First, they are dif- ficult to detect. Unlike many other types of failures, memory leaks do not immediately produce an easily visible symptom (e.g., a crash or the output of a wrong value); typically, leaks remain unobserved until they consume a large portion of the memory available to a sys- tem. Second, leaks have the potential to impact not only the appli- cation that leaks memory, but also every other application running on the system; because the overall amount of memory is limited, as the memory usage of a leaking program increases, less memory is available to other running applications. Consequently, the per- formance and correctness of every running application can be im- pacted by a program that leaks memory. Third, leaks are common, even in mature applications. For example, in the first half of 2009, over 100 leaks in the Firefox web-browser were reported [18]. Because of the serious consequences and common occurrence of memory leaks, researchers have created many static and dynamic techniques for detecting them (e.g., [1,2,4,7–14,16,17,20–23,25, 27,28]). The adoption of static techniques has been limited by sev- eral factors, including the lack of scalable, precise heap modeling. Dynamic techniques are therefore more widely used in practice. In general, dynamic techniques provide one main piece of informa- tion: the location in an execution where a leaked area of memory is allocated. This location is supposed to serve as a starting point for investigating the leak. However, in many situations, this informa- tion does not provide any insight on where or how to fix the mem- ory management error that causes the leak: the allocation location and the location of the memory management error are typically in completely different parts of the application’s code. To address this limitation of existing approaches, we propose a new memory leak detection technique. Our technique provides the same information as existing techniques but also identifies the locations in an execution where leaks occur. In the case of lost memory, the location is defined as the point in an execution where the last pointer to an unallocated memory area is lost or overwritten. In the case of forgotten memory, the location is defined as the last point in an execution where a pointer to a leaked area of memory was used (e.g., when it is dereferenced to read or write memory, passed as a function argument, returned from a function, or used as Camouflage: Automated Sanitization of Field Data James Clause College of Computing Georgia Institute of Technology clause@cc.gatech.edu Alessandro Orso College of Computing Georgia Institute of Technology orso@cc.gatech.edu ABSTRACT Privacy and security concerns have adversely a ected the usefulness of many types of techniques that leverage infor- mation gathered from deployed applications. To address this issue, we present a new approach for automatically sanitiz- ing failure-inducing inputs. Given an input I that causes a failure f, our technique can generate a sanitized input I that is di erent from I but still causes f. I can then be sent to the developers to help them debug f, without revealing the possibly sensitive information contained in I. We im- plemented our approach in a prototype tool, camouflage, and performed an empirical evaluation. In the evaluation, we applied camouflage to a large set of failure-inducing inputs for several real applications. The results of the eval- uation are promising; they show that camouflage is both practical and e ective at generating sanitized inputs. In par- ticular, for the inputs that we considered, I and I shared no sensitive information. 1. INTRODUCTION Investigating techniques that capture data from deployed applications to support in-house software engineering tasks is an increasingly active and successful area of research (e.g., [1,3–5,13,14,17,21,22,26,27,29]). However, privacy and se- curity concerns have prevented widespread adoption of many of these techniques and, because they rely on user partici- pation, have ultimately limited their usefulness. Many of the earlier proposed techniques attempt to sidestep these concerns by collecting only limited amounts of information (e.g., stack traces and register dumps [1, 3, 5] or sampled branch profiles [26,27]) and providing a privacy policy that specifies how the information will be used (e.g., [2,8]). Be- cause the types of information collected by these techniques are unlikely to be sensitive, users are more willing to trust developers. Moreover, because only a small amount of infor- mation is collected, it is feasible for users to manually inspect and sanitize such information before it is sent to developers. Unfortunately, recent research has shown that the e ec- tiveness of these techniques increases when they can lever- age large amounts of detailed information (e.g., complete execution recordings [4, 14] or path profiles [13, 24]). Since more detailed information is bound to contain sensitive data, users will most likely be unwilling to let developers collect such information. In addition, collecting large amounts of information would make it infeasible for users to sanitize the collected information by hand. To address this prob- lem, some of these techniques suggest using an input mini- mization approach (e.g., [6, 7, 35]) to reduce the number of failure-inducing inputs and, hopefully, eliminate some sensi- tive information. Input-minimization techniques, however, were not designed to specifically reduce sensitive inputs, so they can only eliminate sensitive data by chance. In or- der for techniques that leverage captured field information to become widely adopted and achieve their full potential, new approaches for addressing privacy and security concerns must be developed. In this paper, we present a novel technique that addresses privacy and security concerns by sanitizing information cap- tured from deployed applications. Our technique is designed to be used in conjunction with an execution capture/replay technique (e.g., [4, 14]). Given an execution recording that contains a captured failure-inducing input I = i1, i2, . . . in⇥ and terminates with a failure f, our technique replays the execution recording and leverages a specialized version of symbolic-execution to automatically produce I , a sanitized version of I, such that I (1) still causes f and (2) reveals as little information about I as possible. A modified execution recording where I replaces I can then be constructed and sent to the developers, who can use it to debug f. It is, in general, impossible to construct I such that it does not reveal any information about I while still caus- ing the same failure f. Typically, the execution of f would depend on the fact that some elements of I have specific values (e.g., i1 must be 0 for the failing path to be taken). However, this fact does not prevent the technique from be- ing useful in practice. In our evaluation, we found that the information revealed by the sanitized inputs was not sensi- tive and tended to be structural in nature (e.g., a specific portion of the input must be surrounded by double quotes). Conversely, the parts of the inputs that were more likely to be sensitive (e.g., values contained inside the double quotes) were not revealed (see Section 4). To evaluate the e ectiveness of our technique, we imple- mented it in a prototype tool, called camouflage, and car- ried out an empirical evaluation of 170 failure-inducing in- 1 CC 05 ICSE 05 ICSE 07 ISSTA 07 ASE 07 ISSTA 09 ICSE 10 Tech Rept RESEARCH OVERVIEW
  • 4. Efficient instrumentation Jazz: A Tool for Demand-Driven Structural Testing Jonathan Misurda1 , Jim Clause1 , Juliya Reed1 , Bruce R. Childers1 , and Mary Lou So a2 1 University of Pittsburgh, Pittsburgh PA 15260, USA, {jmisurda,clausej,juliya,childers}@cs.pitt.edu 2 University of Virginia, Charlottesville VA 22904, USA, soffa@cs.virginia.edu Abstract. Software testing to produce reliable and robust software has become vitally important. Testing is a process by which quality can be assured through the collection of information about software. While test- ing can improve software quality, current tools typically are inflexible and have high overheads, making it a challenge to test large projects. We describe a new scalable and flexible tool, called Jazz, that uses a demand-driven structural testing approach. Jazz has a low overhead of only 17.6% for branch testing. 1 Introduction In the last several years, the importance of producing high quality and robust software has become paramount. Testing is an important process to support quality assurance by gathering information about the software being developed or modified. It is, in general, extremely labor and resource intensive, accounting for 50-60% of the total cost of software development [1]. The increased emphasis on software quality and robustness mandates improved testing methodologies. To test software, a number of techniques can be applied. One class of tech- niques is structural testing, which checks that a given coverage criterion is sat- isfied. For example, branch testing checks that a certain percentage of branches are executed. Other structural tests include def-use testing in which pairs of variable definitions and uses are checked for coverage and node testing in which nodes in a program’s control flow graph are checked. Unfortunately, structural testing is often hindered by the lack of scalable and flexible tools. Current tools are not scalable in terms of both time and memory, limiting the number and scope of the tests that can be applied to large programs. These tools often modify the software binary to insert instrumentation for testing. In this case, the tested version of the application is not the same version that is shipped to customers and errors may remain. Testing tools are usually inflexible and only implement certain types of testing. For example, many tools implement branch testing, but do not implement node or def-use testing. In this paper, we describe a new tool for structural testing, called Jazz, that addresses these problems. Jazz uses a novel demand-driven technique to apply ABSTRACT Producing reliable and robust software has become one of the most important software development concerns in recent years. Testing is a process by which software quality can be assured through the collection of infor- mation. While testing can improve software reliability, current tools typically are inflexible and have high over- heads, making it challenging to test large software projects. In this paper, we describe a new scalable and flexible framework for testing programs with a novel demand-driven approach based on execution paths to implement test coverage. This technique uses dynamic instrumentation on the binary code that can be inserted and removed on-the-fly to keep performance and mem- ory overheads low. We describe and evaluate implemen- tations of the framework for branch, node and def-use testing of Java programs. Experimental results for branch testing show that our approach has, on average, a 1.6 speed up over static instrumentation and also uses less memory. Categories and Subject Descriptors D.2.5. [Software Engineering]: Testing and Debug- ging—Testing tools; D.3.3. [Programming Lan- guages]: Language Constructs and Features—Program instrumentation, run-time environments General Terms Experimentation, Measurement, Verification Keywords Testing, Code Coverage, Structural Testing, Demand- Driven Instrumentation, Java Programming Language 1. INTRODUCTION In the last several years, the importance of produc- ing high quality and robust software has become para- mount [15]. Testing is an important process to support quality assurance by gathering information about the behavior of the software being developed or modified. It is, in general, extremely labor and resource intensive, accounting for 50-60% of the total cost of software development [17]. Given the importance of testing, it is imperative that there are appropriate testing tools and frameworks. In order to adequately test software, a number of different testing techniques must be per- formed. One class of testing techniques used extensively is structural testing in which properties of the software code are used to ensure a certain code coverage.Struc- tural testing techniques include branch testing, node testing, path testing, and def-use testing [6,7,8,17,19]. Typically, a testing tool targets one type of struc- tural test, and the software unit is the program, file or particular methods. In order to apply various structural testing techniques, different tools must be used. If a tool for a particular type of structural testing is not available, the tester would need to either implement it or not use that testing technique. The tester would also be con- strained by the region of code to be tested, as deter- mined by the tool implementor. For example, it may not be possible for the tester to focus on a particular region of code, such as a series of loops, complicated condi- tionals, or particular variables if def-use testing is desired. The user may want to have higher coverage on frequently executed regions of code. Users may want to define their own way of testing. For example, all branches should be covered 10 times rather than once in all loops. In structural testing, instrumentation is placed at certain code points (probes). Whenever such a program point is reached, code that performs the function for the test (payload) is executed. The probes in def-use testing are dictated by the definitions and uses of variables and the payload is to mark that a definition or use in a def- use pair has been covered. Thus for each type of struc- tural testing, there is a testing “plan”. A test plan is a Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. ICSE'05, May 15-21, 2005, St. Louis, Missouri, USA. Copyright 2005 ACM 1-58113-963-2/05/0005...$5.00. Demand-Driven Structural Testing with Dynamic Instrumentation Jonathan Misurda†, James A. Clause†, Juliya L. Reed†, Bruce R. Childers†, and Mary Lou Soffa‡ †Department of Computer Science University of Pittsburgh Pittsburgh, Pennsylvania 15260 {jmisurda, clausej, juliya, childers}@cs.pitt.edu ‡Department of Computer Science University of Virginia Charlottesville, Virginia 22904 soffa@cs.virginia.edu 156 A Technique for Enabling and Supporting Debugging of Field Failures James Clause and Alessandro Orso College of Computing Georgia Institute of Technology {clause, orso}@cc.gatech.edu Abstract It is difficult to fully assess the quality of software in- house, outside the actual time and context in which it will execute after deployment. As a result, it is common for software to manifest field failures, failures that occur on user machines due to untested behavior. Field failures are typically difficult to recreate and investigate on developer platforms, and existing techniques based on crash report- ing provide only limited support for this task. In this pa- per, we present a technique for recording, reproducing, and minimizing failing executions that enables and supports in- house debugging of field failures. We also present a tool that implements our technique and an empirical study that evaluates the technique on a widely used e-mail client. 1. Introduction Quality-assurance activities, such as software testing and analysis, are notoriously difficult, expensive, and time- consuming. As a result, software products are often re- leased with faults or missing functionality. In fact, real- world examples of field failures experienced by users be- cause of untested behaviors (e.g., due to unforeseen us- ages), are countless. When field failures occur, it is im- portant for developers to be able to recreate and investigate them in-house. This pressing need is demonstrated by the emergence of several crash-reporting systems, such as Mi- crosoft’s error reporting systems [13] and Apple’s Crash Reporter [1]. Although these techniques represent a first important step in addressing the limitations of purely in- house approaches to quality assurance, they work on lim- ited data (typically, a snapshot of the execution state) and can at best identify correlations between a crash report and data on other known failures. In this paper, we present a novel technique for reproduc- ing and investigating field failures that addresses the limita- tions of existing approaches. Our technique works in three phases, intuitively illustrated by the scenario in Figure 1. In the recording phase, while users run the software, the tech- nique intercepts and logs the interactions between applica- tion and environment and records portions of the environ- ment that are relevant to these interactions. If the execution terminates with a failure, the produced execution recording is stored for later investigation. In the minimization phase, using free cycles on the user machines, the technique re- plays the recorded failing executions with the goal of au- tomatically eliminating parts of the executions that are not relevant to the failure. In the replay and debugging phase, developers can use the technique to replay the minimized failing executions and investigate the cause of the failures (e.g., within a debugger). Being able to replay and debug real field failures can give developers unprecedented insight into the behavior of their software after deployment and op- portunities to improve the quality of their software in ways that were not possible before. To evaluate our technique, we implemented it in a proto- type tool, called ADDA (Automated Debugging of Deployed Applications), and used the tool to perform an empirical study. The study was performed on PINE [19], a widely- used e-mail client, and involved the investigation of failures caused by two real faults in PINE. The results of the study are promising. Our technique was able to (1) record all ex- ecutions of PINE (and two other subjects) with a low time and space overhead, (2) completely replay all recorded exe- cutions, and (3) perform automated minimization of failing executions and obtain shorter executions that manifested the same failures as the original executions. Moreover, we were able to replay the minimized executions within a debugger, which shows that they could have actually been used to in- vestigate the failures. The contributions of this paper are: • A novel technique for recording and later replaying exe- cutions of deployed programs. • An approach for minimizing failing executions and gen- erating shorter executions that fail for the same reasons. • A prototype tool that implements our technique. • An empirical study that shows the feasibility and effec- tiveness of the approach. 29th International Conference on Software Engineering (ICSE'07) 0-7695-2828-7/07 $20.00 © 2007 Dytan: A Generic Dynamic Taint Analysis Framework James Clause, Wanchun Li, and Alessandro Orso College of Computing Georgia Institute of Technology {clause|wli7|orso}@cc.gatech.edu ABSTRACT Dynamic taint analysis is gaining momentum. Techniques based on dynamic tainting have been successfully used in the context of application security, and now their use is also being explored in dif- ferent areas, such as program understanding, software testing, and debugging. Unfortunately, most existing approaches for dynamic tainting are defined in an ad-hoc manner, which makes it difficult to extend them, experiment with them, and adapt them to new con- texts. Moreover, most existing approaches are focused on data-flow based tainting only and do not consider tainting due to control flow, which limits their applicability outside the security domain. To address these limitations and foster experimentation with dynamic tainting techniques, we defined and developed a general framework for dynamic tainting that (1) is highly flexible and customizable, (2) allows for performing both data-flow and control-flow based taint- ing conservatively, and (3) does not rely on any customized run- time system. We also present DYTAN, an implementation of our framework that works on x86 executables, and a set of preliminary studies that show how DYTAN can be used to implement different tainting-based approaches with limited effort. In the studies, we also show that DYTAN can be used on real software, by using FIRE- FOX as one of our subjects, and illustrate how the specific char- acteristics of the tainting approach used can affect efficiency and accuracy of the taint analysis, which further justifies the use of our framework to experiment with different variants of an approach. Categories and Subject Descriptors: D.2.5 [Software Engineer- ing]: Testing and Debugging; General Terms: Experimentation, Security Keywords: Dynamic tainting, information flow, general framework 1. INTRODUCTION Dynamic taint analysis (also known as dynamic information flow analysis) consists, intuitively, in marking and tracking certain data in a program at run-time. This type of dynamic analysis is be- coming increasingly popular. In the context of application secu- rity, dynamic-tainting approaches have been successfully used to prevent a wide range of attacks, including buffer overruns (e.g., [8, 17]), format string attacks (e.g., [17, 21]), SQL and command in- jections (e.g., [7, 19]), and cross-site scripting (e.g., [18]). More recently, researchers have started to investigate the use of tainting- based approaches in domains other than security, such as program understanding, software testing, and debugging (e.g., [11, 13]). Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. ISSTA’07, July 9–12, 2007, London, England, United Kingdom. Copyright 2007 ACM 978-1-59593-734-6/07/0007 ...$5.00. Unfortunately, most existing techniques and tools for dynamic taint analysis are defined in an ad-hoc manner, to target a specific problem or a small class of problems. It would be difficult to ex- tend or adapt such techniques and tools so that they can be used in other contexts. In particular, most existing approaches are focused on data-flow based tainting only, and do not consider tainting due to the control flow within an application, which limits their general applicability. Also, most existing techniques support either a sin- gle taint marking or a small, fixed number of markings, which is problematic in applications such as debugging. Finally, almost no existing technique handles the propagation of taint markings in a truly conservative way, which may be appropriate for the specific applications considered, but is problematic in general. Because de- veloping support for dynamic taint analysis is not only time con- suming, but also fairly complex, this lack of flexibility and gener- ality of existing tools and techniques is especially limiting for this type of dynamic analysis. To address these limitations and foster experimentation with dy- namic tainting techniques, in this paper we present a framework for dynamic taint analysis. We designed the framework to be general and flexible, so that it allows for implementing different kinds of techniques based on dynamic taint analysis with little effort. Users can leverage the framework to quickly develop prototypes for their techniques, experiment with them, and investigate trade-offs of dif- ferent alternatives. For a simple example, the framework could be used to investigate the cost effectiveness of considering different types of taint propagation for an application. Our framework has several advantages over existing approaches. First, it is highly flexible and customizable. It allows for easily specifying which program data should be tainted and how, how taint markings should be propagated at run-time, and where and how taint markings should be checked. Second, it allows for performing data-flow and both data-flow and control-flow based tainting. Third, from a more practical standpoint, it works on binaries, does not need access to source code, and does not rely on any customized hardware or operating system, which makes it broadly applicable. We also present DYTAN, an implementation of our framework that works on x86 binaries, and a set of preliminary studies per- formed using DYTAN. In the first set of studies, we report on our experience in using DYTAN to implement two tainting-based ap- proaches presented in the literature. Although preliminary, our ex- perience shows that we were able to implement these approaches completely and with little effort. The second set of studies illus- trates how the specific characteristics of a tainting approach can affect efficiency and accuracy of the taint analysis. In particular, we investigate how ignoring control-flow related propagation and over- looking some data-flow aspects can lead to unsafety. These results further justify the usefulness of experimenting with different varia- tions of dynamic taint analysis and assessing their tradeoffs, which can be done with limited effort using our framework. The second set of studies also shows the practical applicability of DYTAN, by successfully running it on the FIREFOX web browser. 196 Effective Memory Protection Using Dynamic Tainting James Clause, Ioannis Doudalis, Alessandro Orso, and Milos Prvulovic College of Computing Georgia Institute of Technology {clause|idoud|orso|milos}@cc.gatech.edu ABSTRACT Programs written in languages that provide direct access to memory through pointers often contain memory-related faults, which may cause non-deterministic failures and even security vulnerabilities. In this paper, we present a new technique based on dynamic taint- ing for protecting programs from illegal memory accesses. When memory is allocated, at runtime, our technique taints both the mem- ory and the corresponding pointer using the same taint mark. Taint marks are then suitably propagated while the program executes and are checked every time a memory address m is accessed through a pointer p; if the taint marks associated with m and p differ, the ex- ecution is stopped and the illegal access is reported. To allow for a low-overhead, hardware-assisted implementation of the approach, we make several key technical and engineering decisions in the definition of our technique. In particular, we use a configurable, low number of reusable taint marks instead of a unique mark for each area of memory allocated, which reduces the overhead of the approach without limiting its flexibility and ability to target most memory-related faults and attacks known to date. We also define the technique at the binary level, which lets us handle the (very) common case of applications that use third-party libraries whose source code is unavailable. To investigate the effectiveness and practicality of our approach, we implemented it for heap-allocated memory and performed a preliminary empirical study on a set of programs. Our results show that (1) our technique can identify a large class of memory-related faults, even when using only two unique taint marks, and (2) a hardware-assisted implementation of the technique could achieve overhead in the single digits. Categories and Subject Descriptors: D.2.5 [Software Engineering]: Test- ing and Debugging; C.0 [General]: Hardware/Software Interfaces; General Terms: Performance, Security Keywords: Illegal memory accesses, dynamic tainting, hardware support 1. INTRODUCTION Memory-related faults are a serious problem for languages that allow direct memory access through pointers. An important class of memory-related faults are what we call illegal memory accesses. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. ASE’07, November 5–9, 2007, Atlanta, Georgia, USA. Copyright 2007 ACM 978-1-59593-882-4/07/0011 ...$5.00. In languages such as C and C++, when memory allocation is re- quested, a currently-free area of memory m of the specified size is reserved. After m has been allocated, its initial address can be assigned to a pointer p, either immediately (e.g., in the case of heap allocated memory) or at a later time (e.g., when retrieving and storing the address of a local variable). From that point on, the only legal accesses to m through a pointer are accesses per- formed through p or through other pointers derived from p. (In Section 3, we clearly define what it means to derive a pointer from another pointer.) All other accesses to m are Illegal Memory Ac- cesses (IMAs), that is, accesses where a pointer is used to access memory outside the bounds of the memory area with which it was originally associated. IMAs are especially relevant for several reasons. First, they are caused by typical programming errors, such as array-out-of-bounds accesses and NULL pointer dereferences, and are thus widespread and common. Second, they often result in non-deterministic fail- ures that are hard to identify and diagnose; the specific effects of an IMA depend on several factors, such as memory layout, that may vary between executions. Finally, many security concerns such as viruses, worms, and rootkits use IMAs as their injection vectors. In this paper, we present a new dynamic technique for protecting programs against IMAs that is effective against most known types of illegal accesses. The basic idea behind the technique is to use dynamic tainting (or dynamic information flow) [8] to keep track of which memory areas can be accessed through which pointers, as follows. At runtime, our technique taints both allocated mem- ory and pointers using taint marks. Dynamic taint propagation, to- gether with a suitable handling of memory-allocation and deallo- cation operations, ensures that taint marks are appropriately prop- agated during execution. Every time the program accesses some memory through a pointer, our technique checks whether the ac- cess is legal by comparing the taint mark associated with the mem- ory and the taint mark associated with the pointer used to access it. If the marks match, the access is considered legitimate. Otherwise, the execution is stopped and an IMA is reported. In defining our approach, our final goal is the development of a low-overhead, hardware-assisted tool that is practical and can be used on deployed software. A hardware-assisted tool is a tool that leverages the benefits of both hardware and software. Typically, some performance critical aspects are moved to the hardware to achieve maximum efficiency, while software is used to perform op- erations that would be too complex to implement in hardware. There are two main characteristics of our approach that were de- fined to help achieve our goal of a hardware-assisted implementa- tion. The first characteristic is that our technique only uses a small, configurable number of reusable taint marks instead of a unique mark for each area of memory allocated. Using a low number of 283 Penumbra: Automatically Identifying Failure-Relevant Inputs Using Dynamic Tainting James Clause College of Computing Georgia Institute of Technology clause@cc.gatech.edu Alessandro Orso College of Computing Georgia Institute of Technology orso@cc.gatech.edu ABSTRACT Most existing automated debugging techniques focus on re- ducing the amount of code to be inspected and tend to ig- nore an important component of software failures: the in- puts that cause the failure to manifest. In this paper, we present a new technique based on dynamic tainting for au- tomatically identifying subsets of a program’s inputs that are relevant to a failure. The technique (1) marks program inputs when they enter the application, (2) tracks them as they propagate during execution, and (3) identifies, for an observed failure, the subset of inputs that are potentially relevant for debugging that failure. To investigate feasibil- ity and usefulness of our technique, we created a prototype tool, penumbra, and used it to evaluate our technique on several failures in real programs. Our results are promising, as they show that penumbra can point developers to inputs that are actually relevant for investigating a failure and can be more practical than existing alternative approaches. Categories and Subject Descriptors D.2.5 [Software Engineering]: Testing and Debugging General Terms Algorithms, Experimentation, Reliability Keywords Failure-relevant inputs, automated debugging, dynamic in- formation flow, dynamic tainting 1. INTRODUCTION Debugging is known to be a labor-intensive, time-consum- ing task that can be responsible for a large portion of soft- ware development and maintenance costs [21,23]. Common characteristics of modern software, such as increased con- figurability, larger code bases, and increased input sizes, in- troduce new challenges for debugging and exacerbate exist- ing problems. In response, researchers have proposed many Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. ISSTA’09, July 19–23, 2009, Chicago, Illinois, USA. Copyright 2009 ACM 978-1-60558-338-9/09/07 ...$5.00. semi- and fully-automated techniques that attempt to re- duce the cost of debugging (e.g., [8,9,11–13,18,24,25,27]). The majority of these techniques are code-centric in that they focus exclusively on one aspect of debugging—trying to identify the faulty statements responsible for a failure. Although code-centric approaches can work well in some cases (e.g., for isolated faults that involve a single state- ment), they are often inadequate for more complex faults [4]. Faults of omission, for instance, where part of a specification has not been implemented, are notoriously problematic for debugging techniques that attempt to identify potentially faulty statements. The usefulness of code-centric techniques is also limited in the case of long-running programs and pro- grams that process large amounts of information; failures in these types of programs are typically di⌅cult to understand without considering the data involved in such failures. To debug failures more e ectively, it is necessary to pro- vide developers with not only a relevant subset of state- ments, but also a relevant subset of inputs. There are only a few existing techniques that attempt to identify relevant inputs [3, 17, 25], with delta debugging [25] being the most known of these. Although delta debugging has been shown to be an e ective technique for automatic debugging, it also has several drawbacks that may limit its usefulness in prac- tice. In particular, it requires (1) multiple executions of the program being debugged, which can involve a long running time, and (2) complex oracles and setup, which can result in a large amount of manual e ort [2]. In this paper, we present a novel debugging technique that addresses many of the limitations of existing approaches. Our technique can complement code-centric debugging tech- niques because it focuses on identifying program inputs that are likely to be relevant for a given failure. It also overcomes some of the drawbacks of delta debugging because it needs a single execution to identify failure-relevant inputs and re- quires minimal manual e ort. Given an observable faulty behavior and a set of failure- inducing inputs (i.e., a set of inputs that cause such behav- ior), our technique automatically identifies failure-relevant inputs (i.e., a subset of failure-inducing inputs that are ac- tually relevant for investigating the faulty behavior). Our approach is based on dynamic tainting. Intuitively, the tech- nique works by tracking the flow of inputs along data and control dependences at runtime. When a point of failure is reached, the tracked information is used to identify and present to developers the failure-relevant inputs. At this point, developers can use the identified inputs to investigate the failure at hand. LEAKPOINT: Pinpointing the Causes of Memory Leaks James Clause College of Computing Georgia Institute of Technology clause@cc.gatech.edu Alessandro Orso College of Computing Georgia Institute of Technology orso@cc.gatech.edu ABSTRACT Most existing leak detection techniques for C and C++ applications only detect the existence of memory leaks. They do not provide any help for fixing the underlying memory management errors. In this paper, we present a new technique that not only detects leaks, but also points developers to the locations where the underlying errors may be fixed. Our technique tracks pointers to dynamically- allocated areas of memory and, for each memory area, records sev- eral pieces of relevant information. This information is used to identify the locations in an execution where memory leaks occur. To investigate our technique’s feasibility and usefulness, we devel- oped a prototype tool called LEAKPOINT and used it to perform an empirical evaluation. The results of this evaluation show that LEAKPOINT detects at least as many leaks as existing tools, reports zero false positives, and, most importantly, can be effective at help- ing developers fix the underlying memory management errors. Categories and Subject Descriptors D.2.5 [Software Engineering]: Testing and Debugging General Terms Performance, Reliability Keywords Leak detection, Dynamic tainting 1. INTRODUCTION Memory leaks are a type of unintended memory consumption that can adversely impact the performance and correctness of an application. In programs written in languages such as C and C++, memory is allocated using allocation functions, such as malloc and new. Allocation functions reserve a currently free area of memory m and return a pointer p that points to m’s starting ad- dress. Typically, the program stores and then uses p, or another This work was supported in part by NSF awards CCF-0725202 and CCF-0541080 to Georgia Tech. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. ICSE ’10, May 2-8 2010, Cape Town, South Africa Copyright 2010 ACM 978-1-60558-719-6/10/05 ...$10.00. pointer derived from p, to interact with m. When m is no longer needed, the program should pass p to a deallocation function (e.g., free or delete) to deallocate m. A leak occurs if, due to a memory management error, m is not deallocated at the appropri- ate time. There are two types of memory leaks: lost memory and forgotten memory. Lost memory refers to the situation where m be- comes unreachable (i.e., the program overwrites or loses p and all pointers derived from p) without first being deallocated. Forgotten memory refers to the situation where m remains reachable but is not deallocated or accessed in the rest of the execution. Memory leaks are relevant for several reasons. First, they are dif- ficult to detect. Unlike many other types of failures, memory leaks do not immediately produce an easily visible symptom (e.g., a crash or the output of a wrong value); typically, leaks remain unobserved until they consume a large portion of the memory available to a sys- tem. Second, leaks have the potential to impact not only the appli- cation that leaks memory, but also every other application running on the system; because the overall amount of memory is limited, as the memory usage of a leaking program increases, less memory is available to other running applications. Consequently, the per- formance and correctness of every running application can be im- pacted by a program that leaks memory. Third, leaks are common, even in mature applications. For example, in the first half of 2009, over 100 leaks in the Firefox web-browser were reported [18]. Because of the serious consequences and common occurrence of memory leaks, researchers have created many static and dynamic techniques for detecting them (e.g., [1,2,4,7–14,16,17,20–23,25, 27,28]). The adoption of static techniques has been limited by sev- eral factors, including the lack of scalable, precise heap modeling. Dynamic techniques are therefore more widely used in practice. In general, dynamic techniques provide one main piece of informa- tion: the location in an execution where a leaked area of memory is allocated. This location is supposed to serve as a starting point for investigating the leak. However, in many situations, this informa- tion does not provide any insight on where or how to fix the mem- ory management error that causes the leak: the allocation location and the location of the memory management error are typically in completely different parts of the application’s code. To address this limitation of existing approaches, we propose a new memory leak detection technique. Our technique provides the same information as existing techniques but also identifies the locations in an execution where leaks occur. In the case of lost memory, the location is defined as the point in an execution where the last pointer to an unallocated memory area is lost or overwritten. In the case of forgotten memory, the location is defined as the last point in an execution where a pointer to a leaked area of memory was used (e.g., when it is dereferenced to read or write memory, passed as a function argument, returned from a function, or used as Camouflage: Automated Sanitization of Field Data James Clause College of Computing Georgia Institute of Technology clause@cc.gatech.edu Alessandro Orso College of Computing Georgia Institute of Technology orso@cc.gatech.edu ABSTRACT Privacy and security concerns have adversely a ected the usefulness of many types of techniques that leverage infor- mation gathered from deployed applications. To address this issue, we present a new approach for automatically sanitiz- ing failure-inducing inputs. Given an input I that causes a failure f, our technique can generate a sanitized input I that is di erent from I but still causes f. I can then be sent to the developers to help them debug f, without revealing the possibly sensitive information contained in I. We im- plemented our approach in a prototype tool, camouflage, and performed an empirical evaluation. In the evaluation, we applied camouflage to a large set of failure-inducing inputs for several real applications. The results of the eval- uation are promising; they show that camouflage is both practical and e ective at generating sanitized inputs. In par- ticular, for the inputs that we considered, I and I shared no sensitive information. 1. INTRODUCTION Investigating techniques that capture data from deployed applications to support in-house software engineering tasks is an increasingly active and successful area of research (e.g., [1,3–5,13,14,17,21,22,26,27,29]). However, privacy and se- curity concerns have prevented widespread adoption of many of these techniques and, because they rely on user partici- pation, have ultimately limited their usefulness. Many of the earlier proposed techniques attempt to sidestep these concerns by collecting only limited amounts of information (e.g., stack traces and register dumps [1, 3, 5] or sampled branch profiles [26,27]) and providing a privacy policy that specifies how the information will be used (e.g., [2,8]). Be- cause the types of information collected by these techniques are unlikely to be sensitive, users are more willing to trust developers. Moreover, because only a small amount of infor- mation is collected, it is feasible for users to manually inspect and sanitize such information before it is sent to developers. Unfortunately, recent research has shown that the e ec- tiveness of these techniques increases when they can lever- age large amounts of detailed information (e.g., complete execution recordings [4, 14] or path profiles [13, 24]). Since more detailed information is bound to contain sensitive data, users will most likely be unwilling to let developers collect such information. In addition, collecting large amounts of information would make it infeasible for users to sanitize the collected information by hand. To address this prob- lem, some of these techniques suggest using an input mini- mization approach (e.g., [6, 7, 35]) to reduce the number of failure-inducing inputs and, hopefully, eliminate some sensi- tive information. Input-minimization techniques, however, were not designed to specifically reduce sensitive inputs, so they can only eliminate sensitive data by chance. In or- der for techniques that leverage captured field information to become widely adopted and achieve their full potential, new approaches for addressing privacy and security concerns must be developed. In this paper, we present a novel technique that addresses privacy and security concerns by sanitizing information cap- tured from deployed applications. Our technique is designed to be used in conjunction with an execution capture/replay technique (e.g., [4, 14]). Given an execution recording that contains a captured failure-inducing input I = i1, i2, . . . in⇥ and terminates with a failure f, our technique replays the execution recording and leverages a specialized version of symbolic-execution to automatically produce I , a sanitized version of I, such that I (1) still causes f and (2) reveals as little information about I as possible. A modified execution recording where I replaces I can then be constructed and sent to the developers, who can use it to debug f. It is, in general, impossible to construct I such that it does not reveal any information about I while still caus- ing the same failure f. Typically, the execution of f would depend on the fact that some elements of I have specific values (e.g., i1 must be 0 for the failing path to be taken). However, this fact does not prevent the technique from be- ing useful in practice. In our evaluation, we found that the information revealed by the sanitized inputs was not sensi- tive and tended to be structural in nature (e.g., a specific portion of the input must be surrounded by double quotes). Conversely, the parts of the inputs that were more likely to be sensitive (e.g., values contained inside the double quotes) were not revealed (see Section 4). To evaluate the e ectiveness of our technique, we imple- mented it in a prototype tool, called camouflage, and car- ried out an empirical evaluation of 170 failure-inducing in- 1 CC 05 ICSE 05 ICSE 07 ISSTA 07 ASE 07 ISSTA 09 ICSE 10 Tech Rept Dynamic tainting based analyses RESEARCH OVERVIEW
  • 5. Efficient instrumentation Jazz: A Tool for Demand-Driven Structural Testing Jonathan Misurda1 , Jim Clause1 , Juliya Reed1 , Bruce R. Childers1 , and Mary Lou So a2 1 University of Pittsburgh, Pittsburgh PA 15260, USA, {jmisurda,clausej,juliya,childers}@cs.pitt.edu 2 University of Virginia, Charlottesville VA 22904, USA, soffa@cs.virginia.edu Abstract. Software testing to produce reliable and robust software has become vitally important. Testing is a process by which quality can be assured through the collection of information about software. While test- ing can improve software quality, current tools typically are inflexible and have high overheads, making it a challenge to test large projects. We describe a new scalable and flexible tool, called Jazz, that uses a demand-driven structural testing approach. Jazz has a low overhead of only 17.6% for branch testing. 1 Introduction In the last several years, the importance of producing high quality and robust software has become paramount. Testing is an important process to support quality assurance by gathering information about the software being developed or modified. It is, in general, extremely labor and resource intensive, accounting for 50-60% of the total cost of software development [1]. The increased emphasis on software quality and robustness mandates improved testing methodologies. To test software, a number of techniques can be applied. One class of tech- niques is structural testing, which checks that a given coverage criterion is sat- isfied. For example, branch testing checks that a certain percentage of branches are executed. Other structural tests include def-use testing in which pairs of variable definitions and uses are checked for coverage and node testing in which nodes in a program’s control flow graph are checked. Unfortunately, structural testing is often hindered by the lack of scalable and flexible tools. Current tools are not scalable in terms of both time and memory, limiting the number and scope of the tests that can be applied to large programs. These tools often modify the software binary to insert instrumentation for testing. In this case, the tested version of the application is not the same version that is shipped to customers and errors may remain. Testing tools are usually inflexible and only implement certain types of testing. For example, many tools implement branch testing, but do not implement node or def-use testing. In this paper, we describe a new tool for structural testing, called Jazz, that addresses these problems. Jazz uses a novel demand-driven technique to apply ABSTRACT Producing reliable and robust software has become one of the most important software development concerns in recent years. Testing is a process by which software quality can be assured through the collection of infor- mation. While testing can improve software reliability, current tools typically are inflexible and have high over- heads, making it challenging to test large software projects. In this paper, we describe a new scalable and flexible framework for testing programs with a novel demand-driven approach based on execution paths to implement test coverage. This technique uses dynamic instrumentation on the binary code that can be inserted and removed on-the-fly to keep performance and mem- ory overheads low. We describe and evaluate implemen- tations of the framework for branch, node and def-use testing of Java programs. Experimental results for branch testing show that our approach has, on average, a 1.6 speed up over static instrumentation and also uses less memory. Categories and Subject Descriptors D.2.5. [Software Engineering]: Testing and Debug- ging—Testing tools; D.3.3. [Programming Lan- guages]: Language Constructs and Features—Program instrumentation, run-time environments General Terms Experimentation, Measurement, Verification Keywords Testing, Code Coverage, Structural Testing, Demand- Driven Instrumentation, Java Programming Language 1. INTRODUCTION In the last several years, the importance of produc- ing high quality and robust software has become para- mount [15]. Testing is an important process to support quality assurance by gathering information about the behavior of the software being developed or modified. It is, in general, extremely labor and resource intensive, accounting for 50-60% of the total cost of software development [17]. Given the importance of testing, it is imperative that there are appropriate testing tools and frameworks. In order to adequately test software, a number of different testing techniques must be per- formed. One class of testing techniques used extensively is structural testing in which properties of the software code are used to ensure a certain code coverage.Struc- tural testing techniques include branch testing, node testing, path testing, and def-use testing [6,7,8,17,19]. Typically, a testing tool targets one type of struc- tural test, and the software unit is the program, file or particular methods. In order to apply various structural testing techniques, different tools must be used. If a tool for a particular type of structural testing is not available, the tester would need to either implement it or not use that testing technique. The tester would also be con- strained by the region of code to be tested, as deter- mined by the tool implementor. For example, it may not be possible for the tester to focus on a particular region of code, such as a series of loops, complicated condi- tionals, or particular variables if def-use testing is desired. The user may want to have higher coverage on frequently executed regions of code. Users may want to define their own way of testing. For example, all branches should be covered 10 times rather than once in all loops. In structural testing, instrumentation is placed at certain code points (probes). Whenever such a program point is reached, code that performs the function for the test (payload) is executed. The probes in def-use testing are dictated by the definitions and uses of variables and the payload is to mark that a definition or use in a def- use pair has been covered. Thus for each type of struc- tural testing, there is a testing “plan”. A test plan is a Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. ICSE'05, May 15-21, 2005, St. Louis, Missouri, USA. Copyright 2005 ACM 1-58113-963-2/05/0005...$5.00. Demand-Driven Structural Testing with Dynamic Instrumentation Jonathan Misurda†, James A. Clause†, Juliya L. Reed†, Bruce R. Childers†, and Mary Lou Soffa‡ †Department of Computer Science University of Pittsburgh Pittsburgh, Pennsylvania 15260 {jmisurda, clausej, juliya, childers}@cs.pitt.edu ‡Department of Computer Science University of Virginia Charlottesville, Virginia 22904 soffa@cs.virginia.edu 156 A Technique for Enabling and Supporting Debugging of Field Failures James Clause and Alessandro Orso College of Computing Georgia Institute of Technology {clause, orso}@cc.gatech.edu Abstract It is difficult to fully assess the quality of software in- house, outside the actual time and context in which it will execute after deployment. As a result, it is common for software to manifest field failures, failures that occur on user machines due to untested behavior. Field failures are typically difficult to recreate and investigate on developer platforms, and existing techniques based on crash report- ing provide only limited support for this task. In this pa- per, we present a technique for recording, reproducing, and minimizing failing executions that enables and supports in- house debugging of field failures. We also present a tool that implements our technique and an empirical study that evaluates the technique on a widely used e-mail client. 1. Introduction Quality-assurance activities, such as software testing and analysis, are notoriously difficult, expensive, and time- consuming. As a result, software products are often re- leased with faults or missing functionality. In fact, real- world examples of field failures experienced by users be- cause of untested behaviors (e.g., due to unforeseen us- ages), are countless. When field failures occur, it is im- portant for developers to be able to recreate and investigate them in-house. This pressing need is demonstrated by the emergence of several crash-reporting systems, such as Mi- crosoft’s error reporting systems [13] and Apple’s Crash Reporter [1]. Although these techniques represent a first important step in addressing the limitations of purely in- house approaches to quality assurance, they work on lim- ited data (typically, a snapshot of the execution state) and can at best identify correlations between a crash report and data on other known failures. In this paper, we present a novel technique for reproduc- ing and investigating field failures that addresses the limita- tions of existing approaches. Our technique works in three phases, intuitively illustrated by the scenario in Figure 1. In the recording phase, while users run the software, the tech- nique intercepts and logs the interactions between applica- tion and environment and records portions of the environ- ment that are relevant to these interactions. If the execution terminates with a failure, the produced execution recording is stored for later investigation. In the minimization phase, using free cycles on the user machines, the technique re- plays the recorded failing executions with the goal of au- tomatically eliminating parts of the executions that are not relevant to the failure. In the replay and debugging phase, developers can use the technique to replay the minimized failing executions and investigate the cause of the failures (e.g., within a debugger). Being able to replay and debug real field failures can give developers unprecedented insight into the behavior of their software after deployment and op- portunities to improve the quality of their software in ways that were not possible before. To evaluate our technique, we implemented it in a proto- type tool, called ADDA (Automated Debugging of Deployed Applications), and used the tool to perform an empirical study. The study was performed on PINE [19], a widely- used e-mail client, and involved the investigation of failures caused by two real faults in PINE. The results of the study are promising. Our technique was able to (1) record all ex- ecutions of PINE (and two other subjects) with a low time and space overhead, (2) completely replay all recorded exe- cutions, and (3) perform automated minimization of failing executions and obtain shorter executions that manifested the same failures as the original executions. Moreover, we were able to replay the minimized executions within a debugger, which shows that they could have actually been used to in- vestigate the failures. The contributions of this paper are: • A novel technique for recording and later replaying exe- cutions of deployed programs. • An approach for minimizing failing executions and gen- erating shorter executions that fail for the same reasons. • A prototype tool that implements our technique. • An empirical study that shows the feasibility and effec- tiveness of the approach. 29th International Conference on Software Engineering (ICSE'07) 0-7695-2828-7/07 $20.00 © 2007 Dytan: A Generic Dynamic Taint Analysis Framework James Clause, Wanchun Li, and Alessandro Orso College of Computing Georgia Institute of Technology {clause|wli7|orso}@cc.gatech.edu ABSTRACT Dynamic taint analysis is gaining momentum. Techniques based on dynamic tainting have been successfully used in the context of application security, and now their use is also being explored in dif- ferent areas, such as program understanding, software testing, and debugging. Unfortunately, most existing approaches for dynamic tainting are defined in an ad-hoc manner, which makes it difficult to extend them, experiment with them, and adapt them to new con- texts. Moreover, most existing approaches are focused on data-flow based tainting only and do not consider tainting due to control flow, which limits their applicability outside the security domain. To address these limitations and foster experimentation with dynamic tainting techniques, we defined and developed a general framework for dynamic tainting that (1) is highly flexible and customizable, (2) allows for performing both data-flow and control-flow based taint- ing conservatively, and (3) does not rely on any customized run- time system. We also present DYTAN, an implementation of our framework that works on x86 executables, and a set of preliminary studies that show how DYTAN can be used to implement different tainting-based approaches with limited effort. In the studies, we also show that DYTAN can be used on real software, by using FIRE- FOX as one of our subjects, and illustrate how the specific char- acteristics of the tainting approach used can affect efficiency and accuracy of the taint analysis, which further justifies the use of our framework to experiment with different variants of an approach. Categories and Subject Descriptors: D.2.5 [Software Engineer- ing]: Testing and Debugging; General Terms: Experimentation, Security Keywords: Dynamic tainting, information flow, general framework 1. INTRODUCTION Dynamic taint analysis (also known as dynamic information flow analysis) consists, intuitively, in marking and tracking certain data in a program at run-time. This type of dynamic analysis is be- coming increasingly popular. In the context of application secu- rity, dynamic-tainting approaches have been successfully used to prevent a wide range of attacks, including buffer overruns (e.g., [8, 17]), format string attacks (e.g., [17, 21]), SQL and command in- jections (e.g., [7, 19]), and cross-site scripting (e.g., [18]). More recently, researchers have started to investigate the use of tainting- based approaches in domains other than security, such as program understanding, software testing, and debugging (e.g., [11, 13]). Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. ISSTA’07, July 9–12, 2007, London, England, United Kingdom. Copyright 2007 ACM 978-1-59593-734-6/07/0007 ...$5.00. Unfortunately, most existing techniques and tools for dynamic taint analysis are defined in an ad-hoc manner, to target a specific problem or a small class of problems. It would be difficult to ex- tend or adapt such techniques and tools so that they can be used in other contexts. In particular, most existing approaches are focused on data-flow based tainting only, and do not consider tainting due to the control flow within an application, which limits their general applicability. Also, most existing techniques support either a sin- gle taint marking or a small, fixed number of markings, which is problematic in applications such as debugging. Finally, almost no existing technique handles the propagation of taint markings in a truly conservative way, which may be appropriate for the specific applications considered, but is problematic in general. Because de- veloping support for dynamic taint analysis is not only time con- suming, but also fairly complex, this lack of flexibility and gener- ality of existing tools and techniques is especially limiting for this type of dynamic analysis. To address these limitations and foster experimentation with dy- namic tainting techniques, in this paper we present a framework for dynamic taint analysis. We designed the framework to be general and flexible, so that it allows for implementing different kinds of techniques based on dynamic taint analysis with little effort. Users can leverage the framework to quickly develop prototypes for their techniques, experiment with them, and investigate trade-offs of dif- ferent alternatives. For a simple example, the framework could be used to investigate the cost effectiveness of considering different types of taint propagation for an application. Our framework has several advantages over existing approaches. First, it is highly flexible and customizable. It allows for easily specifying which program data should be tainted and how, how taint markings should be propagated at run-time, and where and how taint markings should be checked. Second, it allows for performing data-flow and both data-flow and control-flow based tainting. Third, from a more practical standpoint, it works on binaries, does not need access to source code, and does not rely on any customized hardware or operating system, which makes it broadly applicable. We also present DYTAN, an implementation of our framework that works on x86 binaries, and a set of preliminary studies per- formed using DYTAN. In the first set of studies, we report on our experience in using DYTAN to implement two tainting-based ap- proaches presented in the literature. Although preliminary, our ex- perience shows that we were able to implement these approaches completely and with little effort. The second set of studies illus- trates how the specific characteristics of a tainting approach can affect efficiency and accuracy of the taint analysis. In particular, we investigate how ignoring control-flow related propagation and over- looking some data-flow aspects can lead to unsafety. These results further justify the usefulness of experimenting with different varia- tions of dynamic taint analysis and assessing their tradeoffs, which can be done with limited effort using our framework. The second set of studies also shows the practical applicability of DYTAN, by successfully running it on the FIREFOX web browser. 196 Effective Memory Protection Using Dynamic Tainting James Clause, Ioannis Doudalis, Alessandro Orso, and Milos Prvulovic College of Computing Georgia Institute of Technology {clause|idoud|orso|milos}@cc.gatech.edu ABSTRACT Programs written in languages that provide direct access to memory through pointers often contain memory-related faults, which may cause non-deterministic failures and even security vulnerabilities. In this paper, we present a new technique based on dynamic taint- ing for protecting programs from illegal memory accesses. When memory is allocated, at runtime, our technique taints both the mem- ory and the corresponding pointer using the same taint mark. Taint marks are then suitably propagated while the program executes and are checked every time a memory address m is accessed through a pointer p; if the taint marks associated with m and p differ, the ex- ecution is stopped and the illegal access is reported. To allow for a low-overhead, hardware-assisted implementation of the approach, we make several key technical and engineering decisions in the definition of our technique. In particular, we use a configurable, low number of reusable taint marks instead of a unique mark for each area of memory allocated, which reduces the overhead of the approach without limiting its flexibility and ability to target most memory-related faults and attacks known to date. We also define the technique at the binary level, which lets us handle the (very) common case of applications that use third-party libraries whose source code is unavailable. To investigate the effectiveness and practicality of our approach, we implemented it for heap-allocated memory and performed a preliminary empirical study on a set of programs. Our results show that (1) our technique can identify a large class of memory-related faults, even when using only two unique taint marks, and (2) a hardware-assisted implementation of the technique could achieve overhead in the single digits. Categories and Subject Descriptors: D.2.5 [Software Engineering]: Test- ing and Debugging; C.0 [General]: Hardware/Software Interfaces; General Terms: Performance, Security Keywords: Illegal memory accesses, dynamic tainting, hardware support 1. INTRODUCTION Memory-related faults are a serious problem for languages that allow direct memory access through pointers. An important class of memory-related faults are what we call illegal memory accesses. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. ASE’07, November 5–9, 2007, Atlanta, Georgia, USA. Copyright 2007 ACM 978-1-59593-882-4/07/0011 ...$5.00. In languages such as C and C++, when memory allocation is re- quested, a currently-free area of memory m of the specified size is reserved. After m has been allocated, its initial address can be assigned to a pointer p, either immediately (e.g., in the case of heap allocated memory) or at a later time (e.g., when retrieving and storing the address of a local variable). From that point on, the only legal accesses to m through a pointer are accesses per- formed through p or through other pointers derived from p. (In Section 3, we clearly define what it means to derive a pointer from another pointer.) All other accesses to m are Illegal Memory Ac- cesses (IMAs), that is, accesses where a pointer is used to access memory outside the bounds of the memory area with which it was originally associated. IMAs are especially relevant for several reasons. First, they are caused by typical programming errors, such as array-out-of-bounds accesses and NULL pointer dereferences, and are thus widespread and common. Second, they often result in non-deterministic fail- ures that are hard to identify and diagnose; the specific effects of an IMA depend on several factors, such as memory layout, that may vary between executions. Finally, many security concerns such as viruses, worms, and rootkits use IMAs as their injection vectors. In this paper, we present a new dynamic technique for protecting programs against IMAs that is effective against most known types of illegal accesses. The basic idea behind the technique is to use dynamic tainting (or dynamic information flow) [8] to keep track of which memory areas can be accessed through which pointers, as follows. At runtime, our technique taints both allocated mem- ory and pointers using taint marks. Dynamic taint propagation, to- gether with a suitable handling of memory-allocation and deallo- cation operations, ensures that taint marks are appropriately prop- agated during execution. Every time the program accesses some memory through a pointer, our technique checks whether the ac- cess is legal by comparing the taint mark associated with the mem- ory and the taint mark associated with the pointer used to access it. If the marks match, the access is considered legitimate. Otherwise, the execution is stopped and an IMA is reported. In defining our approach, our final goal is the development of a low-overhead, hardware-assisted tool that is practical and can be used on deployed software. A hardware-assisted tool is a tool that leverages the benefits of both hardware and software. Typically, some performance critical aspects are moved to the hardware to achieve maximum efficiency, while software is used to perform op- erations that would be too complex to implement in hardware. There are two main characteristics of our approach that were de- fined to help achieve our goal of a hardware-assisted implementa- tion. The first characteristic is that our technique only uses a small, configurable number of reusable taint marks instead of a unique mark for each area of memory allocated. Using a low number of 283 Penumbra: Automatically Identifying Failure-Relevant Inputs Using Dynamic Tainting James Clause College of Computing Georgia Institute of Technology clause@cc.gatech.edu Alessandro Orso College of Computing Georgia Institute of Technology orso@cc.gatech.edu ABSTRACT Most existing automated debugging techniques focus on re- ducing the amount of code to be inspected and tend to ig- nore an important component of software failures: the in- puts that cause the failure to manifest. In this paper, we present a new technique based on dynamic tainting for au- tomatically identifying subsets of a program’s inputs that are relevant to a failure. The technique (1) marks program inputs when they enter the application, (2) tracks them as they propagate during execution, and (3) identifies, for an observed failure, the subset of inputs that are potentially relevant for debugging that failure. To investigate feasibil- ity and usefulness of our technique, we created a prototype tool, penumbra, and used it to evaluate our technique on several failures in real programs. Our results are promising, as they show that penumbra can point developers to inputs that are actually relevant for investigating a failure and can be more practical than existing alternative approaches. Categories and Subject Descriptors D.2.5 [Software Engineering]: Testing and Debugging General Terms Algorithms, Experimentation, Reliability Keywords Failure-relevant inputs, automated debugging, dynamic in- formation flow, dynamic tainting 1. INTRODUCTION Debugging is known to be a labor-intensive, time-consum- ing task that can be responsible for a large portion of soft- ware development and maintenance costs [21,23]. Common characteristics of modern software, such as increased con- figurability, larger code bases, and increased input sizes, in- troduce new challenges for debugging and exacerbate exist- ing problems. In response, researchers have proposed many Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. ISSTA’09, July 19–23, 2009, Chicago, Illinois, USA. Copyright 2009 ACM 978-1-60558-338-9/09/07 ...$5.00. semi- and fully-automated techniques that attempt to re- duce the cost of debugging (e.g., [8,9,11–13,18,24,25,27]). The majority of these techniques are code-centric in that they focus exclusively on one aspect of debugging—trying to identify the faulty statements responsible for a failure. Although code-centric approaches can work well in some cases (e.g., for isolated faults that involve a single state- ment), they are often inadequate for more complex faults [4]. Faults of omission, for instance, where part of a specification has not been implemented, are notoriously problematic for debugging techniques that attempt to identify potentially faulty statements. The usefulness of code-centric techniques is also limited in the case of long-running programs and pro- grams that process large amounts of information; failures in these types of programs are typically di⌅cult to understand without considering the data involved in such failures. To debug failures more e ectively, it is necessary to pro- vide developers with not only a relevant subset of state- ments, but also a relevant subset of inputs. There are only a few existing techniques that attempt to identify relevant inputs [3, 17, 25], with delta debugging [25] being the most known of these. Although delta debugging has been shown to be an e ective technique for automatic debugging, it also has several drawbacks that may limit its usefulness in prac- tice. In particular, it requires (1) multiple executions of the program being debugged, which can involve a long running time, and (2) complex oracles and setup, which can result in a large amount of manual e ort [2]. In this paper, we present a novel debugging technique that addresses many of the limitations of existing approaches. Our technique can complement code-centric debugging tech- niques because it focuses on identifying program inputs that are likely to be relevant for a given failure. It also overcomes some of the drawbacks of delta debugging because it needs a single execution to identify failure-relevant inputs and re- quires minimal manual e ort. Given an observable faulty behavior and a set of failure- inducing inputs (i.e., a set of inputs that cause such behav- ior), our technique automatically identifies failure-relevant inputs (i.e., a subset of failure-inducing inputs that are ac- tually relevant for investigating the faulty behavior). Our approach is based on dynamic tainting. Intuitively, the tech- nique works by tracking the flow of inputs along data and control dependences at runtime. When a point of failure is reached, the tracked information is used to identify and present to developers the failure-relevant inputs. At this point, developers can use the identified inputs to investigate the failure at hand. LEAKPOINT: Pinpointing the Causes of Memory Leaks James Clause College of Computing Georgia Institute of Technology clause@cc.gatech.edu Alessandro Orso College of Computing Georgia Institute of Technology orso@cc.gatech.edu ABSTRACT Most existing leak detection techniques for C and C++ applications only detect the existence of memory leaks. They do not provide any help for fixing the underlying memory management errors. In this paper, we present a new technique that not only detects leaks, but also points developers to the locations where the underlying errors may be fixed. Our technique tracks pointers to dynamically- allocated areas of memory and, for each memory area, records sev- eral pieces of relevant information. This information is used to identify the locations in an execution where memory leaks occur. To investigate our technique’s feasibility and usefulness, we devel- oped a prototype tool called LEAKPOINT and used it to perform an empirical evaluation. The results of this evaluation show that LEAKPOINT detects at least as many leaks as existing tools, reports zero false positives, and, most importantly, can be effective at help- ing developers fix the underlying memory management errors. Categories and Subject Descriptors D.2.5 [Software Engineering]: Testing and Debugging General Terms Performance, Reliability Keywords Leak detection, Dynamic tainting 1. INTRODUCTION Memory leaks are a type of unintended memory consumption that can adversely impact the performance and correctness of an application. In programs written in languages such as C and C++, memory is allocated using allocation functions, such as malloc and new. Allocation functions reserve a currently free area of memory m and return a pointer p that points to m’s starting ad- dress. Typically, the program stores and then uses p, or another This work was supported in part by NSF awards CCF-0725202 and CCF-0541080 to Georgia Tech. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. ICSE ’10, May 2-8 2010, Cape Town, South Africa Copyright 2010 ACM 978-1-60558-719-6/10/05 ...$10.00. pointer derived from p, to interact with m. When m is no longer needed, the program should pass p to a deallocation function (e.g., free or delete) to deallocate m. A leak occurs if, due to a memory management error, m is not deallocated at the appropri- ate time. There are two types of memory leaks: lost memory and forgotten memory. Lost memory refers to the situation where m be- comes unreachable (i.e., the program overwrites or loses p and all pointers derived from p) without first being deallocated. Forgotten memory refers to the situation where m remains reachable but is not deallocated or accessed in the rest of the execution. Memory leaks are relevant for several reasons. First, they are dif- ficult to detect. Unlike many other types of failures, memory leaks do not immediately produce an easily visible symptom (e.g., a crash or the output of a wrong value); typically, leaks remain unobserved until they consume a large portion of the memory available to a sys- tem. Second, leaks have the potential to impact not only the appli- cation that leaks memory, but also every other application running on the system; because the overall amount of memory is limited, as the memory usage of a leaking program increases, less memory is available to other running applications. Consequently, the per- formance and correctness of every running application can be im- pacted by a program that leaks memory. Third, leaks are common, even in mature applications. For example, in the first half of 2009, over 100 leaks in the Firefox web-browser were reported [18]. Because of the serious consequences and common occurrence of memory leaks, researchers have created many static and dynamic techniques for detecting them (e.g., [1,2,4,7–14,16,17,20–23,25, 27,28]). The adoption of static techniques has been limited by sev- eral factors, including the lack of scalable, precise heap modeling. Dynamic techniques are therefore more widely used in practice. In general, dynamic techniques provide one main piece of informa- tion: the location in an execution where a leaked area of memory is allocated. This location is supposed to serve as a starting point for investigating the leak. However, in many situations, this informa- tion does not provide any insight on where or how to fix the mem- ory management error that causes the leak: the allocation location and the location of the memory management error are typically in completely different parts of the application’s code. To address this limitation of existing approaches, we propose a new memory leak detection technique. Our technique provides the same information as existing techniques but also identifies the locations in an execution where leaks occur. In the case of lost memory, the location is defined as the point in an execution where the last pointer to an unallocated memory area is lost or overwritten. In the case of forgotten memory, the location is defined as the last point in an execution where a pointer to a leaked area of memory was used (e.g., when it is dereferenced to read or write memory, passed as a function argument, returned from a function, or used as Camouflage: Automated Sanitization of Field Data James Clause College of Computing Georgia Institute of Technology clause@cc.gatech.edu Alessandro Orso College of Computing Georgia Institute of Technology orso@cc.gatech.edu ABSTRACT Privacy and security concerns have adversely a ected the usefulness of many types of techniques that leverage infor- mation gathered from deployed applications. To address this issue, we present a new approach for automatically sanitiz- ing failure-inducing inputs. Given an input I that causes a failure f, our technique can generate a sanitized input I that is di erent from I but still causes f. I can then be sent to the developers to help them debug f, without revealing the possibly sensitive information contained in I. We im- plemented our approach in a prototype tool, camouflage, and performed an empirical evaluation. In the evaluation, we applied camouflage to a large set of failure-inducing inputs for several real applications. The results of the eval- uation are promising; they show that camouflage is both practical and e ective at generating sanitized inputs. In par- ticular, for the inputs that we considered, I and I shared no sensitive information. 1. INTRODUCTION Investigating techniques that capture data from deployed applications to support in-house software engineering tasks is an increasingly active and successful area of research (e.g., [1,3–5,13,14,17,21,22,26,27,29]). However, privacy and se- curity concerns have prevented widespread adoption of many of these techniques and, because they rely on user partici- pation, have ultimately limited their usefulness. Many of the earlier proposed techniques attempt to sidestep these concerns by collecting only limited amounts of information (e.g., stack traces and register dumps [1, 3, 5] or sampled branch profiles [26,27]) and providing a privacy policy that specifies how the information will be used (e.g., [2,8]). Be- cause the types of information collected by these techniques are unlikely to be sensitive, users are more willing to trust developers. Moreover, because only a small amount of infor- mation is collected, it is feasible for users to manually inspect and sanitize such information before it is sent to developers. Unfortunately, recent research has shown that the e ec- tiveness of these techniques increases when they can lever- age large amounts of detailed information (e.g., complete execution recordings [4, 14] or path profiles [13, 24]). Since more detailed information is bound to contain sensitive data, users will most likely be unwilling to let developers collect such information. In addition, collecting large amounts of information would make it infeasible for users to sanitize the collected information by hand. To address this prob- lem, some of these techniques suggest using an input mini- mization approach (e.g., [6, 7, 35]) to reduce the number of failure-inducing inputs and, hopefully, eliminate some sensi- tive information. Input-minimization techniques, however, were not designed to specifically reduce sensitive inputs, so they can only eliminate sensitive data by chance. In or- der for techniques that leverage captured field information to become widely adopted and achieve their full potential, new approaches for addressing privacy and security concerns must be developed. In this paper, we present a novel technique that addresses privacy and security concerns by sanitizing information cap- tured from deployed applications. Our technique is designed to be used in conjunction with an execution capture/replay technique (e.g., [4, 14]). Given an execution recording that contains a captured failure-inducing input I = i1, i2, . . . in⇥ and terminates with a failure f, our technique replays the execution recording and leverages a specialized version of symbolic-execution to automatically produce I , a sanitized version of I, such that I (1) still causes f and (2) reveals as little information about I as possible. A modified execution recording where I replaces I can then be constructed and sent to the developers, who can use it to debug f. It is, in general, impossible to construct I such that it does not reveal any information about I while still caus- ing the same failure f. Typically, the execution of f would depend on the fact that some elements of I have specific values (e.g., i1 must be 0 for the failing path to be taken). However, this fact does not prevent the technique from be- ing useful in practice. In our evaluation, we found that the information revealed by the sanitized inputs was not sensi- tive and tended to be structural in nature (e.g., a specific portion of the input must be surrounded by double quotes). Conversely, the parts of the inputs that were more likely to be sensitive (e.g., values contained inside the double quotes) were not revealed (see Section 4). To evaluate the e ectiveness of our technique, we imple- mented it in a prototype tool, called camouflage, and car- ried out an empirical evaluation of 170 failure-inducing in- 1 CC 05 ICSE 05 ICSE 07 ISSTA 07 ASE 07 ISSTA 09 ICSE 10 Tech Rept Dynamic tainting based analyses Enabling more efficient debugging RESEARCH OVERVIEW
  • 6. Efficient instrumentation Jazz: A Tool for Demand-Driven Structural Testing Jonathan Misurda1 , Jim Clause1 , Juliya Reed1 , Bruce R. Childers1 , and Mary Lou So a2 1 University of Pittsburgh, Pittsburgh PA 15260, USA, {jmisurda,clausej,juliya,childers}@cs.pitt.edu 2 University of Virginia, Charlottesville VA 22904, USA, soffa@cs.virginia.edu Abstract. Software testing to produce reliable and robust software has become vitally important. Testing is a process by which quality can be assured through the collection of information about software. While test- ing can improve software quality, current tools typically are inflexible and have high overheads, making it a challenge to test large projects. We describe a new scalable and flexible tool, called Jazz, that uses a demand-driven structural testing approach. Jazz has a low overhead of only 17.6% for branch testing. 1 Introduction In the last several years, the importance of producing high quality and robust software has become paramount. Testing is an important process to support quality assurance by gathering information about the software being developed or modified. It is, in general, extremely labor and resource intensive, accounting for 50-60% of the total cost of software development [1]. The increased emphasis on software quality and robustness mandates improved testing methodologies. To test software, a number of techniques can be applied. One class of tech- niques is structural testing, which checks that a given coverage criterion is sat- isfied. For example, branch testing checks that a certain percentage of branches are executed. Other structural tests include def-use testing in which pairs of variable definitions and uses are checked for coverage and node testing in which nodes in a program’s control flow graph are checked. Unfortunately, structural testing is often hindered by the lack of scalable and flexible tools. Current tools are not scalable in terms of both time and memory, limiting the number and scope of the tests that can be applied to large programs. These tools often modify the software binary to insert instrumentation for testing. In this case, the tested version of the application is not the same version that is shipped to customers and errors may remain. Testing tools are usually inflexible and only implement certain types of testing. For example, many tools implement branch testing, but do not implement node or def-use testing. In this paper, we describe a new tool for structural testing, called Jazz, that addresses these problems. Jazz uses a novel demand-driven technique to apply ABSTRACT Producing reliable and robust software has become one of the most important software development concerns in recent years. Testing is a process by which software quality can be assured through the collection of infor- mation. While testing can improve software reliability, current tools typically are inflexible and have high over- heads, making it challenging to test large software projects. In this paper, we describe a new scalable and flexible framework for testing programs with a novel demand-driven approach based on execution paths to implement test coverage. This technique uses dynamic instrumentation on the binary code that can be inserted and removed on-the-fly to keep performance and mem- ory overheads low. We describe and evaluate implemen- tations of the framework for branch, node and def-use testing of Java programs. Experimental results for branch testing show that our approach has, on average, a 1.6 speed up over static instrumentation and also uses less memory. Categories and Subject Descriptors D.2.5. [Software Engineering]: Testing and Debug- ging—Testing tools; D.3.3. [Programming Lan- guages]: Language Constructs and Features—Program instrumentation, run-time environments General Terms Experimentation, Measurement, Verification Keywords Testing, Code Coverage, Structural Testing, Demand- Driven Instrumentation, Java Programming Language 1. INTRODUCTION In the last several years, the importance of produc- ing high quality and robust software has become para- mount [15]. Testing is an important process to support quality assurance by gathering information about the behavior of the software being developed or modified. It is, in general, extremely labor and resource intensive, accounting for 50-60% of the total cost of software development [17]. Given the importance of testing, it is imperative that there are appropriate testing tools and frameworks. In order to adequately test software, a number of different testing techniques must be per- formed. One class of testing techniques used extensively is structural testing in which properties of the software code are used to ensure a certain code coverage.Struc- tural testing techniques include branch testing, node testing, path testing, and def-use testing [6,7,8,17,19]. Typically, a testing tool targets one type of struc- tural test, and the software unit is the program, file or particular methods. In order to apply various structural testing techniques, different tools must be used. If a tool for a particular type of structural testing is not available, the tester would need to either implement it or not use that testing technique. The tester would also be con- strained by the region of code to be tested, as deter- mined by the tool implementor. For example, it may not be possible for the tester to focus on a particular region of code, such as a series of loops, complicated condi- tionals, or particular variables if def-use testing is desired. The user may want to have higher coverage on frequently executed regions of code. Users may want to define their own way of testing. For example, all branches should be covered 10 times rather than once in all loops. In structural testing, instrumentation is placed at certain code points (probes). Whenever such a program point is reached, code that performs the function for the test (payload) is executed. The probes in def-use testing are dictated by the definitions and uses of variables and the payload is to mark that a definition or use in a def- use pair has been covered. Thus for each type of struc- tural testing, there is a testing “plan”. A test plan is a Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. ICSE'05, May 15-21, 2005, St. Louis, Missouri, USA. Copyright 2005 ACM 1-58113-963-2/05/0005...$5.00. Demand-Driven Structural Testing with Dynamic Instrumentation Jonathan Misurda†, James A. Clause†, Juliya L. Reed†, Bruce R. Childers†, and Mary Lou Soffa‡ †Department of Computer Science University of Pittsburgh Pittsburgh, Pennsylvania 15260 {jmisurda, clausej, juliya, childers}@cs.pitt.edu ‡Department of Computer Science University of Virginia Charlottesville, Virginia 22904 soffa@cs.virginia.edu 156 A Technique for Enabling and Supporting Debugging of Field Failures James Clause and Alessandro Orso College of Computing Georgia Institute of Technology {clause, orso}@cc.gatech.edu Abstract It is difficult to fully assess the quality of software in- house, outside the actual time and context in which it will execute after deployment. As a result, it is common for software to manifest field failures, failures that occur on user machines due to untested behavior. Field failures are typically difficult to recreate and investigate on developer platforms, and existing techniques based on crash report- ing provide only limited support for this task. In this pa- per, we present a technique for recording, reproducing, and minimizing failing executions that enables and supports in- house debugging of field failures. We also present a tool that implements our technique and an empirical study that evaluates the technique on a widely used e-mail client. 1. Introduction Quality-assurance activities, such as software testing and analysis, are notoriously difficult, expensive, and time- consuming. As a result, software products are often re- leased with faults or missing functionality. In fact, real- world examples of field failures experienced by users be- cause of untested behaviors (e.g., due to unforeseen us- ages), are countless. When field failures occur, it is im- portant for developers to be able to recreate and investigate them in-house. This pressing need is demonstrated by the emergence of several crash-reporting systems, such as Mi- crosoft’s error reporting systems [13] and Apple’s Crash Reporter [1]. Although these techniques represent a first important step in addressing the limitations of purely in- house approaches to quality assurance, they work on lim- ited data (typically, a snapshot of the execution state) and can at best identify correlations between a crash report and data on other known failures. In this paper, we present a novel technique for reproduc- ing and investigating field failures that addresses the limita- tions of existing approaches. Our technique works in three phases, intuitively illustrated by the scenario in Figure 1. In the recording phase, while users run the software, the tech- nique intercepts and logs the interactions between applica- tion and environment and records portions of the environ- ment that are relevant to these interactions. If the execution terminates with a failure, the produced execution recording is stored for later investigation. In the minimization phase, using free cycles on the user machines, the technique re- plays the recorded failing executions with the goal of au- tomatically eliminating parts of the executions that are not relevant to the failure. In the replay and debugging phase, developers can use the technique to replay the minimized failing executions and investigate the cause of the failures (e.g., within a debugger). Being able to replay and debug real field failures can give developers unprecedented insight into the behavior of their software after deployment and op- portunities to improve the quality of their software in ways that were not possible before. To evaluate our technique, we implemented it in a proto- type tool, called ADDA (Automated Debugging of Deployed Applications), and used the tool to perform an empirical study. The study was performed on PINE [19], a widely- used e-mail client, and involved the investigation of failures caused by two real faults in PINE. The results of the study are promising. Our technique was able to (1) record all ex- ecutions of PINE (and two other subjects) with a low time and space overhead, (2) completely replay all recorded exe- cutions, and (3) perform automated minimization of failing executions and obtain shorter executions that manifested the same failures as the original executions. Moreover, we were able to replay the minimized executions within a debugger, which shows that they could have actually been used to in- vestigate the failures. The contributions of this paper are: • A novel technique for recording and later replaying exe- cutions of deployed programs. • An approach for minimizing failing executions and gen- erating shorter executions that fail for the same reasons. • A prototype tool that implements our technique. • An empirical study that shows the feasibility and effec- tiveness of the approach. 29th International Conference on Software Engineering (ICSE'07) 0-7695-2828-7/07 $20.00 © 2007 Dytan: A Generic Dynamic Taint Analysis Framework James Clause, Wanchun Li, and Alessandro Orso College of Computing Georgia Institute of Technology {clause|wli7|orso}@cc.gatech.edu ABSTRACT Dynamic taint analysis is gaining momentum. Techniques based on dynamic tainting have been successfully used in the context of application security, and now their use is also being explored in dif- ferent areas, such as program understanding, software testing, and debugging. Unfortunately, most existing approaches for dynamic tainting are defined in an ad-hoc manner, which makes it difficult to extend them, experiment with them, and adapt them to new con- texts. Moreover, most existing approaches are focused on data-flow based tainting only and do not consider tainting due to control flow, which limits their applicability outside the security domain. To address these limitations and foster experimentation with dynamic tainting techniques, we defined and developed a general framework for dynamic tainting that (1) is highly flexible and customizable, (2) allows for performing both data-flow and control-flow based taint- ing conservatively, and (3) does not rely on any customized run- time system. We also present DYTAN, an implementation of our framework that works on x86 executables, and a set of preliminary studies that show how DYTAN can be used to implement different tainting-based approaches with limited effort. In the studies, we also show that DYTAN can be used on real software, by using FIRE- FOX as one of our subjects, and illustrate how the specific char- acteristics of the tainting approach used can affect efficiency and accuracy of the taint analysis, which further justifies the use of our framework to experiment with different variants of an approach. Categories and Subject Descriptors: D.2.5 [Software Engineer- ing]: Testing and Debugging; General Terms: Experimentation, Security Keywords: Dynamic tainting, information flow, general framework 1. INTRODUCTION Dynamic taint analysis (also known as dynamic information flow analysis) consists, intuitively, in marking and tracking certain data in a program at run-time. This type of dynamic analysis is be- coming increasingly popular. In the context of application secu- rity, dynamic-tainting approaches have been successfully used to prevent a wide range of attacks, including buffer overruns (e.g., [8, 17]), format string attacks (e.g., [17, 21]), SQL and command in- jections (e.g., [7, 19]), and cross-site scripting (e.g., [18]). More recently, researchers have started to investigate the use of tainting- based approaches in domains other than security, such as program understanding, software testing, and debugging (e.g., [11, 13]). Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. ISSTA’07, July 9–12, 2007, London, England, United Kingdom. Copyright 2007 ACM 978-1-59593-734-6/07/0007 ...$5.00. Unfortunately, most existing techniques and tools for dynamic taint analysis are defined in an ad-hoc manner, to target a specific problem or a small class of problems. It would be difficult to ex- tend or adapt such techniques and tools so that they can be used in other contexts. In particular, most existing approaches are focused on data-flow based tainting only, and do not consider tainting due to the control flow within an application, which limits their general applicability. Also, most existing techniques support either a sin- gle taint marking or a small, fixed number of markings, which is problematic in applications such as debugging. Finally, almost no existing technique handles the propagation of taint markings in a truly conservative way, which may be appropriate for the specific applications considered, but is problematic in general. Because de- veloping support for dynamic taint analysis is not only time con- suming, but also fairly complex, this lack of flexibility and gener- ality of existing tools and techniques is especially limiting for this type of dynamic analysis. To address these limitations and foster experimentation with dy- namic tainting techniques, in this paper we present a framework for dynamic taint analysis. We designed the framework to be general and flexible, so that it allows for implementing different kinds of techniques based on dynamic taint analysis with little effort. Users can leverage the framework to quickly develop prototypes for their techniques, experiment with them, and investigate trade-offs of dif- ferent alternatives. For a simple example, the framework could be used to investigate the cost effectiveness of considering different types of taint propagation for an application. Our framework has several advantages over existing approaches. First, it is highly flexible and customizable. It allows for easily specifying which program data should be tainted and how, how taint markings should be propagated at run-time, and where and how taint markings should be checked. Second, it allows for performing data-flow and both data-flow and control-flow based tainting. Third, from a more practical standpoint, it works on binaries, does not need access to source code, and does not rely on any customized hardware or operating system, which makes it broadly applicable. We also present DYTAN, an implementation of our framework that works on x86 binaries, and a set of preliminary studies per- formed using DYTAN. In the first set of studies, we report on our experience in using DYTAN to implement two tainting-based ap- proaches presented in the literature. Although preliminary, our ex- perience shows that we were able to implement these approaches completely and with little effort. The second set of studies illus- trates how the specific characteristics of a tainting approach can affect efficiency and accuracy of the taint analysis. In particular, we investigate how ignoring control-flow related propagation and over- looking some data-flow aspects can lead to unsafety. These results further justify the usefulness of experimenting with different varia- tions of dynamic taint analysis and assessing their tradeoffs, which can be done with limited effort using our framework. The second set of studies also shows the practical applicability of DYTAN, by successfully running it on the FIREFOX web browser. 196 Effective Memory Protection Using Dynamic Tainting James Clause, Ioannis Doudalis, Alessandro Orso, and Milos Prvulovic College of Computing Georgia Institute of Technology {clause|idoud|orso|milos}@cc.gatech.edu ABSTRACT Programs written in languages that provide direct access to memory through pointers often contain memory-related faults, which may cause non-deterministic failures and even security vulnerabilities. In this paper, we present a new technique based on dynamic taint- ing for protecting programs from illegal memory accesses. When memory is allocated, at runtime, our technique taints both the mem- ory and the corresponding pointer using the same taint mark. Taint marks are then suitably propagated while the program executes and are checked every time a memory address m is accessed through a pointer p; if the taint marks associated with m and p differ, the ex- ecution is stopped and the illegal access is reported. To allow for a low-overhead, hardware-assisted implementation of the approach, we make several key technical and engineering decisions in the definition of our technique. In particular, we use a configurable, low number of reusable taint marks instead of a unique mark for each area of memory allocated, which reduces the overhead of the approach without limiting its flexibility and ability to target most memory-related faults and attacks known to date. We also define the technique at the binary level, which lets us handle the (very) common case of applications that use third-party libraries whose source code is unavailable. To investigate the effectiveness and practicality of our approach, we implemented it for heap-allocated memory and performed a preliminary empirical study on a set of programs. Our results show that (1) our technique can identify a large class of memory-related faults, even when using only two unique taint marks, and (2) a hardware-assisted implementation of the technique could achieve overhead in the single digits. Categories and Subject Descriptors: D.2.5 [Software Engineering]: Test- ing and Debugging; C.0 [General]: Hardware/Software Interfaces; General Terms: Performance, Security Keywords: Illegal memory accesses, dynamic tainting, hardware support 1. INTRODUCTION Memory-related faults are a serious problem for languages that allow direct memory access through pointers. An important class of memory-related faults are what we call illegal memory accesses. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. ASE’07, November 5–9, 2007, Atlanta, Georgia, USA. Copyright 2007 ACM 978-1-59593-882-4/07/0011 ...$5.00. In languages such as C and C++, when memory allocation is re- quested, a currently-free area of memory m of the specified size is reserved. After m has been allocated, its initial address can be assigned to a pointer p, either immediately (e.g., in the case of heap allocated memory) or at a later time (e.g., when retrieving and storing the address of a local variable). From that point on, the only legal accesses to m through a pointer are accesses per- formed through p or through other pointers derived from p. (In Section 3, we clearly define what it means to derive a pointer from another pointer.) All other accesses to m are Illegal Memory Ac- cesses (IMAs), that is, accesses where a pointer is used to access memory outside the bounds of the memory area with which it was originally associated. IMAs are especially relevant for several reasons. First, they are caused by typical programming errors, such as array-out-of-bounds accesses and NULL pointer dereferences, and are thus widespread and common. Second, they often result in non-deterministic fail- ures that are hard to identify and diagnose; the specific effects of an IMA depend on several factors, such as memory layout, that may vary between executions. Finally, many security concerns such as viruses, worms, and rootkits use IMAs as their injection vectors. In this paper, we present a new dynamic technique for protecting programs against IMAs that is effective against most known types of illegal accesses. The basic idea behind the technique is to use dynamic tainting (or dynamic information flow) [8] to keep track of which memory areas can be accessed through which pointers, as follows. At runtime, our technique taints both allocated mem- ory and pointers using taint marks. Dynamic taint propagation, to- gether with a suitable handling of memory-allocation and deallo- cation operations, ensures that taint marks are appropriately prop- agated during execution. Every time the program accesses some memory through a pointer, our technique checks whether the ac- cess is legal by comparing the taint mark associated with the mem- ory and the taint mark associated with the pointer used to access it. If the marks match, the access is considered legitimate. Otherwise, the execution is stopped and an IMA is reported. In defining our approach, our final goal is the development of a low-overhead, hardware-assisted tool that is practical and can be used on deployed software. A hardware-assisted tool is a tool that leverages the benefits of both hardware and software. Typically, some performance critical aspects are moved to the hardware to achieve maximum efficiency, while software is used to perform op- erations that would be too complex to implement in hardware. There are two main characteristics of our approach that were de- fined to help achieve our goal of a hardware-assisted implementa- tion. The first characteristic is that our technique only uses a small, configurable number of reusable taint marks instead of a unique mark for each area of memory allocated. Using a low number of 283 Penumbra: Automatically Identifying Failure-Relevant Inputs Using Dynamic Tainting James Clause College of Computing Georgia Institute of Technology clause@cc.gatech.edu Alessandro Orso College of Computing Georgia Institute of Technology orso@cc.gatech.edu ABSTRACT Most existing automated debugging techniques focus on re- ducing the amount of code to be inspected and tend to ig- nore an important component of software failures: the in- puts that cause the failure to manifest. In this paper, we present a new technique based on dynamic tainting for au- tomatically identifying subsets of a program’s inputs that are relevant to a failure. The technique (1) marks program inputs when they enter the application, (2) tracks them as they propagate during execution, and (3) identifies, for an observed failure, the subset of inputs that are potentially relevant for debugging that failure. To investigate feasibil- ity and usefulness of our technique, we created a prototype tool, penumbra, and used it to evaluate our technique on several failures in real programs. Our results are promising, as they show that penumbra can point developers to inputs that are actually relevant for investigating a failure and can be more practical than existing alternative approaches. Categories and Subject Descriptors D.2.5 [Software Engineering]: Testing and Debugging General Terms Algorithms, Experimentation, Reliability Keywords Failure-relevant inputs, automated debugging, dynamic in- formation flow, dynamic tainting 1. INTRODUCTION Debugging is known to be a labor-intensive, time-consum- ing task that can be responsible for a large portion of soft- ware development and maintenance costs [21,23]. Common characteristics of modern software, such as increased con- figurability, larger code bases, and increased input sizes, in- troduce new challenges for debugging and exacerbate exist- ing problems. In response, researchers have proposed many Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. ISSTA’09, July 19–23, 2009, Chicago, Illinois, USA. Copyright 2009 ACM 978-1-60558-338-9/09/07 ...$5.00. semi- and fully-automated techniques that attempt to re- duce the cost of debugging (e.g., [8,9,11–13,18,24,25,27]). The majority of these techniques are code-centric in that they focus exclusively on one aspect of debugging—trying to identify the faulty statements responsible for a failure. Although code-centric approaches can work well in some cases (e.g., for isolated faults that involve a single state- ment), they are often inadequate for more complex faults [4]. Faults of omission, for instance, where part of a specification has not been implemented, are notoriously problematic for debugging techniques that attempt to identify potentially faulty statements. The usefulness of code-centric techniques is also limited in the case of long-running programs and pro- grams that process large amounts of information; failures in these types of programs are typically di⌅cult to understand without considering the data involved in such failures. To debug failures more e ectively, it is necessary to pro- vide developers with not only a relevant subset of state- ments, but also a relevant subset of inputs. There are only a few existing techniques that attempt to identify relevant inputs [3, 17, 25], with delta debugging [25] being the most known of these. Although delta debugging has been shown to be an e ective technique for automatic debugging, it also has several drawbacks that may limit its usefulness in prac- tice. In particular, it requires (1) multiple executions of the program being debugged, which can involve a long running time, and (2) complex oracles and setup, which can result in a large amount of manual e ort [2]. In this paper, we present a novel debugging technique that addresses many of the limitations of existing approaches. Our technique can complement code-centric debugging tech- niques because it focuses on identifying program inputs that are likely to be relevant for a given failure. It also overcomes some of the drawbacks of delta debugging because it needs a single execution to identify failure-relevant inputs and re- quires minimal manual e ort. Given an observable faulty behavior and a set of failure- inducing inputs (i.e., a set of inputs that cause such behav- ior), our technique automatically identifies failure-relevant inputs (i.e., a subset of failure-inducing inputs that are ac- tually relevant for investigating the faulty behavior). Our approach is based on dynamic tainting. Intuitively, the tech- nique works by tracking the flow of inputs along data and control dependences at runtime. When a point of failure is reached, the tracked information is used to identify and present to developers the failure-relevant inputs. At this point, developers can use the identified inputs to investigate the failure at hand. LEAKPOINT: Pinpointing the Causes of Memory Leaks James Clause College of Computing Georgia Institute of Technology clause@cc.gatech.edu Alessandro Orso College of Computing Georgia Institute of Technology orso@cc.gatech.edu ABSTRACT Most existing leak detection techniques for C and C++ applications only detect the existence of memory leaks. They do not provide any help for fixing the underlying memory management errors. In this paper, we present a new technique that not only detects leaks, but also points developers to the locations where the underlying errors may be fixed. Our technique tracks pointers to dynamically- allocated areas of memory and, for each memory area, records sev- eral pieces of relevant information. This information is used to identify the locations in an execution where memory leaks occur. To investigate our technique’s feasibility and usefulness, we devel- oped a prototype tool called LEAKPOINT and used it to perform an empirical evaluation. The results of this evaluation show that LEAKPOINT detects at least as many leaks as existing tools, reports zero false positives, and, most importantly, can be effective at help- ing developers fix the underlying memory management errors. Categories and Subject Descriptors D.2.5 [Software Engineering]: Testing and Debugging General Terms Performance, Reliability Keywords Leak detection, Dynamic tainting 1. INTRODUCTION Memory leaks are a type of unintended memory consumption that can adversely impact the performance and correctness of an application. In programs written in languages such as C and C++, memory is allocated using allocation functions, such as malloc and new. Allocation functions reserve a currently free area of memory m and return a pointer p that points to m’s starting ad- dress. Typically, the program stores and then uses p, or another This work was supported in part by NSF awards CCF-0725202 and CCF-0541080 to Georgia Tech. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. ICSE ’10, May 2-8 2010, Cape Town, South Africa Copyright 2010 ACM 978-1-60558-719-6/10/05 ...$10.00. pointer derived from p, to interact with m. When m is no longer needed, the program should pass p to a deallocation function (e.g., free or delete) to deallocate m. A leak occurs if, due to a memory management error, m is not deallocated at the appropri- ate time. There are two types of memory leaks: lost memory and forgotten memory. Lost memory refers to the situation where m be- comes unreachable (i.e., the program overwrites or loses p and all pointers derived from p) without first being deallocated. Forgotten memory refers to the situation where m remains reachable but is not deallocated or accessed in the rest of the execution. Memory leaks are relevant for several reasons. First, they are dif- ficult to detect. Unlike many other types of failures, memory leaks do not immediately produce an easily visible symptom (e.g., a crash or the output of a wrong value); typically, leaks remain unobserved until they consume a large portion of the memory available to a sys- tem. Second, leaks have the potential to impact not only the appli- cation that leaks memory, but also every other application running on the system; because the overall amount of memory is limited, as the memory usage of a leaking program increases, less memory is available to other running applications. Consequently, the per- formance and correctness of every running application can be im- pacted by a program that leaks memory. Third, leaks are common, even in mature applications. For example, in the first half of 2009, over 100 leaks in the Firefox web-browser were reported [18]. Because of the serious consequences and common occurrence of memory leaks, researchers have created many static and dynamic techniques for detecting them (e.g., [1,2,4,7–14,16,17,20–23,25, 27,28]). The adoption of static techniques has been limited by sev- eral factors, including the lack of scalable, precise heap modeling. Dynamic techniques are therefore more widely used in practice. In general, dynamic techniques provide one main piece of informa- tion: the location in an execution where a leaked area of memory is allocated. This location is supposed to serve as a starting point for investigating the leak. However, in many situations, this informa- tion does not provide any insight on where or how to fix the mem- ory management error that causes the leak: the allocation location and the location of the memory management error are typically in completely different parts of the application’s code. To address this limitation of existing approaches, we propose a new memory leak detection technique. Our technique provides the same information as existing techniques but also identifies the locations in an execution where leaks occur. In the case of lost memory, the location is defined as the point in an execution where the last pointer to an unallocated memory area is lost or overwritten. In the case of forgotten memory, the location is defined as the last point in an execution where a pointer to a leaked area of memory was used (e.g., when it is dereferenced to read or write memory, passed as a function argument, returned from a function, or used as Camouflage: Automated Sanitization of Field Data James Clause College of Computing Georgia Institute of Technology clause@cc.gatech.edu Alessandro Orso College of Computing Georgia Institute of Technology orso@cc.gatech.edu ABSTRACT Privacy and security concerns have adversely a ected the usefulness of many types of techniques that leverage infor- mation gathered from deployed applications. To address this issue, we present a new approach for automatically sanitiz- ing failure-inducing inputs. Given an input I that causes a failure f, our technique can generate a sanitized input I that is di erent from I but still causes f. I can then be sent to the developers to help them debug f, without revealing the possibly sensitive information contained in I. We im- plemented our approach in a prototype tool, camouflage, and performed an empirical evaluation. In the evaluation, we applied camouflage to a large set of failure-inducing inputs for several real applications. The results of the eval- uation are promising; they show that camouflage is both practical and e ective at generating sanitized inputs. In par- ticular, for the inputs that we considered, I and I shared no sensitive information. 1. INTRODUCTION Investigating techniques that capture data from deployed applications to support in-house software engineering tasks is an increasingly active and successful area of research (e.g., [1,3–5,13,14,17,21,22,26,27,29]). However, privacy and se- curity concerns have prevented widespread adoption of many of these techniques and, because they rely on user partici- pation, have ultimately limited their usefulness. Many of the earlier proposed techniques attempt to sidestep these concerns by collecting only limited amounts of information (e.g., stack traces and register dumps [1, 3, 5] or sampled branch profiles [26,27]) and providing a privacy policy that specifies how the information will be used (e.g., [2,8]). Be- cause the types of information collected by these techniques are unlikely to be sensitive, users are more willing to trust developers. Moreover, because only a small amount of infor- mation is collected, it is feasible for users to manually inspect and sanitize such information before it is sent to developers. Unfortunately, recent research has shown that the e ec- tiveness of these techniques increases when they can lever- age large amounts of detailed information (e.g., complete execution recordings [4, 14] or path profiles [13, 24]). Since more detailed information is bound to contain sensitive data, users will most likely be unwilling to let developers collect such information. In addition, collecting large amounts of information would make it infeasible for users to sanitize the collected information by hand. To address this prob- lem, some of these techniques suggest using an input mini- mization approach (e.g., [6, 7, 35]) to reduce the number of failure-inducing inputs and, hopefully, eliminate some sensi- tive information. Input-minimization techniques, however, were not designed to specifically reduce sensitive inputs, so they can only eliminate sensitive data by chance. In or- der for techniques that leverage captured field information to become widely adopted and achieve their full potential, new approaches for addressing privacy and security concerns must be developed. In this paper, we present a novel technique that addresses privacy and security concerns by sanitizing information cap- tured from deployed applications. Our technique is designed to be used in conjunction with an execution capture/replay technique (e.g., [4, 14]). Given an execution recording that contains a captured failure-inducing input I = i1, i2, . . . in⇥ and terminates with a failure f, our technique replays the execution recording and leverages a specialized version of symbolic-execution to automatically produce I , a sanitized version of I, such that I (1) still causes f and (2) reveals as little information about I as possible. A modified execution recording where I replaces I can then be constructed and sent to the developers, who can use it to debug f. It is, in general, impossible to construct I such that it does not reveal any information about I while still caus- ing the same failure f. Typically, the execution of f would depend on the fact that some elements of I have specific values (e.g., i1 must be 0 for the failing path to be taken). However, this fact does not prevent the technique from be- ing useful in practice. In our evaluation, we found that the information revealed by the sanitized inputs was not sensi- tive and tended to be structural in nature (e.g., a specific portion of the input must be surrounded by double quotes). Conversely, the parts of the inputs that were more likely to be sensitive (e.g., values contained inside the double quotes) were not revealed (see Section 4). To evaluate the e ectiveness of our technique, we imple- mented it in a prototype tool, called camouflage, and car- ried out an empirical evaluation of 170 failure-inducing in- 1 CC 05 ICSE 05 ISSTA 07 ASE 07 ISSTA 09 ICSE 10 Tech ReptICSE 07 Dynamic tainting based analyses Enabling more efficient debugging RESEARCH OVERVIEW
  • 11. OVERALL PICTURE Field failures:Anomalous behavior (or crashes) of deployed software that occur on user machines • Difficult to debug • Relevant to users
  • 14. CURRENT PRACTICE Ask the user I frobbed the thingummy like the guy told me. Then I spun the doodad widdershins and a little thinger popped up and it just stopped working.
  • 15. CURRENT PRACTICE Ask the user I opened my web browser. Specifically, I clicked on the dock icon. It bounced twice before crashing. Please help.
  • 17. CURRENT PRACTICE Gather static information Difficult to reproduce the failure
  • 18. CURRENT PRACTICE Gather static information Difficult to reproduce the failure Locations only correlated with the failure Liblit et al. 03 Tucek et al. 07 Chilimbi et al. 09 ...
  • 21. OUR SOLUTION Record failing executions in the field Replay failing executions in house +
  • 22. OUR SOLUTION Record failing executions in the field Replay failing executions in house Debug field failures effectively +
  • 23. In the fieldIn house USAGE SCENARIO ✘Replay / Debug Develop Record Captured failure
  • 24. In the fieldIn house USAGE SCENARIO ✘Replay / Debug Develop Record Captured failure Oracle
  • 25. In the fieldIn house USAGE SCENARIO ✘Replay / Debug Develop Record Captured failure
  • 28. 345345 PRACTICALITY ISSUES Large in size Contain sensitive information ✘
  • 29. 345345 PRACTICALITY ISSUES Large in size Contain sensitive information Minimize ✘ ✘
  • 30. 345345 PRACTICALITY ISSUES Large in size Contain sensitive information Minimize Anonymize ✘ ✘
  • 31. In the fieldIn house Replay / Debug Develop Record Captured failure MinimizeAnonymize USAGE SCENARIO ✘ ✘
  • 37. EXISTING RECORD/REPLAY APPROACHES Chen et al. 01, King et al. 05 Narayanasamy et al. 05, Netzer and Weaver 94, Srinivasan et al. 04,VMWare Exactly replay everything
  • 38. EXISTING RECORD/REPLAY APPROACHES Not amenable to minimization or anonymization Unacceptable runtime overhead Chen et al. 01, King et al. 05 Narayanasamy et al. 05, Netzer and Weaver 94, Srinivasan et al. 04,VMWare Exactly replay everything
  • 39. Not amenable to minimization or anonymization Unacceptable runtime overhead Record low-level events • numerous • high interdependence RECORD & REPLAY
  • 40. Record high-level events • fewer in number • low interdependence Amenable to minimization or anonymization Acceptable runtime overhead RECORD & REPLAY
  • 44. ENVIRONMENT INTERACTIONSStreams Files Interaction Events: FILE — interaction with a file POLL — checks for availability of data on a stream PULL — read data from a stream
  • 45. Environment data (streams): Event log: Environment data (files):
  • 46. Environment data (streams): Event log: Environment data (files):
  • 47. Environment data (streams): Event log: Environment data (files): FILE foo.1
  • 48. Environment data (streams): Event log: Environment data (files): FILE foo.1 foo.1
  • 49. Environment data (streams): Event log: Environment data (files): FILE foo.1 foo.1
  • 50. Environment data (streams): Event log: Environment data (files): FILE foo.1 foo.1 POLL KEYBOARD NOK
  • 51. Environment data (streams): Event log: Environment data (files): FILE foo.1 foo.1 POLL KEYBOARD NOK
  • 52. Environment data (streams): Event log: Environment data (files): FILE foo.1 foo.1 KEYBOARD: {5680} POLL KEYBOARD OK POLL KEYBOARD NOK
  • 53. Environment data (streams): Event log: Environment data (files): FILE foo.1 foo.1 KEYBOARD: {5680} POLL KEYBOARD OK POLL KEYBOARD NOK
  • 54. Environment data (streams): Event log: Environment data (files): FILE foo.1 foo.1 KEYBOARD: {5680}hello POLL KEYBOARD OK PULL KEYBOARD 5 POLL KEYBOARD NOK
  • 55. Environment data (streams): Event log: Environment data (files): FILE foo.1 foo.1 KEYBOARD: {5680}hello POLL KEYBOARD OK PULL KEYBOARD 5 POLL KEYBOARD NOK
  • 56. Environment data (streams): Event log: Environment data (files): FILE foo.1 foo.1 KEYBOARD: {5680}hello POLL KEYBOARD OK PULL KEYBOARD 5 POLL KEYBOARD NOK POLL NETWORK OK NETWORK: {3405}
  • 57. Environment data (streams): Event log: Environment data (files): FILE foo.1 foo.1 KEYBOARD: {5680}hello POLL KEYBOARD OK PULL KEYBOARD 5 POLL KEYBOARD NOK POLL NETWORK OK NETWORK: {3405} ❙
  • 58. Environment data (streams): Event log: Environment data (files): FILE foo.1 foo.1 KEYBOARD: {5680}hello POLL KEYBOARD OK PULL KEYBOARD 5 POLL KEYBOARD NOK POLL NETWORK OK NETWORK: {3405} ❙
  • 59. Environment data (files): Event log: Environment data (streams): KEYBOARD: {5680}hello ❙ {4056}c ❙ {300}... NETWORK: {3405}<html><body>... ❙ {202}... FILE foo.1 POLL KEYBOARD NOK POLL KEYBOARD OK PULL KEYBOARD 5 POLL NETWORK OK PULL NETWORK 1024 FILE bar.1 POLL NETWORK NOK POLL NETWORK OK FILE foo.2 ... PULL NETWORK 1024 FILE foo.2 POLL KEYBOARD NOK ... foo.1 foo.2 bar.1
  • 60. Environment data (files): Event log: Environment data (streams): KEYBOARD: {5680}hello ❙ {4056}c ❙ {300}... NETWORK: {3405}<html><body>... ❙ {202}... FILE foo.1 POLL KEYBOARD NOK POLL KEYBOARD OK PULL KEYBOARD 5 POLL NETWORK OK PULL NETWORK 1024 FILE bar.1 POLL NETWORK NOK POLL NETWORK OK FILE foo.2 ... PULL NETWORK 1024 FILE foo.2 POLL KEYBOARD NOK ... foo.1 foo.2 bar.1
  • 61. Environment data (files): Event log: Environment data (streams): KEYBOARD: {5680}hello ❙ {4056}c ❙ {300}... NETWORK: {3405}<html><body>... ❙ {202}... FILE foo.1 POLL KEYBOARD NOK POLL KEYBOARD OK PULL KEYBOARD 5 POLL NETWORK OK PULL NETWORK 1024 FILE bar.1 POLL NETWORK NOK POLL NETWORK OK FILE foo.2 ... PULL NETWORK 1024 FILE foo.2 POLL KEYBOARD NOK ... foo.1 foo.2 bar.1
  • 62. Environment data (files): Event log: Environment data (streams): KEYBOARD: {5680}hello ❙ {4056}c ❙ {300}... NETWORK: {3405}<html><body>... ❙ {202}... FILE foo.1 POLL KEYBOARD NOK POLL KEYBOARD OK PULL KEYBOARD 5 POLL NETWORK OK PULL NETWORK 1024 FILE bar.1 POLL NETWORK NOK POLL NETWORK OK FILE foo.2 ... PULL NETWORK 1024 FILE foo.2 POLL KEYBOARD NOK ... foo.1 foo.2 bar.1
  • 63. Environment data (files): Event log: Environment data (streams): KEYBOARD: {5680}hello ❙ {4056}c ❙ {300}... NETWORK: {3405}<html><body>... ❙ {202}... FILE foo.1 POLL KEYBOARD NOK POLL KEYBOARD OK PULL KEYBOARD 5 POLL NETWORK OK PULL NETWORK 1024 FILE bar.1 POLL NETWORK NOK POLL NETWORK OK FILE foo.2 ... PULL NETWORK 1024 FILE foo.2 POLL KEYBOARD NOK ... foo.1 foo.2 bar.1
  • 64. Environment data (files): Event log: Environment data (streams): KEYBOARD: {5680}hello ❙ {4056}c ❙ {300}... NETWORK: {3405}<html><body>... ❙ {202}... FILE foo.1 POLL KEYBOARD NOK POLL KEYBOARD OK PULL KEYBOARD 5 POLL NETWORK OK PULL NETWORK 1024 FILE bar.1 POLL NETWORK NOK POLL NETWORK OK FILE foo.2 ... PULL NETWORK 1024 FILE foo.2 POLL KEYBOARD NOK ... foo.1 foo.2 bar.1 ✔
  • 65. Environment data (files): Event log: Environment data (streams): KEYBOARD: {5680}hello ❙ {4056}c ❙ {300}... NETWORK: {3405}<html><body>... ❙ {202}... FILE foo.1 POLL KEYBOARD NOK POLL KEYBOARD OK PULL KEYBOARD 5 POLL NETWORK OK PULL NETWORK 1024 FILE bar.1 POLL NETWORK NOK POLL NETWORK OK FILE foo.2 ... PULL NETWORK 1024 FILE foo.2 POLL KEYBOARD NOK ... foo.1 foo.2 bar.1 ✔
  • 66. Environment data (files): Event log: Environment data (streams): KEYBOARD: {5680}hello ❙ {4056}c ❙ {300}... NETWORK: {3405}<html><body>... ❙ {202}... FILE foo.1 POLL KEYBOARD NOK POLL KEYBOARD OK PULL KEYBOARD 5 POLL NETWORK OK PULL NETWORK 1024 FILE bar.1 POLL NETWORK NOK POLL NETWORK OK FILE foo.2 ... PULL NETWORK 1024 FILE foo.2 POLL KEYBOARD NOK ... foo.1 foo.2 bar.1 ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔
  • 68. EVALUATION Prototype implementation: • maps libc function calls to interaction events Subjects: • several cpu intensive applications (e.g., bzip, gcc) Results: • negligible overheads • (i.e. less than 10%) • data size is acceptable • application dependent
  • 70. ✘ 345 Large in size MINIMIZATION✘ Goal: focus developer effort
  • 75. TIME MINIMIZATION Event log: Environment data (streams): KEYBOARD: {5680}hello ❙ {4056}c ❙ {300}... NETWORK: {3405}<html><body>... ❙ {202}... FILE foo.1 POLL KEYBOARD NOK POLL KEYBOARD OK PULL KEYBOARD 5 POLL NETWORK OK PULL NETWORK 1024 FILE bar.1 POLL NETWORK NOK POLL NETWORK OK FILE foo.2 ... PULL NETWORK 1024 FILE foo.2 POLL KEYBOARD NOK ...
  • 76. TIME MINIMIZATION Event log: Environment data (streams): KEYBOARD: {5680}hello ❙ {4056}c ❙ {300}... NETWORK: {3405}<html><body>... ❙ {202}... FILE foo.1 POLL KEYBOARD NOK POLL KEYBOARD OK PULL KEYBOARD 5 POLL NETWORK OK PULL NETWORK 1024 FILE bar.1 POLL NETWORK NOK POLL NETWORK OK FILE foo.2 ... PULL NETWORK 1024 FILE foo.2 POLL KEYBOARD NOK ... Remove idle time
  • 77. TIME MINIMIZATION Event log: Environment data (streams): KEYBOARD: {5680}hello ❙ {4056}c ❙ {300}... NETWORK: {3405}<html><body>... ❙ {202}... FILE foo.1 POLL KEYBOARD NOK POLL KEYBOARD OK PULL KEYBOARD 5 POLL NETWORK OK PULL NETWORK 1024 FILE bar.1 POLL NETWORK NOK POLL NETWORK OK FILE foo.2 ... PULL NETWORK 1024 FILE foo.2 POLL KEYBOARD NOK ... Remove idle time
  • 78. TIME MINIMIZATION Event log: Environment data (streams): KEYBOARD: {5680}hello ❙ {4056}c ❙ {300}... NETWORK: {3405}<html><body>... ❙ {202}... FILE foo.1 POLL KEYBOARD NOK POLL KEYBOARD OK PULL KEYBOARD 5 POLL NETWORK OK PULL NETWORK 1024 FILE bar.1 POLL NETWORK NOK POLL NETWORK OK FILE foo.2 ... PULL NETWORK 1024 FILE foo.2 POLL KEYBOARD NOK ... Remove idle time Remove delays
  • 79. TIME MINIMIZATION Event log: Environment data (streams): KEYBOARD: {5680}hello ❙ {4056}c ❙ {300}... NETWORK: {3405}<html><body>... ❙ {202}... FILE foo.1 POLL KEYBOARD NOK POLL KEYBOARD OK PULL KEYBOARD 5 POLL NETWORK OK PULL NETWORK 1024 FILE bar.1 POLL NETWORK NOK POLL NETWORK OK FILE foo.2 ... PULL NETWORK 1024 FILE foo.2 POLL KEYBOARD NOK ... Remove idle time Remove delays
  • 80. DATA MINIMIZATION Environment data (files): foo.1 foo.2 bar.1 Whole entities Chunks Atoms
  • 81. DATA MINIMIZATION Environment data (files): foo.2 bar.1 Lorem ipsum dolor sit amet, consetetur sadipscing elitr,sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et Whole entities Chunks Atoms
  • 82. DATA MINIMIZATION Environment data (files): foo.2 bar.1 Lorem ipsum dolor sit amet, consetetur sadipscing elitr,sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et Whole entities Chunks Atoms
  • 83. DATA MINIMIZATION Environment data (files): foo.2 bar.1 Whole entities Chunks Atoms
  • 84. DATA MINIMIZATION Environment data (files): bar.1 Lorem ipsum dolor sit amet, consetetur sadipscing elitr,sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et Whole entities Chunks Atoms
  • 85. DATA MINIMIZATION Environment data (files): bar.1 Lorem ipsum dolor sit amet, consetetur sadipscing elitr,sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et Whole entities Chunks Atoms
  • 86. DATA MINIMIZATION Environment data (files): bar.1 Lorem ipsum dolor sit amet, consetetur sadipscing elitr,sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et Whole entities Chunks Atoms
  • 87. DATA MINIMIZATION Environment data (files): bar.1 Lorem ipsum dolor sit amet, consetetur sadipscing elitr,sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et Whole entities Chunks Atoms
  • 88. DATA MINIMIZATION Environment data (files): bar.1 sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et Whole entities Chunks Atoms
  • 89. DATA MINIMIZATION Environment data (files): bar.1 sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et Whole entities Chunks Atoms
  • 90. DATA MINIMIZATION Environment data (files): bar.1 Whole entities Chunks Atoms sadipscing elitr, eirmod invidunt ut labore dolore magna erat, voluptua.
  • 91. DATA MINIMIZATION Environment data (files): bar.1 Whole entities Chunks Atoms sadipscing elitr, eirmod invidunt ut labore dolore magna erat, voluptua. foo.2
  • 92. DATA MINIMIZATION Environment data (files): Whole entities Chunks Atoms sadipscing elitr, eirmod invidunt ut labore dolore magna erat, voluptua. foo.2
  • 93. EVALUATION Can the technique produce, in a reasonable amount of time, minimized executions that can be used to debug the original failure?
  • 94. EVALUATION Can the technique produce, in a reasonable amount of time, minimized executions that can be used to debug the original failure? Pine email and news client • two real field failures • 20 failing executions, 10 per failure
  • 95. EVALUATION Can the technique produce, in a reasonable amount of time, minimized executions that can be used to debug the original failure? Pine email and news client • two real field failures • 20 failing executions, 10 per failure Minimized executions generated by • randomly generating interaction scripts • manually performing the scripts (while recording) • minimizing the captured executions
  • 96. RESULTS Header-color fault Address book fault 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% # entities streams size files size Averagevalueafterminimization
  • 97. RESULTS Header-color fault Address book fault Results are likely to be conservative; recorded executions only contain the minimal amount of data needed to perform an action. 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% # entities streams size files size Averagevalueafterminimization
  • 98. RESULTS Header-color fault Address book fault Results are likely to be conservative; recorded executions only contain the minimal amount of data needed to perform an action. 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% # entities streams size files size Averagevalueafterminimization Inputs can be minimized in a reasonable amount of time (less then 75 minutes)
  • 100. RESULTS HEADER COLOR FAULT 1. color is enabled 2. one or more colors are added 3. all colors are removed Crash when:
  • 101. RESULTS HEADER COLOR FAULT 1. color is enabled 2. one or more colors are added 3. all colors are removed Crash when: Recorded execution: 34 files and streams ≈800kb
  • 102. Minimized execution: 1 stream 4 files ≈72kb (partial) RESULTS HEADER COLOR FAULT 1. color is enabled 2. one or more colors are added 3. all colors are removed Crash when: Recorded execution: 34 files and streams ≈800kb
  • 105. Sensitive input (I) that causes F Input domain ANONYMIZATION
  • 106. Sensitive input (I) that causes F Input domain Inputs that cause F ANONYMIZATION
  • 107. Sensitive input (I) that causes F Input domain Inputs that cause F ANONYMIZATION Anonymized input (I’) that also causes F
  • 108. Inputs that satisfy F’s path condition Sensitive input (I) that causes F Input domain Inputs that cause F ANONYMIZATION Anonymized input (I’) that also causes F
  • 109. PATH CONDITION GENERATION Path condition: set of constraints on a program’s inputs that encode the conditions necessary for a specific path to be executed.
  • 110. boolean foo(int x, int y, int z) { if(x <= 5) { int a = x * 2; if(y + a > 10) { if(z == 0) { return true; } } } return false; } PATH CONDITION GENERATION
  • 111. boolean foo(int x, int y, int z) { if(x <= 5) { int a = x * 2; if(y + a > 10) { if(z == 0) { return true; } } } return false; } PATH CONDITION GENERATION
  • 112. boolean foo(int x, int y, int z) { if(x <= 5) { int a = x * 2; if(y + a > 10) { if(z == 0) { return true; } } } return false; } PATH CONDITION GENERATION 5 3 0
  • 113. boolean foo(int x, int y, int z) { if(x <= 5) { int a = x * 2; if(y + a > 10) { if(z == 0) { return true; } } } return false; } PATH CONDITION GENERATION 5 3 0 (sensitive)
  • 114. Path Condition: Symbolic State: boolean foo(int x, int y, int z) { if(x <= 5) { int a = x * 2; if(y + a > 10) { if(z == 0) { return true; } } } return false; } PATH CONDITION GENERATION 5 3 0 (sensitive)
  • 115. Path Condition: Symbolic State: boolean foo(int x, int y, int z) { if(x <= 5) { int a = x * 2; if(y + a > 10) { if(z == 0) { return true; } } } return false; } PATH CONDITION GENERATION 5 3 0 x→i1 y→i2 z→i3 (sensitive)
  • 116. Path Condition: Symbolic State: boolean foo(int x, int y, int z) { if(x <= 5) { int a = x * 2; if(y + a > 10) { if(z == 0) { return true; } } } return false; } PATH CONDITION GENERATION 5 3 0 x→i1 y→i2 z→i3 (sensitive)
  • 117. Path Condition: i1 <= 5 Symbolic State: boolean foo(int x, int y, int z) { if(x <= 5) { int a = x * 2; if(y + a > 10) { if(z == 0) { return true; } } } return false; } PATH CONDITION GENERATION 5 3 0 x→i1 y→i2 z→i3 (sensitive)
  • 118. Path Condition: i1 <= 5 Symbolic State: boolean foo(int x, int y, int z) { if(x <= 5) { int a = x * 2; if(y + a > 10) { if(z == 0) { return true; } } } return false; } PATH CONDITION GENERATION 5 3 0 x→i1 y→i2 z→i3 (sensitive)
  • 119. Path Condition: i1 <= 5 Symbolic State: a→i1*2 boolean foo(int x, int y, int z) { if(x <= 5) { int a = x * 2; if(y + a > 10) { if(z == 0) { return true; } } } return false; } PATH CONDITION GENERATION 5 3 0 x→i1 y→i2 z→i3 (sensitive)
  • 120. Path Condition: i1 <= 5 Symbolic State: a→i1*2 boolean foo(int x, int y, int z) { if(x <= 5) { int a = x * 2; if(y + a > 10) { if(z == 0) { return true; } } } return false; } PATH CONDITION GENERATION 5 3 0 x→i1 y→i2 z→i3 (sensitive)
  • 121. Path Condition: i1 <= 5 Symbolic State: a→i1*2 boolean foo(int x, int y, int z) { if(x <= 5) { int a = x * 2; if(y + a > 10) { if(z == 0) { return true; } } } return false; } PATH CONDITION GENERATION 5 3 0 x→i1 y→i2 z→i3 ∧ i2+i1*2 > 10 (sensitive)
  • 122. Path Condition: i1 <= 5 Symbolic State: a→i1*2 boolean foo(int x, int y, int z) { if(x <= 5) { int a = x * 2; if(y + a > 10) { if(z == 0) { return true; } } } return false; } PATH CONDITION GENERATION 5 3 0 x→i1 y→i2 z→i3 ∧ i2+i1*2 > 10 (sensitive)
  • 123. Path Condition: i1 <= 5 Symbolic State: a→i1*2 boolean foo(int x, int y, int z) { if(x <= 5) { int a = x * 2; if(y + a > 10) { if(z == 0) { return true; } } } return false; } PATH CONDITION GENERATION 5 3 0 x→i1 y→i2 z→i3 ∧ i2+i1*2 > 10 ∧ i3 == 0 (sensitive)
  • 124. CHOOSING ANONYMIZED INPUTS Path Condition: i1 <= 5 ∧ i2+i1*2 > 10 ∧ i3 == 0
  • 127. Constraint Solver CHOOSING ANONYMIZED INPUTS Path Condition: i1 <= 5 ∧ i2+i1*2 > 10 ∧ i3 == 0 i1 == 5 i2 == 3 i3 == 0
  • 128. Constraint Solver CHOOSING ANONYMIZED INPUTS Path Condition: i1 <= 5 ∧ i2+i1*2 > 10 ∧ i3 == 0 i1 == 5 i2 == 3 i3 == 0 boolean foo(int x, int y, int z) { if(x <= 5) { int a = x * 2; if(y + a > 10) { if(z == 0) { return true; } } } return false; } 5 3 0
  • 130. Constraint Solver CHOOSING ANONYMIZED INPUTS Path Condition: i1 <= 5 ∧ i2+i1*2 > 10 ∧ i3 == 0 Input Constraints: i1 != 5 ∧ i2 != 3 ∧ i3 != 0
  • 131. Constraint Solver CHOOSING ANONYMIZED INPUTS Path Condition: i1 <= 5 ∧ i2+i1*2 > 10 ∧ i3 == 0 Input Constraints: i1 != 5 ∧ i2 != 3 ∧ i3 != 0 (breakable)
  • 132. Constraint Solver CHOOSING ANONYMIZED INPUTS Path Condition: i1 <= 5 ∧ i2+i1*2 > 10 ∧ i3 == 0 Input Constraints: i1 != 5 ∧ i2 != 3 ∧ i3 != 0 (breakable)
  • 133. Constraint Solver CHOOSING ANONYMIZED INPUTS Path Condition: i1 <= 5 ∧ i2+i1*2 > 10 ∧ i3 == 0 Input Constraints: i1 != 5 ∧ i2 != 3 ∧ i3 != 0 i1 == 4 i2 == 10 i3 == 0 (breakable)
  • 134. PATH CONDITION RELAXATION Sensitive input (I) that causes F Input domain
  • 135. PATH CONDITION RELAXATION Sensitive input (I) that causes F Input domain
  • 136. PATH CONDITION RELAXATION Sensitive input (I) that causes F Input domain
  • 137. PATH CONDITION RELAXATION Sensitive input (I) that causes F Input domain
  • 138. PATH CONDITION RELAXATION Sensitive input (I) that causes F Input domain
  • 139. PATH CONDITION RELAXATION 1. Array inequalities 3. Multi-clause conditionals 2. Switch statements 4. Array reads
  • 140. PATH CONDITION RELAXATION 1. Array inequalities 3. Multi-clause conditionals 2. Switch statements 4. Array reads
  • 141. PATH CONDITION RELAXATION 1. Array inequalities 3. Multi-clause conditionals 2. Switch statements 4. Array reads x.equals(y);
  • 142. PATH CONDITION RELAXATION 1. Array inequalities 3. Multi-clause conditionals 2. Switch statements 4. Array reads x.equals(y); // x = “abc” // y = “abd”
  • 143. PATH CONDITION RELAXATION 1. Array inequalities 3. Multi-clause conditionals 2. Switch statements 4. Array reads x.equals(y); Traditional: x0 == y0 ∧ x1 == y1 ∧ x2 != y2 // x = “abc” // y = “abd”
  • 144. PATH CONDITION RELAXATION 1. Array inequalities 3. Multi-clause conditionals 2. Switch statements 4. Array reads x.equals(y); Traditional: x0 == y0 ∧ x1 == y1 ∧ x2 != y2 Relaxed: x0 != y0 ∨ x1 != y1 ∨ x2 != y2 // x = “abc” // y = “abd”
  • 145. PATH CONDITION RELAXATION 1. Array inequalities 3. Multi-clause conditionals 2. Switch statements 4. Array reads
  • 146. PATH CONDITION RELAXATION 1. Array inequalities 3. Multi-clause conditionals 2. Switch statements 4. Array reads
  • 147. PATH CONDITION RELAXATION 1. Array inequalities 3. Multi-clause conditionals 2. Switch statements 4. Array reads switch(x) { case 1: ... break; case 3: case 5: ... break; default: ... }
  • 148. PATH CONDITION RELAXATION 1. Array inequalities 3. Multi-clause conditionals 2. Switch statements 4. Array reads switch(x) { case 1: ... break; case 3: case 5: ... break; default: ... } // x = 5
  • 149. PATH CONDITION RELAXATION 1. Array inequalities 3. Multi-clause conditionals 2. Switch statements 4. Array reads switch(x) { case 1: ... break; case 3: case 5: ... break; default: ... } Traditional: x == 5 // x = 5
  • 150. PATH CONDITION RELAXATION 1. Array inequalities 3. Multi-clause conditionals 2. Switch statements 4. Array reads switch(x) { case 1: ... break; case 3: case 5: ... break; default: ... } Traditional: x == 5 Relaxed: x == 5 ∨ x == 3 // x = 5
  • 151. PATH CONDITION RELAXATION 1. Array inequalities 3. Multi-clause conditionals 2. Switch statements 4. Array reads switch(x) { case 1: ... break; case 3: case 5: ... break; default: ... } Traditional: x == 5 Relaxed: x == 5 ∨ x == 3 // x = 10
  • 152. PATH CONDITION RELAXATION 1. Array inequalities 3. Multi-clause conditionals 2. Switch statements 4. Array reads switch(x) { case 1: ... break; case 3: case 5: ... break; default: ... } Traditional: Relaxed: x == 5 ∨ x == 3 // x = 10 x == 10
  • 153. PATH CONDITION RELAXATION 1. Array inequalities 3. Multi-clause conditionals 2. Switch statements 4. Array reads switch(x) { case 1: ... break; case 3: case 5: ... break; default: ... } Traditional: Relaxed: // x = 10 x == 10 x != 1 ∧ x != 3 ∧ x != 5
  • 154. EVALUATION Feasibility Can the approach generate, in a reasonable amount of time, anonymized inputs that reproduce the failure? Strength How much information about the original inputs is revealed? Effectiveness Are the anonymized inputs safe to send to developers?
  • 155. SUBJECTS • Columba: 1 fault • htmlparser: 1 fault • Printtokens: 2 faults • NanoXML: 16 faults (20 faults, total)
  • 156. SUBJECTS • Columba: 1 fault • htmlparser: 1 fault • Printtokens: 2 faults • NanoXML: 16 faults Select sensitive failure-inducing inputs • manually generated or included with subject • several 100 bytes to 5MB in size (20 faults, total)
  • 157. SUBJECTS • Columba: 1 fault • htmlparser: 1 fault • Printtokens: 2 faults • NanoXML: 16 faults Select sensitive failure-inducing inputs • manually generated or included with subject • several 100 bytes to 5MB in size (Assume all of each input is potentially sensitive) (20 faults, total)
  • 160. Average % Bits Revealed Average % Residue RQ2: STRENGTH
  • 161. Average % Bits Revealed Average % Residue RQ2: STRENGTH Measures how many inputs that satisfy the path condition Little information revealed
  • 162. Average % Bits Revealed Average % Residue RQ2: STRENGTH Measures how many inputs that satisfy the path condition Lots of information revealed
  • 163. Average % Bits Revealed Average % Residue RQ2: STRENGTH Measures how many inputs that satisfy the path condition Measures how much of the anonymized input is identical to the original input AAAAAA secret AAAAAA ... AAAAAA BBBBBB secret BBBBBB ... BBBBBB I’ Lots of information revealed I
  • 164. Average % Bits Revealed Average % Residue RQ2: STRENGTH Measures how many inputs that satisfy the path condition Measures how much of the anonymized input is identical to the original input AAAAAA secret AAAAAA ... AAAAAA BBBBBB secret BBBBBB ... BBBBBB I’ Lots of information revealed I
  • 167. RQ3: EFFECTIVENESS NANOXML <!DOCTYPE Foo [    <!ELEMENT Foo (ns:Bar)>    <!ATTLIST Foo        xmlns CDATA #FIXED 'http://nanoxml.n3.net/bar'        a     CDATA #REQUIRED>    <!ELEMENT ns:Bar (Blah)>    <!ATTLIST ns:Bar        xmlns:ns CDATA #FIXED 'http://nanoxml.n3.net/bar'>    <!ELEMENT Blah EMPTY>    <!ATTLIST Blah        x    CDATA #REQUIRED        ns:x CDATA #REQUIRED> ]> <!-- comment --> <Foo a='very' b='secret' c='stuff'>vaz    <ns:Bar>        <Blah x="1" ns:x="2"/>    </ns:Bar> </Foo>
  • 168. RQ3: EFFECTIVENESS NANOXML <!DOCTYPE [    <! >    <!ATTLIST         #FIXED ' '         >    <!E >    <!ATTLIST         #FIXED ' '>    <!E >    <!ATTLIST         #         : # > ]> <!-- --> < =' ' =' ' =' '>    < : >        < =" " : =" "/>    </ :
  • 169. Wayne,Bartley,Bartley,Wayne,wbartly@acp.com,, Ronald,Kahle,Kahle,Ron,ron.kahle@kahle.com,, Wilma,Lavelle,Lavelle,Wilma,,lavelle678@aol.com, Jesse,Hammonds,Hammonds,Jesse,,hamj34@comcast.com, Amy,Uhl,Uhl,Amy,uhla@corp1.com,uhla@gmail.com, Hazel,Miracle,Miracle,Hazel,hazel.miracle@corp2.com,, Roxanne,Nealy,Nealy,Roxie,,roxie.nearly@gmail.com, Heather,Kane,Kane,Heather,kaneh@corp2.com,, Rosa,Stovall,Stovall,Rosa,,sstoval@aol.com, Peter,Hyden,Hyden,Pete,,peteh1989@velocity.net, Jeffrey,Wesson,Wesson,Jeff,jwesson@corp4.com,, Virginia,Mendoza,Mendoza,Ginny,gmendoza@corp4.com,, Richard,Robledo,Robledo,Ralph,ralphrobledo@corp1.com,, Edward,Blanding,Blanding,Ed,,eblanding@gmail.com, Sean,Pulliam,Pulliam,Sean,spulliam@corp2.com,, Steven,Kocher,Kocher,Steve,kocher@kocher.com,, Tony,Whitlock,Whitlock,Tony,,tw14567@aol.com, Frank,Earl,Earl,Frankie,,, Shelly,Riojas,Riojas,Shelly,srojas@corp6.com,, RQ3: EFFECTIVENESS COLUMBA , , , , ,, , , , , ,, , , , ,, , , , , ,, , , , , , , , , , , , ,, , , , ,, , , , , , ,, , , , ,, , , , , ,, , , , , , ,, , , , , ,, , , , , ,, , , , ,, , , , , , ,, , , , , ,, , , , ,, ,
  • 170. RQ3: EFFECTIVENESS COLUMBA , , , , ,, , , , , ,, , , , ,, , , , , ,, , , , , , , , , , , , ,, , , , ,, , , , , , ,, , , , ,, , , , , ,, , , , , , ,, , , , , ,, , , , , ,, , , , ,, , , , , , ,, , , , , ,, , , , ,, ,
  • 171. RQ3: EFFECTIVENESS HTMLPARSER <?xml version="1.0" encoding="UTF-8" ?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/ xhtml11/DTD/xhtml11.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en"> <head> <title>james clause @ gatech | home</title> <style type="text/css" media="screen" title=""> <!--/*--><![CDATA[<!--*/ body { margin: 0px; ... /*]]>*/--> </style> </head> <body> ... </body>
  • 172. RQ3: EFFECTIVENESS HTMLPARSER <?xml version="1.0" encoding="UTF-8" ?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/ xhtml11/DTD/xhtml11.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en"> <head> <title>james clause @ gatech | home</title> <style type="text/css" media="screen" title=""> <!--/*--><![CDATA[<!--*/ body { margin: 0px; ... /*]]>*/--> </style> </head> <body> ... </body>
  • 173. RQ3: EFFECTIVENESS HTMLPARSER <?xml version="1.0" encoding="UTF-8" ?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/ xhtml11/DTD/xhtml11.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en"> <head> <title>james clause @ gatech | home</title> <style type="text/css" media="screen" title=""> <!--/*--><![CDATA[<!--*/ body { margin: 0px; ... /*]]>*/--> </style> </head> <body> ... </body> The portions of the inputs that remain after anonymization tend to be structural in nature and therefore are safe to send to developers
  • 179. COMING SOON-ISH FUTURE WORK COMING SOON Improved minimization Leverage passing executions Debugging for developers
  • 181. 1 Taint inputs Foo 512B Bar 1KB Baz 1.5GB IMPROVED MINIMIZATION LEVERAGE DYNAMICTAINTING
  • 182. 1 Taint inputs Foo 512B Bar 1KB Baz 1.5GB IMPROVED MINIMIZATION LEVERAGE DYNAMICTAINTING 1 2 3 4 5 6 7 8 9 0
  • 183. 1 Taint inputs 2 Propagate taint marks Foo 512B Bar 1KB Baz 1.5GB IMPROVED MINIMIZATION LEVERAGE DYNAMICTAINTING 1 2 3 4 5 6 7 8 9 0
  • 184. 1 Taint inputs 2 Propagate taint marks Foo 512B Bar 1KB Baz 1.5GB foo: 512 ... bar: 1024 ... baz: 150... total: 150... IMPROVED MINIMIZATION LEVERAGE DYNAMICTAINTING 1 2 3 4 5 6 7 8 9 0
  • 185. 1 Taint inputs 2 Propagate taint marks 3 Identify relevant inputs Foo 512B Bar 1KB Baz 1.5GB foo: 512 ... bar: 1024 ... baz: 150... total: 150... IMPROVED MINIMIZATION LEVERAGE DYNAMICTAINTING 1 2 3 4 5 6 7 8 9 0
  • 186. 1 Taint inputs 2 Propagate taint marks 3 Identify relevant inputs Foo 512B Bar 1KB Baz 1.5GB foo: 512 ... bar: 1024 ... baz: 150... total: 150... IMPROVED MINIMIZATION LEVERAGE DYNAMICTAINTING 1 2 3 4 5 6 7 8 9 0
  • 187. In the fieldIn house LEVERAGE PASSING EXECUTIONS Replay / Debug Develop Record ✘ MinimizeSanitize ✘
  • 188. In the fieldIn house LEVERAGE PASSING EXECUTIONS Replay / Debug Develop Record ✘ MinimizeSanitize ✘ ✔
  • 189. In the fieldIn house LEVERAGE PASSING EXECUTIONS Replay / Debug Develop Record ✘ MinimizeSanitize ✘ ✔
  • 191. LEVERAGE PASSING EXECUTIONS ✔ “Fuzz” to create failing executions
  • 192. LEVERAGE PASSING EXECUTIONS ✔ “Fuzz” to create failing executions Augment in-house test suites
  • 193. LEVERAGE PASSING EXECUTIONS ✔ “Fuzz” to create failing executions Augment in-house test suites Guide in house testing
  • 194. DEBUGGING FOR DEVELOPERS Most debugging tools are: By us, for us
  • 195. DEBUGGING FOR DEVELOPERS Most debugging tools are: Limited industrial impact By us, for us
  • 196. DEBUGGING FOR DEVELOPERS Most debugging tools are: Limited industrial impact With developers, for developers
  • 197. DEBUGGING FOR DEVELOPERS Most debugging tools are: With developers, for developers Lots of industrial impact
  • 198. DEBUGGING FOR DEVELOPERS Most debugging tools are: With developers, for developers Lots of industrial impact ?
  • 199. Efficient instrumentation Jazz: A Tool for Demand-Driven Structural Testing Jonathan Misurda1 , Jim Clause1 , Juliya Reed1 , Bruce R. Childers1 , and Mary Lou So a2 1 University of Pittsburgh, Pittsburgh PA 15260, USA, {jmisurda,clausej,juliya,childers}@cs.pitt.edu 2 University of Virginia, Charlottesville VA 22904, USA, soffa@cs.virginia.edu Abstract. Software testing to produce reliable and robust software has become vitally important. Testing is a process by which quality can be assured through the collection of information about software. While test- ing can improve software quality, current tools typically are inflexible and have high overheads, making it a challenge to test large projects. We describe a new scalable and flexible tool, called Jazz, that uses a demand-driven structural testing approach. Jazz has a low overhead of only 17.6% for branch testing. 1 Introduction In the last several years, the importance of producing high quality and robust software has become paramount. Testing is an important process to support quality assurance by gathering information about the software being developed or modified. It is, in general, extremely labor and resource intensive, accounting for 50-60% of the total cost of software development [1]. The increased emphasis on software quality and robustness mandates improved testing methodologies. To test software, a number of techniques can be applied. One class of tech- niques is structural testing, which checks that a given coverage criterion is sat- isfied. For example, branch testing checks that a certain percentage of branches are executed. Other structural tests include def-use testing in which pairs of variable definitions and uses are checked for coverage and node testing in which nodes in a program’s control flow graph are checked. Unfortunately, structural testing is often hindered by the lack of scalable and flexible tools. Current tools are not scalable in terms of both time and memory, limiting the number and scope of the tests that can be applied to large programs. These tools often modify the software binary to insert instrumentation for testing. In this case, the tested version of the application is not the same version that is shipped to customers and errors may remain. Testing tools are usually inflexible and only implement certain types of testing. For example, many tools implement branch testing, but do not implement node or def-use testing. In this paper, we describe a new tool for structural testing, called Jazz, that addresses these problems. Jazz uses a novel demand-driven technique to apply ABSTRACT Producing reliable and robust software has become one of the most important software development concerns in recent years. Testing is a process by which software quality can be assured through the collection of infor- mation. While testing can improve software reliability, current tools typically are inflexible and have high over- heads, making it challenging to test large software projects. In this paper, we describe a new scalable and flexible framework for testing programs with a novel demand-driven approach based on execution paths to implement test coverage. This technique uses dynamic instrumentation on the binary code that can be inserted and removed on-the-fly to keep performance and mem- ory overheads low. We describe and evaluate implemen- tations of the framework for branch, node and def-use testing of Java programs. Experimental results for branch testing show that our approach has, on average, a 1.6 speed up over static instrumentation and also uses less memory. Categories and Subject Descriptors D.2.5. [Software Engineering]: Testing and Debug- ging—Testing tools; D.3.3. [Programming Lan- guages]: Language Constructs and Features—Program instrumentation, run-time environments General Terms Experimentation, Measurement, Verification Keywords Testing, Code Coverage, Structural Testing, Demand- Driven Instrumentation, Java Programming Language 1. INTRODUCTION In the last several years, the importance of produc- ing high quality and robust software has become para- mount [15]. Testing is an important process to support quality assurance by gathering information about the behavior of the software being developed or modified. It is, in general, extremely labor and resource intensive, accounting for 50-60% of the total cost of software development [17]. Given the importance of testing, it is imperative that there are appropriate testing tools and frameworks. In order to adequately test software, a number of different testing techniques must be per- formed. One class of testing techniques used extensively is structural testing in which properties of the software code are used to ensure a certain code coverage.Struc- tural testing techniques include branch testing, node testing, path testing, and def-use testing [6,7,8,17,19]. Typically, a testing tool targets one type of struc- tural test, and the software unit is the program, file or particular methods. In order to apply various structural testing techniques, different tools must be used. If a tool for a particular type of structural testing is not available, the tester would need to either implement it or not use that testing technique. The tester would also be con- strained by the region of code to be tested, as deter- mined by the tool implementor. For example, it may not be possible for the tester to focus on a particular region of code, such as a series of loops, complicated condi- tionals, or particular variables if def-use testing is desired. The user may want to have higher coverage on frequently executed regions of code. Users may want to define their own way of testing. For example, all branches should be covered 10 times rather than once in all loops. In structural testing, instrumentation is placed at certain code points (probes). Whenever such a program point is reached, code that performs the function for the test (payload) is executed. The probes in def-use testing are dictated by the definitions and uses of variables and the payload is to mark that a definition or use in a def- use pair has been covered. Thus for each type of struc- tural testing, there is a testing “plan”. A test plan is a Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. ICSE'05, May 15-21, 2005, St. Louis, Missouri, USA. Copyright 2005 ACM 1-58113-963-2/05/0005...$5.00. Demand-Driven Structural Testing with Dynamic Instrumentation Jonathan Misurda†, James A. Clause†, Juliya L. Reed†, Bruce R. Childers†, and Mary Lou Soffa‡ †Department of Computer Science University of Pittsburgh Pittsburgh, Pennsylvania 15260 {jmisurda, clausej, juliya, childers}@cs.pitt.edu ‡Department of Computer Science University of Virginia Charlottesville, Virginia 22904 soffa@cs.virginia.edu 156 A Technique for Enabling and Supporting Debugging of Field Failures James Clause and Alessandro Orso College of Computing Georgia Institute of Technology {clause, orso}@cc.gatech.edu Abstract It is difficult to fully assess the quality of software in- house, outside the actual time and context in which it will execute after deployment. As a result, it is common for software to manifest field failures, failures that occur on user machines due to untested behavior. Field failures are typically difficult to recreate and investigate on developer platforms, and existing techniques based on crash report- ing provide only limited support for this task. In this pa- per, we present a technique for recording, reproducing, and minimizing failing executions that enables and supports in- house debugging of field failures. We also present a tool that implements our technique and an empirical study that evaluates the technique on a widely used e-mail client. 1. Introduction Quality-assurance activities, such as software testing and analysis, are notoriously difficult, expensive, and time- consuming. As a result, software products are often re- leased with faults or missing functionality. In fact, real- world examples of field failures experienced by users be- cause of untested behaviors (e.g., due to unforeseen us- ages), are countless. When field failures occur, it is im- portant for developers to be able to recreate and investigate them in-house. This pressing need is demonstrated by the emergence of several crash-reporting systems, such as Mi- crosoft’s error reporting systems [13] and Apple’s Crash Reporter [1]. Although these techniques represent a first important step in addressing the limitations of purely in- house approaches to quality assurance, they work on lim- ited data (typically, a snapshot of the execution state) and can at best identify correlations between a crash report and data on other known failures. In this paper, we present a novel technique for reproduc- ing and investigating field failures that addresses the limita- tions of existing approaches. Our technique works in three phases, intuitively illustrated by the scenario in Figure 1. In the recording phase, while users run the software, the tech- nique intercepts and logs the interactions between applica- tion and environment and records portions of the environ- ment that are relevant to these interactions. If the execution terminates with a failure, the produced execution recording is stored for later investigation. In the minimization phase, using free cycles on the user machines, the technique re- plays the recorded failing executions with the goal of au- tomatically eliminating parts of the executions that are not relevant to the failure. In the replay and debugging phase, developers can use the technique to replay the minimized failing executions and investigate the cause of the failures (e.g., within a debugger). Being able to replay and debug real field failures can give developers unprecedented insight into the behavior of their software after deployment and op- portunities to improve the quality of their software in ways that were not possible before. To evaluate our technique, we implemented it in a proto- type tool, called ADDA (Automated Debugging of Deployed Applications), and used the tool to perform an empirical study. The study was performed on PINE [19], a widely- used e-mail client, and involved the investigation of failures caused by two real faults in PINE. The results of the study are promising. Our technique was able to (1) record all ex- ecutions of PINE (and two other subjects) with a low time and space overhead, (2) completely replay all recorded exe- cutions, and (3) perform automated minimization of failing executions and obtain shorter executions that manifested the same failures as the original executions. Moreover, we were able to replay the minimized executions within a debugger, which shows that they could have actually been used to in- vestigate the failures. The contributions of this paper are: • A novel technique for recording and later replaying exe- cutions of deployed programs. • An approach for minimizing failing executions and gen- erating shorter executions that fail for the same reasons. • A prototype tool that implements our technique. • An empirical study that shows the feasibility and effec- tiveness of the approach. 29th International Conference on Software Engineering (ICSE'07) 0-7695-2828-7/07 $20.00 © 2007 Dytan: A Generic Dynamic Taint Analysis Framework James Clause, Wanchun Li, and Alessandro Orso College of Computing Georgia Institute of Technology {clause|wli7|orso}@cc.gatech.edu ABSTRACT Dynamic taint analysis is gaining momentum. Techniques based on dynamic tainting have been successfully used in the context of application security, and now their use is also being explored in dif- ferent areas, such as program understanding, software testing, and debugging. Unfortunately, most existing approaches for dynamic tainting are defined in an ad-hoc manner, which makes it difficult to extend them, experiment with them, and adapt them to new con- texts. Moreover, most existing approaches are focused on data-flow based tainting only and do not consider tainting due to control flow, which limits their applicability outside the security domain. To address these limitations and foster experimentation with dynamic tainting techniques, we defined and developed a general framework for dynamic tainting that (1) is highly flexible and customizable, (2) allows for performing both data-flow and control-flow based taint- ing conservatively, and (3) does not rely on any customized run- time system. We also present DYTAN, an implementation of our framework that works on x86 executables, and a set of preliminary studies that show how DYTAN can be used to implement different tainting-based approaches with limited effort. In the studies, we also show that DYTAN can be used on real software, by using FIRE- FOX as one of our subjects, and illustrate how the specific char- acteristics of the tainting approach used can affect efficiency and accuracy of the taint analysis, which further justifies the use of our framework to experiment with different variants of an approach. Categories and Subject Descriptors: D.2.5 [Software Engineer- ing]: Testing and Debugging; General Terms: Experimentation, Security Keywords: Dynamic tainting, information flow, general framework 1. INTRODUCTION Dynamic taint analysis (also known as dynamic information flow analysis) consists, intuitively, in marking and tracking certain data in a program at run-time. This type of dynamic analysis is be- coming increasingly popular. In the context of application secu- rity, dynamic-tainting approaches have been successfully used to prevent a wide range of attacks, including buffer overruns (e.g., [8, 17]), format string attacks (e.g., [17, 21]), SQL and command in- jections (e.g., [7, 19]), and cross-site scripting (e.g., [18]). More recently, researchers have started to investigate the use of tainting- based approaches in domains other than security, such as program understanding, software testing, and debugging (e.g., [11, 13]). Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. ISSTA’07, July 9–12, 2007, London, England, United Kingdom. Copyright 2007 ACM 978-1-59593-734-6/07/0007 ...$5.00. Unfortunately, most existing techniques and tools for dynamic taint analysis are defined in an ad-hoc manner, to target a specific problem or a small class of problems. It would be difficult to ex- tend or adapt such techniques and tools so that they can be used in other contexts. In particular, most existing approaches are focused on data-flow based tainting only, and do not consider tainting due to the control flow within an application, which limits their general applicability. Also, most existing techniques support either a sin- gle taint marking or a small, fixed number of markings, which is problematic in applications such as debugging. Finally, almost no existing technique handles the propagation of taint markings in a truly conservative way, which may be appropriate for the specific applications considered, but is problematic in general. Because de- veloping support for dynamic taint analysis is not only time con- suming, but also fairly complex, this lack of flexibility and gener- ality of existing tools and techniques is especially limiting for this type of dynamic analysis. To address these limitations and foster experimentation with dy- namic tainting techniques, in this paper we present a framework for dynamic taint analysis. We designed the framework to be general and flexible, so that it allows for implementing different kinds of techniques based on dynamic taint analysis with little effort. Users can leverage the framework to quickly develop prototypes for their techniques, experiment with them, and investigate trade-offs of dif- ferent alternatives. For a simple example, the framework could be used to investigate the cost effectiveness of considering different types of taint propagation for an application. Our framework has several advantages over existing approaches. First, it is highly flexible and customizable. It allows for easily specifying which program data should be tainted and how, how taint markings should be propagated at run-time, and where and how taint markings should be checked. Second, it allows for performing data-flow and both data-flow and control-flow based tainting. Third, from a more practical standpoint, it works on binaries, does not need access to source code, and does not rely on any customized hardware or operating system, which makes it broadly applicable. We also present DYTAN, an implementation of our framework that works on x86 binaries, and a set of preliminary studies per- formed using DYTAN. In the first set of studies, we report on our experience in using DYTAN to implement two tainting-based ap- proaches presented in the literature. Although preliminary, our ex- perience shows that we were able to implement these approaches completely and with little effort. The second set of studies illus- trates how the specific characteristics of a tainting approach can affect efficiency and accuracy of the taint analysis. In particular, we investigate how ignoring control-flow related propagation and over- looking some data-flow aspects can lead to unsafety. These results further justify the usefulness of experimenting with different varia- tions of dynamic taint analysis and assessing their tradeoffs, which can be done with limited effort using our framework. The second set of studies also shows the practical applicability of DYTAN, by successfully running it on the FIREFOX web browser. 196 Effective Memory Protection Using Dynamic Tainting James Clause, Ioannis Doudalis, Alessandro Orso, and Milos Prvulovic College of Computing Georgia Institute of Technology {clause|idoud|orso|milos}@cc.gatech.edu ABSTRACT Programs written in languages that provide direct access to memory through pointers often contain memory-related faults, which may cause non-deterministic failures and even security vulnerabilities. In this paper, we present a new technique based on dynamic taint- ing for protecting programs from illegal memory accesses. When memory is allocated, at runtime, our technique taints both the mem- ory and the corresponding pointer using the same taint mark. Taint marks are then suitably propagated while the program executes and are checked every time a memory address m is accessed through a pointer p; if the taint marks associated with m and p differ, the ex- ecution is stopped and the illegal access is reported. To allow for a low-overhead, hardware-assisted implementation of the approach, we make several key technical and engineering decisions in the definition of our technique. In particular, we use a configurable, low number of reusable taint marks instead of a unique mark for each area of memory allocated, which reduces the overhead of the approach without limiting its flexibility and ability to target most memory-related faults and attacks known to date. We also define the technique at the binary level, which lets us handle the (very) common case of applications that use third-party libraries whose source code is unavailable. To investigate the effectiveness and practicality of our approach, we implemented it for heap-allocated memory and performed a preliminary empirical study on a set of programs. Our results show that (1) our technique can identify a large class of memory-related faults, even when using only two unique taint marks, and (2) a hardware-assisted implementation of the technique could achieve overhead in the single digits. Categories and Subject Descriptors: D.2.5 [Software Engineering]: Test- ing and Debugging; C.0 [General]: Hardware/Software Interfaces; General Terms: Performance, Security Keywords: Illegal memory accesses, dynamic tainting, hardware support 1. INTRODUCTION Memory-related faults are a serious problem for languages that allow direct memory access through pointers. An important class of memory-related faults are what we call illegal memory accesses. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. ASE’07, November 5–9, 2007, Atlanta, Georgia, USA. Copyright 2007 ACM 978-1-59593-882-4/07/0011 ...$5.00. In languages such as C and C++, when memory allocation is re- quested, a currently-free area of memory m of the specified size is reserved. After m has been allocated, its initial address can be assigned to a pointer p, either immediately (e.g., in the case of heap allocated memory) or at a later time (e.g., when retrieving and storing the address of a local variable). From that point on, the only legal accesses to m through a pointer are accesses per- formed through p or through other pointers derived from p. (In Section 3, we clearly define what it means to derive a pointer from another pointer.) All other accesses to m are Illegal Memory Ac- cesses (IMAs), that is, accesses where a pointer is used to access memory outside the bounds of the memory area with which it was originally associated. IMAs are especially relevant for several reasons. First, they are caused by typical programming errors, such as array-out-of-bounds accesses and NULL pointer dereferences, and are thus widespread and common. Second, they often result in non-deterministic fail- ures that are hard to identify and diagnose; the specific effects of an IMA depend on several factors, such as memory layout, that may vary between executions. Finally, many security concerns such as viruses, worms, and rootkits use IMAs as their injection vectors. In this paper, we present a new dynamic technique for protecting programs against IMAs that is effective against most known types of illegal accesses. The basic idea behind the technique is to use dynamic tainting (or dynamic information flow) [8] to keep track of which memory areas can be accessed through which pointers, as follows. At runtime, our technique taints both allocated mem- ory and pointers using taint marks. Dynamic taint propagation, to- gether with a suitable handling of memory-allocation and deallo- cation operations, ensures that taint marks are appropriately prop- agated during execution. Every time the program accesses some memory through a pointer, our technique checks whether the ac- cess is legal by comparing the taint mark associated with the mem- ory and the taint mark associated with the pointer used to access it. If the marks match, the access is considered legitimate. Otherwise, the execution is stopped and an IMA is reported. In defining our approach, our final goal is the development of a low-overhead, hardware-assisted tool that is practical and can be used on deployed software. A hardware-assisted tool is a tool that leverages the benefits of both hardware and software. Typically, some performance critical aspects are moved to the hardware to achieve maximum efficiency, while software is used to perform op- erations that would be too complex to implement in hardware. There are two main characteristics of our approach that were de- fined to help achieve our goal of a hardware-assisted implementa- tion. The first characteristic is that our technique only uses a small, configurable number of reusable taint marks instead of a unique mark for each area of memory allocated. Using a low number of 283 Penumbra: Automatically Identifying Failure-Relevant Inputs Using Dynamic Tainting James Clause College of Computing Georgia Institute of Technology clause@cc.gatech.edu Alessandro Orso College of Computing Georgia Institute of Technology orso@cc.gatech.edu ABSTRACT Most existing automated debugging techniques focus on re- ducing the amount of code to be inspected and tend to ig- nore an important component of software failures: the in- puts that cause the failure to manifest. In this paper, we present a new technique based on dynamic tainting for au- tomatically identifying subsets of a program’s inputs that are relevant to a failure. The technique (1) marks program inputs when they enter the application, (2) tracks them as they propagate during execution, and (3) identifies, for an observed failure, the subset of inputs that are potentially relevant for debugging that failure. To investigate feasibil- ity and usefulness of our technique, we created a prototype tool, penumbra, and used it to evaluate our technique on several failures in real programs. Our results are promising, as they show that penumbra can point developers to inputs that are actually relevant for investigating a failure and can be more practical than existing alternative approaches. Categories and Subject Descriptors D.2.5 [Software Engineering]: Testing and Debugging General Terms Algorithms, Experimentation, Reliability Keywords Failure-relevant inputs, automated debugging, dynamic in- formation flow, dynamic tainting 1. INTRODUCTION Debugging is known to be a labor-intensive, time-consum- ing task that can be responsible for a large portion of soft- ware development and maintenance costs [21,23]. Common characteristics of modern software, such as increased con- figurability, larger code bases, and increased input sizes, in- troduce new challenges for debugging and exacerbate exist- ing problems. In response, researchers have proposed many Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. ISSTA’09, July 19–23, 2009, Chicago, Illinois, USA. Copyright 2009 ACM 978-1-60558-338-9/09/07 ...$5.00. semi- and fully-automated techniques that attempt to re- duce the cost of debugging (e.g., [8,9,11–13,18,24,25,27]). The majority of these techniques are code-centric in that they focus exclusively on one aspect of debugging—trying to identify the faulty statements responsible for a failure. Although code-centric approaches can work well in some cases (e.g., for isolated faults that involve a single state- ment), they are often inadequate for more complex faults [4]. Faults of omission, for instance, where part of a specification has not been implemented, are notoriously problematic for debugging techniques that attempt to identify potentially faulty statements. The usefulness of code-centric techniques is also limited in the case of long-running programs and pro- grams that process large amounts of information; failures in these types of programs are typically di⌅cult to understand without considering the data involved in such failures. To debug failures more e ectively, it is necessary to pro- vide developers with not only a relevant subset of state- ments, but also a relevant subset of inputs. There are only a few existing techniques that attempt to identify relevant inputs [3, 17, 25], with delta debugging [25] being the most known of these. Although delta debugging has been shown to be an e ective technique for automatic debugging, it also has several drawbacks that may limit its usefulness in prac- tice. In particular, it requires (1) multiple executions of the program being debugged, which can involve a long running time, and (2) complex oracles and setup, which can result in a large amount of manual e ort [2]. In this paper, we present a novel debugging technique that addresses many of the limitations of existing approaches. Our technique can complement code-centric debugging tech- niques because it focuses on identifying program inputs that are likely to be relevant for a given failure. It also overcomes some of the drawbacks of delta debugging because it needs a single execution to identify failure-relevant inputs and re- quires minimal manual e ort. Given an observable faulty behavior and a set of failure- inducing inputs (i.e., a set of inputs that cause such behav- ior), our technique automatically identifies failure-relevant inputs (i.e., a subset of failure-inducing inputs that are ac- tually relevant for investigating the faulty behavior). Our approach is based on dynamic tainting. Intuitively, the tech- nique works by tracking the flow of inputs along data and control dependences at runtime. When a point of failure is reached, the tracked information is used to identify and present to developers the failure-relevant inputs. At this point, developers can use the identified inputs to investigate the failure at hand. LEAKPOINT: Pinpointing the Causes of Memory Leaks James Clause College of Computing Georgia Institute of Technology clause@cc.gatech.edu Alessandro Orso College of Computing Georgia Institute of Technology orso@cc.gatech.edu ABSTRACT Most existing leak detection techniques for C and C++ applications only detect the existence of memory leaks. They do not provide any help for fixing the underlying memory management errors. In this paper, we present a new technique that not only detects leaks, but also points developers to the locations where the underlying errors may be fixed. Our technique tracks pointers to dynamically- allocated areas of memory and, for each memory area, records sev- eral pieces of relevant information. This information is used to identify the locations in an execution where memory leaks occur. To investigate our technique’s feasibility and usefulness, we devel- oped a prototype tool called LEAKPOINT and used it to perform an empirical evaluation. The results of this evaluation show that LEAKPOINT detects at least as many leaks as existing tools, reports zero false positives, and, most importantly, can be effective at help- ing developers fix the underlying memory management errors. Categories and Subject Descriptors D.2.5 [Software Engineering]: Testing and Debugging General Terms Performance, Reliability Keywords Leak detection, Dynamic tainting 1. INTRODUCTION Memory leaks are a type of unintended memory consumption that can adversely impact the performance and correctness of an application. In programs written in languages such as C and C++, memory is allocated using allocation functions, such as malloc and new. Allocation functions reserve a currently free area of memory m and return a pointer p that points to m’s starting ad- dress. Typically, the program stores and then uses p, or another This work was supported in part by NSF awards CCF-0725202 and CCF-0541080 to Georgia Tech. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. ICSE ’10, May 2-8 2010, Cape Town, South Africa Copyright 2010 ACM 978-1-60558-719-6/10/05 ...$10.00. pointer derived from p, to interact with m. When m is no longer needed, the program should pass p to a deallocation function (e.g., free or delete) to deallocate m. A leak occurs if, due to a memory management error, m is not deallocated at the appropri- ate time. There are two types of memory leaks: lost memory and forgotten memory. Lost memory refers to the situation where m be- comes unreachable (i.e., the program overwrites or loses p and all pointers derived from p) without first being deallocated. Forgotten memory refers to the situation where m remains reachable but is not deallocated or accessed in the rest of the execution. Memory leaks are relevant for several reasons. First, they are dif- ficult to detect. Unlike many other types of failures, memory leaks do not immediately produce an easily visible symptom (e.g., a crash or the output of a wrong value); typically, leaks remain unobserved until they consume a large portion of the memory available to a sys- tem. Second, leaks have the potential to impact not only the appli- cation that leaks memory, but also every other application running on the system; because the overall amount of memory is limited, as the memory usage of a leaking program increases, less memory is available to other running applications. Consequently, the per- formance and correctness of every running application can be im- pacted by a program that leaks memory. Third, leaks are common, even in mature applications. For example, in the first half of 2009, over 100 leaks in the Firefox web-browser were reported [18]. Because of the serious consequences and common occurrence of memory leaks, researchers have created many static and dynamic techniques for detecting them (e.g., [1,2,4,7–14,16,17,20–23,25, 27,28]). The adoption of static techniques has been limited by sev- eral factors, including the lack of scalable, precise heap modeling. Dynamic techniques are therefore more widely used in practice. In general, dynamic techniques provide one main piece of informa- tion: the location in an execution where a leaked area of memory is allocated. This location is supposed to serve as a starting point for investigating the leak. However, in many situations, this informa- tion does not provide any insight on where or how to fix the mem- ory management error that causes the leak: the allocation location and the location of the memory management error are typically in completely different parts of the application’s code. To address this limitation of existing approaches, we propose a new memory leak detection technique. Our technique provides the same information as existing techniques but also identifies the locations in an execution where leaks occur. In the case of lost memory, the location is defined as the point in an execution where the last pointer to an unallocated memory area is lost or overwritten. In the case of forgotten memory, the location is defined as the last point in an execution where a pointer to a leaked area of memory was used (e.g., when it is dereferenced to read or write memory, passed as a function argument, returned from a function, or used as Camouflage: Automated Sanitization of Field Data James Clause College of Computing Georgia Institute of Technology clause@cc.gatech.edu Alessandro Orso College of Computing Georgia Institute of Technology orso@cc.gatech.edu ABSTRACT Privacy and security concerns have adversely a ected the usefulness of many types of techniques that leverage infor- mation gathered from deployed applications. To address this issue, we present a new approach for automatically sanitiz- ing failure-inducing inputs. Given an input I that causes a failure f, our technique can generate a sanitized input I that is di erent from I but still causes f. I can then be sent to the developers to help them debug f, without revealing the possibly sensitive information contained in I. We im- plemented our approach in a prototype tool, camouflage, and performed an empirical evaluation. In the evaluation, we applied camouflage to a large set of failure-inducing inputs for several real applications. The results of the eval- uation are promising; they show that camouflage is both practical and e ective at generating sanitized inputs. In par- ticular, for the inputs that we considered, I and I shared no sensitive information. 1. INTRODUCTION Investigating techniques that capture data from deployed applications to support in-house software engineering tasks is an increasingly active and successful area of research (e.g., [1,3–5,13,14,17,21,22,26,27,29]). However, privacy and se- curity concerns have prevented widespread adoption of many of these techniques and, because they rely on user partici- pation, have ultimately limited their usefulness. Many of the earlier proposed techniques attempt to sidestep these concerns by collecting only limited amounts of information (e.g., stack traces and register dumps [1, 3, 5] or sampled branch profiles [26,27]) and providing a privacy policy that specifies how the information will be used (e.g., [2,8]). Be- cause the types of information collected by these techniques are unlikely to be sensitive, users are more willing to trust developers. Moreover, because only a small amount of infor- mation is collected, it is feasible for users to manually inspect and sanitize such information before it is sent to developers. Unfortunately, recent research has shown that the e ec- tiveness of these techniques increases when they can lever- age large amounts of detailed information (e.g., complete execution recordings [4, 14] or path profiles [13, 24]). Since more detailed information is bound to contain sensitive data, users will most likely be unwilling to let developers collect such information. In addition, collecting large amounts of information would make it infeasible for users to sanitize the collected information by hand. To address this prob- lem, some of these techniques suggest using an input mini- mization approach (e.g., [6, 7, 35]) to reduce the number of failure-inducing inputs and, hopefully, eliminate some sensi- tive information. Input-minimization techniques, however, were not designed to specifically reduce sensitive inputs, so they can only eliminate sensitive data by chance. In or- der for techniques that leverage captured field information to become widely adopted and achieve their full potential, new approaches for addressing privacy and security concerns must be developed. In this paper, we present a novel technique that addresses privacy and security concerns by sanitizing information cap- tured from deployed applications. Our technique is designed to be used in conjunction with an execution capture/replay technique (e.g., [4, 14]). Given an execution recording that contains a captured failure-inducing input I = i1, i2, . . . in⇥ and terminates with a failure f, our technique replays the execution recording and leverages a specialized version of symbolic-execution to automatically produce I , a sanitized version of I, such that I (1) still causes f and (2) reveals as little information about I as possible. A modified execution recording where I replaces I can then be constructed and sent to the developers, who can use it to debug f. It is, in general, impossible to construct I such that it does not reveal any information about I while still caus- ing the same failure f. Typically, the execution of f would depend on the fact that some elements of I have specific values (e.g., i1 must be 0 for the failing path to be taken). However, this fact does not prevent the technique from be- ing useful in practice. In our evaluation, we found that the information revealed by the sanitized inputs was not sensi- tive and tended to be structural in nature (e.g., a specific portion of the input must be surrounded by double quotes). Conversely, the parts of the inputs that were more likely to be sensitive (e.g., values contained inside the double quotes) were not revealed (see Section 4). To evaluate the e ectiveness of our technique, we imple- mented it in a prototype tool, called camouflage, and car- ried out an empirical evaluation of 170 failure-inducing in- 1 CC 05 ICSE 05 ISSTA 07 ASE 07 ISSTA 09 ICSE 10 Tech ReptICSE 07 Dynamic tainting based analyses Enabling more efficient debugging QUESTIONS?