SlideShare a Scribd company logo
PEST: Programs to Evaluate Software Testing Tools and Techniques
James R. Lyle
National Institute of Standards
and Technology
100 Bureau Drive Stop 8970
Gaithersburg, MD 20899-8970
(301) 975-3270
jlyle@nist.gov
Mary T. Laamanen
National Institute of Standards
and Technology
100 Bureau Drive Stop 8970
Gaithersburg, MD 20899-8970
(301) 975-3260
mary.laamanen@nist.gov
Neva M. Carlson
National Institute of Standards
and Technology
100 Bureau Drive Stop 8970
Gaithersburg, MD 20899-8970
(301) 975-3296
neva.carlson@nist.gov
×ØÖ Ø
PEST is a collection of reference materials for the empirical
evaluation and comparison of software testing techniques.
Often the publication of a new testing technique or strategy
includes a theoretical analysis and an ad hoc empirical eval-
uation. Because each researcher usually uses a different set
of programs for an empirical evaluation, there is little basis
for comparison between different techniques.
The project objective is to develop and make available to
software testing researchers and tool vendors a set of refer-
ence materials for the empirical evaluation and comparison
of software testing techniques. This set of reference materi-
als includes a diverse suite of program modules that can be
the subject of a testing technique, testing support tools and
examples of using PEST.
Each module contains a program specification, a correct
implementation in C that can be used as an oracle and several
faulty C implementations, each seeded with a single fault
from a commonly available fault taxonomy. The programs
are designed such that a common test harness can be used
to execute each faulty variant over test data for comparison
against the oracle. This allows for the computation of met-
rics to compare the relative effectiveness of test data gener-
ated by different techniques.
à ÝÛÓÖ ×
software testing, software metrics
½ ÁÒØÖÓ Ù Ø ÓÒ
PEST is a collection of reference materials for the empirical
evaluation and comparison of software testing techniques.
Often the publication of a new testing technique or strategy
includes a theoretical analysis and an ad hoc empirical evalu-
ation. Because each researcher usually uses a different set of
programs for an empirical evaluation, there is little basis for
comparison between different techniques. A common suite
of faulty programs would remedy this problem.
The project objective is to develop and make available to
software testing researchers and tool vendors a set of refer-
ence materials for the empirical evaluation and comparison
of software testing techniques. This set of reference materi-
als includes a diverse suite of program modules that can be
the subject of a testing technique, testing support tools and
examples of using PEST. The PEST materials can be used to
evaluate and compare both white box and black box testing
techniques. In addition, the PEST materials can be used by
testing tool vendors and as a supplement to class materials
in academic courses or a professional training environment.
This short paper describes the PEST reference materials.
The heart of the collection is a suite of modules, correspond-
ing to a single programming project. Each module contains
a program specification, a correct (at least we do our best to
make it so) implementation in C that can be used as an ora-
cle and several faulty C implementations, each seeded with
a single fault from a commonly available fault taxonomy[1].
The programs are designed such that a common test harness
can be used to execute each faulty variant over test data for
comparison against the oracle. This allows statistics to be
collected for the computation of metrics to compare the rel-
ative effectiveness of test data generated by different tech-
niques.
PEST currently contains materials created at NIST; how-
ever we expect to expand the collection with contributions
from the testing community. Contributions can be in many
forms, such as modules in other languages (Java, C++), ex-
perience reports, alternate fault taxonomies, or descriptions
of metrics that can be used to compare testing techniques.
This paper also includes examples of how the PEST ma-
terials can be used to compare testing techniques, investigate
a given technique or be used in teaching about testing.
¾ Ê Ö Ò Å Ø Ö Ð×
The PEST reference materials are divided into four cate-
gories:
1 Testable modules
2 Support Tools
3 Usage examples
4 Miscellaneous items
¾º½ ÅÓ ÙÐ ×
Each testable module is a simple programming project lo-
cated in its own directory. Each directory contains a specifi-
cation file (either plain text, Word or LATEXformat) describ-
ing what the program should do, a main control program
file and several faulty program version files. The control pro-
gram, oracle, and the faulty versions are structured so that it
is easy to verify if the input data has caused the fault to be
revealed. The control program calls a verify function to
compare the results of the oracle with the faulty version. The
control program has the following structure:
get_input(...){...}
oracle(...){...}
verify(...){...}
main(...){...
get_input(...);
bug(...);
oracle(...);
return verify(...);}
Each faulty version is a different implementation of the bug
function in a separate source file. The file names are keyed
to the category of the fault from Beizer’s fault taxonomy.
This makes it possible to characterize the effectiveness of
test cases against particular fault classes.
To execute a set of test cases on a particular faulty vari-
ant, the variant is compiled and then linked with the control
program. The control program begins execution through the
main procedure. The main procedure gets the test data,
calls both the faulty and the correct versions and returns the
results of a comparison to the invoking environment for tab-
ulation. For some modules, the oracle can be eliminated if
the verify function can determine the correctness of the
computation from the input and output. An example of this
might be a numerical analysis routine that can be checked by
substitution of results into an inverse function that returns the
original input.
Modules are being developed from a variety of applica-
tion areas, including system utility software, text processing,
simulation, financial and games.
¾º¾ ÙÐØ Ì ÜÓÒÓÑÝ
PEST uses an adapted version of Beizer’s fault taxonomy[1]
as a guide for creating faulty variants. Beizer’s taxonomy is
more than just bugs and faults, it is a detailed classification
scheme for why is there a problem with this program. This
includes everything from misunderstanding requirements to
using the wrong hardware for the test. This is too general
as a guide for inserting faults. We use items from only four
of Beizer’s nine categories. We ignore requirements, stan-
dards, system architecture, test execution and other cate-
gories but use functionality, structural, data and integration
as a source for fault insertion.
¾º¿ ËÙÔÔÓÖØ ÌÓÓÐ×
PEST currently has several support tools: a test harness that
can be used to run and tally results for multiple data sets
over all the variants of a module, tailored random number
generators and a module browser.
The PEST browser is a graphical user interface (GUI)
tool written as a Java Swing application for browsing the
contents of the database. The tool allows the user to view
the files stored in PEST by navigating through the particu-
lar bug taxonomy. The mutant program files are categorized
according to bug classes contained in the chosen taxonomy.
If a specific program is selected then the user is able to si-
multaneously compare code between the oracle and mutant
version. Changes between the two programs are marked by
color highlights.
¿ Ü ÑÔÐ ×
This section describes some example uses of PEST to com-
pare testing techniques and teach about testing. While both
the PEST module selected and the testing techniques used
are trivial, the examples demonstrate how more elaborate
PEST modules can be used to compare more sophisticated
testing techniques.
¿º½ ÌÖ Ò Ð ÅÓ ÙÐ
This example could be used with students to illustrate basic
testing principles. The testing task is to try several testing
strategies to the specification of the classic triangle classifi-
cation program adapted from Myers[2].
The procedure is given a character string con-
taining three integers separated by spaces, tabs
or one sign character (plus or minus) optionally
preceded by spaces or tabs. An integer contains
the digits 0-9 with an optional plus (+) or minus
(-) sign. The procedure returns an error code if
the input is not three integers. The three values
are interpreted as representing the lengths of the
sides of a triangle. The procedure returns a code
indicating whether the triangle is scalene, isosce-
les, equilateral or not a triangle.
The triangle module has 17 faulty variants over 9 fault classes
as in the following table:
Faults in Triangle
Code Description N
231 Missing case 4
232 Extra case 1
3128 Control flow predicate 1
3141 Loop initial value 2
3142 Loop terminal value 1
3143 Loop increment 2
32221 Expression operator 4
32222 Expression parentheses 1
32223 Expression sign 1
Code and Description are the fault category from Beizer’s
taxonomy, N is the number of faulty versions of the triangle
program created for the fault category.
¿º¾ Ì ×Ø Ò Û Ø Ê Ò ÓÑ ÆÙÑ Ö×
The first strategy investigated used 250 random triples uni-
formly distributed from 0 up to 100. The first 35 triples trig-
gered faults in 8 of the 17 faulty versions. After 145 triples,
two more faults were found. No more faults were found by
the remaining triples.
After examining the random data it was observed that
most triples were not valid triangles. This should have been
expected if the probability of generating valid triangles had
been considered. For example, the chance of a random equi-
lateral triangle is about 1 in 10,000. To generate data with
more valid triangles a second strategy was tried. The ran-
dom distribution was reduced in range to from 0 up to 10.
We should expect about 2 equilateral triangles from this dis-
tribution and we actually got 3.
The results of the second strategy were a little better, we
found more faults with fewer test cases: 11 faults after 34
random triples.
¿º¿ Ì ×Ø Ò Û Ø Ê Ò ÓÑ ÌÖ Ò Ð ×
A third strategy of generating random triangles from each
class rather than triples of random numbers was tried. First,
a class was selected with equal probability from equilateral,
isosceles, scalene or not a triangle, then random side lengths
of up to 100 were generated in the required relationship.
This strategy found 11 faults after 10 random triangles,
another reduction in effort.
The reduction of maximum side length in the second strat-
egy had been helpful, so the final strategy was to reduce
the side length to 10 for the random triangles. This strat-
egy found one more fault, but 67 triangles were needed to
detect the last fault.
¿º ÓÑÔ Ö ×ÓÒ ØÓ ÅÝ Ö×
We also generated test data based on Myers as a comparison.
The following table gives the results for each faulty version
against each testing strategy. R100 and R10 are the random
number triples, T100 and T10 are the random triangles and
M is the data generated from Myers. Except for Myers, 50
data sets were generated and tried. The value in the table
indicates how many data sets triggered the fault. There was
one Myers data set that contained 29 triples.
Fault Class vs Strategy
Code R100 R10 T100 T10 M
231 v1 0 44 50 50 1
231 v2 44 50 50 47 1
231 v3 49 49 50 49 1
231 v4 50 50 50 50 1
232 v1 0 0 0 36 1
3128 v1 22 48 50 46 1
3141 v1 0 0 0 0 1
3141 v2 0 0 0 0 1
3142 v1 0 0 0 0 1
3143 v1 0 0 0 0 1
3143 v2 0 0 0 0 0
32221 v1 50 50 50 50 1
32221 v2 50 49 49 50 1
32221 v3 49 50 50 50 1
32221 v4 49 50 50 50 1
32222 v1 22 47 50 47 1
32223 v1 50 50 50 50 1
The faults in version 231v1 and 232v1 show a fundamen-
tal weakness of random testing. A fault may require a rela-
tionship among several data components that is unlikely to
be generated at random. An equilateral triangle is required
to trigger 231v1. This had a chance of about 1 in 10,000 for
R100. The fault in version 232v1 required a right triangle to
trigger, another low likelihood event.
¿º ËÙÑÑ ÖÝ
The triangle module is useful for demonstrating how PEST
can be used to investigate testing techniques, but is not very
useful for an actual investigation. A more typical module
(although still small at 500 lines and 300 statements) is the
linker module. It is a subsystem of an automatic makefile
generator.
From the description of a set of object files, the
linker module identifies for each main procedure
the subset of object files that unambiguously de-
fine procedures referenced (by an unbroken chain
of references back to the main procedures ob-
ject file) from within this subset of object files.
The linker module also identifies procedures ref-
erenced within this subset of object files but de-
fined more than once or not at all.
Other modules in development include a text editor, a
finance application, a simulator for a simple CPU and com-
ponents from game software.
ÓÒØÖ ÙØ ÓÒ×
There are a number of areas where contributions to PEST
would be useful.
1. An improved taxonomy of software faults that also con-
sidered object oriented programmingwould be helpful.
2. Additional modules, especially written in Java, would
allow PEST to be used on object oriented program test-
ing. Modules need to strike a balance between being
large enough to yield useful results and small enough
to be manageable. A module that takes several months
to test would not be very useful.
3. The metrics used in the triangle example are very sim-
ple. More sophisticated metrics that take into account
multiple modules would be very useful.
ÓÒ ÐÙ× ÓÒ×
As PEST grows over time by contributions from the testing
community it will become a valuable testing resource for
researchers, tool vendors and educators.
The Information Technology Laboratory (ITL) at NIST
responds to industry and user needs for objective, neutral
tests for information technology. ITL works with indus-
try, research and government organizations to develop and
demonstrate tests, test methods, reference data, proof of con-
cept implementations and other infrastructural technologies.
Tools developed by ITL provide impartial ways of measur-
ing information technology products so that developers and
users can evaluate how products perform and assess their
quality based on objective criteria.
Ê Ö Ò ×
[1] B. Beizer. Software Testing Techniques. Van Nostrand
Reinhold International Company Limited, New York,
second edition, 1990.
[2] G. J. Myers. The Art of Software Testing. Wiley-
Interscience, New York, 1979.

More Related Content

PDF
Cl32990995
PDF
An Approach to Software Testing of Machine Learning Applications
PDF
Practical Guidelines to Improve Defect Prediction Model – A Review
PPTX
A software fault localization technique based on program mutations
PDF
Dc35579583
PDF
USING CATEGORICAL FEATURES IN MINING BUG TRACKING SYSTEMS TO ASSIGN BUG REPORTS
PDF
[Tho Quan] Fault Localization - Where is the root cause of a bug?
PPTX
Sta unit 3(abimanyu)
Cl32990995
An Approach to Software Testing of Machine Learning Applications
Practical Guidelines to Improve Defect Prediction Model – A Review
A software fault localization technique based on program mutations
Dc35579583
USING CATEGORICAL FEATURES IN MINING BUG TRACKING SYSTEMS TO ASSIGN BUG REPORTS
[Tho Quan] Fault Localization - Where is the root cause of a bug?
Sta unit 3(abimanyu)

What's hot (19)

PDF
An Empirical Comparison of Model Validation Techniques for Defect Prediction ...
DOC
ISTQB Advanced Study Guide - 4
PPT
Dynamic analysis in Software Testing
PDF
Software testing defect prediction model a practical approach
PDF
Comparative Performance Analysis of Machine Learning Techniques for Software ...
PPTX
Sta unit 4(abimanyu)
PPTX
Boundary value analysis and equivalence partitioning
PDF
A Review on Parameter Estimation Techniques of Software Reliability Growth Mo...
PPT
Experiments on Design Pattern Discovery
PDF
Fundamental Test Design Techniques
PDF
AI-Driven Software Quality Assurance in the Age of DevOps
PDF
Towards a Better Understanding of the Impact of Experimental Components on De...
PDF
Towards effective bug triage with software data reduction techniques
DOCX
Towards effective bug triage with software
PPT
Using Developer Information as a Prediction Factor
PPTX
Feature Selection Techniques for Software Fault Prediction (Summary)
PDF
Q44098893
PPT
Testing Fundamentals
PPTX
Test design techniques: Structured and Experienced-based techniques
An Empirical Comparison of Model Validation Techniques for Defect Prediction ...
ISTQB Advanced Study Guide - 4
Dynamic analysis in Software Testing
Software testing defect prediction model a practical approach
Comparative Performance Analysis of Machine Learning Techniques for Software ...
Sta unit 4(abimanyu)
Boundary value analysis and equivalence partitioning
A Review on Parameter Estimation Techniques of Software Reliability Growth Mo...
Experiments on Design Pattern Discovery
Fundamental Test Design Techniques
AI-Driven Software Quality Assurance in the Age of DevOps
Towards a Better Understanding of the Impact of Experimental Components on De...
Towards effective bug triage with software data reduction techniques
Towards effective bug triage with software
Using Developer Information as a Prediction Factor
Feature Selection Techniques for Software Fault Prediction (Summary)
Q44098893
Testing Fundamentals
Test design techniques: Structured and Experienced-based techniques
Ad

Viewers also liked (17)

PDF
Catalogo alta formazione
PDF
XSS Countermeasures in Grails
PDF
Corso Rspp - Responsabili del servizio prevenzione e protezione
PDF
Brunello imprese a caccia d'innovazione
PDF
Catalogo Corsi Sive Formazione 2013
PDF
Sive Formazione - corsi formazione 2012
PDF
Trattativa acquisti
PDF
Il successo nella trattativa per gli acquisti: con il Neuromarketing
PDF
LUMINARY magazine Dec15
PDF
Corso Lean Office
PDF
Catalogo corsi Sive Formazione 2012
PDF
SSH Tunneling Recipes
PDF
Understanding Java Dynamic Proxies
PDF
Corsi Sive Formazione 2011: corsi di formazione a Venezia
PDF
Grails vs XSS: Defending Grails against XSS attacks
PPTX
Cognitivism team6
PPTX
Cara membuat telepon kaleng
Catalogo alta formazione
XSS Countermeasures in Grails
Corso Rspp - Responsabili del servizio prevenzione e protezione
Brunello imprese a caccia d'innovazione
Catalogo Corsi Sive Formazione 2013
Sive Formazione - corsi formazione 2012
Trattativa acquisti
Il successo nella trattativa per gli acquisti: con il Neuromarketing
LUMINARY magazine Dec15
Corso Lean Office
Catalogo corsi Sive Formazione 2012
SSH Tunneling Recipes
Understanding Java Dynamic Proxies
Corsi Sive Formazione 2011: corsi di formazione a Venezia
Grails vs XSS: Defending Grails against XSS attacks
Cognitivism team6
Cara membuat telepon kaleng
Ad

Similar to Pesttt testing (20)

PPTX
Sta unit 2(abimanyu)
PDF
The job of software tester - How do I see software testing
PDF
PDF
Finding latent code errors via machine learning over program ...
PDF
Revisiting the Notion of Diversity in Software Testing
PPT
testing
PDF
st-notes-13-26-software-testing-is-the-act-of-examining-the-artifacts-and-the...
PDF
Let the CI spot the holes in tested code with the Descartes tool
PDF
Software Testing:
 A Research Travelogue 
(2000–2014)
PDF
Staging's channles are being tested
PDF
Software testing techniques
PDF
@#$@#$@#$"""@#$@#$"""
PDF
Content to all channels
PDF
Slideshare removal with caption
PDF
Go to all channels so that I may test your stats tom
PDF
Slideshare - Many files
PDF
Harry Potter 7-2 3D tonight!!! http://4rd.ca/aaaj6w
PPT
AutoTest.ppt
PPT
AutoTest.ppt
PPT
AutoTest.ppt
Sta unit 2(abimanyu)
The job of software tester - How do I see software testing
Finding latent code errors via machine learning over program ...
Revisiting the Notion of Diversity in Software Testing
testing
st-notes-13-26-software-testing-is-the-act-of-examining-the-artifacts-and-the...
Let the CI spot the holes in tested code with the Descartes tool
Software Testing:
 A Research Travelogue 
(2000–2014)
Staging's channles are being tested
Software testing techniques
@#$@#$@#$"""@#$@#$"""
Content to all channels
Slideshare removal with caption
Go to all channels so that I may test your stats tom
Slideshare - Many files
Harry Potter 7-2 3D tonight!!! http://4rd.ca/aaaj6w
AutoTest.ppt
AutoTest.ppt
AutoTest.ppt

Recently uploaded (20)

PDF
Getting Started with Data Integration: FME Form 101
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PDF
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Zenith AI: Advanced Artificial Intelligence
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PPTX
A Presentation on Touch Screen Technology
PDF
From MVP to Full-Scale Product A Startup’s Software Journey.pdf
PDF
Mushroom cultivation and it's methods.pdf
PPTX
Group 1 Presentation -Planning and Decision Making .pptx
PPTX
TLE Review Electricity (Electricity).pptx
PPTX
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
PDF
Hybrid model detection and classification of lung cancer
PDF
Approach and Philosophy of On baking technology
PDF
Enhancing emotion recognition model for a student engagement use case through...
PDF
Univ-Connecticut-ChatGPT-Presentaion.pdf
PDF
Accuracy of neural networks in brain wave diagnosis of schizophrenia
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
August Patch Tuesday
PDF
MIND Revenue Release Quarter 2 2025 Press Release
Getting Started with Data Integration: FME Form 101
Assigned Numbers - 2025 - Bluetooth® Document
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Zenith AI: Advanced Artificial Intelligence
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
A Presentation on Touch Screen Technology
From MVP to Full-Scale Product A Startup’s Software Journey.pdf
Mushroom cultivation and it's methods.pdf
Group 1 Presentation -Planning and Decision Making .pptx
TLE Review Electricity (Electricity).pptx
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
Hybrid model detection and classification of lung cancer
Approach and Philosophy of On baking technology
Enhancing emotion recognition model for a student engagement use case through...
Univ-Connecticut-ChatGPT-Presentaion.pdf
Accuracy of neural networks in brain wave diagnosis of schizophrenia
Building Integrated photovoltaic BIPV_UPV.pdf
August Patch Tuesday
MIND Revenue Release Quarter 2 2025 Press Release

Pesttt testing

  • 1. PEST: Programs to Evaluate Software Testing Tools and Techniques James R. Lyle National Institute of Standards and Technology 100 Bureau Drive Stop 8970 Gaithersburg, MD 20899-8970 (301) 975-3270 jlyle@nist.gov Mary T. Laamanen National Institute of Standards and Technology 100 Bureau Drive Stop 8970 Gaithersburg, MD 20899-8970 (301) 975-3260 mary.laamanen@nist.gov Neva M. Carlson National Institute of Standards and Technology 100 Bureau Drive Stop 8970 Gaithersburg, MD 20899-8970 (301) 975-3296 neva.carlson@nist.gov ×ØÖ Ø PEST is a collection of reference materials for the empirical evaluation and comparison of software testing techniques. Often the publication of a new testing technique or strategy includes a theoretical analysis and an ad hoc empirical eval- uation. Because each researcher usually uses a different set of programs for an empirical evaluation, there is little basis for comparison between different techniques. The project objective is to develop and make available to software testing researchers and tool vendors a set of refer- ence materials for the empirical evaluation and comparison of software testing techniques. This set of reference materi- als includes a diverse suite of program modules that can be the subject of a testing technique, testing support tools and examples of using PEST. Each module contains a program specification, a correct implementation in C that can be used as an oracle and several faulty C implementations, each seeded with a single fault from a commonly available fault taxonomy. The programs are designed such that a common test harness can be used to execute each faulty variant over test data for comparison against the oracle. This allows for the computation of met- rics to compare the relative effectiveness of test data gener- ated by different techniques. à ÝÛÓÖ × software testing, software metrics ½ ÁÒØÖÓ Ù Ø ÓÒ PEST is a collection of reference materials for the empirical evaluation and comparison of software testing techniques. Often the publication of a new testing technique or strategy includes a theoretical analysis and an ad hoc empirical evalu- ation. Because each researcher usually uses a different set of programs for an empirical evaluation, there is little basis for comparison between different techniques. A common suite of faulty programs would remedy this problem. The project objective is to develop and make available to software testing researchers and tool vendors a set of refer- ence materials for the empirical evaluation and comparison of software testing techniques. This set of reference materi- als includes a diverse suite of program modules that can be the subject of a testing technique, testing support tools and examples of using PEST. The PEST materials can be used to evaluate and compare both white box and black box testing techniques. In addition, the PEST materials can be used by testing tool vendors and as a supplement to class materials in academic courses or a professional training environment. This short paper describes the PEST reference materials. The heart of the collection is a suite of modules, correspond- ing to a single programming project. Each module contains a program specification, a correct (at least we do our best to make it so) implementation in C that can be used as an ora- cle and several faulty C implementations, each seeded with a single fault from a commonly available fault taxonomy[1]. The programs are designed such that a common test harness can be used to execute each faulty variant over test data for comparison against the oracle. This allows statistics to be collected for the computation of metrics to compare the rel- ative effectiveness of test data generated by different tech- niques. PEST currently contains materials created at NIST; how- ever we expect to expand the collection with contributions from the testing community. Contributions can be in many forms, such as modules in other languages (Java, C++), ex-
  • 2. perience reports, alternate fault taxonomies, or descriptions of metrics that can be used to compare testing techniques. This paper also includes examples of how the PEST ma- terials can be used to compare testing techniques, investigate a given technique or be used in teaching about testing. ¾ Ê Ö Ò Å Ø Ö Ð× The PEST reference materials are divided into four cate- gories: 1 Testable modules 2 Support Tools 3 Usage examples 4 Miscellaneous items ¾º½ ÅÓ ÙÐ × Each testable module is a simple programming project lo- cated in its own directory. Each directory contains a specifi- cation file (either plain text, Word or LATEXformat) describ- ing what the program should do, a main control program file and several faulty program version files. The control pro- gram, oracle, and the faulty versions are structured so that it is easy to verify if the input data has caused the fault to be revealed. The control program calls a verify function to compare the results of the oracle with the faulty version. The control program has the following structure: get_input(...){...} oracle(...){...} verify(...){...} main(...){... get_input(...); bug(...); oracle(...); return verify(...);} Each faulty version is a different implementation of the bug function in a separate source file. The file names are keyed to the category of the fault from Beizer’s fault taxonomy. This makes it possible to characterize the effectiveness of test cases against particular fault classes. To execute a set of test cases on a particular faulty vari- ant, the variant is compiled and then linked with the control program. The control program begins execution through the main procedure. The main procedure gets the test data, calls both the faulty and the correct versions and returns the results of a comparison to the invoking environment for tab- ulation. For some modules, the oracle can be eliminated if the verify function can determine the correctness of the computation from the input and output. An example of this might be a numerical analysis routine that can be checked by substitution of results into an inverse function that returns the original input. Modules are being developed from a variety of applica- tion areas, including system utility software, text processing, simulation, financial and games. ¾º¾ ÙÐØ Ì ÜÓÒÓÑÝ PEST uses an adapted version of Beizer’s fault taxonomy[1] as a guide for creating faulty variants. Beizer’s taxonomy is more than just bugs and faults, it is a detailed classification scheme for why is there a problem with this program. This includes everything from misunderstanding requirements to using the wrong hardware for the test. This is too general as a guide for inserting faults. We use items from only four of Beizer’s nine categories. We ignore requirements, stan- dards, system architecture, test execution and other cate- gories but use functionality, structural, data and integration as a source for fault insertion. ¾º¿ ËÙÔÔÓÖØ ÌÓÓÐ× PEST currently has several support tools: a test harness that can be used to run and tally results for multiple data sets over all the variants of a module, tailored random number generators and a module browser. The PEST browser is a graphical user interface (GUI) tool written as a Java Swing application for browsing the contents of the database. The tool allows the user to view the files stored in PEST by navigating through the particu- lar bug taxonomy. The mutant program files are categorized according to bug classes contained in the chosen taxonomy. If a specific program is selected then the user is able to si- multaneously compare code between the oracle and mutant version. Changes between the two programs are marked by color highlights. ¿ Ü ÑÔÐ × This section describes some example uses of PEST to com- pare testing techniques and teach about testing. While both the PEST module selected and the testing techniques used are trivial, the examples demonstrate how more elaborate PEST modules can be used to compare more sophisticated testing techniques. ¿º½ ÌÖ Ò Ð ÅÓ ÙÐ This example could be used with students to illustrate basic testing principles. The testing task is to try several testing strategies to the specification of the classic triangle classifi- cation program adapted from Myers[2]. The procedure is given a character string con- taining three integers separated by spaces, tabs or one sign character (plus or minus) optionally preceded by spaces or tabs. An integer contains
  • 3. the digits 0-9 with an optional plus (+) or minus (-) sign. The procedure returns an error code if the input is not three integers. The three values are interpreted as representing the lengths of the sides of a triangle. The procedure returns a code indicating whether the triangle is scalene, isosce- les, equilateral or not a triangle. The triangle module has 17 faulty variants over 9 fault classes as in the following table: Faults in Triangle Code Description N 231 Missing case 4 232 Extra case 1 3128 Control flow predicate 1 3141 Loop initial value 2 3142 Loop terminal value 1 3143 Loop increment 2 32221 Expression operator 4 32222 Expression parentheses 1 32223 Expression sign 1 Code and Description are the fault category from Beizer’s taxonomy, N is the number of faulty versions of the triangle program created for the fault category. ¿º¾ Ì ×Ø Ò Û Ø Ê Ò ÓÑ ÆÙÑ Ö× The first strategy investigated used 250 random triples uni- formly distributed from 0 up to 100. The first 35 triples trig- gered faults in 8 of the 17 faulty versions. After 145 triples, two more faults were found. No more faults were found by the remaining triples. After examining the random data it was observed that most triples were not valid triangles. This should have been expected if the probability of generating valid triangles had been considered. For example, the chance of a random equi- lateral triangle is about 1 in 10,000. To generate data with more valid triangles a second strategy was tried. The ran- dom distribution was reduced in range to from 0 up to 10. We should expect about 2 equilateral triangles from this dis- tribution and we actually got 3. The results of the second strategy were a little better, we found more faults with fewer test cases: 11 faults after 34 random triples. ¿º¿ Ì ×Ø Ò Û Ø Ê Ò ÓÑ ÌÖ Ò Ð × A third strategy of generating random triangles from each class rather than triples of random numbers was tried. First, a class was selected with equal probability from equilateral, isosceles, scalene or not a triangle, then random side lengths of up to 100 were generated in the required relationship. This strategy found 11 faults after 10 random triangles, another reduction in effort. The reduction of maximum side length in the second strat- egy had been helpful, so the final strategy was to reduce the side length to 10 for the random triangles. This strat- egy found one more fault, but 67 triangles were needed to detect the last fault. ¿º ÓÑÔ Ö ×ÓÒ ØÓ ÅÝ Ö× We also generated test data based on Myers as a comparison. The following table gives the results for each faulty version against each testing strategy. R100 and R10 are the random number triples, T100 and T10 are the random triangles and M is the data generated from Myers. Except for Myers, 50 data sets were generated and tried. The value in the table indicates how many data sets triggered the fault. There was one Myers data set that contained 29 triples. Fault Class vs Strategy Code R100 R10 T100 T10 M 231 v1 0 44 50 50 1 231 v2 44 50 50 47 1 231 v3 49 49 50 49 1 231 v4 50 50 50 50 1 232 v1 0 0 0 36 1 3128 v1 22 48 50 46 1 3141 v1 0 0 0 0 1 3141 v2 0 0 0 0 1 3142 v1 0 0 0 0 1 3143 v1 0 0 0 0 1 3143 v2 0 0 0 0 0 32221 v1 50 50 50 50 1 32221 v2 50 49 49 50 1 32221 v3 49 50 50 50 1 32221 v4 49 50 50 50 1 32222 v1 22 47 50 47 1 32223 v1 50 50 50 50 1 The faults in version 231v1 and 232v1 show a fundamen- tal weakness of random testing. A fault may require a rela- tionship among several data components that is unlikely to be generated at random. An equilateral triangle is required to trigger 231v1. This had a chance of about 1 in 10,000 for R100. The fault in version 232v1 required a right triangle to trigger, another low likelihood event. ¿º ËÙÑÑ ÖÝ The triangle module is useful for demonstrating how PEST can be used to investigate testing techniques, but is not very useful for an actual investigation. A more typical module (although still small at 500 lines and 300 statements) is the linker module. It is a subsystem of an automatic makefile generator.
  • 4. From the description of a set of object files, the linker module identifies for each main procedure the subset of object files that unambiguously de- fine procedures referenced (by an unbroken chain of references back to the main procedures ob- ject file) from within this subset of object files. The linker module also identifies procedures ref- erenced within this subset of object files but de- fined more than once or not at all. Other modules in development include a text editor, a finance application, a simulator for a simple CPU and com- ponents from game software. ÓÒØÖ ÙØ ÓÒ× There are a number of areas where contributions to PEST would be useful. 1. An improved taxonomy of software faults that also con- sidered object oriented programmingwould be helpful. 2. Additional modules, especially written in Java, would allow PEST to be used on object oriented program test- ing. Modules need to strike a balance between being large enough to yield useful results and small enough to be manageable. A module that takes several months to test would not be very useful. 3. The metrics used in the triangle example are very sim- ple. More sophisticated metrics that take into account multiple modules would be very useful. ÓÒ ÐÙ× ÓÒ× As PEST grows over time by contributions from the testing community it will become a valuable testing resource for researchers, tool vendors and educators. The Information Technology Laboratory (ITL) at NIST responds to industry and user needs for objective, neutral tests for information technology. ITL works with indus- try, research and government organizations to develop and demonstrate tests, test methods, reference data, proof of con- cept implementations and other infrastructural technologies. Tools developed by ITL provide impartial ways of measur- ing information technology products so that developers and users can evaluate how products perform and assess their quality based on objective criteria. Ê Ö Ò × [1] B. Beizer. Software Testing Techniques. Van Nostrand Reinhold International Company Limited, New York, second edition, 1990. [2] G. J. Myers. The Art of Software Testing. Wiley- Interscience, New York, 1979.