A Haskell model for examining Perl, Python, and Ruby's testing ecosystems, and in particular Cucumber

Integrating a Behavior-Driven Development
Tool into Perl’s Testing Ecosystem
Peter Sergeant
Kellogg College
University of Oxford
Trinity Term 2016
A dissertation submitted for the
MSc in Software Engineering

Abstract
Cucumber is a suite of software written in Ruby for facilitating Behavior-Driven Devel-
opment by allowing testing fixtures and assertions to be organized into Features and
Scenarios, written in a subset of natural language. Cucumber has been ported to many
languages including Perl and Python.
This dissertation starts by examining and contrasting the architectures of the testing
ecosystems in Perl, Python, and Ruby – from creating basic test assertions to produc-
ing parse-able test-run summaries – and in particular the differences in approach for
facilitating interoperability between testing libraries.
Cucumber-style testing is investigated through this lens – individual features such as
tags, step definitions and command-line tooling are explained and linked back to a more
general hierarchy suggested in the first section.
Finally the design and implementation of Test::BDD::Cucumber –– which shares primary
authorship with this dissertation — is detailed, along with reflection on that design and
implementation.

Acknowledgements
Thank-you to my tutor — Professor Jeremy Gibbons — for keeping the faith for two years,
and for the occasional 24 hour turn-around of drafts. Additionally to my long-suffering
wife who has may times found herself left alone to explore during our holidays while I sat
in hotel rooms finishing this document, and whose own academic achievements inspired
me to undertake this MSc. Finally to the other contributors to the Test::BDD::Cucumber
project, especially Erik Huelsmann who has contributed both code and gentle pressure
to improve the code base.

Contents
1 Introduction 1
1.1 Motivation of this dissertation . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Cucumber and the Platform . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Objectives and Expected Contribution . . . . . . . . . . . . . . . . . . . 2
2 A Model for Testing in Perl, Python, and Ruby 3
2.1 The Anatomy of a Simple Test Assertion . . . . . . . . . . . . . . . . . . 3
2.1.1 Perl . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.1.2 Python . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1.3 Ruby . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.1.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2 Creating Extended Assertions . . . . . . . . . . . . . . . . . . . . . . . . 10
2.2.1 Perl . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.2.2 Python . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.2.3 Ruby . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.2.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.3 A Model For Test Suites and Test Harnesses . . . . . . . . . . . . . . . . 17
2.3.1 Predicates and Test Assertions . . . . . . . . . . . . . . . . . . . 17
2.3.2 Sequencing Test-Assertions and Control Flow . . . . . . . . . . . 18
2.3.3 Modeling Meta Test-Assertion Control Flow . . . . . . . . . . . . 20
2.4 Decisions for Test Library Implementors . . . . . . . . . . . . . . . . . . 21
3 The Cucumber Model 23
3.1 A very High-Level Overview of Cucumber . . . . . . . . . . . . . . . . . 23
3.2 Organization of Assertions . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.2.1 Steps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.2.2 Scenarios . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.2.3 Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.2.4 Tags . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.3 Test Data and Fixtures . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.3.1 Parameterizable Step Definitions . . . . . . . . . . . . . . . . . . 27
3.3.2 Step Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.3.3 Outlines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.3.4 Background Sections as Transformers . . . . . . . . . . . . . . . . 28
3.4 Reporting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.5 Implementation Details . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.5.1 Integrating with Test Assertion Providers . . . . . . . . . . . . . 31
3.5.2 World . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
i

3.5.3 Running the Test Suite . . . . . . . . . . . . . . . . . . . . . . . . 32
4 Implementing Perl’s Test::BDD::Cucumber 33
4.1 An Exculpatory Note on the Code Ahead . . . . . . . . . . . . . . . . . 33
4.2 Step Definitions with Test::Builder . . . . . . . . . . . . . . . . . . . . . 33
4.2.1 Why Integrate with Test::Builder? . . . . . . . . . . . . . . . . . 33
4.2.2 A Meta Test-Assertion for Step Definitions . . . . . . . . . . . . . 34
4.3 Handling Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
4.3.1 What’s Needed . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
4.3.2 Foldable Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.3.3 Control Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.4 Data Provision, Fixtures, and the Environment . . . . . . . . . . . . . . 38
4.4.1 Test::BDD::Cucumber::StepContext . . . . . . . . . . . . . . . . . 39
4.4.2 The Stash . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.4.3 Easy Access . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.5 Output Harnesses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.5.1 Test::BDD::Cucumber::Harness::TestBuilder . . . . . . . . . . . . 40
5 Reflection 42
5.1 Comparing Perl, Python and Ruby . . . . . . . . . . . . . . . . . . . . . 42
5.2 The Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
5.2.1 The Choice of Haskell . . . . . . . . . . . . . . . . . . . . . . . . 43
5.2.2 Generating Formatted Reports . . . . . . . . . . . . . . . . . . . 44
5.2.3 The Extension Model . . . . . . . . . . . . . . . . . . . . . . . . 44
5.2.4 Further Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
5.3 Test::BDD::Cucumber . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
5.3.1 A Brief History . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
5.3.2 Further Work Planned . . . . . . . . . . . . . . . . . . . . . . . . 45
5.3.3 Reflections on the Development Process . . . . . . . . . . . . . . 46
5.4 Summary of Work Complete . . . . . . . . . . . . . . . . . . . . . . . . . 46
A A Simple Haskell Testing Library 47
A.1 Completing the Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
A.1.1 A Monadic ResultEnv . . . . . . . . . . . . . . . . . . . . . . . . 47
A.1.2 The Test Harness . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
A.2 Adding Assertions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
A.3 Outputting TAP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
Bibliography 53
ii

1 Introduction
This dissertation will explore the Behavior-Driven Development tool called Cucumber,
and examine the challenges and considerations experienced when writing an entirely new
implementation of it in Perl, tightly integrated with Perl’s extensive testing ecosystem.
The implementation whose development is explored (Test::BDD::Cucumber) was devel-
oped during attendance of the MSc in Software Engineering, and is currently being used in
a range of applications, from testing warehouse automation robots for NET-A-PORTER,
to coordinating hard-drive tests for Seagate, to providing a basis for a major open-source
ERP’s acceptance tests.
1.1 Motivation of this dissertation
Cucumber describes a suite of Ruby software which allow software tests to be defined
in a subset of natural language, and then allows those natural-language tests to be ex-
ecuted. While Cucumber refers to the specific Ruby implementation for running tests,
and Gherkin refers to the natural language subset used, common usage favors the word
Cucumber or the phrase “Cucumber-style testing” to describe any testing performed in
this style.
This dissertation will use the term Cucumber to describe the general style of testing sug-
gested by Cucumber, RCucumber when talking about the specific Ruby implementation
of Cucumber, and Test::BDD::Cucumber when talking about the Perl implementation
whose design and implementation decisions form a part of this dissertation.
Cucumber promises that it will “encourage closer collaboration, helping teams keep the
business goal in mind at all times”1
. It achieves this by defining an extensible natural
language subset to organize software tests. An example of a test case used later in this
dissertation is:
Scenario: Combining Key Presses on the Display
When I press 1
And I press 2
Then the display should show 12
Much of the value of tests written using Cucumber comes from act of collaboratively
creating descriptions of expected behavior (Wynne and Hellesøy 2012). Producing a
natural language description of a feature which a product manager agrees is a faithful
description, a developer believes has enough information to be used as the starting point
for development, and that a tester believes forms a good basis for a test helps to make
1
https://cucumber.io
1

2
sure the test strikes the right balance between describing what is being tested, and how
it’s being tested.
1.2 Cucumber and the Platform
Perl already has an especially well-developed testing ecosystem. Perl has literally thou-
sands of test-related software packages on the Comprehensive Perl Archive Network
(CPAN)2
, the vast majority of which inter-operate nicely through the Test Anything
Protocol (TAP). These packages cover almost every conceivable testing paradigm, from
unit-testing to automated specification-based testing which mirror Haskell’s QuickTest
packages.
The original Cucumber suite targeted Ruby, and Ruby’s testing eocsystem has some
interesting and significant differences from Perl’s. In order to create a well-integrated
implementation of Cucumber for Perl, these differences and their implications should be
considered.
1.3 Objectives and Expected Contribution
In Chapter 2, differences between Perl and Ruby’s testing ecosystems and architecture
are considered, as is the ecosystem and architecture of a similar popular language, Python.
A general model for describing the differences — specifically in composition and collation
of test assertions — is shown, and a set of considerations for implementers of testing
libraries is suggested.
Chapter 3 then examines RCucumber through that model and the set of considerations
from Chapter 2.
The lessons from this are used to illustrate the reasoning behind design decisions taken
during the development of Test::BDD::Cucumber in Chapter 4, and some of the more
interesting aspects of the implementation of it are illustrated.
Finally, the general applicability and utility of the model, and the use of Test::BDD::Cucumber
since its development and release is considered and reflected upon in Chapter 5.
2
https://metacpan.org

2 A Model for Testing in Perl, Python, and Ruby
A programmer moving between Perl, Python, and Ruby is unlikely to run into too many
conceptual challenges. There’s new syntax to learn, and there are a few wrinkles: someone
new to Perl will have to get used to adding strange characters to the beginning of their
variable names, someone new to Python will need to study the scoping rules (Ascher
and Lutz 1999), and someone new to Ruby will probably spend some time trying to
understand monkey patching and Eigen classes (Flanagan and Matsumoto 2008), but
the similarities vastly outweigh the differences.
One interesting variation between the three languages comes from their automated
software-testing ecosystems. None has built-in primitives for software testing in the
language itself1
, but each provides at least one library in the core distribution for
performing automated software testing, and each has a rich set of externally provided
libraries.
However, the approaches taken vary not just in implementation but the philosophy of how
testing should be approached. There’s variance inside each language’s suite of approaches
too.
Comparisons of the approaches taken are hard to find online (an early draft of this article
published on a blog leads Google’s results), so this chapter dives into the differences in
some detail.
This is achieved by examining the approach taken at various levels of testing:
• The anatomy of a simple test assertion in each language
• Extending assertions to provide improved diagnostics
• Reporting the status of a test script to users or computers, and a general model for
assertions
The lessons are then summarized, and a range of considerations for implementing software
testing libraries are described, for use in the subsequent chapter on RCucumber.
2.1 The Anatomy of a Simple Test Assertion
The term assertion in a computing context has a pedigree stretching back to Turing
(Turing 1949), Goldstine and von Neumann (Goldstine and Von Neumann 1948). An
assertion consists at least of “Boolean-valued expressions … to characterize invalid pro-
gram execution states” and a “runtime response that is invoked if the logical expression
is violated upon evaluation” (Clarke and Rosenblum 2006).
1
although Python has built-in assertions
3

4
Finding a good definition for a test assertion is a little more challenging. Wikipedia2
has
a plausible definition of a test assertion as being “an assertion, along with some form of
identifier” albeit with uninspiring references to back it up.
Kent Beck’s seminal “Simple Smalltalk Testing: With Patterns” (Beck 1994) which begat
SUnit3
, which begat xUnit4
, doesn’t mention assertions at all. Instead, he talks about
Test Case methods on a testing object which raise catchable exceptions via the testing
object.
“Unit Test Frameworks” (Hamill 2004), which covers xUnit-style testing but with a focus
on Java and JUnit uses the term “test assert” to describe this:
Test conditions are checked with test assert methods. Test asserts result in
test success or failure. If the assert fails, the test method returns immediately.
If it succeeds, the execution of the test method continues.
In both cases, the result of assertion failure is the raising of an exception, although this
need not be the case:
some assertion capabilities abort immediately, some report the violation and
then continue execution, and some either abort or continue based on the type
of the assertion (Clarke and Rosenblum 2006)
Essentially then, failed assertions can be treated as reportable events or exceptional events.
One might speculate that these differences in approaches to assertions themselves — re-
portable vs exceptional — might be expressed in differences between language approaches
to test assertion, and start to consider how this might in turn affect suggested practices
for test design.
2.1.1 Perl
By convention, test assertions written in Perl are reportable. On success or failure they
emit the string ok # or not ok # (where # is a test number) to STDOUT and this output
is intended to be easily aggregated and parsed by the software running the test cases.
These basic strings form the basis of TAP — the Test Anything Protocol. TAP is “a sim-
ple text-based interface between testing modules in a test harness”5
, and both consumers
and emitters of it exist for many programming languages (including Python and Ruby).
The simplest possible valid Perl test assertion then is:
printf(”%s %d - %sn”,
($test_condition ? ’ok’ : ’not ok’),
$test_count++,
”Test description”
)
Test::More — a very popular testing library that introduces basic test assertions and is
used by about 80% of all released Perl modules6
— describes its purpose as:
2
https://en.wikipedia.org/wiki/Test_assertion
3
https://en.wikipedia.org/wiki/SUnit
4
https://en.wikipedia.org/wiki/XUnit
5
https://testanything.org/
6
https://en.wikipedia.org/wiki/Test::More

5
to print out either “ok #” or “not ok #” depending on if a given [test assertion]
succeeded or failed”
Test::More provides ok() as its basic unit of assertion:
ok( $test_condition, ”Test description” );
and while one could use ok() to test string equality:
ok( $actual eq $expected, ”String matches $expected” );
Test::More builds on ok() to provide tests which “produce better diagnostics on failure
… [and that] know what the test was and why it failed”. is() can be used to rewrite the
assertion:
is( $actual, $expected, ”String matches $expected” );
and returns detailed diagnostic output, also in TAP format:
not ok 1 - String matches Bar
# Failed test ’String matches Bar’
# at -e line 1.
# got: ’Foo’
# expected: ’Bar’
Test::More ships with a variety of these extended equality assertions.
However, the Test::More documentation has not been entirely honest with us. A quick
look at its source code7
shows nary a print statement, but instead a reliance on what
appears to be a singleton object of the class Test::Builder, to which ok() and other test
assertions delegate. Test::Builder is described as8
:
a single, unified backend for any test library to use. This means two test
libraries which both use Test::Builder can be used together in the same pro-
gram
Which is an intriguing claim to be re-examined when looking at how assertions are ex-
tended. The following code:
ok( 0, ”Should be 1” );
ok( 1, ”Should also be 1” );
will correctly flag that the first assertion failed, but then will continue to test the second
one.
2.1.2 Python
Python provides a built-in assert function whose arguments closely resemble the afore-
mentioned “assertion, along with some form of identifier”:
assert test_condition, ”Test description”
When an assertion is false, an exception of type AssertionError is raised associated with
the string Test description. Uncaught, this will cause the Python test to exit with a
non-zero status and a stack-trace printed to STDERR, like any other exception.
7
https://metacpan.org/source/EXODIST/Test-Simple-1.302047/lib/Test/More.pm
8
https://metacpan.org/pod/Test::Builder

6
AssertionError has taken on a special signifying value for Python testing tools — vir-
tually all of them will provide test assertions which raise AssertionErrors, and will
attempt to catch AssertionErrors having evaluated an assertion, transmuting them to
failures rather than exceptions. The unittest documentation provides9
a clear description
of this distinction:
If the test fails, an exception will be raised, and unittest will identify the test
case as a failure. Any other exceptions will be treated as errors. This helps
you identify where the problem is: failures are caused by incorrect results —
a 5 where you expected a 6. Errors are caused by incorrect code — e.g., a
TypeError caused by an incorrect function call.
Which is foreshadowed by Kent Beck’s (Beck 1994) explanation:
A failure is an anticipated problem. When you write tests, you check for
expected results. If you get a different answer, that is a failure. An error is
more catastrophic, a error condition you didn’t check for.
Like Test::More’s ok(), there’s no default useful debugging information provided by
assert other than what you provide. One can explicitly add it to the assert statement’s
name, which is evaluated at run-time:
assert expected == actual, ”%s != %s” % (repr(expected), repr(actual))
However, evidence that assert was not really meant for software testing starts to emerge
as one digs deeper. Calls to the function are stripped out when running the code in
production mode, and there’s no built-in mechanism for seeing how many times assert
was called — code runs that are devoid of any assert statement are indistinguishable
from code in which every assert statement passed. This makes use of it by itself an
unsuitable building block for testing tools, despite its initial promise — an assertion
borne out by the approaches taken by Python’s testing libraries:
PyTest — a “mature full-featured Python testing tool”10
— deals with this by essentially
turning assert into a macro. Testing libraries imported by code running under PyTest
will have their assert statements rewritten on the fly to produce testing code capable of
useful diagnostics and instrumentation. Running:
assert expected == actual, ”Test description”
using python directly gives us a simple error message:
Traceback (most recent call last):
File ”assert_test.py”, line 4, in <module>
AssertionError: Test description
where running the same code under py.test gives us proper diagnostic output:
======================= ERRORS ========================
___________ ERROR collecting assert_test.py ___________
assert_test.py:4: in <module>
E AssertionError: Test description
E assert ’Foo’ == ’Bar’
9
https://docs.python.org/2/library/unittest.html
10
http://www.pytest.org

7
=============== 1 error in 0.01 seconds ===============
unittest, an SUnit descendant that’s bundled as part of Python’s core library, provides
its own functions that directly raise AssertionErrors. The basic unit is assertTrue:
self.assertTrue( testCondition )
unittest’s test assertions are meant to be run inside xUnit-style Test Cases which are run
inside try/except blocks which will catch AssertionErrors. Amongst others, unittest
provides an assertEqual:
self.assertEqual( expected, actual )
However, the test name has had to be removed in order to get a useful diagnostic message
(this default functionality can be changed):
Traceback (most recent call last):
File ”test_fizzbuzz.py”, line 11, in test_basics
self.assertEqual( expected, actual )
AssertionError: ’Foo’ != ’Bar’
Much as Perl unifies around TAP and Test::Builder, Python’s test tools unify around
the raising of AssertionErrors. This means they raise exceptional test assertions, rather
than reportable ones. A practical difference between these approaches is that a single
test assertion failure in Python will cause other test assertions in the same try/except
scope not to be run — and in fact, not even acknowledged. Running:
assert 0, ”Should be 1”
assert 1, ”Should also be 1”
gives us no information at all about the second assertion. This is consistent with Hamill
(Hamill 2004):
Since a test method should only test a single behavior, in most cases, each
test method should only contain one test assert.
2.1.3 Ruby
Ruby’s testing libraries have no unification point, other than that each uses exceptional
rather than reporting test assertions.
This can perhaps best be illustrated by examining a library like wrong11
, which aims to
provide test assertions to be used inside tests targeting the three most common testing
libraries: RSpec12
, minitest13
and Test::Unit14
.
wrong provides only one main test assertion, assert, which accepts a block expected
to evaluate to true; if it doesn’t, a Wrong::Assert::AssertionFailedError exception is
raised. This is a subclass of RuntimeError, which is “raised when an invalid operation is
attempted”15
, which in turn is a subclass of StandardError, which in turn is a subclass
of Exception, the base exception class.
11
https://github.com/sconover/wrong — wrong is not a particularly popular testing library, but will
serve a purpose in showing how the popular ones work
12
http://rspec.info/
13
https://github.com/seattlerb/minitest
14
https://github.com/test-unit/test-unit
15
http://ruby-doc.org/core-2.1.5/RuntimeError.html

8
Exception has some useful methods for an exception class — a way of getting a stack trace
(backtrace), a descriptive message (message), and a way of contextually re-raising excep-
tions (cause). RuntimeError and StandardError don’t specialize the class in anything
but name, and thus simply provide meta-descriptions of exceptions.
Figure 2.1: Ruby testing library exception class hierarchy
Ruby requires one to raise exceptions that are descended from Exception16
. Exceptions
that inherit from StandardError are intended to be ones that are to some degree expected,
and thus caught and acted upon. As a result, rescue — Ruby’s catch mechanism — will
catch these exceptions by default17
.
Those that don’t inherit from StandardError have to be explicitly matched, either by
naming them directly in the rescue statement, or by specifying that you want to catch
absolutely any type of exception raised.
Thus wrong considers its exceptions to be expected errors that should be caught. How-
ever, wrong also knows how to raise exceptions for the three major/most-popular testing
libraries.
When being used with RSpec — “a behavior driven development framework”18
— wrong’s
16
https://robots.thoughtbot.com/rescue-standarderror-not-exception
17
http://ruby-doc.org/core-2.1.5/StandardError.html
18
http://rspec.info/

9
RSpec adapter will raise exceptions of type RSpec::Expectations::ExpectationNotMetError,
which inherit directly from Exception.
No structured diagnostic data is included in the exception — the diagnostics have been
serialized to a string by the time they’re raised and are used as the Exception’s msg
attribute. No additional helper methods are included directly in the exception class,
either.
When wrong is used with minitest — “a small and incredibly fast unit testing frame-
work”19
— it uses an adapter to raise instances of MiniTest::Assertion. Like RSpec’s
exception class, this also inherits directly from Exception, meaning it won’t get caught
by a default use of rescue. While it does include a couple of helper methods, these are
simply convenience methods for formatting the stack trace, and any diagnostic data is
only available serialized into a string.
Finally, wrong will raise Test::Unit::AssertionFailedError exceptions when used with
Test::Unit — a “unit testing framework … based on xUnit principles”20
. These inherit
from StandardError, so are philosophically “expected” exceptions. Of more interest is
the availability of public attributes attached to these exceptions — expected and actual
values are retained, as is the programmer’s description of the exception. This allows for
some more exciting possibilities for those who might wish to extend it — for example
by adding diagnostic information showing a line-by-line diff, language localization, or
type-sensitive presentations.
In any case, as per Python’s use of raised exceptions to signal test failure, some form of
catcher is needed around any block containing these assertions. As raising an Exception
will terminate execution of the current scope until it’s caught or rescued, only the first
test assertion in any given try/catch scope will actually run.
Ruby’s testing libraries’ assertions lack any form of built-in unification — as wrong shows,
to write software integrating with several of them requires you to write explicit adapters
for them. As there’s no built-in assertion exception class (like there is in Python), different
testing libraries have chosen different base classes to inherit from, and thus there’s no
general way of distinguishing a test failure raised via a test assertion from any other
runtime exception, short of having a hard-coded list of names of exception-class names
from various libraries.
2.1.4 Summary
The effects of these differences will become more apparent as the ways of extending the
test assertions provided are examined. In summary:
• Perl communicates the result of test assertions by pushing formatted text out to
a file-handle — perhaps appropriately for the Practical Extraction and Reporta-
tion Language21
. However, in practice this is managed by a singleton instance of
Test::Builder which the vast majority of Perl testing libraries make use of. Perl’s
test assertions report their results, rather than raising exceptions.
19
http://docs.seattlerb.org/minitest/
20
https://github.com/test-unit/test-unit
21
Even if the acronym has been retro-fitted

10
• Python’s testing infrastructure appears to have evolved from overloading the use
of a more generic assert function which provides runtime assertions and raises
exceptions of the class AssertionError on failure. The use of a shared class be-
tween testing libraries for signaling test failure would seem to allow a fair amount
of interoperability. Python’s test assertions are then all exceptional, rather than
reporting.
• Ruby has several mutually incompatible libraries for performing testing, each of
which essentially raises a subclass of Exception on test failure, but each of which
has its own ideas about how that exception should look, leading to an ecosystem
where auxiliary testing modules need to have adapters to work with each specific
test system. Again, Ruby’s test assertion are exceptional, rather than reporting.
2.2 Creating Extended Assertions
The illustration of the design and architecture of each language’s testing tools continues
with examining how to add customized and extended test assertions for each language.
This section will start with the relatively straightforward task of checking a hash (Ruby
and Perl’s name for an associative array22
) for the presence of a given key — with diag-
nostics returned to the user on failure — but will then complicate that task by examining
how one would go about generalizing the extended assertion to work across different
testing libraries.
2.2.1 Perl
Perl — like Python and Ruby — has a built-in feature for checking if a key exists in a
hash, which returns a true or false value: exists $hash{$key}. Perl’s basic test assertion
Test::More::ok(), accepts a boolean input, so the two can be simply combined:
ok( exists $hash{$key}, ”Key $key exists in hash” );
Ideally one would make life easier for developers writing tests by showing more diagnostic
information to whomever is investigating the failed test assertion — perhaps a list of
available keys.
Perl’s test assertions report, rather than raise exceptions, which means they can addi-
tionally return a value to the caller. Test::More::ok() helpfully returns the value of the
predicate under test, and that can be used to decide whether to print diagnostics using
Test::More::diag():
unless ( ok( exists $hash{$key}, ”Key $key exists in hash” ) ) {
diag(”Hash contained keys: ” . join ’, ’, sort keys %hash )
}
This prints the more helpful TAP output:
not ok 1 - Key Waldo exists in hash
# Failed test ’Key Waldo exists in hash’
# at waldo.t line 16.
# Hash contained keys: Bar, Baz, Foo
22
From the term hash table, which is evocative of an obvious implementation detail

11
where diagnostics are separated from test results using a # — reflecting Perl’s commenting
style.
This is easily packaged up into a reusable function (which also returns the predicate
value):
sub test_hash_has_key {
my ( $class, $hash, $key ) = @_;
if ( ok( exists $hash->{$key}, ”Key $key exists in hash” ) ) {
return 1;
} else {
diag(”Hash contained keys: ” . join ’, ’, sort keys %$hash );
return 0;
}
}
The use of the Test::More function ok() adds a layer of unneeded indirection; it’s possible
instead to talk directly to the Test::Builder singleton that Test::More and virtually every
other Perl testing library uses under the hood:
use base ’Test::Builder::Module’;
sub test_hash_has_key {
my ( $class, $hash, $key ) = @_;
# Get the Test::Builder singleton
my $builder = $class->builder;
# Run the test, and save its pass/fail state in $status
my $status = $builder->ok(
exists $hash->{$key},
”Key $key exists in hash”
);
# Print diagnostics if it failed
unless ( $status ) {
$builder->diag(
”Hash contained keys: ” . join ’, ’, sort keys %$hash );
}
# Pass back the test/fail status to the caller
return $status;
}
The result is very simple code, but also very flexible code: it can be used almost anywhere
in the Perl testing ecosystem. It reports the status of the test assertion on both success
and failure, and adds diagnostics on failure, in a way that will integrate with no additional
work with test suites built on:
• Test::Class, Perl’s xUnit work-alike

12
• Test::Expectation, Perl’s RSpec clone
• Test::WWW::Mechanize, a testing library that drives a web browser
• Test::BDD::Cucumber, Perl’s Cucumber port (which will be examined in Chapter
4)
• And almost without exception, every other one of Perl’s many testing libraries
2.2.2 Python
Python has a simple and particularly readable structure for testing for key membership
of a dict (Python-esque for a hash). One can signal failure of a test assertion in Python
in a portable way that will be caught and understood by the majority of testing libraries;
simply raise an AssertionError:
if key not in d:
raise AssertionError(”Key %s does not exist in dict” % repr(key) )
Extension with diagnostic information is slightly more complicated, as there’s no standard
way to do it across different libraries (or even with most libraries). One can can manually
write to STDERR, and hope for the best:
if key not in d:
keydesc = ”, ”.join(d.keys())
sys.stderr.write(”Dict contained keys: %s” % keydesc)
raise AssertionError(”Key %s does not exist in dict” % repr(key) )
Text to STDERR is explicitly summarized when the test is run using PyTest, but other-
wise has nothing to distinguish it as relating to the test rather than any other expected
diagnostic output.
One could instead add the diagnostics to the message in the raised AssertionError.
At the point where the AssertionError is being raised, there’s already a problem, so
diagnostics are likely appropriate:
if key not in d:
raise AssertionError(
”Key %s does not exist in dict; extant keys: %s”%
(repr(key), keydesc) )
There are wo problems with this approach:
Firstly, explicitly raising an AssertionError means that no actual test assertion is being
exercised — there is no positive assertion path. The parent library is unable to detect
that a test assertion has run, and so can’t keep statistics (such as counting the number
of assertions run or printing descriptions of successful assertions), can’t make use of any
special behavior encoded in its test assertions (such as coverage tracing), nor can it assign
benchmarking results to given assertions.
This first problem can be solved by moving the predicate itself (key not in d) into a
library-provided test assertion. But this has to be done separately and differently for
every testing library to be integrated with, much as wrong in the Ruby section did:
# unittest
self.assertTrue( key in d, ”Key %s exists in dict” % repr(key) )

13
or
# PyTest
assert key in d, ”Key %s exists in dict” % repr(key)
This degrades the apparent unity of Python testing libraries — code can only be re-used
between them if the positive assertion path is ignored. More on this momentarily.
The second problem concerns having conflated diagnostic information and the assertion
identifier. The earlier Wikipedia-derived definition for a test assertion — “an assertion,
along with some form of identifier” — hints that it would be useful to identify and track
assertions. Overloading the identifier to include diagnostics isn’t just philosophically
ugly on account of its mixing of concerns, it hampers the ability to reliable identify an
assertion. No longer can a continuous deployment tool running the tests keeps statistics
on assertions that frequently fail, or accurately benchmark the amount of time taken to
get to an assertion over time.
One could attempt to solve both problems by relying on the knowledge that test assertions
from all libraries will raise the same kind of catchable exception. For example, by running
the appropriate test assertion, and intercepting the raised exception to add diagnostics:
try:
msg = ”Key %s exists in dict” % repr(key)
if ( ’unittest_object’ in vars() ):
unittest_object.assertTrue( key in d, msg )
else:
assert key in d, msg
except AssertionError:
sys.stderr.write(”Dict contained keys: %s” % keydesc)
raise
except:
raise
The lack of a defined method for communicating diagnostics means there will always be
an unpalatable choice between pushing diagnostics as unstructured text to the potentially
noisy STDERR, or overloading the test assertion identifier.
xUnit-based testing libraries (like unittest) work around both issues by making the
smallest-reportable unit the Test Case (Hamill 2004) — a named block that can po-
tentially contain several test assertions — rather than the actual test assertions they
provide. The test assertions would be collated into one named meta test-assertion:
def test_waldo_found(self):
d = self.fixture_dict
# A real unittest assertion
self.assertTrue( len(d) > 0, ”dict has some items” )
# Manually raising a failure
if ”Waldo” not in d:

14
”Key ’Waldo’ does not exist in dict; extant keys: %s” %
keydesc )
If the method representing the Test Case is treated as a meta test-assertion for reporting —
rather than recording the status of the test assertions it’s made up of — then a positive
test assertion path is regained (the Test Case was run and did not fail), as is a test
assertion identity separate from diagnostics information (the method name of the Test
Case vs the message in the the raised AssertionError).
Essentially, if:
• one trusts that the developer using the test assertion is treating it as a small part
of a bigger meta test-assertion; and
• the bigger meta test-assertion has a stable and high-quality identifier; and
• the bigger meta test-assertion not failing is recorded and treated as an assertive
success
then one can fall back to the already-seen solution of explicitly raising a AssertionError
and overloading its test name with diagnostics:
if key not in d:
”Key %s does not exist in dict; extant keys: %s”%
(repr(key), keydesc) )
Python presents a unified mechanism for representing test assertion failure23
, but there
is no unified mechanism for representing assertion success, and thus no mechanism for
specifying more generally that a test assertion took place. Lack of a specific diagnostic
channel for assertions to use mean the author of an extended diagnostic test assertion
will need to think carefully about how to provide this information.
In practice though, the majority of Python testing is done using unittest (or tools based
on it) — which uses the Test Case pattern above — or via PyTest24
— whose default is
to look for testing classes with test_ methods, and thus also implement the Test Case
pattern. Assuming one sticks to tools using this pattern, the pitfalls and distinctions
above are — literally — academic only, and only of interest to those trying to understand
the implementation details.
2.2.3 Ruby
Unlike either Perl or Python, Ruby’s testing tools have not coalesced around any shared
approach. The approach of wrong has been examined in Section 2.1.3 — a library with
adapters that allow a given test assertion function to raise an exception of the appropriate
type for the testing library been used.25
To then compare Ruby to Perl and Python by developing a specific extended diagnostic
assertion seems unrewarding. However, both wrong and Test::Unit have interesting takes
23
or more accurately, Python’s testing tools have unified around treating the built-in AssertionError
exception as such
24
or both together
25
Entirely anecdotally — while researching this topic — there seems to be a prevalent sentiment that
people only use RSpec or minitest, or derived tools, and that once one had settled on one, one was
expected to work inside the ecosystem of that particular tool only

15
on how their assertion diagnostics are raised, so this section will look at them in more
detail.
Specifically not mentioned here are minitest, which takes the same approach as Python’s
unittest, and RSpec, a Behavior Driven Development tool which will be looked at in a
little more detail at the same time as Cucumber.
wrong
Every other test assertion library looked at so far provides a method for asserting truth,
a method for asserting equality with some diagnostic capabilities, and a set of other
extended diagnostic test assertions.
wrong provides only a single method — assert {block} — which accepts a block of code
expected to be a predicate expression. When the block evaluates to true, the code moves
on. When the block to evaluates false, a more in-depth process is kicked off.
The assert method determines which file on the file-system it’s in, and what line number,
and that file is then opened and the block is located and statically parsed!26
The boolean-
returning expression in the block is then split into sub-expressions (if they exist), and the
boolean value of each is shown. For example, and from wrong’s documentation:
x = 7; y = 10; assert { x == 7 && y == 11 }
==>
Expected ((x == 7) and (y == 11)), but
(x == 7) is true
x is 7
(y == 11) is false
y is 10
wrong’s documentation explicitly discourages adding identifier names to test assertions
created with assert on the basis that the predicate itself should be sufficient documen-
tation:
if your assertion code isn’t self-explanatory, then that’s a hint that you might
need to do some refactoring until it is.
In the example above, x == 7 && y == 11 is expected to act both as the identifier and
the assertion.
On failure, and in raising its own exception class, wrong merges the stringified predi-
cate that acts as an identifier into its diagnostics for the failure. This approach extends
to its design of adapters for other exception classes too. While Test::Unit’s exception
class (examined next) supports a distinction between these, wrong assumes that all ex-
ception classes it has been adapted to raise exceptions using will also simply use a string
containing both diagnostics and assertion identifier.
Test::Unit
Test::Unit is an occasionally-bundled-with-Ruby27
xUnit-derivative, which provides an
assert_equal() test assertion. Like the other xUnit descendants (such as unittest above),
26
https://github.com/sconover/wrong
27
http://www.slideshare.net/kou/rubykaigi-2015

16
it requires test assertions to be used in named Test Case blocks, which it uses to identify
tests.
By default:
def test_simple
actual = ”Bar”
assert_equal(”Foo”, actual, ”It’s a Foo” )
will die, but will interestingly not conflate diagnostics and identifiers:
Failure: test_simple(TUSimple)
TU_Simple.rb:8:in ‘test_simple’
5:
6: def test_simple
7: actual = ”Bar”
=> 8: assert_equal(”Foo”, actual, ”It’s a Foo” )
9: end
10:
11: end
It’s a Foo
<”Foo”> expected but was
<”Bar”>
The enclosing test case is what’s marked as failed (Failure: test_simple(TUSimple)),
and the test assertion’s name is presented separately (It’s a Foo) to the diagnostic
message (<”Foo”> expected but was <”Bar”>) and stack trace.
Indeed, Test::Unit raises exceptions of the class Test::Unit::AssertionFailedError,
which has explicit attributes supporting an expected value, an actual value, and a mes-
sage, separately28
.
This seems like a best of both worlds approach for test assertions that result in exceptions
— passing the diagnostic information back to the test harness distinct from both the test
assertion name, and distinct from the wider containing test name. Test::Unit — via
plugins — is able to support output from its tests that make use of this distinction,
including a TAP output module.
2.2.4 Summary
The differences between reporting test assertions and exceptional test assertions have
started to become more clear as the examination continues, and the concept of a meta
test-assertion (such as an xUnit-style Test Case) has been introduced.
Both reporting and exceptional test assertions seem to have advantages and disadvan-
tages.
Perl’s reporting-assertion-based approach means that a failed test assertion doesn’t derail
sibling assertions from being run — one can run a lengthy series of assertions in series, and
a single failing one near the beginning won’t stop further potentially useful diagnostics
on other facets of the code from being generated.
28
https://github.com/test-unit/test-unit/blob/master/lib/test/unit/assertion-failed-error.rb

17
A desire to see the outcome of several facets of the code under test may incentivize users
of exceptional test assertions to organize their tests into smaller units that individually
test these facets. This starts to resemble the xUnit ideal that “a test method should only
test one behavior … when there is more than one condition to test, then a test fixture
should be set up, and each condition placed in a separate test method” (Hamill 2004).
This gentle pressure from the tooling to design tests around small units is not there in
Perl (and presumably other languages using reporting-assertion-based approaches), and
— anecdotally — this can often lead to tests in Perl being written in a long, meandering
style that mixes test assertions directly into fixture code with no clear separation.
In Ruby and Python, the combination of this pressure and the lack of a defined diagnostics
channel has apparently led to the xUnit style being the default — named blocks that
enclose a small number of assertions are considered to be the tests that are run, not the
individual assertions. In that context, these blocks are the smallest unit identified by test
harnesses, and the messages contained in raised exceptions are seen solely as diagnostic
information. These blocks have been referred to as meta test-assertions so far in this
chapter.
2.3 A Model For Test Suites and Test Harnesses
2.3.1 Predicates and Test Assertions
This chapter has so far described test assertions as operating on predicates – expressions
that will evaluate to true or false. In the examples seen so far, a False result can also
include diagnostic information.
data Result = Pass
| Fail Diagnostics
deriving (Show)
type Predicate e = e -> Result
Given this dissertation deals with dynamic languages, side-effects such as exceptions or
mutation of the environment may occur:
• Data set up to be operated upon by tests — fixtures — may be altered as part of
the evaluation of the predicate, as might other values in the environment
• The evaluation of the predicate itself may be unable to be completed, and a runtime
exception must be raised
The model then needs to be able to describe a result that passes or fails (Result), the
new environment that exists after evaluation, and whether or not that evaluation caused
an exception (Left e or Right e1):
data ResultEnv e e1 = ResultEnv Result (Either e e1)
deriving (Show)
The assertion itself can be modelled as a function that maps from one environment to
another, with a result:
type Assertion e e1 = e -> ResultEnv e e1
and this made into a test assertion with the addition of an identifier:

18
data TestAssertion e e1 = TestAssertion Identifier (Assertion e e1)
2.3.2 Sequencing Test-Assertions and Control Flow
A developer, a tester, or a continuous delivery tool is ultimately interested in test asser-
tions to signal whether a piece of software performs its tasks correctly, and perhaps what
remedial actions need to be taken.
Any indication that the software does not (Fail Diagnostics) or performs in a way
that was unexpected (a Left value being returned) is likely lead the test interpretor to
conclude that the end state of the whole sequence of assertions is that of failure, and
take appropriate action. The result of previous evaluations needs to be considered in
subsequent ones, and thus a way of combining results is needed:
defResult = Pass
addResult (Fail d) (Fail d’) = Fail $ mappend d d’
addResult (Fail d) _ = Fail d
addResult _ x = x
instance Monoid Result where
mempty = defResult
mappend = addResult
More generally, a way is also needed of providing the environment left by the last eval-
uation to the next one, and for stopping on failure — essentially a way of binding them
together:
instance Monad (ResultEnv e) where
return e = ResultEnv Pass (Right e)
(ResultEnv r (Left x)) >>= _ = ResultEnv r (Left x)
(ResultEnv _ (Right x)) >>= f = ResultEnv r o
where (ResultEnv r o) = f x
A failure is expected to be the zero value for combining results — any sequence of results
with a failure in it will be a failure. This behavior is implemented in all implementations
covered so far. However other behaviors related to collections of test assertions and
meta test-assertions differ both between testing libraries and indeed inside the libraries
themselves.
The following will be used to illustrate combination choices:
eg = ( idTestGroupX, [
( idTestGroupX1, [tX1a, tX1b]),
( idTestGroupX2, [tX2a, tX2b]),
( idTestGroupX3, []) ])
Continuation after a failure
Two classes of test assertion have been seen so far — reportable test-assertions and excep-
tional test-assertions. Using reportable test-assertions like those built with Test::Builder’s
ok() method, a test assertion failing will not prevent subsequent sister test assertions from
running:

19
ok( 0, ”This fails” );
ok( 1, ”This is evaluated anyway” );
A test assertion built with unittest’s assertTrue will — through virtue of raising a ex-
ception (albeit of a special type) — prevent sibling test assertions from running:
self.assertTrue( False, ”This fails” )
self.assertTrue( True, ”This is never evaluated” )
If tx1a fails in the illustrative example, one might reasonably expect tx1b to only be run
if the test assertions are reportable. However, when test assertions are collated into meta
test-assertions provided by the testing library, behavior may change in this regard.
If Test Group X1 as a whole is marked as a failure (due to failure of tx1a, whether or
not Test Group X2 is evaluated depends on the type of the meta test-assertion.
While the basic test assertions for unittest are exceptional, the Test Methods that they’re
collected into are reportable, as are the Test Classes that those are collected into.
This logic could be placed in assertions themselves, but this limits flexibility. Instead,
the model has a function that accepts a description of desired behavior (Exceptional or
Reportable) and adjusts results appropriately:
transmute :: AssertionType -> ResultEnv e e -> ResultEnv e e
-- For all AssertionTypes
--
-- Passes are passed through
transmute _ r@(ResultEnv Pass (Right x)) = r
-- Exceptions that are passes become failures
transmute _ r@(ResultEnv Pass (Left x)) =
ResultEnv (Fail failedToCompile) (Left x)
-- Exceptional
--
-- A failure in Exception mode becomes an exception
transmute Exceptional (ResultEnv (Fail d) (Right x)) =
ResultEnv (Fail d) (Left x)
-- A exception in Exceptional mode is passed through
transmute Exceptional r@(ResultEnv (Fail d) (Left x)) = r
-- Reportable
--
-- A failure in Reportable mode is passed through
transmute Reportable r@(ResultEnv (Fail d) (Right x)) = r
-- An exception in Reportable mode is passed through
transmute Reportable r@(ResultEnv (Fail d) (Left x)) = r
Catching Exceptions
Related to whether test assertions are reportable or exceptional — how should unexpected
exceptions be handled? Should an exception cause sibling test assertions to be skipped
and control handed back to the enclosing meta test-assertion? While the overall Result

20
shape of the enclosing meta test-assertion will not be changed, the environment may be
mutated further, and further diagnostics from failures and stack traces may be added.
For all libraries examined so far, exceptions at the level of test assertions will cause siblings
to be skipped. However, each library has at least one meta test-assertion construct that
will catch exceptions, and continue to run sibling meta test-assertions.
These meta test-assertions essentially treat exceptions as failures, a behavior which can
be added to the transmute function:
-- Catchable
--
-- A failure in Catchable mode is passed through
transmute Catchable r@(ResultEnv (Fail d) (Right x)) = r
-- An exception in Catchable mode is changed to a fail
transmute Catchable r@(ResultEnv (Fail d) (Left x)) =
ResultEnv (Fail $ mappend d recovering) (Right x)
Assertions types are then:
data AssertionType = Exceptional | Reportable | Catchable
Empty Sequences
Another aspect to consider is the behavior of Test Group X3, an empty sequence of
assertions. Does Test Group X3 pass because no test assertion failures were recorded, or
does it fail as no test assertion successes were recorded?
Test::Builder’s reportable test-assertions require positive proof that tests passed, or the
tests are marked as failing:
# Subtest: Test Group X3
1..0
# No tests run!
not ok 1 - No tests run for subtest ”Test Group X3”
Where unittest’s exceptional test-assertions consider empty Test Methods and Test
Classes to be passing, due to the absence of failure.
This property of meta test-assertions will also need to be recorded in the model — a
function that accepts the desired behavior and returns a predicate yielding an appropriate
ResultEnv:
data EmptyBehavior = Succeeds | Fails
emptyAssertion :: EmptyBehavior -> e -> ResultEnv e e
emptyAssertion Succeeds e = ResultEnv Pass (Right e)
emptyAssertion Fails e = ResultEnv (Fail dEmpty) (Right e)
2.3.3 Modeling Meta Test-Assertion Control Flow
A generalized meta test-assertion model must allow the attributes in the previous section
to be recorded alongside a sequence of test assertions or meta test-assertions that comprise
it.

21
In the model the attributes are stored as fields in a Configuration data type:
data Configuration = Configuration {
assertionType :: AssertionType,
emptyBehavior :: EmptyBehavior
}
Collections of test assertions can then be described as meta test-assertions, with an
identifier and with the desired sequencing configuration:
data MetaTestAssertion e e1 =
Single (TestAssertion e e1)
| Sequence Configuration Identifier
[MetaTestAssertion e e1]
MetaTestAssertion allows all test groupings seen so far to be modeled and their behavior
described — consider for example the xUnit groups as implemented by unittest:
A test method is a Python block containing a sequence of assertX test assertions; a
failure raises an exception, stopping further execution of test assertions in that block. An
empty block is a pass:
testMethod = Sequence Configuration {
assertionType = Exceptional,
emptyBehavior = Succeeds }
A test class contains many test methods, but if any fail then the test class should continue,
as it should in the case of an exception, making them Catchable:
testClass = Sequence Configuration {
assertionType = Catchable,
A test suite container for test classes which required that at least one test class exists
completes the xUnit hierarchy:
testSuite = Sequence Configuration {
emptyBehavior = Fails }
and the illustrative example — now with information about sequencing embedded – can
be written as:
s = testSuite idTestSuite [
testClass idTestClass [
testMethod idTestMethodX1 [Single tX1a, Single tX1b],
testMethod idTestMethodX2 [Single tX2a, Single tX2b],
testMethod idTestMethodX3 [] ] ]
2.4 Decisions for Test Library Implementors
This chapter has considered a number of different facets of the Perl, Python and Ruby
testing infrastructures. These considerations are particularly relevant to those looking to
implement a testing library targeting one of those languages, and searching for lessons
from an existent one.

22
Integration with Existing Testing Infrastructure and Ecosystem
What level of integration with the existing testing infrastructure and ecosystem should
be aimed for, and what language-specific considerations are there?
A developer targeting Perl will need to be mindful of Perl’s reporting-based test assertions,
and will wish to make fully exploit the existing Test::Builder infrastructure. A developer
targeting Python would want to make sure their library understood and reified exceptions
inheriting from AssertionError, and a Ruby developer might well wish to simply choose
a single existing Ruby library to build upon.
Meta Test-Assertions and Other Platform Norms
Consideration should also be given to the target platform’s existing ideas of meta test-
assertions — are there organizational structures and norms (such as Test::Builder’s sub-
tests) that should be respected and utilized so as to be optimally familiar to experienced
developers for that platform?
Are tests found and run according to certain conventions? For example, Ruby developers
may well expect their test suites to be runnable via a rake task, where Perl developers
would expect tests to be findable and runnable via an entry point in the ./t directory of
their project.
Creation and Organization of Test Data and Fixtures
Are there conventions for setting up and providing test data and fixtures to blocks con-
taining test assertions? Most xUnit descendants will have a Test Fixture class with setUp
and tearDown methods, and access to those via a Test Caller class (Hamill 2004). Is
there a testing context where this (and other test run contextual) data is held, and how
do assertions access it?
Reporting and Collation
How should success and failure of test assertions be captured and reported upon? For
example, a Perl developer would expect any form of test run against their code-base
(regardless of what language the test itself was implemented in) to output TAP, and
would either expect a hosting continuous delivery tool to understand TAP or a format to
which TAP could easily be converted.

3 The Cucumber Model
This chapter will examine both the implementation of RCucumber, and the structure of
Gherkin, the language in which Cucumber tests are defined, and link both back to the
model constructed in Chapter 2. That examination and the lessons from it will be used
to explain choices made in the implementation of Test::BDD::Cucumber in Chapter 4.
In the last chapter, four major questions for designers of testing libraries were raised:
• What level of integration with the existing testing infrastructure and ecosystem of
the host language is provided?
• What meta test-assertions are provided or suggested, and how are the component
(single or meta) test assertions inside those selected and composed?
• What features are provided or suggested for creation and organization of test data
and fixtures?
• How are results reported upon for human and software audiences?
This chapter will examine those questions for the Cucumber implementation in Ruby
(RCucumber), the original and reference implementation.
3.1 A very High-Level Overview of Cucumber
Cucumber’s promise is that
it has been designed specifically to ensure the acceptance tests can easily be
read—and written—by anyone on the team. (Wynne and Hellesøy 2012).
Software features are described in a way that must satisfy the customer that the correct
behavior is being described, but also written sufficiently tersely and specifically that a
developer implementing test code believes they can literally organize their test assertions
around that description.
Consider the following Cucumber feature description of a simple hand-held calculator:
1 Feature: Basic Functionality
2
3 Background:
4 Given a fresh calculator
5 Then the display should be blank
6
7 Scenario: First Key Press on the Display
8 When I press 1
9 Then the display should show 1
10
23

24
11 Scenario: Combining Key Presses on the Display
12 When I press 1
13 And I press 2
15
16 @addition
17 Scenario: Basic Addition
18 When I key
19 | 1 |
20 | + |
21 | 2 |
22 | = |
24
25 @addition
26 Scenario Outline: Addition examples
27 When I press <left>
28 And I press +
29 Then the display should show <left>
30 When I press <right>
31 But I press =
32 Then the display should show <result>
33 Examples:
34 | left | right | result |
35 | 2 | 3 | 5 |
36 | 3 | 5 | 8 |
37 | 5 | 8 | 13 |
The description of the feature should serve as documentation of the intended behavior of
the software, and also as a meta test-assertion which can be run, and its results reported
upon and analyzed to verify the software behaves as expected.
The mechanism by which this is achieved – through the lens of the meta assertion model
— forms the basis of this chapter.
3.2 Organization of Assertions
3.2.1 Steps
A clue to where the test assertions live in the presented feature description is given by
the presence of the word “should”. The lines beginning with Given, When, and Then1
are
called Steps and are mapped by RCucumber to tester-defined code blocks called step
definitions.
A mapping for Then the display should be blank might be:
Then(’the display should be blank’) do
expect( @calc.display ).to eql(’’)
end
1
but also And and But, which are stand-ins for the conjunction starting the previous line

25
which uses RSpec’s test assertion expect to build a test assertion.
A step definition is a block of code that can include 0 or more test assertions created with
a host testing library, with the addition of a parameterizable (see Section 3.3.1) lookup
key. A step is a line of text — or descriptive identifier — that can be mapped to that
step definition. A step then, with its accompanying step definition forms the most basic
meta test-assertion in a Cucumber test-suite. Then conjunction used (Given/When/Then)
is only used as a key to the lookup process itself — no semantic value is conveyed by it.
Given that the block referenced is written in Ruby, an imperative language with side-
effects, test assertions in the block are generally run in the order they’re written in, and
may mutate global state — this mutative property is examine in more detail in Section
2.3.1.
Ruby has exceptional test assertions, and RCucumber makes no effort to check the host-
ing test-library for evidence of positively run test assertions. This implies that (and is
implemented such that) an empty step definition, or one containing code but no assertions
is considered a pass.
In fact, if the above mapping is rewritten to omit the test assertion:
Then(’the display should be blank’) do
end
then Cucumber’s output remains identical, and the test results summarized as:
6 scenarios (6 undefined)
37 steps (25 undefined, 12 passed)
0m0.029s
This is subtly different from the case where a step definition is simply undefined. On
encountering a step that can’t be mapped to a step definition, Cucumber registers a
named type of failure, TODO, distinguished from regular failures only via presentation
(and thus forming part of the Diagnostics in the model).
One final point: as Ruby’s test assertions are exceptional rather than reporting based,
any subsequent test assertions inside a step definition after a failing one aren’t run, and
are ignored. This is also true of test assertions which raise runtime exceptions. The
step definition as a meta-assertion then exhibits failure and exception behavior can be
modeled as:
step = Sequence Configuration {
Mapping a step to a step definition is slightly more involved, and examined in Section
3.3.
3.2.2 Scenarios
The next level up in the meta test-assertion hierarchy are Scenarios, such as those defined
on lines 7, 11, and 17. Scenarios have names, and are a meta test-assertion over steps.
If a step inside a scenario fails, sibling steps will not be executed — they are marked
by Cucumber as skipped, another type of failure distinguished only be presentation. A

26
scenario with no steps is considered to be passing. Therefore the definition is the same
as for step:
scenario = Sequence Configuration {
Scenarios can be templated and parameterized and Background on line 3 of the example
above closely resembles a Scenario — these issues are all addressed in Section 3.3.
3.2.3 Features
Scenarios are combined into a file that defines a single feature. When a Feature contains
either a failing or an exception raising Scenario it continues to run its other Scenarios.
An empty feature is counted and reported-upon as a pass, so:
feature = Sequence Configuration {
Features exist individually as single .feature files on the file-system, generally in a hi-
erarchy under a single directory. Running cucumber on an entirely empty directory will
complain that certain helper directories its expecting are missing, but as long as those
are there, then a directory simply without any .feature files is considered to pass:
directory = Sequence Configuration {
3.2.4 Tags
The example included also has a tag on lines 16 and 25: @addition. Tags are annotations
on scenarios and features that allow them to be filtered. For example, a developer may
have a directory full of features to implement, but only be interested in running and
reading reports on the pass or fail state of those she knows to be under active development.
Those features and scenarios can be annotated as — for example — work in progress
(@wip)2
, and RCucumber asked to run just those.
In the model detailed so far, these tags form part of the test assertion and meta test-
assertion identifiers. These tags are used to transform one meta test-assertion into one
with fewer enclosed meta test-assertions3
.
To perform this transformation based on a selection of tags desirable or undesirable
requires a function to take a description of that selection, a meta test-assertion, and can
return a value describing if it should be kept:
type TagSpec = Identifier -> Bool
which can be fed into a filtering function:
2
@wip itself has no special meaning to RCucumber, but has widespread conventional use in the Cu-
cumber community as the primary tag for toggling whether a test should be run
3
Data.Witherable would seem to be the closest Haskell description of this operation generally:
https://hackage.haskell.org/package/witherable

27
mtaIdentifier :: MetaTestAssertion e e1 -> Identifier
mtaIdentifier (Single (TestAssertion i _)) = i
mtaIdentifier (Sequence _ i _) = i
select :: TagSpec -> MetaTestAssertion e e1 -> MetaTestAssertion e e1
select _ s@(Single t) = s
select _ s@(Sequence _ _ []) = s
select t (Sequence c i xs) =
Sequence c i $ filter (t . mtaIdentifier) xs
3.3 Test Data and Fixtures
RCucumber supports and explicitly encourages (Wynne and Hellesøy 2012) re-use of step
definitions: the steps on lines 8, 12, and 13 all map to the same step definition. They are
not, however, necessarily the same step as their identifier may be altered via annotations
such as tags, and other fixtures described in this section.
These data-providing annotations are explicitly part of the identifier for a test, and thus
the identifiers described so far also constitute a type of fixture, of which the descriptive
name for a test forms part. An assertion with provided fixture data can thus be:
type AssertionWithFixture e e1 = Identifier -> e -> ResultEnv e e1
which can be initialized with the descriptive name and the rest of the fixture-constituting
identity:
initialize :: AssertionWithFixture e e1 -> Description ->
Identifier -> TestAssertion e e1
initialize a n f = TestAssertion i (a f)
where i = addDescription f n
3.3.1 Parameterizable Step Definitions
The most fundamental mechanism for providing data to test assertions run by Cucumber
is in steps that target parameterizable step definitions via regular expressions.
The sample feature (in Section 3.1) has steps:
and
which both target the same step definition:
Then(/^the display should show (d+)$/) do |number|
expect( @calc.display ).to eql(number)
end
Although the matching and dispatch of the step definition occurs at compile time, the
definition of the step has occurred in the static feature description, so the data is compile-
time, making it part of the fixture-encapsulating identifier.

28
In the model, step definitions that receive this data are then of type AssertionWithFixture,
and are turned into TestAssertions by the code that performs the lookup and matching:
match :: Regex -> AssertionWithFixture e e1 -> TestAssertion e e1
lookup :: (Monoid e1) => Regex -> [AssertionWithFixture e e1]
-> TestAssertion e e1
lookup r as = first $ fmap (match r) as
where first [] = notFoundAssertion
first ms = head ms
3.3.2 Step Data
Steps can also have structured data associated with them, as per line 18. This data can
either be an array, a hash, or a block of multi-line text — the example on line 18 shows
the array form. Associated step data is passed as the last argument to a step definition
being run:
When (”I key”) do |data_table|
# ... do something with the data_table object
end
This fixture data is fixed at compile time (although the copy passed to the step definition
is mutable inside the step definition only), and it’s the responsibility of the feature parser
to ensure it is placed into the identifier.
3.3.3 Outlines
Line 26 contains a Scenario Outline, which bears a strong resemblance to a normal
scenario, only with arrow-bracket placeholders and a list of Examples at the end.
The pipe-delimited table in the Examples section is parsed, and the scenario is then
repeated for each data row in the table — the scenario shown becomes three scenarios,
run sequentially, and with the placeholders replaced with the data in the appropriate
column.
The data provided becomes both part of the step identifier, and it’s name — it produces
scenarios equivalent to having simply repeated the Scenario Outline three times with the
data from each row.
3.3.4 Background Sections as Transformers
The final mechanism to cover in terms of creating and organizing test data are Background
scenarios, as per line 3.
A single Background section is allowed per Feature, and it describes steps that must be
run at the beginning of every scenario, similar to xUnit’s concept of a setup method.

29
3.4 Reporting
RCucumber’s output uses color very effectively to illustrate test progression and result
statuses. However, it also has other formatters for a number of other formats, such as
JSON.
Figure 3.1: Cucumber Colorized Output
An output format in RCucumber is a string, but in the model can be any type. To
model the generation of the output, a Report wraps the current state of output and the
ResultEnv:
data Report t e e1 = Report t (ResultEnv e e1)
Simple functor-like helpers can be implemented to make dealing with the enclosed
ResultEnv easier:
reportResultMap :: (ResultEnv e e1 -> ResultEnv e2 e3)
-> Report t e e1 -> Report t e2 e3
reportResultMap g (Report t f) = Report t (g f)
runReportAssertion :: (a -> ResultEnv e e1) ->
Report t e a -> Report t e e1
runReportAssertion a = reportResultMap (>>= a)
RCucumber’s reporting capabilities can be extended using a built-in formatting extension
mechanism, built around an event stream4
. Developers can provide an instantiated object
to Cucumber to use as the formatter, which should implement methods that receive
information about the meta test-assertions currently being run.
4
https://github.com/cucumber/cucumber/wiki/Custom-Formatters

30
Consider a simplified5
selection of those run for the Scenario meta test-assertion:
before_scenario
tag_name
scenario_name
before_steps
...
after_steps
after_scenario
The events split into two types — those run before and after evaluation of the feature. The
further separation into individual events beyond that is simply a convenience mechanism:
specific parts of a formatter can be extended from a default implementation — changing
how tags are rendered, for example — without the subclass also needing to re-implement
the other before events. But a more general model that simply had before and after events
would be able to implement the events given if desired. An even more general model that
offloaded meta test-assertion discrimination to the formatter itself would also work.
The model generalizes this formatting extension concept even further to an Extension.
Extensions receive a Start or End Event, a MetaTestAssertion, and an incoming Report.
data Event = Start | End
type Extension t e = Event -> MetaTestAssertion e e ->
Report t e e -> Report t e e
Extensions can mutate the report on ingress or egress. Sequences of extensions are
composed, and applied to the Report:
applyExtensions :: Event -> [Extension t e] -> MetaTestAssertion e e ->
Report t e e -> Report t e e
applyExtensions event extensions metaTA = combined
where order Start = reverse extensions
order End = extensions
combined = foldr (.) id $
map (x -> x event metaTA) (order event)
and the sequence of extensions passed to whichever function is coordinating the evaluation
of meta test-assertions. A very simple extension that simply shows a meta test-assertion
is being started or ending might be:
simple :: Extension String e
simple event mta (Report t e) =
Report (t ++ ”[” ++ verb event ++ ”: ”
++ name mta ++ ”]” ) e
where verb Start = ”Starting”
verb End = ”Ending”
Producing more complicated output and an implementation of a test runner that makes
use of extensions is discussed in Appendix A.
5
before_scenario is in fact before_feature_element, and some extra tag-related events have been
removed

31
3.5 Implementation Details
3.5.1 Integrating with Test Assertion Providers
By default, step definitions in RCucumber are written to use RSpec. RSpec provides a
rich set of exception-raising test assertions.
RCucumber makes no attempt to distinguish between exceptions raised by test assertions,
and those raised more generally. The step definition evaluation code is approximately:6
def execute(*args)
# Run the step definition code with any arguments
@block.call(*args)
# Instantiate a new Core::Test::Result::Passed object
passed
# Catch any exceptions that occurred in this scope
rescue Exception => exception
# Instantiate a new Core::Test::Result::Failed object
failed(exception)
end
All Ruby testing libraries (covered) create exceptional test assertions without any com-
mon basis, which would make it very hard without a per-library adapter to inspect and
meaningfully reason about those exceptions.
At the same time, the approach of conflating exceptions raised by test assertions into
the same category as all exceptions allows for a great deal of flexibility — there is no
restriction on which testing libraries you can use with RCucumber, as long as the test
assertions are exceptional.
3.5.2 World
The recommended approach for integrating with non-RSpec test assertion providers is to
consume their methods into an instance of World.78
World is a blank class, an instance of which is instantiated before every scenario. The
step definition is then executed as if it were a method of the World class, using Ruby’s
built-in method on the base class instance_exec.
Tests that create test data at runtime are able to assign these to instance variables of
the scenario’s encapsulating World class, and thus share data between different steps in
a scenario.
6
Comments have been added. See https://goo.gl/jHjadr for the actual code on GitHub at time of
writing
7
https://github.com/cucumber/cucumber/wiki/Using-MiniTest
8
https://github.com/cucumber/cucumber/wiki/Using-Test::Unit

32
3.5.3 Running the Test Suite
RCucumber ships with an executable, cucumber, which by default will search for a
features directory with a step_definitions sub-folder, and run the .feature files con-
tained inside.
The default output is highly colorized (as per the figure in Section 3.1), and provides a
very user-friendly output format, which is the recommended way of running the test suite
during development9
.
For developers using Rake — “a make-like build utility for Ruby”10
— to test and build
their projects, an integration is provided. This allows for easy running of the RCucumber-
based tests along side tests written with other tools.
9
https://github.com/cucumber/cucumber/wiki/Using-Rake
10
https://github.com/ruby/rake

4 Implementing Perl’s Test::BDD::Cucumber
The previous chapter examined RCucumber through the lens of the test-assertion model,
and adding to the model as new concepts were encountered. RCucumber’s method for
integrating into Ruby’s testing libraries and more generally into Ruby projects was shown,
the meta test-assertions of steps, scenarios, and features were detailed, creation and
organization of test data and fixtures were explained in terms of the model, and collation
and reporting of results was described.
This chapter looks at the implementation details of the Perl implementation of Cucumber,
Test::BDD::Cucumber, and describes the reasoning in developing it, as well as relating it
back to the behavior of RCucumber and the test-assertion model.
4.1 An Exculpatory Note on the Code Ahead
Test::BDD::Cucumber is a relatively large code-base, with many parts written in a style
that Perl developers would describe as “high magic” — code that makes use of some of
Perl’s more powerful (and unusual) features. It exists primarily as a tool for professional
developers to use in their day-to-day work, and as such many aspects of the code-base
have been written with practicality as their primary concern. It also represents a work
that has changed over time, and vestigial stubs of previously implemented behaviors can
still be observed.
The code included in this chapter consists of snippets of simplified, and re-ordered code,
and an effort has been made to it so as to try and make it as accessible as possible to
non-Perl programmers, while still illustrating the most important concepts. The entire
code-base in its original form is available on GitHub1
and on the CPAN2
.
4.2 Step Definitions with Test::Builder
4.2.1 Why Integrate with Test::Builder?
As with virtually every Perl testing library, Test::BDD::Cucumber builds on top of
Test::Builder. Test::Builder provides a singleton to allow testing libraries a unified in-
terface to reporting the results of individual test assertions, creating meta test-assertions,
and a standard test harness interface which outputs TAP by default.
A developer experienced writing tests with Perl will probably have a good feel for how
Test::Builder-based tests will work and some favorite associated testing libraries that
1
https://github.com/pjlsergeant/test-bdd-cucumber-perl
2
https://metacpan.org/pod/Test::BDD::Cucumber
33

34
they like using — for example, Test::WWW::Mechanize for tests which interact with web
servers.
To leverage both developers’ existing experience and knowledge, and also the very wide
range of testing libraries available on CPAN, integrating as tightly as practicable with
Test::Builder was desirable.
However, Cucumber also has its own conventions to be followed: an existing set of meta
test-assertions, syntax-highlighted output for developers to quickly see the status of their
test implementations, and a JSON output understood by several tools.
Choices needed to be made to concerning how much to compromise between providing a
familiar and easily integrated environment for Perl developers and how closely to hew to
the original RCucumber.
4.2.2 A Meta Test-Assertion for Step Definitions
Subtests
Test::Builder has a generic subtest meta test-assertion which supports arbitrarily deep
trees of meta test-assertions. Subtests are introduced by passing a code-reference to
Test::Builder->subtest:
$Test->subtest( ”Parent”, sub { ok( 1, ”Passing assertion” ) } );
which by itself will produce:
# Subtest: Parent
ok 1 - Passing assertion
1..1
ok 1 - Parent
1..1
This may appear initially to be a good candidate on which to have built the Cucumber
step-definition meta test-assertion. However, subtests have a fixed configuration, which
matches that of a test script written in Perl containing a list of test assertions:
subtest = Sequence Configuration {
assertionType = Reportable,
emptyBehavior = Fails }
This differs from Cucumber’s step definitions by treating empty blocks as failures, and
by the enclosed test assertions being reportable and not exceptional.
Empty blocks as failures is simply not a compatible concept with the Cucumber model.
Code in step definitions is used to set up fixtures and test data with no requirement for
test assertions to be run. Some way of changing this behavior was required.
Test::Builder also outputs the result of reportable test assertions to STDOUT — as well as
any diagnostics — as it encounters them, by default. Should a colorized output similar to
RCucumber’s be required, this needs instead to be captured and passed to whatever code
is making formatting and reporting decisions. Subtests provide no inbuilt mechanism for
doing this.
Finally, subtests don’t provide any mechanism for either catching exceptions, or for
converting captured exceptions into failures. Subtests then were considered not to

35
be the solution without significant changes to their behavior in the development of
Test::BDD::Cucumber.
An Entirely New Testing Context
The Test::Builder documentation says:
you only run one test per program [which] shares such global information as
the test counter and where test output is going
This itself is suggestive that the Test::Builder singleton itself is a meta test-assertion. It
has predefined behavior for no tests having been run (failure), and operates over test
assertions implemented using its test methods.
Running each step definition against its own Test::Builder ‘singleton’ seemed to have
some benefits. “Where test output is going” is a function of the Test::Builder object,
and thus configurable. Test::Builder can be put in a passing state even when empty by
preceding any other test assertions with a call to its pass() method. The Test::Builder
singleton stores information about how many tests passed, failed, or were skipped, and
so the result of a single step definition can be queried can reported on programmatically.
This didn’t solve the issue with test assertions being reportable and not exceptional. But
that didn’t appear to be a key feature of Cucumber, and so the trade-off was made to
keep that in the Perl style of being reportable — this also meant that no extra magic
was required to change the behavior of Test::Builder-based libraries, which helped keep
the implementation simple.
All that was required was finding a way to spoof the Test::Builder singleton to a clean
instance before each step definition was run.
Spoofing the Test::Builder singleton
Creating a new clean Test::Builder object to use as the singleton was simple —
Test::Builder provides a method for just this:
1 my $proxy_object = Test::Builder->create();
Test::Builder also provides reports in TAPs of test assertion executions. By default these
are written to STDOUT, but can be intercepted:
2 my $output;
3 $proxy_object->output( $output ); # Equivalent to ‘Report‘
4 $proxy_object->failure_output( $output ); # Equivalent to ‘Diagnostics‘
This Test::Builder instance also needs to pass by default, so a test assertion that’s guar-
anteed to pass is run first:
5 $proxy_object->ok( 1,
6 ”Starting to execute step: ” . $step_text );
Well-implemented libraries that integrate with Test::Builder will retrieve the single-
ton when needed by calling Test::Builder->new — a subroutine called new in the
Test::Builder name-space. Perl provides a keyword local that essentially performs
variable substitution inside a lexical scope, and all child lexical scopes.

36
By pointing the Test::Builder::new3
at a subroutine that instead returns the proxy
object, the aim of diverting calls to the Test::Builder singleton to the one used to
collect results from the step definition was achieved:
7 local *Test::Builder::new = sub { return $proxy_object };
The step definition could now be run:
8 # ‘eval‘ will catch any exceptions, and place their value in $@
9 eval { $step_definition->($context) };
and any exceptions caught turned into failures against the Test::Builder object with a
diagnostic description:
10 if ($@) {
11 # Fail a test assertion called ”Test compiled”
12 $proxy_object->ok( 0, ”Test compiled” );
13
14 # Add the exception details as diagnostics
15 $proxy_object->diag($@);
16 }
The result of the step definition can then be examined and encapsulated in order to pass
them on to the scenario meta test-assertion:
17 my $test_builder_status = $proxy_object->history;
18 my $cucumber_step_status =
19 $test_builder_status->test_was_successful ?
20 ( $history->todo_count ? ’pending’ : ’passing’ ) :
21 ’failing’;
22
23 $result = Test::BDD::Cucumber::Model::Result->new({
24 result => $cucumber_step_status,
25 output => $output
26 });
4.3 Handling Results
4.3.1 What’s Needed
As foreshadowed by the appearance of a Result Model class in the preceding code section,
passing the results of steps to enclosing meta test-assertions required some considera-
tion. For every meta test-assertion considered in this dissertation (and in the model: see
Chapter 3), the failure of a constituent test assertion or meta test-assertion causes the
meta test-assertion to be marked as a failure. However, the status of preceding meta
test-assertions can also determine whether or not a meta test-assertion gets evaluated in
the first place.
Test::BDD::Cucumber needed a way of emulating the foldable results from meta test-
assertions described in the model, and allowing results from one meta test-assertion to
affect whether or not other meta test-assertions are run at all.
3
The -> is a Perl-ism that passes the name-space to the left of it into the function (in that name-space)
on the right of it, and is one of the quirks of Perl’s “bolted-on” object orientation implementation

37
4.3.2 Foldable Results
In the model, a Result is a set of Pass and Fail Diagnostics. Assuming that
Diagnostics describes a set with both an identity element and the ability to append
items from that set, then Result result can also be a set with an identity element
Pass (defResult) and an appending function (addResult). More simply, assuming
Diagnostics is a monoid, then so is Result (these definitions are covered in Section
2.3.2).
Test::BDD::Cucumber::Model::Result attempts to model Result in Perl. It begins with
a list of possible states that a result can hold:
enum ’StepStatus’, [qw( passing failing pending undefined )];
This contains two new statuses not seen yet: pending (equivalent to skipped) and
undefined (equivalent to TODO). These are types of failure with extra meta-information
added to them, which a sufficiently advanced Diagnostics data type could handle.
Diagnostics themselves are collated in an output string attribute.
The class contains a constructor to allow it to be instantiated from a sequence of existing
result objects:
1 sub from_children {
2 my $class = shift; # Perl OO boilerplate
3 my ( @children ) = @_; # The results from which to build this one
A list of statuses is built up in %results, which starts with a default passing state, and
a string in which to concatenate the diagnostics is declared:
4 my %results = ( passing => 1 );
5 my $output;
The result of each child result is added to %results, and any diagnostics it contains are
appended:
6 for my $child (@children) {
7 $results{ $child->result }++;
8 $output .= $child->output . ”n”;
9 }
10 $output .= ”n”;
Finally, the result types are evaluated in terms of precedence — the presence of a result
of that type causes the new class to be instantiated with that result:
11 for my $status (qw( failing undefined pending passing )) {
12 if ( $results{$status} ) {
13 return $class->new(
14 {
15 result => $status,
16 output => $output
17 }
18 );
19 }
20 }

38
4.3.3 Control Flow
The results of test assertions and meta test-assertions don’t simply affect the overall result,
they also affect whether or not other test assertions and meta test-assertions are run.
Given a result with an environment Result e e1and a new assertion which also produces
a result (e -> Result e e1), whether or not to continue depends on if the environment
encapsulated in that result is in a normal state (Right e1) or an exceptional state (Left
e). The monad that implements this has already been covered in Section 2.3.2.
Feature directories and features both share the same configuration for sequentially eval-
uating meta test-assertions:
c = Configuration {
and this configuration is used to ensure that the result of evaluating a feature or scenario
can’t leave the result in an exceptional state (Left e) via transmute (see: Section 2.3.2).
As a result, Test::BDD::Cucumber doesn’t need to implement any special control flow
logic for these, simply the foldable result model already covered.
However step definitions failures are treated as exceptional, so in running a scenario, step
definitions occurring after a failing one must not be run. A real Perl exception would be
tricky to handle as reporting events should still be generated, simply the assertion should
not be run.
While running a scenario, Test::BDD::Cucumber keeps track of if the scenario has failed
using a boolean attribute on the scenario runtime called short_circuit. This is set
having examined the result:
# If it didn’t pass, short-circuit the rest
unless ( $result->result eq ’passing’ ) {
$outline_stash->{’short_circuit’} = 1;
}
and is used to set the result of a step definition without running it in the step dispatch
code:
# Short-circuit rather than running if needs be
return $self->skip_step( $context, ’pending’,
”Short-circuited from previous tests”, 0 )
if $short_circuit;
4.4 Data Provision, Fixtures, and the Environment
Cucumber uses a World object to allow step definitions in a scenario to share data and
certain fixtures. “Compile-time” fixture data previously described as residing in the
identifier is passed in as arguments to that step definition. No method for introspecting
the step definition itself (or any of the enclosing meta test-assertion hierarchy) is provided.

39
4.4.1 Test::BDD::Cucumber::StepContext
Test::BDD::Cucumber takes a slightly different approach: “data made available to step
definitions”(Wynne and Hellesøy 2012) is provided via a Test::BDD::Cucumber::StepContext
object, passed as the first argument to a step definition.
The step context firstly contains references to allow for introspection. A link to the ob-
ject that defines the step definition is available via step(), to the object that defined the
enclosing scenario as scenario(), and to the object that defined the feature as feature()
— this also allows access to the tags defined for the scenario and feature. The conjunc-
tion used to declare the step (Given/When/Then) is available via the misleadingly named
verb(), and the rest of the step line as text().
Access to identity information is available via data() for data tables, and matches for
data that was extracted from the step text as part of the step-definition lookup process.
4.4.2 The Stash
The context also provides access to a hash via stash(). This hash is meant to serve a
similar purpose to Cucumber’s World object — a place to store data created during the
test run.
The stash itself is a hash containing two hashes — feature and scenario:
$stash = {
feature => {},
scenario => {},
};
The scenario is reset at the start of every scenario, making it appropriate for storing
data created during a scenario run, where feature is reset at the beginning of every new
feature — should a feature require data to be persisted between scenarios (such as a
computationally expensive fixture) it can be stored here.
4.4.3 Easy Access
To prevent a developer from needing to write the relatively verbose:
sub {
# Read the context from the first argument
my $context = $_[0];
my $stash = $context->stash;
my $value = $stash->{’scenario’}->{’foo’};
to access stash variables, step definitions are able to access two globally defined methods in
the Test::BDD::Cucumber::StepFile class: S and C. Before the step definition is executed,
the definitions of these methods are changed to provide access to the stash and the step
context respectively:
local *Test::BDD::Cucumber::StepFile::S = sub {
return $context->stash->{’scenario’};

40
};
local *Test::BDD::Cucumber::StepFile::C = sub {
return $context;
};
allowing for a far more concise access method:
sub {
my $value = S->{’scenario’}->{’foo’};
Finally, Perl developers are used to accessing regular expression matches using the Perl
special variables $1 to $n. Rather than access matches via:
my $first_match = C->matches->[0]
the regular expression is re-matched again against the step text immediately before the
step definition is executed so that a more natural style of:
my $first_match = $1;
can be used.
4.5 Output Harnesses
Test::BDD::Cucumber calls its formatting mechanisms harnesses, on the basis that the
output will be consumed by a testing harness. An abstract implementation is provided
which names methods for each meta test-assertion (feature, feature_done, etc).
While the model provides a very generic formatter, Test::BDD::Cucumber takes a similar
approach to RCucumber itself by naming each of the formatting events separately. Again,
this allows a developer extending a formatter to not need to re-implement formatting for
every event simply to re-implement formatting for one.
Additionally, where the model has a single data type to model the meta test-identifier
and associated fixture data, Test::BDD::Cucumber provides slightly different objects to
model each. Test::BDD::Cucumber::Model::Feature for example contains information
about the document that contains the feature, but no place to hold step data, and
Test::BDD::Cucumber::Model::Step is unable to hold tag-related data as steps don’t
support tags in the Cucumber model.
4.5.1 Test::BDD::Cucumber::Harness::TestBuilder
Both the most interesting and most important harness is the one for exercising a
Test::Builder singleton.
Beyond executing the step definitions, the Test::Builder harness is the only part of the
code-base that knows about Test::Builder — the code between the step definitions and
this harness is completely agnostic about Test::Builder. There’s no linkage at all between
the step definitions’ usage of the class to evaluate the status of a step definition and the
use of it to communicate with the outside world:
As explored in Section 4.2.2, a new instance of Test::Builder is instantiated for every step
definition, its state after execution of the step definition examined, and then it’s simply
discarded. The harness communicates with the “real” Test::Builder singleton — if there
is one — to communicate its results.

41
The description of TAP as outputting solely to STDOUT (see: Section 2.1.1) was a sim-
plification. It supports diagnostics both via note(), which outputs #-prefixed diagnostics
to STDOUT, and diag(), which outputs #-prefixed diagnostics to STDERR. note() is
intended for diagnostics information useful for a developer performing debugging, where
diag() is intended for diagnostic information relating to unexpected or undesired test
execution — for example further information on test failure. prove, Test::Builder’s test
runner, outputs only the latter by default, but will output both in verbose mode.
Scenario and feature names are recorded using note() — #-prefixed diagnostics to STD-
OUT — and so their names are only printed in verbose mode:
# Scenario: MD5 longer data
A passing step is marked as such to Test::Builder using its pass() method: pass($step_name);.
This causes Test::Builder to record the test assertion has passed. Descriptions of pass-
ing tests are also suppressed by prove unless in verbose mode — in verbose mode,
Test::BDD::Cucumber presents them with their step text:
ok 142 - Given a usable ”Digest” class
ok 143 - Given a Digest MD5 object
ok 144 - When I’ve added ”foo bar baz” to the object
A failing step is recorded as such using the fail() method, fail($step_name);. Failing
test assertions are always output by prove, along with the location of the assertion:
not ok 145 - Then the hex output is
”75ad9f578e43b863590fae52d5d19ce6”
# Failed test ’ Then the hex output is
”75ad9f578e43b863590fae52d5d19ce6”’
# at TestBuilder.pm line 87.
# in step at examples/tagged-digest/features/basic.feature line 39.
However, as the step was evaluated using the per-step Test::Builder object, their out-
put is also available, and this is then marked as a diagnostic to Test::Builder: diag(
$result->output );. This causes its transcript to always be output by prove:
This transcript starts with the always-passing test assertion that allows for correct be-
havior on step definitions without assertions:
# ok 1 - Starting to execute step: the hex output is
”75ad9f578e43b863590fae52d5d19ce6”
before continuing to show the failing test assertion, along with its own failure diagnostics:
# not ok 2
#
# # Failed test at digest/features/step_definitions/basic_steps.pl line 34.
# # got: ’75ad9f578e43b863590fae52d5d19ce6ZZZ’
# # expected: ’75ad9f578e43b863590fae52d5d19ce6’
# 1..2
The TAP output of the step definition’s Test::Builder object is thus folded into the output
of the “real” Test::Builder object. This folding allows the whole test suite written with
Cucumber to be executed and evaluated by prove or other TAP-compatible test runners,
and recapturing the benefits used by communicating test status with TAP (see: Section
2.1.1), such as easy conversion to jUnit XML format.

5 Reflection
5.1 Comparing Perl, Python and Ruby
Discounting the time spent in previous development of Test::BDD::Cucumber, the amount
of time spent on examining and understanding the testing ecosystems of each of the
three languages dwarfed time spent on other parts of the dissertation — good existing
comparisons seemed not to exist, and searching for them now brings up early drafts of
the first chapter released as blog posts as the top result.
Anecdotally, making this more difficult was that the majority of developers for any given
language apparently don’t understand the inner workings of their testing ecosystems.
Turning to former colleagues and friends with considerable experience and insight into
their respective languages of choice often revealed an attitude of simply fitting together
their test suites from tools without consideration to how they worked.
Having spent many years working almost exclusively with Perl, the apparent lack of
reporting test assertions in Python and Ruby was surprising, but made more sense with
further understanding of the xUnit model of treating a small named block around test
assertions as being the most granular reportable unit of tests. Experience suggest that a
weakness in many Perl test suites encountered is their being written purely sequentially,
as long meandering lists of unstructured assertions.
The extra discipline of wrapping these into test methods necessitated by exception-raising
test assertions would seem attractive, and experimentation of implementing exceptional
test assertions in Perl an interesting follow-up experiment.
Perl’s unification around TAP as a reporting method still appears very useful. The ability
to express how many tests one expects to be run will help to catch testing code that isn’t
run, and would help make up for a lack of a positive assertion path in libraries with
exceptional test assertions.
No other work on generalized hierarchies of test assertions being folded into multiple
levels of meta test-assertions was discovered while writing this dissertation. The ability
to map the test assertion groupings provided by many different testing approaches to
configurations describing the expected behavior of their included assertions seems useful,
and it would be very interesting to perform an exhaustive search of the literature to find
other descriptions of this concept. Inexperience with research may well have played a
factor.
Finding consistent names for testing concepts also proved challenging. The starting
assumption was that simple, concise, and unambiguous descriptions of “test assertions”
and “test harnesses” would be forthcoming. It was surprising to find that the usage of
these terms appears to have grown organically with no formalization.
42

43
Hamill (Hamill 2004) — for example — uses the term “Test Asserts” exactly twice, the
fuller of the two descriptions being:
Test conditions are checked with test assert methods. Test asserts result in
test success or failure. If the assert fails, the test method returns immediately.
and “test harness” is mentioned a single time with the term remaining undefined. The
Cucumber Book (Wynne and Hellesøy 2012) simply refers to “assertions” without defini-
tion. Several hours spent searching for papers to define the terms well and using Google
Books to find solid definitions in text books was also unproductive — an outcome that the
obscure references on Wikipedia for the terms might have predicted. Again, inexperience
with research rather than a lack of literature may have been the critical factor.
That said, a citeable reference of how the terms came about and how they’re used in
various ways would be very useful. The more general “Historical Perspective on Runtime
Assertion Checking in Software Development”(Clarke and Rosenblum 2006) was both
fascinating and highly informative, and writing a research piece in the same vein looking
at modern test terminology and the different concepts that various terms refer to would
be an interesting avenue for further work on this topic.
5.2 The Model
5.2.1 The Choice of Haskell
The choice to develop the model in Haskell rather than a formal specification language
like Z-Notation was made on the basis of comparative familiarity with Haskell over formal
specification languages.
However its utility and comparative flexibility have exceeded expectations, and also what
would have been possible with Z-Notation — indeed it’s used to build a very simple
Haskell testing library in Appendix A whose complexity has been primarily limited by
time available and perhaps lack of experience developing in Haskell, rather than with any
perceived inherent limitation of the model.
A prelude exists for the model, which isn’t included. This defines types whose definition
was intended to be ambiguous, for fear of being tied to any particular design choices. For
example, the identifier type in this prelude exists as:
type Identifier = ((),String,())
which has all the monadic properties of a simple String type, but also the inability to be
used directly as such without wrapping functions to access the simple test name inside
it:
name (Single (TestAssertion ((),i,()) _)) = i
name (Sequence _ ((),i,()) _) = i
This property stopped it from being used as — or being treated in the model as — simple
a name or a literal string and thus retaining the ability to describe it as a combination
of a name and compile-time fixtures. As a result, the model — despite being able to be
used as the basis for actual software — could perhaps still be translated to Z-Notation
without needing to pull in definitions of many more general Haskell data types.

44
5.2.2 Generating Formatted Reports
5.2.3 The Extension Model
The extension model shown in Section 3.4 is significantly more powerful than that of
Test::BDD::Cucumber. Test::BDD::Cucumber currently has two separate event models
— one specifically for output harnesses that produce formatted transcripts, and a more
general one for events to mutate data — setting up fixtures for example.
The initial version of the Haskell model was written in the same style — a distinct
formatter type which could only affect the Report transcript. While attempting to utilize
that type, it became clear that a more general model was available. The extension
system implementation allows easy re-dispatch from a formatter to a second formatter.
The TestBuilder type described in Section A.3 and exercised via the extension model
could be easily sequenced in front of another formatter that wished to use the data model
but output — for example — TAP embedded in syntax-highlighted HTML code.
Where the transmute function seems sufficiently fundamental to sequencing tests that
it’s hard-coded into the test runner, it too could be easily recreated as an extension. This
provides a useful mechanism for anyone who wished to use the model to build a testing
library that wished to embed their own test sequencing logic.
The ability to wrap a core software function with an ingress and egress filter would seem
to be an example of the middleware pattern1
.
5.2.4 Further Work
Meta test-assertion configurations have been built to cover those provided by three real
testing libraries: xUnit style, Test::Builder style, and Cucumber style. The flexibility of
the configuration approach may allow this to be true for most commonly used assertion-
based test libraries, although attempting to write a more complicated and complete
testing library in Haskell using the model might well reveal weakness to be addressed in
the model.
A more thorough understanding of monads and monad transformers may have led to a
slightly different model being produced. Quite how the test assertions described could
be extended or embedded to allow access to IO and the use of some existing monads —
such as the State Monad, a contender for modeling the environment — is unclear to this
author, but presumably quite straight-forward given sufficient time to understand them.
Overall Haskell appears to have been an excellent choice for the model, especially as
implementations to prove the model was working were achievable, rather than resorting
to simply the type checking Z-Notation could have provided.
5.3 Test::BDD::Cucumber
5.3.1 A Brief History
Test:BDD::Cucumber was originally written not for any business need, but because at-
tempting to integrate it with the Test::Builder model seemed like an interesting challenge.
1
https://en.wikipedia.org/wiki/Middleware

45
Having written an original proof of concept in early July 2011, further development
seemed rewarding, and a first initial release was made in September 2011.
Development of the core, and the majority of the further work has been performed by
the author. Release rights were briefly shared with another developer who applied some
outstanding patches and released a version to the CPAN. Major contributions from other
developers include the localization system for allowing features to be written in non-
English languages, an extension system for allowing hooks to be run on certain processing
events, and a JSON output formatter.
Test::BDD::Cucumber has a variety of users. Seagate’s Portsmouth office use it to orga-
nize quality control tests on hard drives, the anonymous operating system Tails2
use it
for some testing and pushed for changes so that it could be distributed with in Debian’s
package system, and LedgerSMB, a large and complicated open-source ERP system use
features both to serve as documentation of their legacy system and for organizing Sele-
nium tests. It is included in the packaging systems of many Unix-based systems, and it
has been the subject of a Linux Magazine article3
.
5.3.2 Further Work Planned
Parser Improvements
Test::BDD::Cucumber still uses a simple and inelegant parser implemented using regular
expression, which has been organically developed over time. Where it can parse all the
feature files in the RCucumber testing suite, it also will parse several incorrect constructs
in feature files, which has led to its own test suite for example implementing invalid
Gherkin tests.
The general suite of Cucumber implementations contains its own grammar for a parser
(also developed by the Cucumber team) called Berp, which outputs parsers for Gherkin
written in a variety of languages. I have completed initial work (which has subsequently
been accepted by the Cucumber team) to automatically generate a Perl-based parser,
and this Gherkin parser has also been released to the CPAN.4
Further work is required to unify the AST output by the Perl Gherkin parser with the data
model used by Test::BDD::Cucumber, and thought needed on migrating users who’ve
written features compatible with Test::BDD::Cucumber but disallowed by the official
parser. However, the benefits of using the official parsing grammar to make the project as
close to RCucumber as possible for users coming to Test::BDD::Cucumber from Cucumber
implementations in other languages may be real enough that this work is planned.
Unification of Formatters and Extensions
While writing this dissertation and discovering that formatters were simply a type of
extension, the desire to unify Test::BDD::Cucumber’s formatting and extensions system
has developed.
Discussion with the contributor of the existing extension system at a recent technical
conference has shown that he too is very keen for this unification, and that it would solve
2
https://tails.boum.org/
3
http://www.linux-magazine.com/Issues/2014/161/Perl-Cucumber
4
https://metacpan.org/release/Gherkin

46
business needs he has, so this work is likely to be prioritized as a joint development effort.
5.3.3 Reflections on the Development Process
The original development occurred both as an individual effort, and without any reference
to existing Cucumber implementations. Early versions often worked significantly differ-
ently from RCucumber, but the behavior has been changed to be closer to RCucumber
over time.
There may have been benefits to this approach. The use of the stash object instead
of a World object feels closer to an idiomatic Perl approach, as does the provision of
context to a step via localized methods. Implementing these in the RCucumber way may
have produced a solution that felt less like Perl. The realization that Ruby treats the
assertions inside steps as exceptional rather than reporting may have led to an effort to
reproduce this in the Perl version, and would have probably been realized as an inelegant
and non-Perl step definition implementation.
On the other hand, breaking changes have been introduced into subsequent Test::BDD::Cucumber
as a result of having been at first insufficiently familiar with RCucumber. Background
sections were originally understood to be per-Feature setup, rather than per-Scenario,
and changing this behavior led to breaking tests for users of the library.
5.4 Summary of Work Complete
Test::BDD::Cucumber is a significant piece of software engineering work, and was devel-
oped and extended during enrollment in the MSc course, often with reference to principles
taught on the course. However, it was not specifically developed for this dissertation.
The first chapter required researching and understanding the testing ecosystems of two
unfamiliar languages, Ruby and Python. These ecosystems were unfamiliar both in their
realized states, and in the general philosophy of exceptional test assertions. The de-
velopment of the meta test-assertion paradigm and accompanying Haskell model was
performed solely for this dissertation.
The model was then expanded during research of the also largely unfamiliar RCucumber
code-base. Concepts such as the World object, Ruby’s use of Rake as a build tool, and the
very mechanism by which RCucumber provides data to step definitions were unknown,
and had to be researched to write the appropriate chapter. The model as presented was
built (and rebuilt many times) from scratch as an attempt was made to understand a
generalized concept of both test suites and Cucumber’s working.
The understanding that Test::BDD::Cucumber’s formatters are specific applications of
the extension model, and constituted middleware arose during the writing of this dis-
sertation, and led to useful discussion with other developers on Test::BDD::Cucumber
that will hopefully lead to a rich seam of improvements for the Test::BDD::Cucumber
code-base. The creation of a Perl back-end to Cucumber’s Berp parser occurred during
the two-year period set aside for this dissertation.
Finally, the Haskell model also serves as a potential basis for testing library implementa-
tions in the future. An example is given in the immediately following Appendix.

A A Simple Haskell Testing Library
The model created for this dissertation went somewhat further than expected in terms
of functionality, and in testing the model, a nascent Haskell testing library was built.
This appendix shows how the model built can be used to build a simple TAP-outputting
testing library in Haskell
A.1 Completing the Model
A.1.1 A Monadic ResultEnv
The instance of ResultEnv e as a monad has already been shown in Section 2.3.2. How-
ever, this relies on ResultEnv e having been declared an instance of Functor and Applica-
tive.
ResultEnv e e1 encapsulate a result with an exceptional (e) or normal (e1) environment,
and this environment can have functions conditionally applied to it (if it’s a normal value)
via fmap if ResultEnv e is a functor:
instance Functor (ResultEnv e) where
fmap g (ResultEnv r (Left s)) = ResultEnv r (Left s)
fmap g (ResultEnv r (Right s)) = ResultEnv r (Right (g s))
Declaring it as an instance of Applicative allows for encapsulated normal values to be
passed into a function that accepts several arguments:
instance Applicative (ResultEnv e) where
pure e = ResultEnv Pass (Right e)
ResultEnv r (Left x) <*> _ = ResultEnv r (Left x)
ResultEnv r (Right x) <*> ResultEnv r’ o = ResultEnv r’ (fmap x o)
The monadic definition allows predicates to be combined using >>= for example:
result = startResultEnv >>= predicate1 >>= predicate2
A.1.2 The Test Harness
A meta test-assertion need to be evaluated as a whole, and as a result of any enclosed
meta test-assertions, with the result wrapped in extensions, and from a starting Report
t e. This is done using run:
run :: (Monoid e) => [Extension t e] ->
MetaTestAssertion e e -> Report t e e -> Report t e e
47

48
run extensions metaTA report =
-- Pass the incoming Report through the extensions on the way in,
-- run the assertions to get a new Report, and pass that report
-- back through the extensions on the way out
extend End $ assertionResult metaTA $ extend Start report
where extend event = applyExtensions event extensions metaTA
-- A single assertion can be run on the enclosed ResultEnv
assertionResult (Single (TestAssertion _ assertion)) =
-- Defined in Section 3.4
runReportAssertion assertion
-- An empty list of assertions must use the appropriate
-- Meta-Test-Assertion defined behavior
assertionResult (Sequence c _ []) =
run extensions (Single (TestAssertion (makeID ”Empty Assertion”)
-- Defined in Section 2.3.2
(emptyAssertion (emptyBehavior c))))
-- A list of assertions are combined and their results
-- placed into a list that is then summarized
assertionResult (Sequence c _ xs) =
summarize . combine c extensions xs
-- Run assertions sequentially, saving each result
combine c extensions [] start = [start]
combine c extensions (x:xs) start = runit : combine c extensions xs runit
where runit = transmute’ $ run extensions x start
-- reportResultMap is defined in Section 3.4
transmute’ = reportResultMap (transmute (assertionType c))
-- Concatenate the enclosed results using monoidal definition
-- for results
summarize :: (Monoid e) => [Report t e e] -> Report t e e
summarize xs = Report transcript (ResultEnv passFail e)
where passFail = mconcat $ map ((Report _ (ResultEnv i _)) -> i) xs
(Report transcript (ResultEnv _ e)) = last xs
A.2 Adding Assertions
A simple non-mutating assertion for equality makes use of the applicative instance for
ResultEnv, by creating a ResultEnv e Bool using <*>, and then using that ResultEnv to
return a copy of the original ResultEnv with the pass/fail status potentially changed:
assertEqual :: (Eq e, Show e) => e -> ResultEnv e e -> ResultEnv e e
assertEqual e r@(ResultEnv _ i) = resultFrom $ (==) <$> pure e <*> r
where -- A passing assertion is the identity element for ResultEnv‘

49
resultFrom (ResultEnv _ (Right True)) = r
-- A failure doesn’t affect the environment, but changes
-- the enclosed ‘Result‘ value
resultFrom (ResultEnv _ (Right False)) =
ResultEnv (Fail (”/= ” ++ show e)) i
This can be exercised by creating a starting environment using pure:
environment :: ResultEnv Integer Integer
environment = pure 6
a predicate to see if the current environment is 5:
isFive = assertEqual 5
and applying that predicate to the environment itself:
testResult = isFive environment
giving a result of:
ResultEnv (Fail ”/= 5”) (Right 6)
However, to exercise the model more fully, requires a predicate that mutates the environ-
ment having performed its check. In this case, a predicate that tests for evenness, and
then decrements the environment by one, wrapped in a test assertion and then a meta
test-assertion:
assertEvenAndDecrement :: Integer -> ResultEnv Integer Integer
assertEvenAndDecrement x
| x < 1 = ResultEnv (Fail ”Exceptionally small input”) (Left x)
| (x ‘mod‘ 2) > 0 = ResultEnv (Fail (show x ++ ” is odd”)) (Right (x - 1))
| otherwise = ResultEnv Pass (Right (x-1))
testAssertEvenAndDecrement =
TestAssertion (makeID ”Env is even”) assertEvenAndDecrement
metaTestAssertEvenAndDecrement =
Single testAssertEvenAndDecrement
A test suite using xUnit-style meta test-assertions can be built up from this:
fiveAssertions = replicate 5
metaTestAssertEvenAndDecrement
suite = testSuite (makeID ”Test Suite”) [
testClass (makeID ”Test Class”) [
testMethod (makeID ”Test X1”) fiveAssertions,
testMethod (makeID ”Test X2”) fiveAssertions,
testMethod (makeID ”Test X3”) [] ] ]
A.3 Outputting TAP
To see the results of the test suite run, an output format is needed. TAP would be
appropriate for this.

50
TAP requires a maintenance of a test count at several levels, as well as final transcript.
TestBuilder provides this:
data TestBuilder = TestBuilder ([Integer], String)
and an instance of monoid is declared for it describing how to combine results from a
given meta test-assertion run into a previous one:
instance Monoid TestBuilder where
mempty = TestBuilder ([0], ””)
mappend (TestBuilder (d, t)) (TestBuilder (d’, t’)) =
TestBuilder (zipWith (+) d d’, mappend t t’)
along with a method for easily adding a completed meta test-assertion to the count:
-- Discard the count for this level, and increment the count
-- for the penultimate one, to keep track of tests run in
-- the parent meta test-assertion
incrementCount :: [Integer] -> [Integer]
incrementCount i = gparentCount ++ [1 + last parentCount]
where parentCount = init i
gparentCount = init parentCount
At the beginning of a meta test-assertion run, all that’s needed is to add to the depth of
the count:
-- Add to the depth of MTAs kept track of
startMTA _ _ (TestBuilder (i, e)) = TestBuilder (i ++ [0], e)
and at completion, a TAP line to add to the transcript is assembled:
endMTA description result (TestBuilder (i, e)) =
TestBuilder (newCount, e ++ output)
where output =
indent ++ okNotOK result ++ show testNumber ++
” - ” ++ description ++ debug result ++ ”n”
newCount = incrementCount i -- Incremented parent’s count
testNumber = last newCount -- Which number of the parent this is
depth = length newCount - 1 -- How many MTA’s deep
indent = replicate (depth * 4) ’ ’
okNotOK (ResultEnv Pass _) = ”ok ”
okNotOK (ResultEnv (Fail _) _) = ”not ok ”
debug r@(ResultEnv Pass _) = ””
--debug r@(ResultEnv Pass _) = ” # ” ++ debugEnvironment r
debug r@(ResultEnv (Fail d) l) = ”n” ++ indent ++ ”# ” ++
debugEnvironment r ++ diagnostics d l
debugEnvironment = show
diagnostics d l =
(”n” ++ indent ++ ”# ” ++ d) ++ skipped l

51
skipped (Left _) = ” EXCEPTION!”
skipped (Right _) = ””
These functions are then turned into an extension that can be given to the test runner:
tapOutput :: (Show e) => Extension TestBuilder e
tapOutput event assertion (Report transcript resultEnv ) =
Report
(showResult event)
(clearDiagnostics event resultEnv)
where name’ = name assertion
showResult Start = startMTA name’ resultEnv transcript
showResult End = endMTA name’ resultEnv transcript
clearDiagnostics Start re = re
clearDiagnostics End (ResultEnv Pass q) = ResultEnv Pass q
clearDiagnostics End (ResultEnv (Fail _) q) =
ResultEnv (Fail emptyDiag) q
And TAP can be generated for the test run:
not ok 1 - Env is even
# ResultEnv (Fail ”5 is odd”) (Right 4)
# 5 is odd
# ResultEnv (Fail ””) (Left 4)
# EXCEPTION!
# EXCEPTION!
# EXCEPTION!
# EXCEPTION!
not ok 1 - Test X1
# EXCEPTION!
ok 1 - Env is even
# ResultEnv (Fail ”3 is odd”) (Right 2)
# 3 is odd
# EXCEPTION!
# EXCEPTION!
# EXCEPTION!

52
not ok 2 - Test X2
# EXCEPTION!
ok 1 - Empty Assertion
ok 3 - Test X3
not ok 1 - Test Class
# ResultEnv (Fail ”Recovered Left->RightRecovered Left->Right”) (Right 2)
# Recovered Left->RightRecovered Left->Right
not ok 1 - Test Suite
# ResultEnv (Fail ””) (Right 2)
fin

Bibliography
Ascher, David, and Mark Lutz. 1999. “Functions > Scope Rules in Functions.”
In Learning Python. O’Reilly. https://www.safaribooksonline.com/library/view/
learning-python/1565924649/ch04s03.html.
Beck, Kent. 1994. “Simple Smalltalk Testing: With Patterns.” The Smalltalk Report 4
(2): 16–18.
Clarke, Lori A., and David S. Rosenblum. 2006. “A Historical Perspective on Runtime
Assertion Checking in Software Development.” ACM SIGSOFT Software Engineering
Notes 31 (3). ACM: 25–37.
Flanagan, David, and Yukihiro Matsumoto. 2008. “Classes and Modules > Single-
ton Methods and the Eigenclass.” In The Ruby Programming Language. O’Reilly
Media, Inc. https://www.safaribooksonline.com/library/view/the-ruby-programming/
9780596516178/ch07s07.html.
Goldstine, Herman Heine, and John Von Neumann. 1948. Planning and Coding of
Problems for an Electronic Computing Instrument. Institute for Advanced Study. https:
//library.ias.edu/files/pdfs/ecp/planningcodingof0103inst.pdf.
Hamill, Paul. 2004. Unit Test Frameworks: Tools for High-Quality Software Development.
O’Reilly Media, Inc.
Turing, A. 1949. “In Report of a Conference on High Speed Automatic Calculating
Machines.” In Univ. Math. Laboratory, Cambridge, 67–69. http://www.turingarchive.
org/viewer/?id=462&title=01.
Wynne, Matt, and Aslak Hellesøy. 2012. The Cucumber Book : Behaviour-Driven De-
velopment for Testers and Developers. Dallas, Tex.: Pragmatic Bookshelf.
53

A Haskell model for examining Perl, Python, and Ruby's testing ecosystems, and in particular Cucumber

More Related Content

What's hot (15)

Similar to A Haskell model for examining Perl, Python, and Ruby's testing ecosystems, and in particular Cucumber (20)

Recently uploaded (20)

A Haskell model for examining Perl, Python, and Ruby's testing ecosystems, and in particular Cucumber