Test Automation Day 2018

Besides the obvious tools:
improving your testing with
state-of-the-art techniques
Maurício Aniche
m.f.aniche@tudelft.nl
@mauricioaniche
Photo by Sora Sagano
https://unsplash.com/photos/WA-QRL5wDMw

Content and License
• This presentation can be found at:
http://www.mauricioaniche.com/talks/2018/tad
• You can use it and modify it.
• You always have to give credits to the original author.
• You agree not to sell it or make profit in any way with this.

! Jeroen Castelein
" Mozhan Soltani # Annibale Panichella
! Joop Aué ! Maikel Lobbezoo ! Rick Wieman
! Sicco Verwer
! Felienne Hermans # Davide Spadini# $ Alberto Bacchelli
! Arie van DeursenKristín Fjóla
! Peter Evers
Qianqian Zhu

• First job as a developer in 2004
• First important project in 2016
• First important bug: 2016
• Tests are important!
A little story
Photo by Michael Mims
https://unsplash.com/photos/0ZL0O-eDOpU

TEST ANALYSIS
& TEST DESIGN
clipart by j4p4n, adlerweb
https://openclipart.org/detail/297959/standing-robot
https://openclipart.org/detail/262444/bubble-person

“Testing is different from writing tests.
Developers write tests as a a way to give them
space to think and confidence for refactoring.
Testing focuses on finding bugs. Both should
be done.”
https://medium.com/@mauricioaniche/testing-vs-writing-tests-d817bffea6bc

The literature on test oracles has introduced techniques for oracle
automation, including modelling, specifications, contract-driven
development and metamorphic testing. When none of these is
completely adequate, the final source of test oracle information
remains the human, who may be aware of informal specifications,
expectations, norms and domain specific information that provide
informal oracle guidance.

TEST ANALYSIS
& TEST DESIGN
Find systematic and
automated ways to design
and execute tests!

Topics of today
• Structural testing and MC/DC
• Log monitoring and passive learning
• Search-based software testing
• Mutation testing
• Fuzzing
• Property-based testing
• Code review
• Static analysis tools

Who are you?
• Software developers?
• Software testers?
• What are your expectations here today?
• Fill this out: https://bit.ly/tad2018
clipart by GDJ
https://openclipart.org/detail/230150/crowd-of-kids

Structural
Testing
clipart by J_Alves
https://openclipart.org/detail/61405/threonine-amino-acid

Given the points of two
different players, the
program must return the
number of points the one
who wins has!
public int play(int left,
int right) {
int ln = left;
int rn = right;
if(ln > 21)
ln = 0;
if(rn > 21)
rn = 0;
if(ln > rn)
return rn;
else
return ln;
}

int right) {
int ln = left;
int rn = right;
if(ln > 21)
ln = 0;
if(rn > 21)
rn = 0;
if(ln > rn)
return rn;
else
return ln;
}
First criteria: “going
through all the lines”
If our test suite
exercises all the lines,
we are happy.

int right) {
int ln = left;
int rn = right;
if(ln > 21)
ln = 0;
if(rn > 21)
rn = 0;
if(ln > rn)
return rn;
else
return ln;
}
If our test suite
we are happy.
T1 = (30, 30)

int right) {
1 int ln = left;
2 int rn = right;
3 if(ln > 21)
4 ln = 0;
5 if(rn > 21)
6 rn = 0;
7 if(ln > rn)
8 return rn;
9 else
10 return ln;
}
If our test suite
we are happy.
T1 = (30, 30)
9 / 10 = 90% line coverage

int right) {
1 int ln = left;
2 int rn = right;
3 if(ln > 21)
4 ln = 0;
5 if(rn > 21)
6 rn = 0;
7 if(ln > rn)
8 return rn;
9 else
10 return ln;
}
If our test suite
we are happy.
T1 = (30, 30)
T2 = (10,9) <-- left player wins
Make it true

int right) {
1 int ln = left;
2 int rn = right;
3 if(ln > 21)
4 ln = 0;
5 if(rn > 21)
6 rn = 0;
7 if(ln > rn)
8 return rn;
9 else
10 return ln;
}
If our test suite
we are happy.
T1 = (30, 30)
T2 = (10,9) <-- left player wins
10 / 10 = 100% line coverage

9/10 = 90%,
5/6 = 83%...
From now on, I’ll write as
many lines as I can!!
Xclipart by GDJ
https://openclipart.org/detail/230143/female-engineer-9

Given a sentence, you
should count the number
of words that end with
either an “s” or an “r”. A
word ends when a non-
letter appears.

int words = 0;
char last = ' ';
for(int i = 0;
i<str.length();
i++)
if(!Character.isLetter
(str.charAt(i)) &&
(last == ‘s’ || last
== ‘r’))
words++;
last = str.charAt(i);
if(last == ‘s’
|| last == ‘r’)
words++;
return words;
true
false
false
false
true
true
Control-flow graph
(CFG)
We should cover
all the branches
(arrows)

int words = 0;
char last = ' ';
for(int i = 0;
i<str.length();
i++)
(str.charAt(i)) &&
== ‘r’))
words++;
if(last == ‘s’
|| last == ‘r’)
words++;
return words;
true
false
false
false
true
true
“cats|dogs”

int words = 0;
char last = ' ';
for(int i = 0;
i<str.length();
i++)
(str.charAt(i)) &&
== ‘r’))
words++;
if(last == ‘s’
|| last == ‘r’)
words++;
return words;
true
false
false
false
true
true
“cats|dog”

Branch coverage means
we exercise all the
branches!

I wonder if that’s
enough…

(str.charAt(i)))
last == 'r'last == 's’
words++;
false
true
true
false
true
false
If we “explode” the if into
its several conditions, we
have more paths to
explore!

int words = 0;
char last = ' ';
for(int i = 0;
i<str.length();
i++)
(str.charAt(i))
last == 'r'last == 's’
words++;
if(last == ‘s'
last == ‘r’)
words++;
return words;
true
false
true
true
false
false
false
true
false
true
true
false

Ok, condition coverage
seems to cover more
than branch coverage!

If we aim for condition
coverage, are we testing
all the paths?

(A && (B | C))
Tests a b c Outcome
1 T T T T
2 T T F T
3 T F T T
4 T F F F
5 F T T F
6 F T F F
7 F F T F
8 F F F F
Path Coverage

Can we actually achieve
100% path coverage?

• The subpaths through this control flow
can include or exclude each of the
statements Si, so that in total N
branches result in 2^N paths that must
be traversed
• Choosing input data to force execution
of one particular path may be very
difficult, or even impossible if the
conditions are not independent
if (a) {
S1;
}
if (b) {
S2;
}
if (C) {
S3;
}
...
if (x) {
Sn;
}
The number of paths can
still grow exponentially

Can we test just the
important
combinations?

Modified Condition/
Decision Coverage
(MC/DC)

(A && (B | C))
Tests a b c Outcome
1 T T T T
2 T T F T
3 T F T T
4 T F F F
5 F T T F
6 F T F F
7 F F T F
8 F F F F

(A && (B | C))
Tests a b c Outcome
1 T T T T
2 T T F T
3 T F T T
4 T F F F
5 F T T F
6 F T F F
7 F F T F
8 F F F F
A = {1, 5}, {2, 6}, {3,7}
B = {2, 4}
C = {3, 4}
Final = {2, 3, 4, 6}
They are the same!
We don’t need them all

So, for N conditions, I
always have only N+1
tests! That’s definitely
better than 2n!!

McCabe’s Cyclomatic Complexity
• C = |E| - |N| + 2
• C = # decision points + 1
• C = # of decision-statements
+ 1
C > 10: method too complex
[McCabe, 1976]
[ C correlated with #lines of
code ]
32
1
7
65
4

McCabe for Testing?
No empirical evidence
that it is better than
just decision coverage.
How many tests?
• Branch: 2 tests
• All paths: 4 tests
• McCabe: 3 tests
32
1
7
65
4
McCabe: Easy to count, limited usefulness
as coverage metric

Strategy Subsumption
MC/DC
Branch + Condition
Coverage
Branch
Coverage
Statement
Coverage
• Strategy X subsumes strategy Y if
all elements that Y exercises are
also exercised by X
• No conclusive results on relative
bug-finding effectiveness have
been established.
Path coverage

What do YOU think:
Do we need 100% code coverage?

Don’t worry about
coverage, just write some
good tests.
I am ready to write some
unit tests. What code
coverage should I aim for?
Testivus on Code Coverage. Alberto Savoia. https://www.artima.com/weblogs/viewpost.jsp?thread=204677
clipart by 10_boss, bibbleycheese
https://openclipart.org/detail/202573/my-yoda
https://openclipart.org/detail/248493/pretzel-ninja

How many grains of rice
should put in that [boiling
water] pot?
It depends on how many
people you need to feed, how
hungry they are, what other
food you are serving, how
much rice you have available,
and so on Exactly!

80% and no less!

The first programmer is new and just getting started with testing.
Right now he has a lot of code and no tests. He has a long way to
go; focusing on code coverage at this time would be depressing and
quite useless. He’s better off just getting used to writing and
running some tests. He can worry about coverage later.

The second programmer, on the other hand, is quite experience
both at programming and testing. When I replied by asking her
how many grains of rice I should put in a pot, I helped her realize
that the amount of testing necessary depends on a number of
factors, and she knows those factors better than I do – it’s her code
after all. There is no single, simple, answer, and she’s smart enough
to handle the truth and work with that.

The third programmer wants only simple
answers – even when there are no simple
answers … and then does not follow them
anyway.

Mutation testing
Gif by h1flosse
https://openclipart.org/detail/190026/mutant

Imagine your code is a small town, where
crimes happen from times to times…
Photo by Jesus in Taiwan
https://unsplash.com/photos/c6aunWXHZZ0

Imagine your code is a small town, where
crimes happen from times to times…
clipart by kolbasun
https://openclipart.org/detail/219619/ninja-cop
Let’s simulate crimes and see
if the cops can get it!

City -> Program
Crime -> Bugs in code
Police -> Unit testing
Fake crime -> Mutation Testing

public int play(int
left, int right) {
int ln = left;
int rn = right;
if(ln > 21)
ln = 0;
if(rn > 21)
rn = 0;
if(ln > rn)
return rn;
else
return ln;
}
public int play(int
left, int right) {
int ln = left;
int rn = right;
if(ln > 21)
ln = 0;
if(rn < 21)
rn = 0;
if(ln > rn)
return rn;
else
return ln;
}

public int play(int
left, int right) {
int ln = left;
int rn = right;
if(ln > 21)
ln = 0;
if(rn > 21)
rn = 0;
if(ln > rn)
return rn;
else
return ln;
}
public int play(int
left, int right) {
int ln = left;
int rn = right;
if(ln > 21)
ln = 0;
if(rn < 21)
rn = 0;
if(ln > rn)
return rn;
else
return ln;
}
If your test still passes, this is no good!

Common mutants
• Replace arithmetic operator (+, -, *, /, …)
• Replace relational operators (>, >=, <, <=, ==, !=, …)
• Replace constants (a -> a+1)

As a research field
• Since the 70s
• Benefits:
• Better fault exposing capability
• A good alternative to real faults
• Limitations:
• High computational power
• Undecidable Equivalent Mutant Problem
•Mutants for other problems
• SQL

In order to alleviate the computational issues, we
present a diff-based probabilistic approach to
mutation analysis that drastically reduces the number
of mutants by omitting lines of code without
statement coverage and lines that are determined to
be uninteresting

Mutations:
http://pitest.org/quickstart/mutators/

Is (preventive)
testing enough?
Maybe not…
clipart by dani ela
https://openclipart.org/detail/229476/14-flowers

Context:
Payments
Payment
Provider

DEV OPS
Logs are our current bridge!

One Billion Log Lines a Day:
Monitoring using the ELK Stack
• Logstash: Unify different logging sources
• Elastic Search: Search and filter large log data
• Kibana: Visual interactive dashboard
Image credit: www.neteye-blog.com

Poll: Java Exceptions in a Payment System
Your payment system in production generates 1 billion log lines per day.
How many errors / warnings with exceptions do you expect to see?
A. None. “We have a zero exception policy.”
B. 1 Thousand. “Some exceptions are unavoidable.”
C. 1 Million. “Most exceptions are harmless.”
D. 1 Billion. “We only log errors and exceptions.”
Adyen, Nov 2016:
~1,000,000 per
day

Complex API Integration
• Payment APIs are complex
• Integration faults are easily made
• Merchant needs assistance with API
usage
• Merchant may not notice mistakes
• 2.5M http error responses per month
• What can we learn from them?
66

11 Common Causes for API Error Reponses
Integrators are definitely the main responsible for API integration problems!

11 Common Causes for API Error Reponses
Integrators are definitely the main responsible for API integration problems!
Understand your errors

Payment
Terminals
Payment
Provider

Point of sale terminal variability
• Card brands
• Card entry modes
(chip, swipe, contactless)
• Currency conversion
• Loyalty points
• Validation type (pin, signature)
• Issuer responses
(declined, insufficient balance)
• Cancellations
(shopper, merchant)

Passive learning
Identifying system behavior from observations,
and representing it in the smallest possible model.
20170101160001 Adyen version: ******
20170101160002 Starting TX/amt=10001/currency=978
20170101160003 Starting EMV
20170101160004 EMV started
20170101160005 Magswipe opened
20170101160006 CTLS started
20170101160007 Transaction initialised
20170101160008 Run TX as EMV transaction
20170101160009 Application selected app:******
20170101160010 read_application_data succeeded
20170101160011 data_authentication succeeded
20170101160012 validate 0
20170101160013 DCC rejected
20170101160014 terminal_risk_management succeeded
20170101160015 verify_card_holder succeeded
20170101160016 generate_first_ac succeeded
20170101160017 Authorizing online
20170101160018 Data returned by the host succeeded
20170101160019 Transaction authorized by card
20170101160020 Approved receipt printed
20170101160021 pos_result_code:APPROVED
20170101160022 Final status: Approved
20170101160001 Adyen version: ******
20170101160004 EMV started
20170101160012 validate 0
20170101160001 Adyen version: ******
20170101160004 EMV started
20170101160012 validate 0
20170101160001 Adyen version: ******
20170101160004 EMV started
20170101160012 validate 0
20170101160001 Adyen version: ******
20170101160004 EMV started
20170101160012 validate 0
Rick Wieman, Maurício Aniche, Willem Lobbezoo, Sicco Verwer and Arie van Deursen.
An Experience Report on Applying Passive Learning in a Large-Scale Payment Company. ICSME Industry Track, 2017
https://automatonlearning.net/
DFASAT / FlexFringe
Heule & Verwer, ICGI 2010

Use Inferred Models to Analyze:
Bugs in Test Phase
• Terminal asked for PIN
• AND asked for signature
• Domain expert noted this unwanted
behavior in inferred model.
• Fixed before it went into production

Differences Between Card Brands
Twice as many chip errors
Informed
merchant
about issue.

Time out problems
Timeout
Improved
performance under
network instability
by adding targeted
retry mechanism

Can the machine
generate tests for us?
Automated test
generation!
clipart by bingenberg
https://openclipart.org/detail/229476/14-flowers

1
5 2
6 7 3 4
8 9
10
@Test
public void test(){
// Constructor (init)
// Method Calls
// Assertions (check)
}

1
5 2
6 7 3 4
8 9
10
@Test
public void test(){
Triangle t = new Triangle (1,2,3);
// Method Calls
}

1
5 2
6 7 3 4
8 9
10
@Test
public void test(){
t.computeTriangleType();
}

1
5 2
6 7 3 4
8 9
10
@Test
public void test(){
t.computeTriangleType();
String typ = t.getType();
assertTrue(typ.equals(“SCALENE”));
}

Random testing
1. Pick one of the available constructors (with
random input)
2. Pick one or more public methods (with
random input)
3. Generate the assertions by checking the
final state of the object using get methods
clipart by 10binary
https://openclipart.org/detail/175047/february-11-2013

Genetic Algorithm
Initialization
Fitness
Calculations
Terminate?
Selection
Crossover
Mutation
Elitism
Yes
No

1
5 2
6 7 3 4
8 9
10
(2,2,3) -> <1,2,4>
(2,3,3) -> <1,5,7,8>

1
5 2
6 7 3 4
8 9
10 (2,2,3) -> <1,2,4>
(2,3,3) -> <1,5,7,8>
Fitness = Approach + Distance
Approach = # of control nodes
between the execution and the
target.
Distance = The normalized
distance for the control node
that diverged to “not diverge”.
n/(n+1)

1
5 2
6 7 3 4
8 9
10 (2,2,3) -> <1,2,4> = 2 + [1/(1+1)] = 2.5
(2,3,3) -> <1,5,7,8> = 0 + [1/(1+1)] = 0.5
target.
n/(n+1)

1
5 2
6 7 3 4
8 9
10 (2,2,3) -> <1,2,4> = 2 + [1/(1+1)] = 2.5
(2,3,3) -> <1,5,7,8> = 0 + [1/(1+1)] = 0.5 <-- better!
target.
n/(n+1)

Fraser, Gordon, and Andrea Arcuri. "Evosuite: automatic test suite generation for object-oriented software." Proceedings of
the 19th ACM SIGSOFT symposium and the 13th European conference on Foundations of software engineering. ACM, 2011.

Testing SQL
Query
SELECT Name
FROM Product
WHERE Price > 20
Name Price
- 19
- 20
- 21
Test Database
Table: Product
Coverage Criterion
1. False
Price = 19
2. Boundary
Price = 20
3. True
Price = 21

Testing SQL
Query
SELECT *
FROM àccount`
LEFT JOIN ùser` AS àssignedUser` ON account.assigned_user_id = assigneduser.id
LEFT JOIN ùser` AS `modifiedBy` ON account.modified_by_id = modifiedby.id
LEFT JOIN ùser` AS `createdBy` ON account.created_by_id = createdby.id
LEFT JOIN èntity_email_address` AS èmailAddressesMiddle`
ON account.id = emailaddressesmiddle.entity_id
AND emailaddressesmiddle.deleted = '0'
AND emailaddressesmiddle.primary = '1'
AND emailaddressesmiddle.entity_type = 'Account'
LEFT JOIN èmail_address` AS èmailAddresses`
ON emailaddresses.id = emailaddressesmiddle.email_address_id
AND emailaddresses.deleted = '0'
LEFT JOIN èntity_phone_number` AS `phoneNumbersMiddle`
ON account.id = phonenumbersmiddle.entity_id
AND phonenumbersmiddle.deleted = '0'
AND phonenumbersmiddle.primary = '1'
AND phonenumbersmiddle.entity_type = 'Account'
LEFT JOIN `phone_number` AS `phoneNumbers`
ON phonenumbers.id = phonenumbersmiddle.phone_number_id
AND phonenumbers.deleted = '0'
WHERE (( account.name LIKE 'Besha%'
OR account.id IN (SELECT entity_id
FROM entity_email_address
JOIN email_address
ON email_address.id =
entity_email_address.email_address_id
WHERE entity_email_address.deleted = 0
AND entity_email_address.entity_type =
'Account'
AND email_address.deleted = 0
AND email_address.name LIKE 'Besha%') ))
AND account.deleted = '0'
x 42 Coverage Rules
ü

EvoSQL
EvoSQL
SQLFpc
Test Data
Query
Database Schema
Coverage
Rules
Jeroen Castelein, Maurício Aniche, Mozhan Soltani, Annibale Panicchella, Arie Van Deursen
Search-Based Test Data Generation for SQL Queries. ICSE 2018.

Study Context
2,135 queries / 4 systems:
• Alura, e-learning platform
• EspoCRM, open source software for customer relations
• SuiteCRM, open source software for customer relations
• ERPNext, open source resource planning software for enterprises.

EvoSQL Evaluation Outcomes
• 100% of targets covered for 98% of the queries
• On average 86% covered for the remaining 2%
• Usually within seconds
• Outperforms biased and random alternatives:
• Biased random can handle 90% of simple queries (< 10 rules)
• Biased random often finds no solution for complex queries (10+ rules)

Property-
Based Testing
clipart by GDJ
https://openclipart.org/detail/232264/colorful-fleur-de-lis-fractal-3

Alan Turing on Assertions
(wo)

Assertions Defined
An assertion is a Boolean expression
at a specific point in a program
which will be true
unless there is a bug in the program.
http://wiki.c2.com/?WhatAreAssertions
Assertions in the
program: They hold
for any execution
of that point.
Unlike test code
assertion, which
holds for one
execution only105

The Java (C, C++, …) assert Statement
If boolean-expression is true, do nothing.
If it is false, throw an AssertionError,
with the string as message
“assert” boolean-expression [“:” string ]

LLVM Assertion Examples (BitcodeReader.cpp)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
assert(BlockAddrFwdRefs.empty() && "Unresolved blockaddress fwd references");
assert(Ty == V->getType() && "Type mismatch in constant table!");
assert((Ty == 0 || Ty == V->getType()) && "Type mismatch in value table!");
assert(It != ResolveConstants.end() && It->first == *I);
assert(isa<ConstantExpr>(UserC) && "Must be a ConstantExpr.");
assert(V->getType()->isMetadataTy() && "Type mismatch in value table!");
assert((!Alignment || isPowerOf2_32(Alignment)) && "Alignment must be a power of two.");
assert((Record[i] == 3 || Record[i] == 4) && "Invalid attribute group entry");
assert(Record[i] == 0 && "Kind string not null terminated");
assert(Record[i] == 0 && "Value string not null terminated");
assert(ResultTy && "Didn't read a type?");
assert(TypeList[NumRecords] == 0 && "Already read type?");
assert(NextBitCode == bitc::METADATA_NAMED_NODE); (void)NextBitCode;
assert((CT != LandingPadInst::Catch || !isa<ArrayType>(Val->getType())) &&
"Catch clause has a invalid type!");
assert((CT != LandingPadInst::Filter || isa<ArrayType>(Val->getType())) &&
"Filter clause has invalid type!");
assert(DFII != DeferredFunctionInfo.end() && "Deferred function not found!");
assert(DeferredFunctionInfo.count(F) && "No info to read function later?");
assert(M == TheModule && "Can only Materialize the Module this BitcodeReader is attached to.");
https://blog.regehr.org/archives/1091

Thinking in Assertions
• Method preconditions:
• Propositions that must hold before calling the method
• Method postconditions
• Propositions that are guaranteed to hold after the method has finished
• Structural invariants
• Properties over the state of an object throughout the object’s lifetime
• Helps to improve / reason about design
• Can be turned into assertions that can be checked at run time
• Supports the testing process

Formal Specifications via Hoare Triples
• Any execution of A,
• starting in a state where P holds
• will terminate in a state where Q holds
{ P } A { Q }
{ preconditions } Method { postconditions }

Precondition Design
• The “strength” of your preconditions is a design choice.
• The weaker your precondition
• The more situations your method needs to handle
• The less thinking the client needs to do (easier to use)
• However, with weak preconditions:
• The server will always do the checking
• This may be redundant:
checks also done if we’re sure they’ll pass.

Examples: File has been crated; Player has been moved;
Points have been added; Resulting tile is never null;
If client invokes a (server) method and meets its preconditions,
the server guarantees the postcondition will hold.
clipart by floEdelmann
https://openclipart.org/detail/260432/beach-chair

If you (as client) invoke a (server) method
without meeting its preconditions, anything can happen.
E.g.: Null pointer
exception
clipart by tzunghaor
https://openclipart.org/detail/166696/nuclear-explosion

Design By Contract
• Contract metaphor:
• Contract: an explicit statement of the rights and obligations
between a client and a server
• Server perspective:
• If you call me and meet my precondition, I ensure that after returning
I deliver a state in which my postcondition holds
• If not, you’re on your own.
Bertrand Meyer, Applying "Design by Contract",
IEEE Computer 25, 10, October 1992, pages 40-51

Bertrand Meyer’s
Seven Principles of Software Testing
1. To test a program is to try to make it fail.
2. Tests are no substitute for specifications
3. Any failed execution must yield a test case
4. Determining success or failure of tests must be an automatic
process (4.b: via contracts)
Bertrand Meyer, IEEE Software, 2008. Required Reading!

Seven Principles of Software Testing
5. An effective testing process must include both manually and
automatically produced test cases.
6. Test strategies must be empirically validated
7. A testing strategy’s most important property is the number of faults
it uncovers as a function of time.

Assertions Pro / Con
Great
• Support better testing
• Make debugging easier
(less distance)
• Executable comments
• “Gateway drug to formal
methods”
Less than Great
• Slow down code
• Make programs incorrect when
used improperly
• Might trick some of us lazy
programmers into using them to
implement error handling
• Are commonly misunderstood
http://blog.regehr.org/archives/1091
Required reading

Property-Based Testing
• Think of ”properties” (assertions) for functions
• Let “generator” produces series of random input values for function
• For each random input check the assertions.

Property: length of concatenated strings
equals sum of length of individual strings
Quickcheck:
will generate 100 random strings
to check this property.

Can tools help us
find bugs
automatically?
Yes, even without running the code!
clipart by Machovka
https://openclipart.org/detail/2676/lady-bug

Examples of bugs
• Equals checks for incompatible operand
• HE: Class defines equals() but not hashCode()
• RpC: Repeated conditional tests
• FL: Method performs math using floating point precision
• RANGE: Array offset is out of bounds (RANGE_ARRAY_OFFSET)
• Etc etc…
• Full list:
https://spotbugs.readthedocs.io/en/latest/bugDescriptions.html#

Linters are prevalent
• OSS systems have been intensively using linters.
• Tools are highly flexible, and developers have different strategies to
configure it.
• Challenge: false positives.
• You should develop your own!!
• Bugs specific to your context, e.g., config files.
Beller, Moritz, et al. "Analyzing the state of static analysis: A large-scale evaluation in open source software." Software Analysis, Evolution, and Reengineering (SANER),
2016 IEEE 23rd International Conference on. Vol. 1. IEEE, 2016.
Tómasdóttir, K. F., Aniche, M., & Deursen, A. V. (2017, October). Why and how JavaScript developers use linters. In Proceedings of the 32nd IEEE/ACM International Conference on
Automated Software Engineering (pp. 578-589). IEEE Press.

Importance of the different rules
1. Stylistic Issues
2. Best Practices
3. Variables
4. Possible Errors
5. Node.js &
CommonJS
6. ECMAScript 6
7. Strict Mode
1. Possible Errors 92.5%
2. Best Practices 89%
3. ECMAScript 6 86.7%
4. Variables 86,4%
5. Stylistic Issues 78.2%
6. Node.js & CommonJS 62.6%
7. Strict Mode 57.8%

Code review in test files!
Test files are almost 2 times less likely to be discussed
during code review when reviewed together with
production files!!
Davide Spadini, Maurício Aniche, Magiel Bruntink, Margaret-Anne Storey, Alberto Bacchelli. When Testing Meets Code
Review: Why and How Developers Review Tests. ICSE 2018.

Code review in test files!
Little on
finding more
bugs!
Davide Spadini, Maurício Aniche, Magiel Bruntink, Margaret-Anne Storey, Alberto Bacchelli. When Testing Meets Code
Review: Why and How Developers Review Tests. ICSE 2018.
0% 10% 20% 30%
0% 10% 20% 30%
Code improvement
Understanding
Social communication
Defect
Knowledge transfer
Misc

Learn software
testing is
challenging!
clipart by frankes
https://openclipart.org/detail/190242/comic-girl-tini-at-school

Common mistakes
• Test coverage (20.87%)
• Maintainability of test code (20.42%)
• Understanding test concepts (15.35%)
• Boundary testing (12.95%)
• State-based testing (12.39%)
• Assertions (8.93%)
• Mock Objects (5.87%)
• Tools (4.21%)

Difficult topics
Maurício Aniche, Felienne Hermans, Arie van Deursen. An Exploratory Study on Challenges in Software Testing
Education. TU Delft. In submission.
17%
19%
30%
31%
42%
35%
27%
35%
29%
46%
56%
36%
30%
44%
54%
46%
73%
76%
49%
42%
33%
32%
27%
25%
25%
25%
21%
20%
19%
18%
16%
16%
14%
14%
2%
1%
34%
39%
37%
37%
31%
40%
48%
41%
50%
35%
26%
46%
54%
40%
32%
41%
25%
23%
Minimum set of tests Q18 (80)
Avoid flaky tests Q17 (81)
Exploratory Testing Q16 (80)
Defensive programming Q15 (81)
How much to test Q14 (80)
Acceptance tests Q13 (81)
Design by contracts Q12 (81)
TDD Q11 (81)
Testability Q10 (81)
Best practices Q9 (81)
State−based testing Q8 (81)
Apply MC/DC Q7 (83)
Structural testing Q6 (82)
Boundary Testing Q5 (84)
Mock Objects Q4 (84)
Choose the test level Q3 (84)
Arrange−Act−Assert Q2 (81)
JUnit tests Q1 (83)
100 50 0 50 100

How to Learn?
Maurício Aniche, Felienne Hermans, Arie van Deursen. An Exploratory Study on Challenges in Software Testing
Education. TU Delft. In submission.
0%
1%
7%
6%
9%
10%
7%
31%
30%
35%
29%
93%
93%
80%
75%
73%
72%
65%
33%
32%
30%
20%
7%
6%
12%
19%
19%
18%
28%
36%
38%
34%
51%
Midterm exam Q11 (81)
AMA sessions Q10 (82)
Related papers Q9 (79)
Support from TAs Q8 (82)
Labwork Q7 (83)
ISTQB book Q6 (81)
PragProg book Q5 (80)
Interaction Q4 (83)
Live coding Q3 (83)
Guest lectures Q2 (83)
Lectures Q1 (83)
100 50 0 50 100
Peopledonotlikebooksandpapers…

The majority of projects and users [from 416
participants and 1,337,872 intervals] do not
practice testing actively.
We should change it.
Moritz Beller, Georgios Gousios, Annibale Panichella, Andy Zaidman. When, How, and Why Developers (Do Not) Test in Their IDEs. FSE 2015. clipart by laobc
https://openclipart.org/detail/65257/sad-baby

Topics of today
• Structural testing and MC/DC
• Log monitoring and passive learning
• Search-based software testing
• Mutation testing
• Fuzzing
• Property-based testing
• Code review
• Static analysis tools
Maurício Aniche
m.f.aniche@tudelft.nl
@mauricioaniche
http://www.mauricioaniche.com/talks/2018/tad

Test Automation Day 2018

More Related Content

Similar to Test Automation Day 2018 (20)

More from Maurício Aniche (20)

Recently uploaded (20)

Test Automation Day 2018