How to test an AI application

Tieturi-Webinaari:
Kuinka testata
tekoälysovellusta?
Kari Kakkonen
Knowit
https://www.linkedin.com/in/karikakkonen
Mark Sevalnev
Knowit
https://www.linkedin.com/in/marksevalnev
Copyright Knowit Solutions Oy 2021 1

A Nordic powerhouse
for digital solutions
4,000+
/ Professionals
6 countries
/ Sweden, Norway, Finland,
Denmark, Germany and Poland
4 business areas
/ Solutions, Experience, Connectivity and Insight
468.0 MEUR
/ Net sales
Nordic ESG champions
/ Clear vision to accelerate the sustainability agenda
47.5 MEUR
/ Adjusted operating profit (EBITA)

ROLES
• Knowit Solutions Oy, Director of Training
and Competences, Lead Consultant,
Trainer and Coach
• Children’s and testing author at Dragons
Out Oy
• TMMi, Board of Directors
• Treasurer of Finnish Software Testing
Board (FiSTB)
ACHIEVEMENTS
• Tester of the Year in Finland 2021
• EuroSTAR European Testing Excellence
Award 2021
• ISTQB Executive Committee 2015-2021
• Influencing testing since 1996
• Ranked in 100 most influential IT persons
in Finland (Tivi magazine)
• Great number of presentations in Finnish
and international conferences
• TestausOSY/FAST founding member.
• Co-author of Agile Testing Foundations
book
• Regular blogger in Tivi-magazine
Kari Kakkonen, Lead Testing Consultant
SERVICES
• ISTQB Advanced, Foundation and Agile Testing
• A4Q AI and Software Testing
• Knowit Quality Professional
• DASA DevOps
• Quality & Test process and organization development,
Metrics, TMMi and other assessments
• Agile testing, Scrum, Kanban, Lean
• Leadership
• Test automation, Mobile, Cloud, DevOps, AI
• Quality, cost, benefits
EDUCATION
• ISTQB Expert Level Test Management & Advanced Full
& Agile Tester certified
• DASA DevOps, Scrum Master and SAFe certified
• TMMi Professional, Assessor, Process Improver
certified
• SPICE provisionary assessor certified
• M.Sc.(Eng), Helsinki University of Technology (present
Aalto University), Otaniemi, Espoo
• Marketing studies, University of Wisconsin-Madison,
the USA.
26.1.2023 3
BUSINESS DOMAINS
• Wide spread of business domain knowledge: Embedded, industry, public, training,
telecommunications, commerce, Insurance, banking, pension.
twitter.com/kkakkonen
Dragonsout.com
MORE INFORMATION
linkedin.com/in/karikakkonen/
© Copyright Knowit Trainings 2022

Mark has over 10 years of software development experience in three main areas: AI/ML prototyping,
traditional software development, and computer science research. His passion is in sky-rocketing domain
of AI/ML. Mark has worked with NLP, Deep learning, classification, speech-to-text-systems and he has
been co-author of several scientific papers.
.
TECHNOLOGY
• Java
• Python
• React
• Spring Boot
• Node.js
• Keras
• TensorFlow
• DialogFlow
• AWS
• Azure
• Google Cloud
• GitLab
• Docker
COURCES AND CERTIFICATIONS
• AWS Certified Machine Learning,
Specialty
• AWS Certified Solutions Architect,
Associate
EDUCATION
• M.Sc. (Technology), Theoretical
Computer Science (main),
Software Systems (minor),
Aalto University
Mark Sevalnev, Full Stack Developer
#AI/ML
#AWS
#Java
#React
ROLES
Full Stack Developer
AI Developer
Trainer
MORE INFORMATION
linkedin.com/in/marksevalnev
TECHNIQUES BENEFIT IN EXAMPLE PROJECTS
• Developing a 3D virtual avatar working as a service desk operative as
follows: The existing React.js code was modified to match different
business requirements. API integrations were implemented with cloud
services (AWS) and designed the login for chat bot conversations. Used
main tools: React.js, Dialogflow, AWS, and Google Cloud.
• AI prediction algorithm to predict future values in the HR process such as
time needed for recruitment, etc. The following professional skills were
needed: Data investigation, fetching, cleaning and preparation, designing
and implementing AI/ML algorithm, and deploying solution to the cloud.
Used main tools: Python, Pandas, Numby, Keras, and Sckit-learn.
• PoC for Optical Character Recognition
(OCR): As AI developer, building.
ETL for OCR of scanned documents.
Deploying a solution to AWS Fargate
running inside Docker containers,
and configuring orchestration of
ETL pipe using Airflow with the
following subtasks: Documents
converting to gray scale, OCR,
content classification with NLP,
storing results into ElasticSearch.
4
26.1.2023

• Miten tekoäly poikkeaa normaalista
ohjelmistosta?
• Tekoälyn testauksen alueet
koneälyn opetuksessa
• Tekoälyn testaustapoja
Agenda

Why right now?
Four drivers behind AI revolution
26.1.2023 6
© Copyright Knowit Solutions 2020 | Version 2.0
Computation growth due to general purpose GPUs The rise of Big data
Community based achievements in Deep learning Open source tools and frameworks

AI applications?
26.1.2023 7
Figure: 2019 AI landscape by Firstmark (a snippet)
http://mattturck.com/wp-
content/uploads/2019/07/2019_Matt_Turck_Big_Data_Landscap
e_Final_Fullsize.png

8
AI as a paradigm shift
How AI is different from traditional software development?
© Copyright Knowit Oy 2020 | Confidential | Version 1.0
Code
?
?
?
Input
Output
Traditional approach:
Work focuses on
coding rules
Machine learning:
Work focuses on
collecting examples

Is AI better?
In which set of problems AI-based approach is superior?
26.1.2023 9
Figures: Li/Johnson/Yeung C231
https://cs231n.github.io/classification/

AI is broken?
Why well-trained image recognition is failed in production?
26.1.2023 10
Not a tank
tank
Tank
Classifier
tank
tank
tank

Metaphor for AI learning: baby or alien?
26.1.2023 11

AI specific challenges
How biased data can ruin AI performance?
26.1.2023 12
Figure: Harvard University
https://sitn.hms.harvard.edu/flash/2020/racial-
discrimination-in-face-recognition-technology/
Figures: MIT
https://arxiv.org/abs/1901.10002

Is AI different to test? Is AI a 'black box'? Is AI 'fragile'?
Specificies of AI performance
26.1.2023 13
https://www.researchgate.net/figure/One-pixel-
attacks-created-with-the-proposed-algorithm-that-
successfully-fooled-three_fig3_320609325

• Features
• Value space
• Labels
• Functions
• Function weights
• Model
• Model training
• Training and testing set
• Fitting error
Small detour into AI related
terms...
© Copyright Knowit Solutions 2020 | Version 2.0 14

What are the features? What are the labels?
What is value space?
26.1.2023 15
x1 = 75kg
x2 =
172cm
x3 = siniset
y = mies
x1 = 75
x2 = 172
x3 = 3
y = 0
x1 = (105,234,41)
x2 = (45,24,44)
x3 = (15, 4,21)
…
x307 200 = (15,24,71)
y = koira
x1 = (105,234,41)
x2 = (45,24,44)
x3 = (15, 4,21)
…
x307 200 = (15,24,71)
y = 1
x1 represents person’s weight, so it
can potentially get values from 40kg to
200kg
x2 represents person’s height, so it can
potentially get values from 80cm to
250cm
These are
real world
objects
These are
what we
measure from
them
These are what
we feed to AI
model/function

What is function? What are the function weights?
26.1.2023 16
• Mathematical function is a mapping that takes input x and outputs y
• Examples of the functions:
G = m*g  G = f(m)
s = s0 + v0*t + ½*a*t2  s = f(t)
• Every AI algorithm (neural network, regression line, decision tree) is a
mathematical function i.e. f(x)=y
• x is the input representation i.e. set of properties (features) that describe
the given input
• y is the desired class i.e. y=0 => ‘this a cat image’, y=1 => ‘this is a dog
image’
• weights (parameters) are ‘moving parts’ in a function i.e. numbers that
must be fixed

What is training and testing sets? What is model training?
What is fittest error?
26.1.2023 17
virhe

Is AI for instance neural net learns?
26.1.2023 18
https://www.datasciencecentral.
com/the-approximation-power-
of-neural-networks-with-python-
codes/
https://en.wikipedia.org/wiki/Univ
ersal_approximation_theorem

Is AI different to test? Is AI a 'black box'? Is AI 'fragile'?
Specificies of AI performance
26.1.2023 19
https://playground.tensorflow.org/

Functional & Non-Functional Characteristics
Non-Functional
Testing
Functional
Testing
what the
system does
how the
system does it
Functional
Suitability
Performance
Efficiency
Compatibility Usability Reliability Security
Maintain-
ability
Portability
ISO 25010 Product Quality Model
Functional
completeness
Functional
correctness
Functional
appropriateness
Time
behaviour
Resource
utilisation
Capacity
Co-existence
Interoperability
Appropriateness
recognizability
Learnability
Operability
User error
protection
User interface
aesthetics
Accessibility
Maturity
Availability
Fault tolerance
Recoverability
Confidentiality
Integrity
Non-
repudiation
Accountability
Authenticity
Modularity
Reusability
Analysability
Modifiability
Testability
Adaptability
Installability
Replaceability
© STA Consulting

Risks, Objectives and Acceptance Criteria
ISO/IEC 25010
Quality
Characteristics
Acceptance
Criteria
AI-Specific
Quality
Characteristics
Test
Objectives
Perceived
Risks
 The most important (highest risk) system characteristics are used to generate
test objectives and acceptance criteria for AI-Based systems, including:
• flexibility, adaptability and evolution
• autonomy
• probabilistic and non-deterministic systems
• side-effects and reward hacking
• ethics and safety
• inappropriate bias
• transparency, interpretability and explainability
© STA Consulting

ML Workflow with Explicit Test Activities
Framework
& Algorithm
Selection
Model
Generation
& Test
Select a
Framework
Select & Build
the Algorithm
Model Generation
ML Model Testing
Prepare Data
Input Data Testing
Understand
the Objectives
Deploy the
Model
Monitor & Tune
the Model
Use the Model
Use, Monitor
& Tune
the Model
train/test
pipeline
production
pipeline
tested
model
model
objectives
framework &
algorithm
deployed
model
feedback
Data
Preparation &
Test
© STA Consulting

Input Data Testing
Input Data
Testing
Pipeline
Testing
Data
Testing
• ensure that the data used by the system (for training and prediction) is of
the highest quality
 Objective
© STA Consulting

ML Model Testing
• ensure that the generated model meets any functional and non-functional
acceptance criteria
 Objective
ML Model
Testing
Dynamic
Testing
Static
Testing
© STA Consulting

requirements for
ML model
ML model &
operational
pipeline
ML Workflow with Life Cycles
Component
Testing
Code
System
Testing
Architectural
Design
Requirements
Analysis
Acceptance
Testing
Detailed
Design
Component
Integration
Testing
V-Model used as an example only – AI-based systems can be
built using any life cycle, but test levels tend to remain the same.
requirements for
overall system &
non-AI components
non-AI components
© STA Consulting

AI-Specific Testing Issues
• self-learning systems
• autonomy and autonomous systems
• probabilistic and non-deterministic systems
• complexity
• automation bias
• test data
• concept drift
• inappropriate bias
• transparency, interpretability and explainability
 Several characteristics make the testing of AI-based systems especially
challenging, such as:
© STA Consulting

Example: Testing for Inappropriate Bias
Testing for
Inappropriate Bias
Dynamic,
Black-Box
Static,
White-Box
Testing for algorithmic bias:
• can involve analysis during model training,
evaluation and tuning
Testing for sample bias:
• can involve reviewing the source of data
and the acquisition process
• can involve reviewing the data pre-
processing activities
• can be difficult because ML algorithms can
use combinations of seemingly unrelated
features to infer results (which are biased)
• Testing with an independent dataset can
often detect bias
• Can involve measuring how changes in
inputs affect system outputs for specific
groups
- similar to explainability testing
• May be carried out in a production
environment, or as part of testing prior to
release
© STA Consulting

Test Methods and Techniques
 Adversarial Attacks and Data Poisoning
 Pairwise Testing
 Back-to-Back Testing
 A/B Testing
 Metamorphic Testing
 Experience-Based Testing
 Selecting Test Techniques for AI-Based Systems
© STA Consulting

Example: Risks → Test Approaches
AI Components
Non-AI
Components
AI-Based System
Specialized
Testing
Conventional
Testing
Test Approach
Risk
Analysis
© STA Consulting

Test Environments for AI-Based Systems
 The test environments for AI-Based systems have much in common with
those for conventional systems, typically
• the development environment at unit level
• a production-like test environment at system and acceptance levels
 ML models, when tested in isolation, are typically tested within their
development framework
© STA Consulting

• https://www.istqb.org/certifications/artificial-inteligence-tester
• https://www.tieturi.fi/koulutus/istqb-ai-testing/
• 13-16.3.2023
Where to learn more?

kari.kakkonen@knowit.fi
https://linkedin.com/in/karikakkonen/
mark.sevalnev@knowit.fi
https://www.linkedin.com/in/marksevalnev
Thank you!

How to test an AI application

More Related Content

What's hot (20)

Similar to How to test an AI application (20)

More from Kari Kakkonen (20)

Recently uploaded (20)

How to test an AI application