SlideShare a Scribd company logo
Autonomous Systems: How to
Address the Dilemma between
Autonomy and Safety
Lionel Briand
ASE 2022 Keynote
http://www.lbriand.info
Cyber-Physical Systems
2
Leveson, 2016
Autonomous Systems
‱ Autonomy lies on a spectrum
‱ Fully autonomous systems ó advisory systems
‱ Open context: context not fully specified
3
Burton et al., 2019
Why Autonomy?
‱ Reactivity to events, e.g., lunar rover
‱ Alleviate cognitive load in complex tasks. E.g., adaptive
cruise control
‱ New missions without human physical constraints, e.g.,
pilots in fighter jets
‱ Complete specifications difficult to formalize: environmental
complexity as well as continual change
4
Autonomy and Machine Learning
‱ Machine learning (ML) helps with the implementation of the
understanding and decision layers.
‱ ML: (1) infer input-output relationships from samples of intended
behavior, (2) generalize the learned function to unknown and
unforeseen inputs
‱ Risk: unanticipated responses and emergent system behavior
‱ Our focus: autonomous systems that use ML models
5
Semantic Gap
‱ Gap between intended functionality and specified
functionality (Burton et al., 2019).
‱ More challenging for autonomous systems
‱ Causes:
- Complexity and unpredictability of the operational domain
- E.g., Urban environments
6
Semantic Gap
‱ Gap between intended functionality and specified
functionality (Burton et al., 2019).
‱ More challenging for autonomous systems
‱ Causes:
- Complexity and unpredictability of the system itself
- E.g., Heterogeneous sensing channels
7
Semantic Gap
‱ Gap between intended functionality and specified
functionality (Burton et al., 2019).
‱ More challenging for autonomous systems
‱ Causes:
- Transfer of decision function to the system
- E.g., ML functions do not deliver clear-cut answers
8
Paradox
10
‱ “The problem of deriving a suitable specification of the
intended behavior is instead transferred to the problem
of demonstrating that the implemented (learned)
behaviour meets the intent.” Burton et al. 2019
‱ Main challenge: Definition and validation of adequate
system safety requirements
Assurance Cases
‱ A valid, evidence-based justification for a set of claims
about the safety of a system for a given function over its
operational context
11
Burton et al., 2019
Example: Autonomous Taxiing
12
Asaadi et al., 2020
CNN:
‱ Cross track
error (CTE)
‱ Heading
Error (HE)
Example: Autonomous Taxiing
13
‱ Safety property: Lateral runway excursions
‱ Safety requirement: the aircraft CTE shall not exceed a
specified offset whilst taxiing
‱ Hazard: aligning the aircraft nose with a different runway
marking instead of the centerline
‱ Mitigation: retraining the ML component using training data
augmented with appropriate test data exhibiting violations
Assurance Case
14
GSN notation:
Goals, strategies,
evidence
Structured
arguments backed
up by evidence
Evidence:
‱ artifacts
‱ verification
Asaadi et al., 2020
Safety as a Control Problem
15
‱ Safety should be
treated as an emergent
property of the system
(Leveson, 2016)
‱ Paradigm Change:
Run-time Assurance Architecture
16
‱ HiL simulator and iron bird
‱ CNN outputs used for run-time monitoring
‱ Contingency actions
Asaadi et al., 2020
ML Challenges
‱ Robustness to adversarial or unexpected inputs
‱ Uncertainty of predictions
‱ Explainability: provides insights in a human interpretable way
‱ Verification: formal (proof) or experimental (test)
17
Tambon et al., 2022
Safety Verification
‱ Ideally:
‱ Formal safety guarantees
‱ Practical assumptions (about input space, model, properties 
)
‱ Scalable (e.g., to large models and systems)
‱ But
‱ We cannot have it all (e.g., high dimensionality, many parameters)
‱ Safety is not a model property, rather a system one
18
Safety Verification
‱ Formal analysis, e.g., reachability analysis
‱ Focus on robustness to perturbations
‱ Guarantees about models (not systems) under very restrictive
assumptions (e.g., model architecture, input space shape, reasonable
over-approximation of output space)
‱ Scalability issues
‱ Testing-based analysis, e.g., search-based testing
‱ Heuristics for input space exploration
‱ No guarantees, but no restrictive assumptions and more scalable
‱ Smart combination?
19
ML-Related Assurance Questions
‱ Model level: Under which input conditions is an ML-
component classifying/predicting (in)correctly?
‱ System level: Under which conditions is an ML-component
(un)safe for the system?
‱ Integration level: Are there possible undesirable interactions
among (ML) components?
‱ Context: No specifications or code for ML components,
increasingly large models (DL, RL)
20
Summary
‱ Autonomous safety functions can help achieve higher safety,
e.g., emergency braking
‱ However, they also entail risks addressed by: (1) design-time
assurance cases and (2) run-time safety monitoring play a
key role to alleviate those risks
‱ We need automated support to collect appropriate evidence
(1) and guide monitoring (2) are required in the context of ML-
enabled autonomous systems.
21
Example Projects and
Lessons Learned
22
Model-Level Testing and
Analysis
23
Test Inputs: Adversarial or
Natural?
‱ Adversarial inputs: Focus on robustness, e.g., noise or attacks
‱ Natural inputs: Focus on functional aspects, e.g., functional
safety
24
Example: Key-points Detection
‱ DNNs used for key-points detection
in images
‱ Many applications, e.g., face
recognition
‱ Testing: Find test suite that causes
DNN to poorly predict as many key-
points as possible within time
budget
‱ Images generated by a simulator
28
Ground truth
Predicted
Example Application
‱ Drowsiness or gaze detection based on interior camera monitoring the driver
‱ In the drowsiness or gaze detection problem, each Key-Point (KP) may be highly
important for safety
‱ Each KP leads to a test objective
‱ For our subject DNN, we have 27 test objectives
‱ Goal: Cause the DNN to mispredict as many key-points as possible
‱ Solution: Many-objective search algorithms (based on genetic algorithms)
combined with simulator
30
Ul Haq et al., 2021
Overview
31
Input Generator (search) Simulator
Input (vector)
DNN
Fitness
Calculator
Actual Key-points Positions
Predicted Key-points Positions
Fitness Score
(Error Value)
Most Critical
Test Inputs
Test
Image
Results
‱ Our approach is effective in generating test suites that cause the DNN to severely
mispredict more than 93% of all key-points on average
‱ Not all mispredictions can be considered failures 
 (e.g., shadows)
‱ We must know when the DNN cannot be expected to be accurate and have contingency
measures
‱ Some key-points are more severely predicted than others, detailed analysis revealed
two reasons:
‱ Under-representation of some key-points (hidden) in the training data
‱ Large variation in the shape and size of the mouth across different 3D models (more
training needed)
32
Interpretation
‱ Regression trees to predict accuracy based on simulation parameters
‱ Enable detailed analysis to find the root causes of high Normalized Error (NE) values, e.g., shadow
on the location of KP26 is the cause of high NE values
‱ Regression trees show excellent accuracy, reasonable size.
‱ Amenable to risk analysis, gaining useful safety insights, and contingency plans at run-time
33
Image Characteristics Condition NE
! = 9 ∧ # < 18.41 0.04
! = 9 ∧ # ≄ 18.41 ∧ $ < −22.31 ∧ % < 17.06 0.26
! = 9 ∧ # ≄ 18.41 ∧ $ < −22.31 ∧ 17.06 ≀ % < 19 0.71
! = 9 ∧ # ≄ 18.41 ∧ $ < −22.31 ∧ % ≄ 19 0.36
Representative rules derived from the decision tree for KP26
(M: Model-ID, P: Pitch, R: Roll, Y: Yaw, NE: Normalized Error)
(A) A test image satisfying
the first condition
(B) A test image satisfying
the third condition
NE = 0.013 NE = 0.89
Real Images
‱ Test selection requires a different approach than with a
simulator
‱ Labeling costs are significant
36
DNN Structural Coverage Criteria
37
Chen et al. 2020
‱ Neuron coverage
‱ How neurons are
activated
‱ Many variants
Limitations
‱ Require access to the DNN internals and sometimes the
training set. Not realistic in many practical settings.
‱ There is a weak correlation between coverage and
misclassification or poor predictions for natural inputs
‱ Also many questions regarding the studies focused on
adversarial inputs 

38
Diversity-driven Test Selection
‱ Test inputs are real images
‱ Black-box approach based on measuring the diversity of
test inputs.
‱ Scalable selection
‱ The more diverse, the more likely test inputs are to reveal
faults
39
Agahababaeyan et al., 2022
Geometric Diversity (GD)
‱ Given a dataset X and its corresponding feature vectors V,
the geometric diversity of a subset S ⊆ X is defined as the
hyper-volume of the parallelepiped spanned by the rows of
Vs, i.e., feature vectors of items in S, where the larger the
volume, the more diverse is the feature space of S
40
Extracting Image Features
‱ VGG16 is a convolutional neural network trained on a
subset of the ImageNet dataset, a collection of over 14
million images belonging to 22,000 categories.
41
Correlation with Faults
Correlation between geometric diversity and faults
Pareto Front Optimization
44
‱ Black-box test selection to detect as many diverse faults
as possible for a given test budget
‱ Search-based Approach: Multi-Objective Genetic Search
(NSGA-II)
‱ Two objectives (Max):
‱ diversity
‱ uncertainty (e.g., Gini)
GD score
Gini
score
HUDD
45
‱ How to help perform risk analysis with real images?
‱ Rely on heatmaps to generate clusters of images leading to failures because of a
same root cause (common characteristics of the images)
‱ enable engineers to ensure safety with introducing countermeasures
Error-inducing
Test Set images
Step1.
Heatmap
based
clustering
Root cause clusters
C1 C2 C3
Step 2. Inspection of subset
of cluster elements.
Fahmy et al. 2021
‱ Classification
‱ Gaze Detection
46
90
270
180 0
45
22.5
67.5
337.5
315
292.5
247.5
225
202.5
157.5
135
112.5
Top
Center
B
o
t
t
o
m
L
e
f
t
Bottom
Center
B
o
t
t
o
m
R
i
g
h
t
Top
Right
Middle
Right
T
o
p
L
e
f
t
Middle
Left
Example Application
Clusters identify different problems
Cluster 1
(angle ~157.5)
borderline cases
Cluster3
(near closed eyes)
incomplete training set
49
Cluster 2
(eye middle center)
incomplete set of classes
SEDE: Simulator-based Explanations
for DNN failurEs
50
‱ Built on HUDD
‱ Generates readable
descriptions of
failure-inducing,
real-world images
‱ Effective retraining
‱ Leverages
availability of
simulators
Fahmy et al., 2022
SAFE: Black-Box Approach
51
Retraining
Features Extraction Detection of root causes
Error inducing
images
Data
Preprocessing
Features
Extraction
Clustering
Root cause
clusters
Dimensionality
Reduction
Unsafe-set
Selection
Retraining
Improved DNN
‱ Black-box alternative to HUDD
‱ No need to extend the DNN (LRP)
‱ Reduces training time and memory usage
‱ More accurate clustering (density based,
DBSCAN)
Attaoui et al., 2022
Reinforcement Learning Testing
and Safety
52
Reward and Functional Faults
54
‱ Testing goal: Detect and explain reward and functional faults
‱ A pole is attached to a cart, and the goal is to move the cart right and left to
keep the pole from falling.
‱ Reward fault: the accumulated reward is less than a threshold, e.g., the pole
falls down in the first 70 timesteps
‱ Functional fault: the pole is stable and, regardless of the accumulated reward,
the cart moves beyond the 2.4 limited distance, and the episode terminates
Functional fault Reward fault
STARLA: Objectives
‱ Detect as quickly
as possible diverse
faulty episodes
‱ Accurately
characterize faulty
episodes
‱ STARLA: Search
and ML based
55
Zolfagharian et al. 2022
System-Level Testing and
Analysis
56
Testing via Physics-based
Simulation
58
ADAS
(SUT)
Simulator (Matlab/Simulink)
Model
(Matlab/Simulink)
â–Ș Physical plant (vehicle / sensors / actuators)
â–Ș Other cars
â–Ș Pedestrians
â–Ș Environment (weather / roads / traïŹƒc signs)
Test input
Test output
time-stamped output
Reinforcement Learning for ADAS
Testing
‱ Use RL to change the environment in a realistic manner
with the objective of triggering safety violations
59
Action (Behavior of environment actors, e.g., vehicle in front)
Reward (e.g., based on distance from vehicle in front)
State/Next State (Locations and conditions about the
environment, e.g., collision)
RL Agent
RL-Environment
Simulator
(CARLA)
Control
(Image, location,
)
ADAS
Example Violation
‱ Violation: Ego Vehicle should not collide with other vehicle
‱ Vehicle-in-front slows down suddenly and then moves to the right
‱ Possible reason: Agent was not trained with such episodes
61
Car View Top View
System Testing Challenges
62
To address the challenges, we propose SAMOTA (Surrogate-Assisted
Many-Objective Testing Approach) by leveraging many-objective
search and Surrogate Models (SMs)
Ul Haq et al. 2022
Large Input
Space
Many Safety
Requirements
Computationally-
intensive
Simulation
Surrogate Models
‱ Surrogate model: Model that mimics the simulator, to a
certain extent, while being much less computationally
expensive
‱ Research: Combine search with surrogate modeling to
decrease the computational cost of testing (Ul Haq et al. 2022)
63
Polynomial
Regression
(PR)
Radial Basis Function (RBF) Kringing (KR)
SAMOTA
64
Steps:
1. Initialization
2. Global search
3. Local search
4. Glocal search
5. Local search
6. 

Execute
Simulator
Global
SMs Many Objective
Search Algorithm
Most Critical
Test Cases
Most Uncertain
Test Cases
Global Search
Initialisation
Execute
Simulator Database
Minimal
Test Suite
Local SM
per Cluster
Local Search
Most Critical
Test Cases
Single Objective
Search Algorithm
Clustering
Top Points
ML Impact on System Safety
‱ Inherent uncertainty in ML models
‱ Test mitigation mechanisms in
systems to handle misclassifications
or mispredictions
‱ Goal: We want to learn, as accurately
as possible, when an ML component
leads to system safety violations in
terms of inputs and outputs
‱ Applications: This is expected to help
guide and focus the testing of ML
components or implement safety
monitors for them.
65
Learn the Safety Envelope
Monitoring: How far are we from the safety envelop
for a given violation probability?
66
input
output
Safe Unsafe
Learnt envelope
Conclusions
67
Safety Analysis
‱ The adoption of machine learning to enable autonomy raises new
or amplify existing safety challenges
‱ Assurance cases (1) + run-time assurance architecture (2)
‱ Automated test support to collect evidence (1) and learn safety
conditions to monitor (2)
‱ Approaches based on metaheuristic search and machine learning
have shown to be practical, effective, and reasonably efficient
‱ Limited industrial experience properly reported on these topics
68
Automated Testing
‱ Remains key mechanism to provide safety evidence
‱ Different levels of testing: model, integration, system
‱ Automation is key at all levels and requires different strategies
‱ Strategy: evolutionary computing and machine learning
‱ Scalability is the main challenge: simulations, surrogate models, 

‱ More work needed beyond stateless models (DNN): reinforcement
learning 

69
Autonomous Systems: How to
Address the Dilemma between
Autonomy and Safety
Lionel Briand
ASE 2022 Keynote
http://www.lbriand.info
References
72
Selected References
‱ Ul Haq et al. "Automatic Test Suite Generation for Key-points Detection DNNs Using Many-Objective Search" ACM International
Symposium on Software Testing (ISSTA), 2021
‱ Ul Haq et al., “Efficient Online Testing for DNN-Enabled Systems using Surrogate-Assisted and Many-Objective Optimization”
IEEE/ACM ICSE 2022
‱ Fahmy et al. "Supporting DNN Safety Analysis and Retraining through Heatmap-based Unsupervised Learning" IEEE Transactions
on Reliability, Special section on Quality Assurance of Machine Learning Systems, 2021
‱ Fahmy et al. "Simulator-based explanation and debugging of hazard-triggering events in DNN-based safety-critical systems”,
ArXiv report, https://arxiv.org/pdf/2204.00480.pdf, ACM TOSEM, 2022
‱ Attaoui et al., “Black-box Safety Analysis and Retraining of DNNs based on Feature Extraction and Clustering”, ArXiv report,
https://arxiv.org/pdf/2201.05077v1.pdf, ACM TOSEM, 2022
‱ Aghababaeyan et al., “Black-Box Testing of Deep Neural Networks through Test Case Diversity”,
https://arxiv.org/pdf/2112.12591.pdf
‱ Zolfagharian et al., “Search-Based Testing Approach for Deep Reinforcement Learning Agents”,
https://arxiv.org/pdf/2206.07813.pdf
73
Selected Safety References
‱ Asaadi et al., “Assured Integration of Machine Learning-
basedAutonomy on Aviation Platforms”, 2020 AIAA/IEEE 39th
Digital Avionics Systems Conference (DASC)
‱ Burton et al., “Mind the gaps: Assuring the safety of autonomous
systems from an engineering, ethical, and legal perspective”,
Artificial Intelligence (Elsevier), 2019
‱ Tambon et al., “How to certify machine learning based
safety-critical systems? A systematic literature review”,
Automated Software Engineering (Springer), 2022
‱ Leveson, “Engineering a safer world: Systems thinking applied to
safety”, The MIT Press, 2016.
74
Selected ML Testing References
‱ Goodfellow et al. "Explaining and harnessing adversarial examples." arXiv preprint arXiv:1412.6572 (2014).
‱ Zhang et al. "DeepRoad: GAN-based metamorphic testing and input validation framework for autonomous driving systems." In 33rd
IEEE/ACM International Conference on Automated Software Engineering (ASE), 2018.
‱ Tian et al. "DeepTest: Automated testing of deep-neural-network-driven autonomous cars." In Proceedings of the 40th
international conference on software engineering, 2018.
‱ Li et al. “Structural Coverage Criteria for Neural Networks Could Be Misleading”, IEEE/ACM 41st International Conference on
Software Engineering: New Ideas and Emerging Results (NIER)
‱ Kim et al. "Guiding deep learning system testing using surprise adequacy." In IEEE/ACM 41st International Conference on Software
Engineering (ICSE), 2019.
‱ Ma et al. "DeepMutation: Mutation testing of deep learning systems." In 2018 IEEE 29th International Symposium on Software
Reliability Engineering (ISSRE), 2018.
‱ Zhang et al. "Machine learning testing: Survey, landscapes and horizons." IEEE Transactions on Software Engineering (2020).
‱ Riccio et al. "Testing machine learning based systems: a systematic mapping." Empirical Software Engineering 25, no. 6 (2020)
‱ Gerasimou et al., “Importance-Driven Deep Learning System Testing”, IEEE/ACM 42nd International Conference on Software
Engineering, 2020
75

More Related Content

PPTX
ISO/PAS 21448 (SOTIF) in the Development of ADAS and Autonomous Vehicles
PDF
Azure Lab Services.pdf
PPTX
Testing object oriented software.pptx
PPTX
Software Quality Assurance
PDF
V2X Communications: Getting our Cars Talking
PDF
MagicEye Driver Fatigue Monitoring System
PDF
Sikuli script
PDF
Cellular V2X
ISO/PAS 21448 (SOTIF) in the Development of ADAS and Autonomous Vehicles
Azure Lab Services.pdf
Testing object oriented software.pptx
Software Quality Assurance
V2X Communications: Getting our Cars Talking
MagicEye Driver Fatigue Monitoring System
Sikuli script
Cellular V2X

What's hot (20)

PPTX
Java Swing
PPTX
AI Testing What Why and How To Do It?
 
PDF
Types of Software Testing | Edureka
PPTX
Software testing
PPTX
PROGRESS OF AUTOSAR STANDARDS FOR FUTURE INTELLIGENT VEHICLES
 
PPTX
Introduction to ASPICE
PDF
An integrative solution towards SOTIF and AV safety
PDF
Autonomous Vehicles: Technologies, Economics, and Opportunities
PPT
Corba
PPTX
PostGIS - National Education Center for GIS: Open Source GIS
PPT
Component based models and technology
ODP
Software testing ppt
PPT
Chapter 9 Testing Strategies.ppt
PPT
Uml package diagram
PDF
Structure chart
PPTX
ISO 26262 introduction
PPTX
Design Pattern in Software Engineering
PPT
Parking Guidance Systems
PPTX
Automotive Hacking
PDF
Automotive Infotainment Test Solution or In-Vehicle Infotainment Testing (IVI...
Java Swing
AI Testing What Why and How To Do It?
 
Types of Software Testing | Edureka
Software testing
PROGRESS OF AUTOSAR STANDARDS FOR FUTURE INTELLIGENT VEHICLES
 
Introduction to ASPICE
An integrative solution towards SOTIF and AV safety
Autonomous Vehicles: Technologies, Economics, and Opportunities
Corba
PostGIS - National Education Center for GIS: Open Source GIS
Component based models and technology
Software testing ppt
Chapter 9 Testing Strategies.ppt
Uml package diagram
Structure chart
ISO 26262 introduction
Design Pattern in Software Engineering
Parking Guidance Systems
Automotive Hacking
Automotive Infotainment Test Solution or In-Vehicle Infotainment Testing (IVI...
Ad

Similar to Autonomous Systems: How to Address the Dilemma between Autonomy and Safety (20)

PDF
Applications of Search-based Software Testing to Trustworthy Artificial Intel...
PDF
Testing Machine Learning-enabled Systems: A Personal Perspective
PDF
Automated Testing and Safety Analysis of Deep Neural Networks
PDF
Requirements in Engineering AI- Enabled Systems: Open Problems and Safe AI Sy...
PDF
Adobe Audition Crack FRESH Version 2025 FREE
PDF
Functional Safety in ML-based Cyber-Physical Systems
PDF
Approach AI assurance
PDF
Safety Verification of Deep Neural Networks_.pdf
PPTX
[DSC Croatia 22] Towards Classification Trustworthiness in Safety-Critical Au...
PDF
“Building an Autonomous Detect-and-Avoid System for Commercial Drones,” a Pre...
PDF
Black-box Safety Analysis and Retraining of DNNs based on Feature Extraction ...
PPTX
DSML 2021 Keynote: Intelligent Software Engineering: Working at the Intersect...
PDF
Enzo Fenoglio (Cisco) Beyond Deep Learning – How to be ahead of tomorrow
 
PDF
Leveraging Artificial Intelligence Processing on Edge Devices
 
PDF
Keynote presentation at DeepTest Workshop 2025
PPTX
rsec2a-2016-jheaton-morning
PPTX
Explainability First! Cousteauing the Depths of Neural Networks
PPTX
Introduction to AI Safety (public presentation).pptx
PDF
Machine Learning: Past, Present and Future - by Tom Dietterich
PDF
Quant university MRM and machine learning
Applications of Search-based Software Testing to Trustworthy Artificial Intel...
Testing Machine Learning-enabled Systems: A Personal Perspective
Automated Testing and Safety Analysis of Deep Neural Networks
Requirements in Engineering AI- Enabled Systems: Open Problems and Safe AI Sy...
Adobe Audition Crack FRESH Version 2025 FREE
Functional Safety in ML-based Cyber-Physical Systems
Approach AI assurance
Safety Verification of Deep Neural Networks_.pdf
[DSC Croatia 22] Towards Classification Trustworthiness in Safety-Critical Au...
“Building an Autonomous Detect-and-Avoid System for Commercial Drones,” a Pre...
Black-box Safety Analysis and Retraining of DNNs based on Feature Extraction ...
DSML 2021 Keynote: Intelligent Software Engineering: Working at the Intersect...
Enzo Fenoglio (Cisco) Beyond Deep Learning – How to be ahead of tomorrow
 
Leveraging Artificial Intelligence Processing on Edge Devices
 
Keynote presentation at DeepTest Workshop 2025
rsec2a-2016-jheaton-morning
Explainability First! Cousteauing the Depths of Neural Networks
Introduction to AI Safety (public presentation).pptx
Machine Learning: Past, Present and Future - by Tom Dietterich
Quant university MRM and machine learning
Ad

More from Lionel Briand (20)

PDF
LTM: Scalable and Black-box Similarity-based Test Suite Minimization based on...
PDF
TEASMA: A Practical Methodology for Test Adequacy Assessment of Deep Neural N...
PDF
Automated Test Case Repair Using Language Models
PDF
FlakyFix: Using Large Language Models for Predicting Flaky Test Fix Categorie...
PDF
Precise and Complete Requirements? An Elusive Goal
PDF
Large Language Models for Test Case Evolution and Repair
PDF
Metamorphic Testing for Web System Security
PDF
Simulator-based Explanation and Debugging of Hazard-triggering Events in DNN-...
PDF
Fuzzing for CPS Mutation Testing
PDF
Data-driven Mutation Analysis for Cyber-Physical Systems
PDF
Many-Objective Reinforcement Learning for Online Testing of DNN-Enabled Systems
PDF
ATM: Black-box Test Case Minimization based on Test Code Similarity and Evolu...
PDF
PRINS: Scalable Model Inference for Component-based System Logs
PDF
Revisiting the Notion of Diversity in Software Testing
PDF
Mathematicians, Social Scientists, or Engineers? The Split Minds of Software ...
PDF
Reinforcement Learning for Test Case Prioritization
PDF
Mutation Analysis for Cyber-Physical Systems: Scalable Solutions and Results ...
PDF
On Systematically Building a Controlled Natural Language for Functional Requi...
PDF
Efficient Online Testing for DNN-Enabled Systems using Surrogate-Assisted and...
PDF
Guidelines for Assessing the Accuracy of Log Message Template Identification ...
LTM: Scalable and Black-box Similarity-based Test Suite Minimization based on...
TEASMA: A Practical Methodology for Test Adequacy Assessment of Deep Neural N...
Automated Test Case Repair Using Language Models
FlakyFix: Using Large Language Models for Predicting Flaky Test Fix Categorie...
Precise and Complete Requirements? An Elusive Goal
Large Language Models for Test Case Evolution and Repair
Metamorphic Testing for Web System Security
Simulator-based Explanation and Debugging of Hazard-triggering Events in DNN-...
Fuzzing for CPS Mutation Testing
Data-driven Mutation Analysis for Cyber-Physical Systems
Many-Objective Reinforcement Learning for Online Testing of DNN-Enabled Systems
ATM: Black-box Test Case Minimization based on Test Code Similarity and Evolu...
PRINS: Scalable Model Inference for Component-based System Logs
Revisiting the Notion of Diversity in Software Testing
Mathematicians, Social Scientists, or Engineers? The Split Minds of Software ...
Reinforcement Learning for Test Case Prioritization
Mutation Analysis for Cyber-Physical Systems: Scalable Solutions and Results ...
On Systematically Building a Controlled Natural Language for Functional Requi...
Efficient Online Testing for DNN-Enabled Systems using Surrogate-Assisted and...
Guidelines for Assessing the Accuracy of Log Message Template Identification ...

Recently uploaded (20)

PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 41
PPTX
Materi-Enum-and-Record-Data-Type (1).pptx
PPTX
Transform Your Business with a Software ERP System
PDF
Audit Checklist Design Aligning with ISO, IATF, and Industry Standards — Omne...
PPTX
ISO 45001 Occupational Health and Safety Management System
PPTX
ai tools demonstartion for schools and inter college
PDF
Complete React Javascript Course Syllabus.pdf
PDF
Adobe Illustrator 28.6 Crack My Vision of Vector Design
PDF
2025 Textile ERP Trends: SAP, Odoo & Oracle
PDF
Softaken Excel to vCard Converter Software.pdf
PDF
System and Network Administration Chapter 2
PDF
Which alternative to Crystal Reports is best for small or large businesses.pdf
PPTX
Online Work Permit System for Fast Permit Processing
PPTX
Essential Infomation Tech presentation.pptx
PDF
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
DOCX
The Five Best AI Cover Tools in 2025.docx
PDF
medical staffing services at VALiNTRY
PDF
System and Network Administraation Chapter 3
PDF
How to Choose the Right IT Partner for Your Business in Malaysia
PPTX
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
Internet Downloader Manager (IDM) Crack 6.42 Build 41
Materi-Enum-and-Record-Data-Type (1).pptx
Transform Your Business with a Software ERP System
Audit Checklist Design Aligning with ISO, IATF, and Industry Standards — Omne...
ISO 45001 Occupational Health and Safety Management System
ai tools demonstartion for schools and inter college
Complete React Javascript Course Syllabus.pdf
Adobe Illustrator 28.6 Crack My Vision of Vector Design
2025 Textile ERP Trends: SAP, Odoo & Oracle
Softaken Excel to vCard Converter Software.pdf
System and Network Administration Chapter 2
Which alternative to Crystal Reports is best for small or large businesses.pdf
Online Work Permit System for Fast Permit Processing
Essential Infomation Tech presentation.pptx
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
The Five Best AI Cover Tools in 2025.docx
medical staffing services at VALiNTRY
System and Network Administraation Chapter 3
How to Choose the Right IT Partner for Your Business in Malaysia
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises

Autonomous Systems: How to Address the Dilemma between Autonomy and Safety

  • 1. Autonomous Systems: How to Address the Dilemma between Autonomy and Safety Lionel Briand ASE 2022 Keynote http://www.lbriand.info
  • 3. Autonomous Systems ‱ Autonomy lies on a spectrum ‱ Fully autonomous systems Ăł advisory systems ‱ Open context: context not fully specified 3 Burton et al., 2019
  • 4. Why Autonomy? ‱ Reactivity to events, e.g., lunar rover ‱ Alleviate cognitive load in complex tasks. E.g., adaptive cruise control ‱ New missions without human physical constraints, e.g., pilots in fighter jets ‱ Complete specifications difficult to formalize: environmental complexity as well as continual change 4
  • 5. Autonomy and Machine Learning ‱ Machine learning (ML) helps with the implementation of the understanding and decision layers. ‱ ML: (1) infer input-output relationships from samples of intended behavior, (2) generalize the learned function to unknown and unforeseen inputs ‱ Risk: unanticipated responses and emergent system behavior ‱ Our focus: autonomous systems that use ML models 5
  • 6. Semantic Gap ‱ Gap between intended functionality and specified functionality (Burton et al., 2019). ‱ More challenging for autonomous systems ‱ Causes: - Complexity and unpredictability of the operational domain - E.g., Urban environments 6
  • 7. Semantic Gap ‱ Gap between intended functionality and specified functionality (Burton et al., 2019). ‱ More challenging for autonomous systems ‱ Causes: - Complexity and unpredictability of the system itself - E.g., Heterogeneous sensing channels 7
  • 8. Semantic Gap ‱ Gap between intended functionality and specified functionality (Burton et al., 2019). ‱ More challenging for autonomous systems ‱ Causes: - Transfer of decision function to the system - E.g., ML functions do not deliver clear-cut answers 8
  • 9. Paradox 10 ‱ “The problem of deriving a suitable specification of the intended behavior is instead transferred to the problem of demonstrating that the implemented (learned) behaviour meets the intent.” Burton et al. 2019 ‱ Main challenge: Definition and validation of adequate system safety requirements
  • 10. Assurance Cases ‱ A valid, evidence-based justification for a set of claims about the safety of a system for a given function over its operational context 11 Burton et al., 2019
  • 11. Example: Autonomous Taxiing 12 Asaadi et al., 2020 CNN: ‱ Cross track error (CTE) ‱ Heading Error (HE)
  • 12. Example: Autonomous Taxiing 13 ‱ Safety property: Lateral runway excursions ‱ Safety requirement: the aircraft CTE shall not exceed a specified offset whilst taxiing ‱ Hazard: aligning the aircraft nose with a different runway marking instead of the centerline ‱ Mitigation: retraining the ML component using training data augmented with appropriate test data exhibiting violations
  • 13. Assurance Case 14 GSN notation: Goals, strategies, evidence Structured arguments backed up by evidence Evidence: ‱ artifacts ‱ verification Asaadi et al., 2020
  • 14. Safety as a Control Problem 15 ‱ Safety should be treated as an emergent property of the system (Leveson, 2016) ‱ Paradigm Change:
  • 15. Run-time Assurance Architecture 16 ‱ HiL simulator and iron bird ‱ CNN outputs used for run-time monitoring ‱ Contingency actions Asaadi et al., 2020
  • 16. ML Challenges ‱ Robustness to adversarial or unexpected inputs ‱ Uncertainty of predictions ‱ Explainability: provides insights in a human interpretable way ‱ Verification: formal (proof) or experimental (test) 17 Tambon et al., 2022
  • 17. Safety Verification ‱ Ideally: ‱ Formal safety guarantees ‱ Practical assumptions (about input space, model, properties 
) ‱ Scalable (e.g., to large models and systems) ‱ But ‱ We cannot have it all (e.g., high dimensionality, many parameters) ‱ Safety is not a model property, rather a system one 18
  • 18. Safety Verification ‱ Formal analysis, e.g., reachability analysis ‱ Focus on robustness to perturbations ‱ Guarantees about models (not systems) under very restrictive assumptions (e.g., model architecture, input space shape, reasonable over-approximation of output space) ‱ Scalability issues ‱ Testing-based analysis, e.g., search-based testing ‱ Heuristics for input space exploration ‱ No guarantees, but no restrictive assumptions and more scalable ‱ Smart combination? 19
  • 19. ML-Related Assurance Questions ‱ Model level: Under which input conditions is an ML- component classifying/predicting (in)correctly? ‱ System level: Under which conditions is an ML-component (un)safe for the system? ‱ Integration level: Are there possible undesirable interactions among (ML) components? ‱ Context: No specifications or code for ML components, increasingly large models (DL, RL) 20
  • 20. Summary ‱ Autonomous safety functions can help achieve higher safety, e.g., emergency braking ‱ However, they also entail risks addressed by: (1) design-time assurance cases and (2) run-time safety monitoring play a key role to alleviate those risks ‱ We need automated support to collect appropriate evidence (1) and guide monitoring (2) are required in the context of ML- enabled autonomous systems. 21
  • 23. Test Inputs: Adversarial or Natural? ‱ Adversarial inputs: Focus on robustness, e.g., noise or attacks ‱ Natural inputs: Focus on functional aspects, e.g., functional safety 24
  • 24. Example: Key-points Detection ‱ DNNs used for key-points detection in images ‱ Many applications, e.g., face recognition ‱ Testing: Find test suite that causes DNN to poorly predict as many key- points as possible within time budget ‱ Images generated by a simulator 28 Ground truth Predicted
  • 25. Example Application ‱ Drowsiness or gaze detection based on interior camera monitoring the driver ‱ In the drowsiness or gaze detection problem, each Key-Point (KP) may be highly important for safety ‱ Each KP leads to a test objective ‱ For our subject DNN, we have 27 test objectives ‱ Goal: Cause the DNN to mispredict as many key-points as possible ‱ Solution: Many-objective search algorithms (based on genetic algorithms) combined with simulator 30 Ul Haq et al., 2021
  • 26. Overview 31 Input Generator (search) Simulator Input (vector) DNN Fitness Calculator Actual Key-points Positions Predicted Key-points Positions Fitness Score (Error Value) Most Critical Test Inputs Test Image
  • 27. Results ‱ Our approach is effective in generating test suites that cause the DNN to severely mispredict more than 93% of all key-points on average ‱ Not all mispredictions can be considered failures 
 (e.g., shadows) ‱ We must know when the DNN cannot be expected to be accurate and have contingency measures ‱ Some key-points are more severely predicted than others, detailed analysis revealed two reasons: ‱ Under-representation of some key-points (hidden) in the training data ‱ Large variation in the shape and size of the mouth across different 3D models (more training needed) 32
  • 28. Interpretation ‱ Regression trees to predict accuracy based on simulation parameters ‱ Enable detailed analysis to find the root causes of high Normalized Error (NE) values, e.g., shadow on the location of KP26 is the cause of high NE values ‱ Regression trees show excellent accuracy, reasonable size. ‱ Amenable to risk analysis, gaining useful safety insights, and contingency plans at run-time 33 Image Characteristics Condition NE ! = 9 ∧ # < 18.41 0.04 ! = 9 ∧ # ≄ 18.41 ∧ $ < −22.31 ∧ % < 17.06 0.26 ! = 9 ∧ # ≄ 18.41 ∧ $ < −22.31 ∧ 17.06 ≀ % < 19 0.71 ! = 9 ∧ # ≄ 18.41 ∧ $ < −22.31 ∧ % ≄ 19 0.36 Representative rules derived from the decision tree for KP26 (M: Model-ID, P: Pitch, R: Roll, Y: Yaw, NE: Normalized Error) (A) A test image satisfying the first condition (B) A test image satisfying the third condition NE = 0.013 NE = 0.89
  • 29. Real Images ‱ Test selection requires a different approach than with a simulator ‱ Labeling costs are significant 36
  • 30. DNN Structural Coverage Criteria 37 Chen et al. 2020 ‱ Neuron coverage ‱ How neurons are activated ‱ Many variants
  • 31. Limitations ‱ Require access to the DNN internals and sometimes the training set. Not realistic in many practical settings. ‱ There is a weak correlation between coverage and misclassification or poor predictions for natural inputs ‱ Also many questions regarding the studies focused on adversarial inputs 
 38
  • 32. Diversity-driven Test Selection ‱ Test inputs are real images ‱ Black-box approach based on measuring the diversity of test inputs. ‱ Scalable selection ‱ The more diverse, the more likely test inputs are to reveal faults 39 Agahababaeyan et al., 2022
  • 33. Geometric Diversity (GD) ‱ Given a dataset X and its corresponding feature vectors V, the geometric diversity of a subset S ⊆ X is defined as the hyper-volume of the parallelepiped spanned by the rows of Vs, i.e., feature vectors of items in S, where the larger the volume, the more diverse is the feature space of S 40
  • 34. Extracting Image Features ‱ VGG16 is a convolutional neural network trained on a subset of the ImageNet dataset, a collection of over 14 million images belonging to 22,000 categories. 41
  • 35. Correlation with Faults Correlation between geometric diversity and faults
  • 36. Pareto Front Optimization 44 ‱ Black-box test selection to detect as many diverse faults as possible for a given test budget ‱ Search-based Approach: Multi-Objective Genetic Search (NSGA-II) ‱ Two objectives (Max): ‱ diversity ‱ uncertainty (e.g., Gini) GD score Gini score
  • 37. HUDD 45 ‱ How to help perform risk analysis with real images? ‱ Rely on heatmaps to generate clusters of images leading to failures because of a same root cause (common characteristics of the images) ‱ enable engineers to ensure safety with introducing countermeasures Error-inducing Test Set images Step1. Heatmap based clustering Root cause clusters C1 C2 C3 Step 2. Inspection of subset of cluster elements. Fahmy et al. 2021
  • 38. ‱ Classification ‱ Gaze Detection 46 90 270 180 0 45 22.5 67.5 337.5 315 292.5 247.5 225 202.5 157.5 135 112.5 Top Center B o t t o m L e f t Bottom Center B o t t o m R i g h t Top Right Middle Right T o p L e f t Middle Left Example Application
  • 39. Clusters identify different problems Cluster 1 (angle ~157.5) borderline cases Cluster3 (near closed eyes) incomplete training set 49 Cluster 2 (eye middle center) incomplete set of classes
  • 40. SEDE: Simulator-based Explanations for DNN failurEs 50 ‱ Built on HUDD ‱ Generates readable descriptions of failure-inducing, real-world images ‱ Effective retraining ‱ Leverages availability of simulators Fahmy et al., 2022
  • 41. SAFE: Black-Box Approach 51 Retraining Features Extraction Detection of root causes Error inducing images Data Preprocessing Features Extraction Clustering Root cause clusters Dimensionality Reduction Unsafe-set Selection Retraining Improved DNN ‱ Black-box alternative to HUDD ‱ No need to extend the DNN (LRP) ‱ Reduces training time and memory usage ‱ More accurate clustering (density based, DBSCAN) Attaoui et al., 2022
  • 43. Reward and Functional Faults 54 ‱ Testing goal: Detect and explain reward and functional faults ‱ A pole is attached to a cart, and the goal is to move the cart right and left to keep the pole from falling. ‱ Reward fault: the accumulated reward is less than a threshold, e.g., the pole falls down in the first 70 timesteps ‱ Functional fault: the pole is stable and, regardless of the accumulated reward, the cart moves beyond the 2.4 limited distance, and the episode terminates Functional fault Reward fault
  • 44. STARLA: Objectives ‱ Detect as quickly as possible diverse faulty episodes ‱ Accurately characterize faulty episodes ‱ STARLA: Search and ML based 55 Zolfagharian et al. 2022
  • 46. Testing via Physics-based Simulation 58 ADAS (SUT) Simulator (Matlab/Simulink) Model (Matlab/Simulink) â–Ș Physical plant (vehicle / sensors / actuators) â–Ș Other cars â–Ș Pedestrians â–Ș Environment (weather / roads / traïŹƒc signs) Test input Test output time-stamped output
  • 47. Reinforcement Learning for ADAS Testing ‱ Use RL to change the environment in a realistic manner with the objective of triggering safety violations 59 Action (Behavior of environment actors, e.g., vehicle in front) Reward (e.g., based on distance from vehicle in front) State/Next State (Locations and conditions about the environment, e.g., collision) RL Agent RL-Environment Simulator (CARLA) Control (Image, location,
) ADAS
  • 48. Example Violation ‱ Violation: Ego Vehicle should not collide with other vehicle ‱ Vehicle-in-front slows down suddenly and then moves to the right ‱ Possible reason: Agent was not trained with such episodes 61 Car View Top View
  • 49. System Testing Challenges 62 To address the challenges, we propose SAMOTA (Surrogate-Assisted Many-Objective Testing Approach) by leveraging many-objective search and Surrogate Models (SMs) Ul Haq et al. 2022 Large Input Space Many Safety Requirements Computationally- intensive Simulation
  • 50. Surrogate Models ‱ Surrogate model: Model that mimics the simulator, to a certain extent, while being much less computationally expensive ‱ Research: Combine search with surrogate modeling to decrease the computational cost of testing (Ul Haq et al. 2022) 63 Polynomial Regression (PR) Radial Basis Function (RBF) Kringing (KR)
  • 51. SAMOTA 64 Steps: 1. Initialization 2. Global search 3. Local search 4. Glocal search 5. Local search 6. 
 Execute Simulator Global SMs Many Objective Search Algorithm Most Critical Test Cases Most Uncertain Test Cases Global Search Initialisation Execute Simulator Database Minimal Test Suite Local SM per Cluster Local Search Most Critical Test Cases Single Objective Search Algorithm Clustering Top Points
  • 52. ML Impact on System Safety ‱ Inherent uncertainty in ML models ‱ Test mitigation mechanisms in systems to handle misclassifications or mispredictions ‱ Goal: We want to learn, as accurately as possible, when an ML component leads to system safety violations in terms of inputs and outputs ‱ Applications: This is expected to help guide and focus the testing of ML components or implement safety monitors for them. 65
  • 53. Learn the Safety Envelope Monitoring: How far are we from the safety envelop for a given violation probability? 66 input output Safe Unsafe Learnt envelope
  • 55. Safety Analysis ‱ The adoption of machine learning to enable autonomy raises new or amplify existing safety challenges ‱ Assurance cases (1) + run-time assurance architecture (2) ‱ Automated test support to collect evidence (1) and learn safety conditions to monitor (2) ‱ Approaches based on metaheuristic search and machine learning have shown to be practical, effective, and reasonably efficient ‱ Limited industrial experience properly reported on these topics 68
  • 56. Automated Testing ‱ Remains key mechanism to provide safety evidence ‱ Different levels of testing: model, integration, system ‱ Automation is key at all levels and requires different strategies ‱ Strategy: evolutionary computing and machine learning ‱ Scalability is the main challenge: simulations, surrogate models, 
 ‱ More work needed beyond stateless models (DNN): reinforcement learning 
 69
  • 57. Autonomous Systems: How to Address the Dilemma between Autonomy and Safety Lionel Briand ASE 2022 Keynote http://www.lbriand.info
  • 59. Selected References ‱ Ul Haq et al. "Automatic Test Suite Generation for Key-points Detection DNNs Using Many-Objective Search" ACM International Symposium on Software Testing (ISSTA), 2021 ‱ Ul Haq et al., “Efficient Online Testing for DNN-Enabled Systems using Surrogate-Assisted and Many-Objective Optimization” IEEE/ACM ICSE 2022 ‱ Fahmy et al. "Supporting DNN Safety Analysis and Retraining through Heatmap-based Unsupervised Learning" IEEE Transactions on Reliability, Special section on Quality Assurance of Machine Learning Systems, 2021 ‱ Fahmy et al. "Simulator-based explanation and debugging of hazard-triggering events in DNN-based safety-critical systems”, ArXiv report, https://arxiv.org/pdf/2204.00480.pdf, ACM TOSEM, 2022 ‱ Attaoui et al., “Black-box Safety Analysis and Retraining of DNNs based on Feature Extraction and Clustering”, ArXiv report, https://arxiv.org/pdf/2201.05077v1.pdf, ACM TOSEM, 2022 ‱ Aghababaeyan et al., “Black-Box Testing of Deep Neural Networks through Test Case Diversity”, https://arxiv.org/pdf/2112.12591.pdf ‱ Zolfagharian et al., “Search-Based Testing Approach for Deep Reinforcement Learning Agents”, https://arxiv.org/pdf/2206.07813.pdf 73
  • 60. Selected Safety References ‱ Asaadi et al., “Assured Integration of Machine Learning- basedAutonomy on Aviation Platforms”, 2020 AIAA/IEEE 39th Digital Avionics Systems Conference (DASC) ‱ Burton et al., “Mind the gaps: Assuring the safety of autonomous systems from an engineering, ethical, and legal perspective”, Artificial Intelligence (Elsevier), 2019 ‱ Tambon et al., “How to certify machine learning based safety-critical systems? A systematic literature review”, Automated Software Engineering (Springer), 2022 ‱ Leveson, “Engineering a safer world: Systems thinking applied to safety”, The MIT Press, 2016. 74
  • 61. Selected ML Testing References ‱ Goodfellow et al. "Explaining and harnessing adversarial examples." arXiv preprint arXiv:1412.6572 (2014). ‱ Zhang et al. "DeepRoad: GAN-based metamorphic testing and input validation framework for autonomous driving systems." In 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE), 2018. ‱ Tian et al. "DeepTest: Automated testing of deep-neural-network-driven autonomous cars." In Proceedings of the 40th international conference on software engineering, 2018. ‱ Li et al. “Structural Coverage Criteria for Neural Networks Could Be Misleading”, IEEE/ACM 41st International Conference on Software Engineering: New Ideas and Emerging Results (NIER) ‱ Kim et al. "Guiding deep learning system testing using surprise adequacy." In IEEE/ACM 41st International Conference on Software Engineering (ICSE), 2019. ‱ Ma et al. "DeepMutation: Mutation testing of deep learning systems." In 2018 IEEE 29th International Symposium on Software Reliability Engineering (ISSRE), 2018. ‱ Zhang et al. "Machine learning testing: Survey, landscapes and horizons." IEEE Transactions on Software Engineering (2020). ‱ Riccio et al. "Testing machine learning based systems: a systematic mapping." Empirical Software Engineering 25, no. 6 (2020) ‱ Gerasimou et al., “Importance-Driven Deep Learning System Testing”, IEEE/ACM 42nd International Conference on Software Engineering, 2020 75