SlideShare a Scribd company logo
Robustness in Deep Learning
Murari Mandal
Postdoctoral Researcher
National University of Singapore (NUS)
https://murarimandal.github.io
“robustness”?
- the ability to withstand or overcome adverse conditions or rigorous
testing.
• Are the current deep learning models robust?
• Adversarial example: An input data point that is slightly
perturbed by an adversarial perturbation causing failure in
the deep learning system.
Robustness
The AI Breakthroughs
Vinyals et al. “Grandmaster level in StarCraft II using
multi-agent reinforcement learning”
Redmon et al. “YOLO9000: Better, Faster, Stronger”
https://github.com/facebookresearch/detectron2
Higher Stakes?
Tang et al. “Data Valuation for Medical Imaging Using Shapley Value: Application on A Large-scale Chest X-ray dataset”
https://scale.com/
Autonomous Driving
Eijgelaar et al. “Robust Deep Learning–based Segmentation
of Glioblastoma on Routine Clinical MRI Scans…”
Better Performance!
Performance of winning entries in the ImageNet
from 2011 to 2017 in the image classification
task.
Liu et al. “Deep Learning for Generic Object Detection: A Survey”
Evolution of object detection performance on
COCO (Test-Dev results)
• The degrees of robustness or adaptability is quite low!
• Human Perception Vs Machine/Deep Learning
Performance?
Are the Models Robust?
Results of different patches, trained on COCO, tested on the person category of different datasets.
Wu et al. “Making an Invisibility Cloak: Real World Adversarial Attacks on Object Detectors”
Adversarial samples Clean samples
• Deep neural networks have been shown to be vulnerable to
adversarial examples.
• Maliciously perturbed inputs that cause DNNs to produce
incorrect predictions.
Adversarial Attacks
Madry et al.
Goodfellow et al. “Explaining
and Harnessing Adversarial
Examples”
• Adversarial robustness poses a significant challenge for the
deployment of ML-based systems.
• Specially safety- and security-critical environments like
autonomous driving, disease detection or unmanned aerial
vehicles, etc.
Adversarial Attacks
Joysua Rao "Robust Machine Learning Algorithms and Systems for Detection and Mitigation of Adversarial Attacks and Anomalies”
• How to fool a machine learning model?
• How to create the adversarial perturbation? Threat model
• What is the attack strategy for the perturbation at hand?
Attack Strategy
Adversarial Attacks
• What are the desired consequences of the adversarial
perturbation?
• Untargeted (Non-targeted): As many misclassifications as
possible. No preference concerning the appearing classes in the
adversarial output.
• Static Target: Fixed classification output. Example: Forcing the
model to output one fixed image of an empty street without any
pedestrians or cars in sight.
• Dynamic Target: Keep the output unchanged with the exception
of removing certain target classes. Example: Removing the
pedestrian class in every possible traffic situation.
• Confusing Target (Confusion): Change the position or size of
certain target classes. Example: Reduces the size of pedestrians
and in this way leads to a false sense of distance.
Adversarial Attacks: Threat Model
Adversarial Attacks: Threat Model
Assion et al. "The Attack Generator: A Systematic Approach Towards Constructing Adversarial Attacks"
Yuan et al. "Adversarial Examples: Attacks and Defenses for Deep Learning"
• Perturbation Scope:
• Individual Scope: Attack is designed for one specific input image. It
is not necessary that the same perturbation fools the ML system on
other data points.
• Contextual Scope: Image agnostic perturbation that causes label
changes for one or more specific contextual situations. Example,
traffic, rain, lighting change, camera angles, etc.
• Universal Scope: Image agnostic perturbation that causes label
changes for a significant part of the true data distribution with no
explicit contextual dependencies.
Adversarial Attacks: Threat Model
• Perturbation Scope:
Adversarial Attacks: Threat Model
Shen et al. "AdvSPADE: Realistic Unrestricted Attacks for Semantic Segmentation"
• Perturbation Imperceptibility:
• Lp-based Imperceptibility: Small changes with respect to some Lp-
norm, the changes should be imperceptible to human eyes.
• Attention-based Imperceptibility: Wasserstein distance, SSIM or
other metric based imperceptibility.
• Output Imperceptibility: The classification output is imperceptible
to the human observer.
• Detector Imperceptibility: A predefined selection of software-based
detection systems is not able to detect irregularities in the input,
output or in the activation patterns of the ML module caused by
the adversarial perturbation.
Adversarial Attacks: Threat Model
• Perturbation Imperceptibility:
Adversarial Attacks: Threat Model
Shen et al. "AdvSPADE: Realistic Unrestricted Attacks for Semantic Segmentation"
• Model Knowledge:
• White-box: Full knowledge of the model internals: architecture,
parameters, weight configurations, training strategy.
• Output-transparent Black-box: No access to model parameters. But
can observe the class probabilities or output logits of the module.
• Query-limited Black-box: Access to the full or parts of the module’s
output on a limited number of inputs or with a limited frequency.
• Label-only Black-box: Only access to the full or parts of the final
classification/regression decisions of the system.
• (Full) Black-box: No access to the model of any kind.
Adversarial Attacks: Threat Model
• Data Knowledge:
• Training Data: Access to full of significant part of training data
• Surrogate Data: No direct access. But data points can be collected
from the relevant underlying data distribution.
• Adversary Capability:
• Digital Data Feed (Direct Data Feed): The attacker can directly feed
digital input to the model.
• Physical Data Feed: Creates physical perturbations in the
environment.
• Spatial Constraint: Only influence limited areas of the input data.
Adversarial Attacks: Threat Model
• Model Basis: Which model is used by the attack?
• Victim Model: Use the victim model to calculate adversarial
perturbations.
• Surrogate Model: Use a surrogate model or a different model.
• Data Basis: What data is used by the attack?
• Training Data: Original training data set are given to the adversarial
attack.
• Surrogate Data: Data related to the underlying data distribution of
the task.
• No Data: Attack works with images that are not samples of the
present data distribution.
Adversarial Attacks: Attack Strategy
• Optimization Method:
• First-order Methods: Exploit perturbation directions given by exact
or approximate (sub-)gradients.
• Second-order Methods: Based on the calculation of the Hessian
matrix or approximations of the Hessian matrix.
• Evolution & Random Sampling: The adversarial attack generates
possible perturbations by sampling distributions and combining
promising candidates.
Adversarial Attacks: Attack Strategy
• Some of the representative approaches for generating
adversarial examples
• Fast Gradient Sign Method (FGSM)
• Basic Iterative Method (BIM)
• Iterative Least-Likely Class Method (ILLC)
• Jacobian-based Saliency Map Attack (JSMA)
• DeepFool
• CPPN EA Fool
• Projected Gradient Descent (PGD)
• Carlini and Wagner (C&W) attack
• Adversarial patch attack
Adversarial Attacks: Attack Strategy
Attacks on Image Classification
Duan et al. “Adversarial Camouflage: Hiding
Physical-World Attacks with Natural Styles”
Attacks on Image Classification
Lu et al. "Enhancing Cross-Task Black-Box Transferability of
Adversarial Examples with Dispersion Reduction
https://openai.com/blog/multimodal-neurons/
Shamsabadi, et al. “ColorFool Semantic
Adversarial Colorization”
Attacks on Image Classification
Kantipudi et al. “Color Channel Perturbation Attacks for Fooling Convolutional Neural Networks and A Defense Against Such Attacks”
Attacks on Object Detector
Xu et al. "Adversarial T-shirt! Evading Person Detectors in A Physical World"
Attacks on Object Detector
Xu et al. "Adversarial T-shirt! Evading Person Detectors in A Physical World"
Zhang et al. "Contextual Adversarial Attacks for Object Detection"
Duan et al. "Adversarial Camouflage: Hiding Physical-
World Attacks with Natural Styles"
Eykholt et al. “Physical Adversarial Examples for Object Detectors”
The poster attack on Yolov2
Attacks on Object Detector
Eykholt et al. “Physical Adversarial Examples for Object Detectors”
The sticker attack on Yolov2
Attacks on Object Detector
The YOLOv2 detector is evaded using a pattern trained on the COCO dataset with a
carefully constructed objective.
Wu et al. “Making an Invisibility Cloak: Real World Adversarial Attacks on Object Detectors”
Attacks on Object Detector
• Semantic segmentation networks are harder to break.
• Due their multi-scale encoder decoder structure and output
as per pixel probability instead of just probability score for
the whole image.
Attacks on Semantic Segmentation
Attacks on Semantic Segmentation
Shen et al. "AdvSPADE: Realistic Unrestricted Attacks for Semantic Segmentation"
Why do adversarial examples exist?
Ilyas et al. “Adversarial examples are not bugs, they are features”
Adversarial examples can be attributed to the presence of non-robust features
• We can use the knowledge about the adversarial attacks to
improve the model robustness.
• Why to evaluate the robustness?
• To defend against an adversary who will attack the system.
• For example, an attacker may wish to cause a self-driving car to
incorrectly recognize road signs.
• Cause an NSFW detector to incorrectly recognize an image as safe-
for-work.
• Cause a malware (or spam) classifier to identify a malicious file (or
spam email) as benign.
• Cause an ad-blocker to incorrectly identify an advertisement as
natural content
• Cause a digital assistant to incorrectly recognize commands it is
given.
Adversarial Robustness
• To test the worst-case robustness of machine learning
algorithms.
• Many real-world environments have inherent randomness that is
difficult to predict.
• Analyzing the worst-case robustness will cover minor perturbation
cases.
• To measure progress of machine learning algorithms
towards human-level abilities.
• In terms of normal performance, Gap is <<<< between
Human Vs Machine.
• In adversarial robustness, Gap >>>> between Human Vs
Machine.
Adversarial Robustness
• Reactive defenses: Preprocessing techniques, detection for
adversarial samples.
• Detection of adversarial examples
• Input transformations (preprocessing)
• Obfuscation defenses: Try to hide or obfuscate sensitive
traits of a model (e.g. gradients) to alleviate the impact of
adversarial examples.
• Gradient masking
Defense Against Adversarial Attacks
• Proactive defenses: Build and train models natively robust
to adversarial perturbations.
• Adversarial training
• Architectural defenses
• Learning in a min-max setting
• Hyperparameter tuning
• Generative models (GAN) based defense
• Provable adversarial defenses
• What is missing?
• A uniform protocol for defense evaluation
Defense Against Adversarial Attacks
• Protect your Identity in public places.
Adversarial Attacks & Privacy?
https://www.inovex.de/blog/machine-perception-face-recognition/
• Stopping unauthorized exploitation of personal data for
training commercial models.
• Protect your privacy.
• Can data be made unlearnable for deep learning models?
Adversarial Attacks & Privacy?
Huang et al. “Unlearnable Examples: Making Personal Data Unexploitable”
• Adversarial Attacks and defense– A very important
challenge for AI research.
• The existence of adversarial cases depend on the
applications – classification, detection, segmentation, etc.
• How many adversarial samples are out there? Impossible to
know.
• Need to revisit the current practice of reporting standard
performance. Adversarial robust performance matters!
• Robustness of ML/DL models must be evaluated with
adversarial examples.
• Adversarial attacks for a good cause – improving privacy.
Takeaways
• Grebner et al. “The Attack Generator: A Systematic Approach Towards Constructing
Adversarial Attacks”
• Arnab et al. "On the Robustness of Semantic Segmentation Models to Adversarial Attacks“
• Liu et al. “Deep Learning for Generic Object Detection: A Survey”
• Wu et al. “Making an Invisibility Cloak: Real World Adversarial Attacks on Object
Detectors”
• Assion et al. "The Attack Generator: A Systematic Approach Towards Constructing
Adversarial Attacks"
• Shen et al. "AdvSPADE: Realistic Unrestricted Attacks for Semantic Segmentation"
• Xu et al. "Adversarial T-shirt! Evading Person Detectors in A Physical World"
• Duan et al. "Adversarial Camouflage: Hiding Physical-World Attacks with Natural Styles“
• Serban et al. "Adversarial Examples - A Complete Characterisation of the Phenomenon“
References
Thank You!

More Related Content

PDF
Adversarial Attacks on A.I. Systems — NextCon, Jan 2019
PPTX
Adversarial Attacks and Defense
PDF
Research of adversarial example on a deep neural network
ODP
Simple Introduction to AutoEncoder
PPTX
Autoencoders in Deep Learning
PPTX
Introduction to Grad-CAM (complete version)
PDF
Introduction to Diffusion Models
PDF
Deep Learning - Convolutional Neural Networks
Adversarial Attacks on A.I. Systems — NextCon, Jan 2019
Adversarial Attacks and Defense
Research of adversarial example on a deep neural network
Simple Introduction to AutoEncoder
Autoencoders in Deep Learning
Introduction to Grad-CAM (complete version)
Introduction to Diffusion Models
Deep Learning - Convolutional Neural Networks

What's hot (20)

PPT
2.5 backpropagation
PPTX
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
PDF
Robustness of Deep Neural Networks
PDF
Adversarial Attacks and Defenses in Deep Learning.pdf
PDF
Using Machine Learning in Networks Intrusion Detection Systems
PDF
Transfer Learning: An overview
PPTX
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...
PDF
Introduction to Neural Networks
PPTX
Memebership inference attacks against machine learning models
PDF
Intepretability / Explainable AI for Deep Neural Networks
PDF
Machine Learning for Dummies
PDF
Generative Adversarial Networks
PPTX
Image classification with Deep Neural Networks
PPTX
Backpropagation And Gradient Descent In Neural Networks | Neural Network Tuto...
PPTX
Support Vector Machine ppt presentation
PDF
Supervised learning
PDF
(2017/06)Practical points of deep learning for medical imaging
PDF
Autoencoders
PPTX
Decision Tree Algorithm With Example | Decision Tree In Machine Learning | Da...
PPTX
K MEANS CLUSTERING
2.5 backpropagation
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
Robustness of Deep Neural Networks
Adversarial Attacks and Defenses in Deep Learning.pdf
Using Machine Learning in Networks Intrusion Detection Systems
Transfer Learning: An overview
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...
Introduction to Neural Networks
Memebership inference attacks against machine learning models
Intepretability / Explainable AI for Deep Neural Networks
Machine Learning for Dummies
Generative Adversarial Networks
Image classification with Deep Neural Networks
Backpropagation And Gradient Descent In Neural Networks | Neural Network Tuto...
Support Vector Machine ppt presentation
Supervised learning
(2017/06)Practical points of deep learning for medical imaging
Autoencoders
Decision Tree Algorithm With Example | Decision Tree In Machine Learning | Da...
K MEANS CLUSTERING
Ad

Similar to Robustness in deep learning (20)

PDF
aml.pdf
PPTX
Interactive ML.pptx XAI by Radhika selvamani
PDF
Microsoft Research Faculty Summit - AI and Security
PPTX
Subverting Machine Learning Detections for fun and profit
PPTX
What Is Adversarial Machine Learning.pptx
PDF
Machine Duping 101: Pwning Deep Learning Systems
PDF
Autonomous Systems: How to Address the Dilemma between Autonomy and Safety
PDF
Adversarial ML - Part 1.pdf
PDF
Bringing Red vs. Blue to Machine Learning
PPTX
Group 10 - DNN Presentation for UOM.pptx
PDF
Presentation by Lionel Briand
PDF
DEF CON 24 - Clarence Chio - machine duping 101
PPTX
Adversarial Training is all you Need.pptx
PPTX
Applicability issues of Evasion-Based Adversarial Attacks and Mitigation Tech...
PDF
Adversarial ML - Part 2.pdf
PDF
AI model security and robustness
PDF
Testing Machine Learning-enabled Systems: A Personal Perspective
PPTX
Adversarial Machine Learning in Cybersecurity.pptx
PDF
PRACTICAL ADVERSARIAL ATTACKS AGAINST CHALLENGING MODELS ENVIRONMENTS - Moust...
PDF
Automated Testing and Safety Analysis of Deep Neural Networks
aml.pdf
Interactive ML.pptx XAI by Radhika selvamani
Microsoft Research Faculty Summit - AI and Security
Subverting Machine Learning Detections for fun and profit
What Is Adversarial Machine Learning.pptx
Machine Duping 101: Pwning Deep Learning Systems
Autonomous Systems: How to Address the Dilemma between Autonomy and Safety
Adversarial ML - Part 1.pdf
Bringing Red vs. Blue to Machine Learning
Group 10 - DNN Presentation for UOM.pptx
Presentation by Lionel Briand
DEF CON 24 - Clarence Chio - machine duping 101
Adversarial Training is all you Need.pptx
Applicability issues of Evasion-Based Adversarial Attacks and Mitigation Tech...
Adversarial ML - Part 2.pdf
AI model security and robustness
Testing Machine Learning-enabled Systems: A Personal Perspective
Adversarial Machine Learning in Cybersecurity.pptx
PRACTICAL ADVERSARIAL ATTACKS AGAINST CHALLENGING MODELS ENVIRONMENTS - Moust...
Automated Testing and Safety Analysis of Deep Neural Networks
Ad

More from Ganesan Narayanasamy (20)

PDF
Empowering Engineering Faculties: Bridging the Gap with Emerging Technologies
PDF
Chip Design Curriculum development Residency program
PDF
Basics of Digital Design and Verilog
PDF
180 nm Tape out experience using Open POWER ISA
PDF
Workload Transformation and Innovations in POWER Architecture
PDF
OpenPOWER Workshop at IIT Roorkee
PDF
Deep Learning Use Cases using OpenPOWER systems
PDF
IBM BOA for POWER
PDF
OpenPOWER System Marconi100
PDF
OpenPOWER Latest Updates
PDF
POWER10 innovations for HPC
PDF
Deeplearningusingcloudpakfordata
PDF
OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...
PDF
AI in healthcare and Automobile Industry using OpenPOWER/IBM POWER9 systems
PDF
AI in healthcare - Use Cases
PDF
AI in Health Care using IBM Systems/OpenPOWER systems
PDF
AI in Healh Care using IBM POWER systems
PDF
Poster from NUS
PDF
SAP HANA on POWER9 systems
PPTX
Graphical Structure Learning accelerated with POWER9
Empowering Engineering Faculties: Bridging the Gap with Emerging Technologies
Chip Design Curriculum development Residency program
Basics of Digital Design and Verilog
180 nm Tape out experience using Open POWER ISA
Workload Transformation and Innovations in POWER Architecture
OpenPOWER Workshop at IIT Roorkee
Deep Learning Use Cases using OpenPOWER systems
IBM BOA for POWER
OpenPOWER System Marconi100
OpenPOWER Latest Updates
POWER10 innovations for HPC
Deeplearningusingcloudpakfordata
OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...
AI in healthcare and Automobile Industry using OpenPOWER/IBM POWER9 systems
AI in healthcare - Use Cases
AI in Health Care using IBM Systems/OpenPOWER systems
AI in Healh Care using IBM POWER systems
Poster from NUS
SAP HANA on POWER9 systems
Graphical Structure Learning accelerated with POWER9

Recently uploaded (20)

PDF
Advanced Soft Computing BINUS July 2025.pdf
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
GDG Cloud Iasi [PUBLIC] Florian Blaga - Unveiling the Evolution of Cybersecur...
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PDF
NewMind AI Monthly Chronicles - July 2025
PDF
Approach and Philosophy of On baking technology
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PPTX
Big Data Technologies - Introduction.pptx
PDF
cuic standard and advanced reporting.pdf
PDF
Spectral efficient network and resource selection model in 5G networks
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
solutions_manual_-_materials___processing_in_manufacturing__demargo_.pdf
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
Advanced Soft Computing BINUS July 2025.pdf
NewMind AI Weekly Chronicles - August'25 Week I
GDG Cloud Iasi [PUBLIC] Florian Blaga - Unveiling the Evolution of Cybersecur...
Mobile App Security Testing_ A Comprehensive Guide.pdf
Chapter 3 Spatial Domain Image Processing.pdf
MYSQL Presentation for SQL database connectivity
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
CIFDAQ's Market Insight: SEC Turns Pro Crypto
NewMind AI Monthly Chronicles - July 2025
Approach and Philosophy of On baking technology
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
Big Data Technologies - Introduction.pptx
cuic standard and advanced reporting.pdf
Spectral efficient network and resource selection model in 5G networks
Understanding_Digital_Forensics_Presentation.pptx
solutions_manual_-_materials___processing_in_manufacturing__demargo_.pdf
Advanced methodologies resolving dimensionality complications for autism neur...

Robustness in deep learning

  • 1. Robustness in Deep Learning Murari Mandal Postdoctoral Researcher National University of Singapore (NUS) https://murarimandal.github.io
  • 2. “robustness”? - the ability to withstand or overcome adverse conditions or rigorous testing. • Are the current deep learning models robust? • Adversarial example: An input data point that is slightly perturbed by an adversarial perturbation causing failure in the deep learning system. Robustness
  • 3. The AI Breakthroughs Vinyals et al. “Grandmaster level in StarCraft II using multi-agent reinforcement learning” Redmon et al. “YOLO9000: Better, Faster, Stronger” https://github.com/facebookresearch/detectron2
  • 4. Higher Stakes? Tang et al. “Data Valuation for Medical Imaging Using Shapley Value: Application on A Large-scale Chest X-ray dataset” https://scale.com/ Autonomous Driving Eijgelaar et al. “Robust Deep Learning–based Segmentation of Glioblastoma on Routine Clinical MRI Scans…”
  • 5. Better Performance! Performance of winning entries in the ImageNet from 2011 to 2017 in the image classification task. Liu et al. “Deep Learning for Generic Object Detection: A Survey” Evolution of object detection performance on COCO (Test-Dev results)
  • 6. • The degrees of robustness or adaptability is quite low! • Human Perception Vs Machine/Deep Learning Performance? Are the Models Robust? Results of different patches, trained on COCO, tested on the person category of different datasets. Wu et al. “Making an Invisibility Cloak: Real World Adversarial Attacks on Object Detectors” Adversarial samples Clean samples
  • 7. • Deep neural networks have been shown to be vulnerable to adversarial examples. • Maliciously perturbed inputs that cause DNNs to produce incorrect predictions. Adversarial Attacks Madry et al. Goodfellow et al. “Explaining and Harnessing Adversarial Examples”
  • 8. • Adversarial robustness poses a significant challenge for the deployment of ML-based systems. • Specially safety- and security-critical environments like autonomous driving, disease detection or unmanned aerial vehicles, etc. Adversarial Attacks Joysua Rao "Robust Machine Learning Algorithms and Systems for Detection and Mitigation of Adversarial Attacks and Anomalies”
  • 9. • How to fool a machine learning model? • How to create the adversarial perturbation? Threat model • What is the attack strategy for the perturbation at hand? Attack Strategy Adversarial Attacks
  • 10. • What are the desired consequences of the adversarial perturbation? • Untargeted (Non-targeted): As many misclassifications as possible. No preference concerning the appearing classes in the adversarial output. • Static Target: Fixed classification output. Example: Forcing the model to output one fixed image of an empty street without any pedestrians or cars in sight. • Dynamic Target: Keep the output unchanged with the exception of removing certain target classes. Example: Removing the pedestrian class in every possible traffic situation. • Confusing Target (Confusion): Change the position or size of certain target classes. Example: Reduces the size of pedestrians and in this way leads to a false sense of distance. Adversarial Attacks: Threat Model
  • 11. Adversarial Attacks: Threat Model Assion et al. "The Attack Generator: A Systematic Approach Towards Constructing Adversarial Attacks" Yuan et al. "Adversarial Examples: Attacks and Defenses for Deep Learning"
  • 12. • Perturbation Scope: • Individual Scope: Attack is designed for one specific input image. It is not necessary that the same perturbation fools the ML system on other data points. • Contextual Scope: Image agnostic perturbation that causes label changes for one or more specific contextual situations. Example, traffic, rain, lighting change, camera angles, etc. • Universal Scope: Image agnostic perturbation that causes label changes for a significant part of the true data distribution with no explicit contextual dependencies. Adversarial Attacks: Threat Model
  • 13. • Perturbation Scope: Adversarial Attacks: Threat Model Shen et al. "AdvSPADE: Realistic Unrestricted Attacks for Semantic Segmentation"
  • 14. • Perturbation Imperceptibility: • Lp-based Imperceptibility: Small changes with respect to some Lp- norm, the changes should be imperceptible to human eyes. • Attention-based Imperceptibility: Wasserstein distance, SSIM or other metric based imperceptibility. • Output Imperceptibility: The classification output is imperceptible to the human observer. • Detector Imperceptibility: A predefined selection of software-based detection systems is not able to detect irregularities in the input, output or in the activation patterns of the ML module caused by the adversarial perturbation. Adversarial Attacks: Threat Model
  • 15. • Perturbation Imperceptibility: Adversarial Attacks: Threat Model Shen et al. "AdvSPADE: Realistic Unrestricted Attacks for Semantic Segmentation"
  • 16. • Model Knowledge: • White-box: Full knowledge of the model internals: architecture, parameters, weight configurations, training strategy. • Output-transparent Black-box: No access to model parameters. But can observe the class probabilities or output logits of the module. • Query-limited Black-box: Access to the full or parts of the module’s output on a limited number of inputs or with a limited frequency. • Label-only Black-box: Only access to the full or parts of the final classification/regression decisions of the system. • (Full) Black-box: No access to the model of any kind. Adversarial Attacks: Threat Model
  • 17. • Data Knowledge: • Training Data: Access to full of significant part of training data • Surrogate Data: No direct access. But data points can be collected from the relevant underlying data distribution. • Adversary Capability: • Digital Data Feed (Direct Data Feed): The attacker can directly feed digital input to the model. • Physical Data Feed: Creates physical perturbations in the environment. • Spatial Constraint: Only influence limited areas of the input data. Adversarial Attacks: Threat Model
  • 18. • Model Basis: Which model is used by the attack? • Victim Model: Use the victim model to calculate adversarial perturbations. • Surrogate Model: Use a surrogate model or a different model. • Data Basis: What data is used by the attack? • Training Data: Original training data set are given to the adversarial attack. • Surrogate Data: Data related to the underlying data distribution of the task. • No Data: Attack works with images that are not samples of the present data distribution. Adversarial Attacks: Attack Strategy
  • 19. • Optimization Method: • First-order Methods: Exploit perturbation directions given by exact or approximate (sub-)gradients. • Second-order Methods: Based on the calculation of the Hessian matrix or approximations of the Hessian matrix. • Evolution & Random Sampling: The adversarial attack generates possible perturbations by sampling distributions and combining promising candidates. Adversarial Attacks: Attack Strategy
  • 20. • Some of the representative approaches for generating adversarial examples • Fast Gradient Sign Method (FGSM) • Basic Iterative Method (BIM) • Iterative Least-Likely Class Method (ILLC) • Jacobian-based Saliency Map Attack (JSMA) • DeepFool • CPPN EA Fool • Projected Gradient Descent (PGD) • Carlini and Wagner (C&W) attack • Adversarial patch attack Adversarial Attacks: Attack Strategy
  • 21. Attacks on Image Classification Duan et al. “Adversarial Camouflage: Hiding Physical-World Attacks with Natural Styles”
  • 22. Attacks on Image Classification Lu et al. "Enhancing Cross-Task Black-Box Transferability of Adversarial Examples with Dispersion Reduction https://openai.com/blog/multimodal-neurons/ Shamsabadi, et al. “ColorFool Semantic Adversarial Colorization”
  • 23. Attacks on Image Classification Kantipudi et al. “Color Channel Perturbation Attacks for Fooling Convolutional Neural Networks and A Defense Against Such Attacks”
  • 24. Attacks on Object Detector Xu et al. "Adversarial T-shirt! Evading Person Detectors in A Physical World"
  • 25. Attacks on Object Detector Xu et al. "Adversarial T-shirt! Evading Person Detectors in A Physical World" Zhang et al. "Contextual Adversarial Attacks for Object Detection" Duan et al. "Adversarial Camouflage: Hiding Physical- World Attacks with Natural Styles"
  • 26. Eykholt et al. “Physical Adversarial Examples for Object Detectors” The poster attack on Yolov2 Attacks on Object Detector
  • 27. Eykholt et al. “Physical Adversarial Examples for Object Detectors” The sticker attack on Yolov2 Attacks on Object Detector
  • 28. The YOLOv2 detector is evaded using a pattern trained on the COCO dataset with a carefully constructed objective. Wu et al. “Making an Invisibility Cloak: Real World Adversarial Attacks on Object Detectors” Attacks on Object Detector
  • 29. • Semantic segmentation networks are harder to break. • Due their multi-scale encoder decoder structure and output as per pixel probability instead of just probability score for the whole image. Attacks on Semantic Segmentation
  • 30. Attacks on Semantic Segmentation Shen et al. "AdvSPADE: Realistic Unrestricted Attacks for Semantic Segmentation"
  • 31. Why do adversarial examples exist? Ilyas et al. “Adversarial examples are not bugs, they are features” Adversarial examples can be attributed to the presence of non-robust features
  • 32. • We can use the knowledge about the adversarial attacks to improve the model robustness. • Why to evaluate the robustness? • To defend against an adversary who will attack the system. • For example, an attacker may wish to cause a self-driving car to incorrectly recognize road signs. • Cause an NSFW detector to incorrectly recognize an image as safe- for-work. • Cause a malware (or spam) classifier to identify a malicious file (or spam email) as benign. • Cause an ad-blocker to incorrectly identify an advertisement as natural content • Cause a digital assistant to incorrectly recognize commands it is given. Adversarial Robustness
  • 33. • To test the worst-case robustness of machine learning algorithms. • Many real-world environments have inherent randomness that is difficult to predict. • Analyzing the worst-case robustness will cover minor perturbation cases. • To measure progress of machine learning algorithms towards human-level abilities. • In terms of normal performance, Gap is <<<< between Human Vs Machine. • In adversarial robustness, Gap >>>> between Human Vs Machine. Adversarial Robustness
  • 34. • Reactive defenses: Preprocessing techniques, detection for adversarial samples. • Detection of adversarial examples • Input transformations (preprocessing) • Obfuscation defenses: Try to hide or obfuscate sensitive traits of a model (e.g. gradients) to alleviate the impact of adversarial examples. • Gradient masking Defense Against Adversarial Attacks
  • 35. • Proactive defenses: Build and train models natively robust to adversarial perturbations. • Adversarial training • Architectural defenses • Learning in a min-max setting • Hyperparameter tuning • Generative models (GAN) based defense • Provable adversarial defenses • What is missing? • A uniform protocol for defense evaluation Defense Against Adversarial Attacks
  • 36. • Protect your Identity in public places. Adversarial Attacks & Privacy? https://www.inovex.de/blog/machine-perception-face-recognition/
  • 37. • Stopping unauthorized exploitation of personal data for training commercial models. • Protect your privacy. • Can data be made unlearnable for deep learning models? Adversarial Attacks & Privacy? Huang et al. “Unlearnable Examples: Making Personal Data Unexploitable”
  • 38. • Adversarial Attacks and defense– A very important challenge for AI research. • The existence of adversarial cases depend on the applications – classification, detection, segmentation, etc. • How many adversarial samples are out there? Impossible to know. • Need to revisit the current practice of reporting standard performance. Adversarial robust performance matters! • Robustness of ML/DL models must be evaluated with adversarial examples. • Adversarial attacks for a good cause – improving privacy. Takeaways
  • 39. • Grebner et al. “The Attack Generator: A Systematic Approach Towards Constructing Adversarial Attacks” • Arnab et al. "On the Robustness of Semantic Segmentation Models to Adversarial Attacks“ • Liu et al. “Deep Learning for Generic Object Detection: A Survey” • Wu et al. “Making an Invisibility Cloak: Real World Adversarial Attacks on Object Detectors” • Assion et al. "The Attack Generator: A Systematic Approach Towards Constructing Adversarial Attacks" • Shen et al. "AdvSPADE: Realistic Unrestricted Attacks for Semantic Segmentation" • Xu et al. "Adversarial T-shirt! Evading Person Detectors in A Physical World" • Duan et al. "Adversarial Camouflage: Hiding Physical-World Attacks with Natural Styles“ • Serban et al. "Adversarial Examples - A Complete Characterisation of the Phenomenon“ References