SlideShare a Scribd company logo
2
Most read
4
Most read
5
Most read
EXPLOITING AI MODELS
Adversarial Attacks and Defense Mechanisms
Presented by
Bryan Zarnett
Chief Technology Officer at NetraScale
INTRODUCTIONS 1/12
"AI's biggest strength is also its
weakest link. The smarter it
gets, the more complexity that
is added, the better a
professional threat actor can
blend in.”
AGENDA
Using Cyber-security Threats for Inspiration
AI Vulnerabilities and Models at Risk
Adversarial Attacks and Attack Models
Defense Strategies and Tactics
Real-world Limitations or Problems to Solve?
PERSPECTIVE 2/12
Using Cyber-security Threats for Inspiration
In offensive security, we apply three points of view – yours, the opponent, and a bystander. In
developing solutions with one perspective being offensive security, take into consideration….
OFFENSIVE
Implementation of solutions that implement attacks in
a positive manner (using skills for good versus evil)
PREVENTATIVE
Assurance that the fundamental and emerging
attacks are not present in your solution.
3
2
DEFENSIVE
Implementation of solutions that detect, deter,
or respond to attacks and related activities.
1
INTRODUCTION 3/12
AI VULNERABILITIES
AI models are inherently vulnerable to attacks due to their reliance on data, complex architectures,
and often opaque decision-making processes. These vulnerabilities make AI systems prime targets
for malicious actors who seek to manipulate outcomes, extract information, or disrupt services.
Model
Dependencies
Complexity
Increased
Deployment
AI models rely heavily on large volumes of data for training and
refinement. Any tampering with this data (data poisoning) can degrade
model performance, making AI systems prone to making incorrect or
biased decisions.
Many AI models, especially deep learning networks, operate as "black
boxes" with limited interpretability. This opacity creates a security
challenge: it’s difficult to predict how a model might respond to
adversarial inputs, making it challenging to detect or mitigate potential
vulnerabilities.
As AI becomes more prevalent in essential systems like financial
services, healthcare, and autonomous vehicles, the consequences of
attacks can be severe. AI security isn’t just a technical concern but a
matter of public safety, regulatory compliance, and societal trust.
INTRODUCTION 4/12
MODELS AT RISK
1
2
3
4
IMAGE
CLASSIFIERS
NLP MODELS
REINFORCEMENT
LEARNING MODELS
GENERATIVE
MODELS
These models are often deployed in security systems (like facial recognition) and autonomous systems
(such as driverless cars). However, small, carefully crafted perturbations can mislead these classifiers,
causing the model to misidentify objects or people, potentially leading to severe safety risks.
NLP models, used in applications like chatbots, sentiment analysis, and language translation, are vulnerable
to input manipulation. Adversaries can craft textual inputs to produce biased or harmful outputs, extract
confidential information, or mislead users.
These models are commonly used in dynamic environments like trading systems, game AI, and robotics.
Adversarial attacks on reinforcement learning can lead to suboptimal or harmful decisions. In the financial
sector, this might cause a model to buy or sell assets incorrectly.
Generative models, such as Generative Adversarial Networks (GANs), produce content like images or text.
These models can be attacked to create fake or misleading content, which could have severe consequences
for cybersecurity and information integrity, such as generating synthetic identities for fraud.
FUNDAMENTALS 5/12
ADVERSARIAL ATTACKS
Adversarial attacks are deliberate attempts to deceive, manipulate, or compromise AI models by exploiting their vulnerabilities. These
attacks are designed to produce unintended behaviors in the model, often with severe implications depending on the application.
Evasion Poisoning Model Extraction
Attack
INFERENCE PHASE TRAINING DATASET PARAMETERIZATION
The attacker subtly alters the input
data to deceive the model without
needing access to the training data.
This is particularly effective against
image classification models, where
small modifications to pixels can
cause the model to misclassify an
object, often without human detection.
The attacker contaminates the training
dataset, introducing incorrect or
malicious data points to compromise
the model’s accuracy. Poisoning can
lead to long-term issues in the model,
as the model will consistently exhibit
biased or inaccurate behavior.
Aimed at learning the internal
parameters or the decision-making
logic of the model. Attackers query the
model multiple times to reconstruct or
approximate its functionality, often with
the intent of creating a similar model
without needing access to the original
training data or understanding the
internal architecture.
Example
By slightly altering an image of a stop
sign, attackers can cause an AI-driven
vehicle’s vision system to perceive it
as a yield sign
Introduce biased data into a medical AI
training set, causing it to misdiagnose
certain diseases or recommend
incorrect treatments.
Use model extraction to replicate a
proprietary recommendation algorithm
in a financial system. This could lead to
intellectual property theft and
significant competitive losses for the
model’s creator.
FUNDAMENTALS 6/12
ATTACK MODELS
Minor, carefully crafted
changes to an image could
cause an image classifier to
misidentify objects. In the
example of the "stop sign
attack," researchers modified
a stop sign’s pixels in a way
that was invisible to the human
eye. However, this alteration
caused the classifier in a self-
driving car to misinterpret it as
a yield or speed limit sign,
creating a severe safety risk.
1
Data Poisoning
Machine learning models are
often used to detect fraudulent
transactions. Attackers have
successfully injected
manipulated transactions into
training datasets, causing the
models to ignore specific
fraudulent patterns. By
poisoning the dataset, the
attackers can make future
fraudulent transactions less
likely to trigger alerts.
Model Extraction
Attackers used a series of API
queries to replicate a proprietary
sentiment analysis model. This
allowed them to create a near-
identical model without investing in
the original research, depriving the
service provider of revenue while
raising IP theft concerns.
3
Backdoor Attacks
A backdoor attack was
successfully embedded in a facial
recognition system used for
access control. By introducing
specific images with subtle,
repetitive patterns into the training
set, attackers created a hidden
trigger. When the trigger pattern
(such as a unique accessory or
small tattoo) was present, the
system would misclassify
unauthorized individuals as
authorized users.
4
2
Image Alteration
IN DEFENSE 7/12
STRATEGIES
Defense mechanisms aim to increase the model's resilience, detect adversarial behavior, and mitigate the effects of successful
attacks. Some of the most widely adopted defense strategies include adversarial training, model robustness enhancement, and
defensive distillation.
Adversarial Training
Exposing the model to adversarial examples
during the training phase. By training on both
clean and adversarial data, the model learns
to recognize and resist attacks, making it
more robust against similar threats.
Model Robustness
Building models that are less sensitive to
small changes in input, making them harder
to fool with adversarial examples. Robustness
can be improved through regularization
techniques, noise injection, or using more
complex architectures that generalize better.
Defensive Distillation
Training a simplified "student" model to mimic
the behavior of a more complex "teacher"
model. The process smooths out the decision
boundaries, making it harder for adversarial
attacks to find precise weaknesses.
IN DEFENSE 8/12
TACTICS
Tactics are the specific actions we take to implement our strategy effectively. In the context of AI, the tactics for detecting and
mitigating attacks are grounded in core principles of preventative security, addressing both technical vulnerabilities in code and threats
from social engineering. These approaches aim to identify unusual input patterns, assess uncertainty levels, and apply filtering
techniques to reduce potential risks and improve resilience against adversarial attacks.
Input Filtering
Input filtering attempts to identify and block
adversarial inputs before they reach the model.
This is often done by scanning inputs for
suspicious patterns or perturbations that
deviate from normal data distributions.
Filtering can be based on statistical methods,
such as detecting outliers, or more advanced
approaches like employing additional models
trained to classify inputs as adversarial or
benign.
Uncertainty Assessment
AI models can be equipped to assess their own
uncertainty regarding specific inputs, which
helps identify adversarial examples. When an
input causes unusually high uncertainty, the
model may flag it as suspicious.
Methods like Bayesian inference or dropout-
based uncertainty estimation allow models to
gauge confidence levels. If an input causes
high variance in predictions across multiple
runs, it might be an adversarial attempt.
Anomaly Detection
Detection models are trained specifically to
identify adversarial inputs by analyzing
patterns that differ from normal behavior.
Anomaly detection systems flag unusual inputs
or behaviors in real-time, offering an additional
layer of defense.
Autoencoders, statistical outlier detection, and
ensemble methods are commonly used to
distinguish between benign and adversarial
inputs. Detection models run alongside the
primary model, filtering inputs.
FUTURE STATE 9/12
REAL WORLD LIMITATIONS
“CODE” QUALITY
Poorly written or inadequately tested code can
introduce vulnerabilities that adversaries may
exploit, leading to unexpected model behaviors
or complete system compromise.
MODEL INTERPRETABILITY
Often compromised by defensive measures
like robustness improvements or distillation.
This can make it difficult for organizations to
understand model decisions and adhere to
regulatory requirements,
PERFORMANCE TRADE-OFFS
Stronger defenses can significantly slow
down model performance, making them
unsuitable for real-time or high-frequency
applications.
ADVANCING THREATS
New adversarial techniques are developed,
existing defenses may become outdated or
ineffective. A successful defense strategy
typically requires layered approaches that
combine several techniques to create a more
resilient AI system.
FUTURE STATE 10/12
REAL WORLD LIMITATIONS
FUTURE STATE 11/12
LOOKING FORWARD
SECURE MODEL ARCHITECTURES
AI MONITORING SYSTEMS
EXPLAINABILITY
Consider the following
milestones in the
development of your
solution!
These architectures integrate security
measures such as adversarial
robustness, encrypted computations,
and privacy-preserving techniques to
protect sensitive data and model
integrity.
These systems monitor inputs,
outputs, and model decisions to detect
anomalies, potential adversarial
attacks, and drift in model
performance.
Explainability tools help uncover how
models reach specific decisions,
allowing organizations to understand
potential vulnerabilities, biases, or
unexpected behaviors.
THANK YOU 12/12

More Related Content

PDF
AI model security.pdf
DOCX
Minor Project ReportCyber security Effects on AI: Challenges and Mitigation S...
DOCX
Minor Project Report about Cyber security Effects on AI: Challenges and Mitig...
PPTX
Cybersecurity artificial intelligence presentation
PDF
Data security in AI systems
PDF
How adversaries interfere with AI and ML systems
PPTX
Blackbox Testing in AI Cybersecurity
PDF
Harnessing the Power of Machine Learning in Cybersecurity.pdf
AI model security.pdf
Minor Project ReportCyber security Effects on AI: Challenges and Mitigation S...
Minor Project Report about Cyber security Effects on AI: Challenges and Mitig...
Cybersecurity artificial intelligence presentation
Data security in AI systems
How adversaries interfere with AI and ML systems
Blackbox Testing in AI Cybersecurity
Harnessing the Power of Machine Learning in Cybersecurity.pdf

Similar to Exploiting AI Models: Adversarial Attacks and Defense Mechanisms (20)

PDF
The Role of AI in Risk Management for Enterprises.pdf
PPTX
apidays New York 2025 - To tune or not to tune by Anamitra Dutta Majumdar (In...
PPTX
AI_ML_Penetration_Testing_Safeguarding_AI-Driven_Systems.pptx
PDF
SECURING THE DIGITAL FORTRESS: ADVERSARIAL MACHINE LEARNING CHALLENGES AND CO...
PPTX
Group 10 - DNN Presentation for UOM.pptx
PDF
AI-Driven Threat Intelligence: Transforming Cybersecurity for Proactive Risk ...
PDF
AI-Driven Threat Intelligence: Transforming Cybersecurity for Proactive Risk ...
PPTX
swamy_ppt[1]_[Read-Only][1].pptxswamy_ppt[1]_[Read-Only][1].pptx
PPTX
machine learning ppt for the bca sem 6 students
PDF
AI and Machine Learning in Cybersecurity.pdf
 
PDF
Generative AI Cybersecurity Solutions Shaping the Future of Cyber Protection ...
PDF
AI Risk Management: How Smart Businesses Stay Safe
PDF
J018127176.publishing paper of mamatha (1)
PDF
Session2-Application Threat Modeling
PDF
Tru_Shiralkar_Gen AI Sec_ ISACA 2024.pdf
PDF
CYBERSECURITY INFRASTRUCTURE AND SECURITY AUTOMATION
PDF
CYBERSECURITY INFRASTRUCTURE AND SECURITY AUTOMATION
PDF
LLM_Security_Arjun_Ghosal_&_Sneharghya.pdf
PPTX
Secure AI Development: Strategies for Safe Innovation in a Machine-Led World
PDF
[DSC Europe 23] Aleksandar Tomcic - Adversarial Attacks
The Role of AI in Risk Management for Enterprises.pdf
apidays New York 2025 - To tune or not to tune by Anamitra Dutta Majumdar (In...
AI_ML_Penetration_Testing_Safeguarding_AI-Driven_Systems.pptx
SECURING THE DIGITAL FORTRESS: ADVERSARIAL MACHINE LEARNING CHALLENGES AND CO...
Group 10 - DNN Presentation for UOM.pptx
AI-Driven Threat Intelligence: Transforming Cybersecurity for Proactive Risk ...
AI-Driven Threat Intelligence: Transforming Cybersecurity for Proactive Risk ...
swamy_ppt[1]_[Read-Only][1].pptxswamy_ppt[1]_[Read-Only][1].pptx
machine learning ppt for the bca sem 6 students
AI and Machine Learning in Cybersecurity.pdf
 
Generative AI Cybersecurity Solutions Shaping the Future of Cyber Protection ...
AI Risk Management: How Smart Businesses Stay Safe
J018127176.publishing paper of mamatha (1)
Session2-Application Threat Modeling
Tru_Shiralkar_Gen AI Sec_ ISACA 2024.pdf
CYBERSECURITY INFRASTRUCTURE AND SECURITY AUTOMATION
CYBERSECURITY INFRASTRUCTURE AND SECURITY AUTOMATION
LLM_Security_Arjun_Ghosal_&_Sneharghya.pdf
Secure AI Development: Strategies for Safe Innovation in a Machine-Led World
[DSC Europe 23] Aleksandar Tomcic - Adversarial Attacks
Ad

Recently uploaded (20)

PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Empathic Computing: Creating Shared Understanding
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PPTX
Cloud computing and distributed systems.
PPT
Teaching material agriculture food technology
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Unlocking AI with Model Context Protocol (MCP)
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Machine learning based COVID-19 study performance prediction
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Network Security Unit 5.pdf for BCA BBA.
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Understanding_Digital_Forensics_Presentation.pptx
Empathic Computing: Creating Shared Understanding
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Cloud computing and distributed systems.
Teaching material agriculture food technology
Reach Out and Touch Someone: Haptics and Empathic Computing
Unlocking AI with Model Context Protocol (MCP)
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Diabetes mellitus diagnosis method based random forest with bat algorithm
Encapsulation_ Review paper, used for researhc scholars
MIND Revenue Release Quarter 2 2025 Press Release
20250228 LYD VKU AI Blended-Learning.pptx
Programs and apps: productivity, graphics, security and other tools
Machine learning based COVID-19 study performance prediction
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Ad

Exploiting AI Models: Adversarial Attacks and Defense Mechanisms

  • 1. EXPLOITING AI MODELS Adversarial Attacks and Defense Mechanisms Presented by Bryan Zarnett Chief Technology Officer at NetraScale
  • 2. INTRODUCTIONS 1/12 "AI's biggest strength is also its weakest link. The smarter it gets, the more complexity that is added, the better a professional threat actor can blend in.” AGENDA Using Cyber-security Threats for Inspiration AI Vulnerabilities and Models at Risk Adversarial Attacks and Attack Models Defense Strategies and Tactics Real-world Limitations or Problems to Solve?
  • 3. PERSPECTIVE 2/12 Using Cyber-security Threats for Inspiration In offensive security, we apply three points of view – yours, the opponent, and a bystander. In developing solutions with one perspective being offensive security, take into consideration…. OFFENSIVE Implementation of solutions that implement attacks in a positive manner (using skills for good versus evil) PREVENTATIVE Assurance that the fundamental and emerging attacks are not present in your solution. 3 2 DEFENSIVE Implementation of solutions that detect, deter, or respond to attacks and related activities. 1
  • 4. INTRODUCTION 3/12 AI VULNERABILITIES AI models are inherently vulnerable to attacks due to their reliance on data, complex architectures, and often opaque decision-making processes. These vulnerabilities make AI systems prime targets for malicious actors who seek to manipulate outcomes, extract information, or disrupt services. Model Dependencies Complexity Increased Deployment AI models rely heavily on large volumes of data for training and refinement. Any tampering with this data (data poisoning) can degrade model performance, making AI systems prone to making incorrect or biased decisions. Many AI models, especially deep learning networks, operate as "black boxes" with limited interpretability. This opacity creates a security challenge: it’s difficult to predict how a model might respond to adversarial inputs, making it challenging to detect or mitigate potential vulnerabilities. As AI becomes more prevalent in essential systems like financial services, healthcare, and autonomous vehicles, the consequences of attacks can be severe. AI security isn’t just a technical concern but a matter of public safety, regulatory compliance, and societal trust.
  • 5. INTRODUCTION 4/12 MODELS AT RISK 1 2 3 4 IMAGE CLASSIFIERS NLP MODELS REINFORCEMENT LEARNING MODELS GENERATIVE MODELS These models are often deployed in security systems (like facial recognition) and autonomous systems (such as driverless cars). However, small, carefully crafted perturbations can mislead these classifiers, causing the model to misidentify objects or people, potentially leading to severe safety risks. NLP models, used in applications like chatbots, sentiment analysis, and language translation, are vulnerable to input manipulation. Adversaries can craft textual inputs to produce biased or harmful outputs, extract confidential information, or mislead users. These models are commonly used in dynamic environments like trading systems, game AI, and robotics. Adversarial attacks on reinforcement learning can lead to suboptimal or harmful decisions. In the financial sector, this might cause a model to buy or sell assets incorrectly. Generative models, such as Generative Adversarial Networks (GANs), produce content like images or text. These models can be attacked to create fake or misleading content, which could have severe consequences for cybersecurity and information integrity, such as generating synthetic identities for fraud.
  • 6. FUNDAMENTALS 5/12 ADVERSARIAL ATTACKS Adversarial attacks are deliberate attempts to deceive, manipulate, or compromise AI models by exploiting their vulnerabilities. These attacks are designed to produce unintended behaviors in the model, often with severe implications depending on the application. Evasion Poisoning Model Extraction Attack INFERENCE PHASE TRAINING DATASET PARAMETERIZATION The attacker subtly alters the input data to deceive the model without needing access to the training data. This is particularly effective against image classification models, where small modifications to pixels can cause the model to misclassify an object, often without human detection. The attacker contaminates the training dataset, introducing incorrect or malicious data points to compromise the model’s accuracy. Poisoning can lead to long-term issues in the model, as the model will consistently exhibit biased or inaccurate behavior. Aimed at learning the internal parameters or the decision-making logic of the model. Attackers query the model multiple times to reconstruct or approximate its functionality, often with the intent of creating a similar model without needing access to the original training data or understanding the internal architecture. Example By slightly altering an image of a stop sign, attackers can cause an AI-driven vehicle’s vision system to perceive it as a yield sign Introduce biased data into a medical AI training set, causing it to misdiagnose certain diseases or recommend incorrect treatments. Use model extraction to replicate a proprietary recommendation algorithm in a financial system. This could lead to intellectual property theft and significant competitive losses for the model’s creator.
  • 7. FUNDAMENTALS 6/12 ATTACK MODELS Minor, carefully crafted changes to an image could cause an image classifier to misidentify objects. In the example of the "stop sign attack," researchers modified a stop sign’s pixels in a way that was invisible to the human eye. However, this alteration caused the classifier in a self- driving car to misinterpret it as a yield or speed limit sign, creating a severe safety risk. 1 Data Poisoning Machine learning models are often used to detect fraudulent transactions. Attackers have successfully injected manipulated transactions into training datasets, causing the models to ignore specific fraudulent patterns. By poisoning the dataset, the attackers can make future fraudulent transactions less likely to trigger alerts. Model Extraction Attackers used a series of API queries to replicate a proprietary sentiment analysis model. This allowed them to create a near- identical model without investing in the original research, depriving the service provider of revenue while raising IP theft concerns. 3 Backdoor Attacks A backdoor attack was successfully embedded in a facial recognition system used for access control. By introducing specific images with subtle, repetitive patterns into the training set, attackers created a hidden trigger. When the trigger pattern (such as a unique accessory or small tattoo) was present, the system would misclassify unauthorized individuals as authorized users. 4 2 Image Alteration
  • 8. IN DEFENSE 7/12 STRATEGIES Defense mechanisms aim to increase the model's resilience, detect adversarial behavior, and mitigate the effects of successful attacks. Some of the most widely adopted defense strategies include adversarial training, model robustness enhancement, and defensive distillation. Adversarial Training Exposing the model to adversarial examples during the training phase. By training on both clean and adversarial data, the model learns to recognize and resist attacks, making it more robust against similar threats. Model Robustness Building models that are less sensitive to small changes in input, making them harder to fool with adversarial examples. Robustness can be improved through regularization techniques, noise injection, or using more complex architectures that generalize better. Defensive Distillation Training a simplified "student" model to mimic the behavior of a more complex "teacher" model. The process smooths out the decision boundaries, making it harder for adversarial attacks to find precise weaknesses.
  • 9. IN DEFENSE 8/12 TACTICS Tactics are the specific actions we take to implement our strategy effectively. In the context of AI, the tactics for detecting and mitigating attacks are grounded in core principles of preventative security, addressing both technical vulnerabilities in code and threats from social engineering. These approaches aim to identify unusual input patterns, assess uncertainty levels, and apply filtering techniques to reduce potential risks and improve resilience against adversarial attacks. Input Filtering Input filtering attempts to identify and block adversarial inputs before they reach the model. This is often done by scanning inputs for suspicious patterns or perturbations that deviate from normal data distributions. Filtering can be based on statistical methods, such as detecting outliers, or more advanced approaches like employing additional models trained to classify inputs as adversarial or benign. Uncertainty Assessment AI models can be equipped to assess their own uncertainty regarding specific inputs, which helps identify adversarial examples. When an input causes unusually high uncertainty, the model may flag it as suspicious. Methods like Bayesian inference or dropout- based uncertainty estimation allow models to gauge confidence levels. If an input causes high variance in predictions across multiple runs, it might be an adversarial attempt. Anomaly Detection Detection models are trained specifically to identify adversarial inputs by analyzing patterns that differ from normal behavior. Anomaly detection systems flag unusual inputs or behaviors in real-time, offering an additional layer of defense. Autoencoders, statistical outlier detection, and ensemble methods are commonly used to distinguish between benign and adversarial inputs. Detection models run alongside the primary model, filtering inputs.
  • 10. FUTURE STATE 9/12 REAL WORLD LIMITATIONS “CODE” QUALITY Poorly written or inadequately tested code can introduce vulnerabilities that adversaries may exploit, leading to unexpected model behaviors or complete system compromise. MODEL INTERPRETABILITY Often compromised by defensive measures like robustness improvements or distillation. This can make it difficult for organizations to understand model decisions and adhere to regulatory requirements, PERFORMANCE TRADE-OFFS Stronger defenses can significantly slow down model performance, making them unsuitable for real-time or high-frequency applications. ADVANCING THREATS New adversarial techniques are developed, existing defenses may become outdated or ineffective. A successful defense strategy typically requires layered approaches that combine several techniques to create a more resilient AI system.
  • 11. FUTURE STATE 10/12 REAL WORLD LIMITATIONS
  • 12. FUTURE STATE 11/12 LOOKING FORWARD SECURE MODEL ARCHITECTURES AI MONITORING SYSTEMS EXPLAINABILITY Consider the following milestones in the development of your solution! These architectures integrate security measures such as adversarial robustness, encrypted computations, and privacy-preserving techniques to protect sensitive data and model integrity. These systems monitor inputs, outputs, and model decisions to detect anomalies, potential adversarial attacks, and drift in model performance. Explainability tools help uncover how models reach specific decisions, allowing organizations to understand potential vulnerabilities, biases, or unexpected behaviors.