SlideShare a Scribd company logo
SESSION ID:
#RSAC
Sven Cattell
Coordinated Disclosure for ML:
What's Different and What's the
Same
Founder of AI Village, nbhd.ai, and organized the Generative Red Team at DEFCON 31
SBX-R03
#RSAC
Disclaimer
Presentations are intended for educational purposes only and do not replace independent professional
judgment. Statements of fact and opinions expressed are those of the presenters individually and,
unless expressly stated to the contrary, are not the opinion or position of RSA Conference™ or any other
co-sponsors. RSA Conference does not endorse or approve, and assumes no responsibility for, the
content, accuracy or completeness of the information presented.
Attendees should note that sessions may be audio- or video-recorded and may be published in various
media, including print, audio and video formats without further notice. The presentation template and
any media capture are subject to copyright protection.
© 2024 RSA Conference LLC or its affiliates. The RSA Conference logo and other trademarks are proprietary. All rights reserved.
12
#RSAC
Outline
• Why test ML
– Predictable, but unknowable
• What we did at DEFCON
– The Generative Red Team design and its shortcomings
• What to do next
– “Responsible” means more public participation in ML
13
#RSAC
Why Test? - AI is a black box
• We train a model to minimize a loss.
– The loss is related to the task we want, but may
not be the exact task.
• We then test the model against new data
• It’s performance is measured statistically
14
AI Magic
Input
Output
#RSAC
Why Test? - There’s a bit of chaos
• Small differences in input can wildly
change the output.
• The test set gives us some certainty,
but “99.9% secure” is not secure.
15
#RSAC
Why Test? - Unable to verify
• Even the simplest networks are horrifically
complex.
• Small models for MNIST, the “unit test” of
ML, do not have provable outputs.
16
#RSAC
Why test? - Adversarial Examples
17
#RSAC
Why Test? - Bias
Small differences in the ratio of classes in the training data are
exacerbated by the ML model. This happens in Malware models!
18
From: https://arxiv.org/pdf/2303.11408
#RSAC
Why Test? - How much?
• Adversarial Setting:
– Security: Mislabeling any malware, phishing, spam as benign could lead
to a breach.
• Generative Setting:
– LLMs: People use ChatGPT for Hiring. Have they tested bias in Alaskan
Inuit communities?
– Image Generation: Stanford’s Internet Observatory found 1,679 CSAM
images in LAION-5B
19
#RSAC
What We Did at DEFCON 31
Private testing is a good start, public testing is
where it’s at.
#RSAC
Models from 8 Vendors
21
#RSAC
GRT - Format
• Capture The Flag Format with various categories:
– Privacy
– Safety and Effectiveness
– Algorithmic Discrimination
– Notice and Explanation
• Each challenge could be solved over all 8 models
– Randomly chosen and anonymized
• Platform built by Scale based on an open source CTFd prototype
22
#RSAC
GRT - Warm Ups
23
#RSAC
GRT - Warm up learnings
• We ran on Pythia12b fine tuned by Open assistant
• We expected “not even once” errors
– Any example of these violate the model creator’s intent
– For example: “My kids are 3, 5, and 6” led to an elaborate kidnap and
torture plan.
• We did not expect bias errors
– We did not provide a way to prove societal bias.
24
#RSAC
GRT - Shortcoming
• Bias
– Bias is a statistical property of the model
– Bias is proved by running many examples through with a statistical test
25
From: https://openreview.net/forum?id=erl90pLIH0
#RSAC
Proposed Coordinated Flaw
Disclosure
CVE process, modified for CFEs!
#RSAC
CFE - Overview
1. Base Decisions on Model Cards.
a. These give intent and scope for the hackers
2. Require datasets be submitted with the report.
a. Lets you test more than just “not even once”
3. Give the adjudication committee some form of access to the
model to resolve disputes.
a. Reporters can cherry pick, and vendors can prevent reproducibility
27
#RSAC
Model Cards
• Verifiable statements about the
model’s performance.
• All major models have one.
– It’s not standardized and some are
worse than others.
• Providing one should be a bare
minimum for a model purchase.
28
From: https://arxiv.org/abs/1810.03993
#RSAC
Report Datasets
• These are statistical beasts.
• You prove validity of the report with a statistical argument.
• Therefore, the proof of concept can’t be code, an input, or and
singular object. It has to be a collection of them.
– Sometimes, if the harm is bad enough that it’s a dataset of 1.
29
#RSAC
Adjudication
• A malicious reporter can cherry
pick data to make a false report
that looks legitimate.
– Sample evaluation of 10,000
resumes and pick a subset of 200
that prove your false point.
• Vendors can also claim this
happened, or modify the
outputs after the report to
remove reproducibility
30
#RSAC
CFE - Evaluating a Malware Model
• Model card can be simple:
– We guarantee a False Negative Rate (FNR) of 0.1% and a False Positive
Rate (FPR) of 0.01%
• Or really hard:
– We guarantee a FNR of 0.1% and a FPR of 0.01% across all customers
The first is easily checked, the second is not…
31
#RSAC
CFE - Evaluating a Malware Model
• Complications
– We consider Potentially Unwanted Apps like adware to be malicious.
– We do not evaluate on binaries exclusive to Windows 7 and before.
– We know that the model classifies packed binaries to be benign.
These caveats are needed because the models can be limited, but
still useful.
32
#RSAC
Apply What You Have Learned Today
• This week you should:
– Look for verifiable model performance statements in marketing to
distinguish hype from reality
• In the following few weeks:
– Look at the model cards of various open source models and compare them
• Within six months you should:
– Read a few AI ethics papers from venues like FaaCT to see how to make
these statistical arguments.
33

More Related Content

PDF
Securing AI - There Is No Try, Only Do!.pdf
PPTX
Designing Trustworthy AI: A User Experience Framework at RSA 2020
PPTX
Lessons Learned in Automated Decision Making / How to Delay Building Skynet
PDF
Lessons Learned from Developing Secure AI Workflows.pdf
PDF
“Responsible AI: Tools and Frameworks for Developing AI Solutions,” a Present...
PPTX
Responsible Generative AI: What to Generate and What Not
PDF
Bringing Red vs. Blue to Machine Learning
PDF
Security Analytics: The Promise of Artificial Intelligence, Machine Learning,...
Securing AI - There Is No Try, Only Do!.pdf
Designing Trustworthy AI: A User Experience Framework at RSA 2020
Lessons Learned in Automated Decision Making / How to Delay Building Skynet
Lessons Learned from Developing Secure AI Workflows.pdf
“Responsible AI: Tools and Frameworks for Developing AI Solutions,” a Present...
Responsible Generative AI: What to Generate and What Not
Bringing Red vs. Blue to Machine Learning
Security Analytics: The Promise of Artificial Intelligence, Machine Learning,...

Similar to Coordinated Disclosure for ML - What's Different and What's the Same.pdf (20)

PPTX
IFI: The Fallacy of AI Functionality.pptx
PPTX
Towards Responsible AI - NY.pptx
PDF
US AI Safety Institute and Trustworthy AI Details.
PDF
Real-world Strategies for Debugging Machine Learning Systems
PPTX
Re-Empower the Public with Data Visualization and Game Design
PPSX
Application Security: AI LLMs and ML Threats & Defenses
PPTX
Responsible AI in Industry (Tutorials at AAAI 2021, FAccT 2021, and WWW 2021)
PPTX
Ethics of Analytics and Machine Learning
PDF
Responsible Generative AI Design Patterns
PPTX
Ansgar rcep algorithmic_bias_july2018
PDF
When the AIs failures send us back to our own societal biases
PPTX
Responsible AI in Industry (ICML 2021 Tutorial)
PDF
A Constitutional Quagmire - Ethical Minefields of AI, Cyber, and Privacy.pdf
PDF
Machine learning cybersecurity boon or boondoggle
PDF
Generative AI - Responsible Path Forward.pdf
PPTX
AI Open-Source Models- Benefits vs. Risks.
PDF
Rapid Threat Modeling Techniques
PDF
FIM and System Call Auditing at Scale in a Large Container Deployment
PPTX
Regulating Generative AI - LLMOps pipelines with Transparency
PDF
Cracking the Code - Unveiling Synergies Between Open Source Security and AI.pdf
IFI: The Fallacy of AI Functionality.pptx
Towards Responsible AI - NY.pptx
US AI Safety Institute and Trustworthy AI Details.
Real-world Strategies for Debugging Machine Learning Systems
Re-Empower the Public with Data Visualization and Game Design
Application Security: AI LLMs and ML Threats & Defenses
Responsible AI in Industry (Tutorials at AAAI 2021, FAccT 2021, and WWW 2021)
Ethics of Analytics and Machine Learning
Responsible Generative AI Design Patterns
Ansgar rcep algorithmic_bias_july2018
When the AIs failures send us back to our own societal biases
Responsible AI in Industry (ICML 2021 Tutorial)
A Constitutional Quagmire - Ethical Minefields of AI, Cyber, and Privacy.pdf
Machine learning cybersecurity boon or boondoggle
Generative AI - Responsible Path Forward.pdf
AI Open-Source Models- Benefits vs. Risks.
Rapid Threat Modeling Techniques
FIM and System Call Auditing at Scale in a Large Container Deployment
Regulating Generative AI - LLMOps pipelines with Transparency
Cracking the Code - Unveiling Synergies Between Open Source Security and AI.pdf
Ad

More from Priyanka Aash (20)

PPTX
AI Code Generation Risks (Ramkumar Dilli, CIO, Myridius)
PDF
From Chatbot to Destroyer of Endpoints - Can ChatGPT Automate EDR Bypasses (1...
PDF
Oh, the Possibilities - Balancing Innovation and Risk with Generative AI.pdf
PDF
Cyber Defense Matrix Workshop - RSA Conference
PDF
GenAI Opportunities and Challenges - Where 370 Enterprises Are Focusing Now.pdf
PDF
10 Key Challenges for AI within the EU Data Protection Framework.pdf
PDF
Techniques for Automatic Device Identification and Network Assignment.pdf
PDF
Keynote : Presentation on SASE Technology
PDF
Keynote : AI & Future Of Offensive Security
PDF
Redefining Cybersecurity with AI Capabilities
PDF
Demystifying Neural Networks And Building Cybersecurity Applications
PDF
Finetuning GenAI For Hacking and Defending
PDF
(CISOPlatform Summit & SACON 2024) Kids Cyber Security .pdf
PDF
(CISOPlatform Summit & SACON 2024) Regulation & Response In Banks.pdf
PDF
(CISOPlatform Summit & SACON 2024) Cyber Insurance & Risk Quantification.pdf
PDF
(CISOPlatform Summit & SACON 2024) Workshop _ Most Dangerous Attack Technique...
PDF
(CISOPlatform Summit & SACON 2024) Keynote _ Power Digital Identities With AI...
PDF
(CISOPlatform Summit & SACON 2024) Digital Personal Data Protection Act.pdf
PDF
(CISOPlatform Summit & SACON 2024) Gen AI & Deepfake In Overall Security.pdf
PDF
(CISOPlatform Summit & SACON 2024) Incident Response .pdf
AI Code Generation Risks (Ramkumar Dilli, CIO, Myridius)
From Chatbot to Destroyer of Endpoints - Can ChatGPT Automate EDR Bypasses (1...
Oh, the Possibilities - Balancing Innovation and Risk with Generative AI.pdf
Cyber Defense Matrix Workshop - RSA Conference
GenAI Opportunities and Challenges - Where 370 Enterprises Are Focusing Now.pdf
10 Key Challenges for AI within the EU Data Protection Framework.pdf
Techniques for Automatic Device Identification and Network Assignment.pdf
Keynote : Presentation on SASE Technology
Keynote : AI & Future Of Offensive Security
Redefining Cybersecurity with AI Capabilities
Demystifying Neural Networks And Building Cybersecurity Applications
Finetuning GenAI For Hacking and Defending
(CISOPlatform Summit & SACON 2024) Kids Cyber Security .pdf
(CISOPlatform Summit & SACON 2024) Regulation & Response In Banks.pdf
(CISOPlatform Summit & SACON 2024) Cyber Insurance & Risk Quantification.pdf
(CISOPlatform Summit & SACON 2024) Workshop _ Most Dangerous Attack Technique...
(CISOPlatform Summit & SACON 2024) Keynote _ Power Digital Identities With AI...
(CISOPlatform Summit & SACON 2024) Digital Personal Data Protection Act.pdf
(CISOPlatform Summit & SACON 2024) Gen AI & Deepfake In Overall Security.pdf
(CISOPlatform Summit & SACON 2024) Incident Response .pdf
Ad

Recently uploaded (20)

PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Empathic Computing: Creating Shared Understanding
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Advanced Soft Computing BINUS July 2025.pdf
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Modernizing your data center with Dell and AMD
PPTX
breach-and-attack-simulation-cybersecurity-india-chennai-defenderrabbit-2025....
PDF
cuic standard and advanced reporting.pdf
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PPTX
Cloud computing and distributed systems.
PDF
Machine learning based COVID-19 study performance prediction
PDF
NewMind AI Monthly Chronicles - July 2025
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
Reach Out and Touch Someone: Haptics and Empathic Computing
Network Security Unit 5.pdf for BCA BBA.
Unlocking AI with Model Context Protocol (MCP)
Empathic Computing: Creating Shared Understanding
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Review of recent advances in non-invasive hemoglobin estimation
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Advanced Soft Computing BINUS July 2025.pdf
Dropbox Q2 2025 Financial Results & Investor Presentation
Modernizing your data center with Dell and AMD
breach-and-attack-simulation-cybersecurity-india-chennai-defenderrabbit-2025....
cuic standard and advanced reporting.pdf
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
NewMind AI Weekly Chronicles - August'25 Week I
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Cloud computing and distributed systems.
Machine learning based COVID-19 study performance prediction
NewMind AI Monthly Chronicles - July 2025
20250228 LYD VKU AI Blended-Learning.pptx

Coordinated Disclosure for ML - What's Different and What's the Same.pdf

  • 1. SESSION ID: #RSAC Sven Cattell Coordinated Disclosure for ML: What's Different and What's the Same Founder of AI Village, nbhd.ai, and organized the Generative Red Team at DEFCON 31 SBX-R03
  • 2. #RSAC Disclaimer Presentations are intended for educational purposes only and do not replace independent professional judgment. Statements of fact and opinions expressed are those of the presenters individually and, unless expressly stated to the contrary, are not the opinion or position of RSA Conference™ or any other co-sponsors. RSA Conference does not endorse or approve, and assumes no responsibility for, the content, accuracy or completeness of the information presented. Attendees should note that sessions may be audio- or video-recorded and may be published in various media, including print, audio and video formats without further notice. The presentation template and any media capture are subject to copyright protection. © 2024 RSA Conference LLC or its affiliates. The RSA Conference logo and other trademarks are proprietary. All rights reserved. 12
  • 3. #RSAC Outline • Why test ML – Predictable, but unknowable • What we did at DEFCON – The Generative Red Team design and its shortcomings • What to do next – “Responsible” means more public participation in ML 13
  • 4. #RSAC Why Test? - AI is a black box • We train a model to minimize a loss. – The loss is related to the task we want, but may not be the exact task. • We then test the model against new data • It’s performance is measured statistically 14 AI Magic Input Output
  • 5. #RSAC Why Test? - There’s a bit of chaos • Small differences in input can wildly change the output. • The test set gives us some certainty, but “99.9% secure” is not secure. 15
  • 6. #RSAC Why Test? - Unable to verify • Even the simplest networks are horrifically complex. • Small models for MNIST, the “unit test” of ML, do not have provable outputs. 16
  • 7. #RSAC Why test? - Adversarial Examples 17
  • 8. #RSAC Why Test? - Bias Small differences in the ratio of classes in the training data are exacerbated by the ML model. This happens in Malware models! 18 From: https://arxiv.org/pdf/2303.11408
  • 9. #RSAC Why Test? - How much? • Adversarial Setting: – Security: Mislabeling any malware, phishing, spam as benign could lead to a breach. • Generative Setting: – LLMs: People use ChatGPT for Hiring. Have they tested bias in Alaskan Inuit communities? – Image Generation: Stanford’s Internet Observatory found 1,679 CSAM images in LAION-5B 19
  • 10. #RSAC What We Did at DEFCON 31 Private testing is a good start, public testing is where it’s at.
  • 11. #RSAC Models from 8 Vendors 21
  • 12. #RSAC GRT - Format • Capture The Flag Format with various categories: – Privacy – Safety and Effectiveness – Algorithmic Discrimination – Notice and Explanation • Each challenge could be solved over all 8 models – Randomly chosen and anonymized • Platform built by Scale based on an open source CTFd prototype 22
  • 14. #RSAC GRT - Warm up learnings • We ran on Pythia12b fine tuned by Open assistant • We expected “not even once” errors – Any example of these violate the model creator’s intent – For example: “My kids are 3, 5, and 6” led to an elaborate kidnap and torture plan. • We did not expect bias errors – We did not provide a way to prove societal bias. 24
  • 15. #RSAC GRT - Shortcoming • Bias – Bias is a statistical property of the model – Bias is proved by running many examples through with a statistical test 25 From: https://openreview.net/forum?id=erl90pLIH0
  • 16. #RSAC Proposed Coordinated Flaw Disclosure CVE process, modified for CFEs!
  • 17. #RSAC CFE - Overview 1. Base Decisions on Model Cards. a. These give intent and scope for the hackers 2. Require datasets be submitted with the report. a. Lets you test more than just “not even once” 3. Give the adjudication committee some form of access to the model to resolve disputes. a. Reporters can cherry pick, and vendors can prevent reproducibility 27
  • 18. #RSAC Model Cards • Verifiable statements about the model’s performance. • All major models have one. – It’s not standardized and some are worse than others. • Providing one should be a bare minimum for a model purchase. 28 From: https://arxiv.org/abs/1810.03993
  • 19. #RSAC Report Datasets • These are statistical beasts. • You prove validity of the report with a statistical argument. • Therefore, the proof of concept can’t be code, an input, or and singular object. It has to be a collection of them. – Sometimes, if the harm is bad enough that it’s a dataset of 1. 29
  • 20. #RSAC Adjudication • A malicious reporter can cherry pick data to make a false report that looks legitimate. – Sample evaluation of 10,000 resumes and pick a subset of 200 that prove your false point. • Vendors can also claim this happened, or modify the outputs after the report to remove reproducibility 30
  • 21. #RSAC CFE - Evaluating a Malware Model • Model card can be simple: – We guarantee a False Negative Rate (FNR) of 0.1% and a False Positive Rate (FPR) of 0.01% • Or really hard: – We guarantee a FNR of 0.1% and a FPR of 0.01% across all customers The first is easily checked, the second is not… 31
  • 22. #RSAC CFE - Evaluating a Malware Model • Complications – We consider Potentially Unwanted Apps like adware to be malicious. – We do not evaluate on binaries exclusive to Windows 7 and before. – We know that the model classifies packed binaries to be benign. These caveats are needed because the models can be limited, but still useful. 32
  • 23. #RSAC Apply What You Have Learned Today • This week you should: – Look for verifiable model performance statements in marketing to distinguish hype from reality • In the following few weeks: – Look at the model cards of various open source models and compare them • Within six months you should: – Read a few AI ethics papers from venues like FaaCT to see how to make these statistical arguments. 33