SlideShare a Scribd company logo
Synthetic
Examples
For Applying
Supervised Machine Learning
In Cyber Defense
Samuel Crisanto
Naveed Ahmad
Office 365 Security
The Context
• Not customer facing security
• Not the typical IT setup with lot of chaos
• Intrusion Detection in Data Center
• Same software running on hundreds of
thousands of machines doing similar things
0
1
2
3
4
5
8/25 8/27 8/29 8/31 9/02
Alerts Per Day
+ We Catch Attacks
~440 Billion Events Per Day
Filter
Rules
Service Normal
Hacker’s Activity
Auto-Learn
Catching Bad Actors
0
100000
200000
300000
400000
500000
8/25 8/27 8/29 8/31 9/2
Detection per Day
Anomaly Rule
0
100000
200000
300000
400000
500000
8/25 8/27 8/29 8/31 9/2
Detection per Day
Anomaly Rule
• Can’t learn what we haven’t seen
• But there is value in auto-learning what we have seen
• With a world class Pen Test team you can auto-learn a lot
Supervised ML – Limitations & Opportunities
0
1
2
3
4
5
8/25 8/27 8/29 8/31 9/02
Alerts Per Day
+ We Catch Attacks
0
100000
200000
300000
400000
500000
8/25 8/27 8/29 8/31 9/2
Detection per Day
Anomaly Rule
Transforming Human Problem in to ML ProblemThe Context
P – Process
U – User
SG – Security Group
RD – Remote Destination
RK – Registry Key
D – Detection
Entity with one or more associated
detections
Entity without an
associated detection
Detection
P2 Launched Process
SG1
RD1
Established Connection
D1
P1 Launched Process
P3
Launched Process
Created User
P3
Added user to Security Group
D3
U1
D2
Launched Process
P4
Registry Key Added
D4
RK1
High
Privileged
Injected
Process
powershell.exe
Connection
to C2 Server
Local Security Group
net.exe
net.exe
Local User
Registry Auto-StartKey
regedit.exe
D1
D3
D2
D4
Human Triage
Process-Tree Boundary
Machine Boundary
User-Session Boundary
Benign Detection Noise
Malicious Detection
Detection Corelation
Feature Extractor
Feature Vector
Feature Extractor
Feature Vector
MaliciousExample Benign Example
<Features>
<Feature Type ="Numeric" Signal=“Detection1" Operation="Count" Field="ProcessName" />
<Feature Type ="Numeric" Signal="Detection2" Operation="Max" Field="Score" />
<Feature Type ="Numeric" Signal=“Detection3" Operation="MaxSum" Field="Bytes,IP,Port" />
...
</Features>
Extracting Features
Training Set
[3.0, 2.0, 0.95, 3.0, 5455.0, 6345.0, 2.0, 0.73, … ] – Malicious
[0.5, 0.7, 0.95, 1.0, 1151.0, 1312.0, 1.0, 0.43, … ] – Benign
[0.2, 0.9, 0.23, 1.5, 1252.0, 2113.0, 0.9, 0.31, … ] – Benign
[2.3, 1.8, 0.89, 2.3, 4995.0, 5545.0, 1.9, 0.85, … ] – Malicious
…
Known Examples
New Activity
[1.3, 2.1, 0. 29, 1.3, 2791.0, 1595.0, 2.9, 0.95, … ] Malicious/Benign?
Random Forest
Machine Learning
The Challenge
In Applying
Supervised ML for
Defending
a Diverse Service
1. Not enough successfulmalicious examples in
all the diverse populations
→ For training a good model
2. Not enough benign examples in targeted
subpopulations
→ For accurately testing and validating model
Scarcity of Successful Attack Examples
• 438 compromised machine examples
Attack
Examples
Confirmed Attacks
Risky OCE Activity
Attack Automation
• Nearly half a million machines in our service, and growing
• For a given time window, 156K show benign anomalous behavior
0 20000 40000 60000 80000 100000 120000 140000 160000 180000
Benign
Malicious
Examples
Scarcity of Successful Attack Examples
For all the Diverse Populations We Protect
• Model Performance
• 156K Benign, 438 Malicious
• Area under ROC Curve = 0.971
• Area under PR Curve = 0.999
• Awesome model or Overfit?
Detections
Solution
Crafting synthetic attack examples from past
attacksfor all diverse populations we want to
protect
Extract Attack SignalsExtract Attack SignalsExtract Attack SignalsM1
M2
M1
M2
B1
B2
B3
M1B1
M1B2
M1B3
M2B1
M2B2
M2B3
Overlaid with Attack Signals
Anomaly Scoring
Extract Attack Signals
Sampled Benign
Cartesian Bootstrapping
Sample
Cartesian Bootstrapping - Distribution
Bootstrappedmaliciousexamples are more representativeof diverse target population
Detections
• Model Performance - Before
• 156K Benign, 438 Malicious Examples
• Area under ROC Curve = 0.971
• Area under PR Curve = 0.999
• Overfit model performingpoorlyin production
• Model Performance - After
• 156K Benign, 84K Malicious Examples
• Area under ROC Curve = 0.851
• Area under PR Curve = 0.848
• Balanced model performingwell in production
Cartesian Bootstrapping - Results
Malicious Activities Benign Activities
Synthetic
Attack Examples
438 156K
84K
Measuring model performance on targeted small
subpopulations
The Next
Challenge
Test on a
subpopulation
Learn on all
data
The model is trained on the entire dataset.
How does it perform on a small, targeted subpopulation?
Measuring Performance
Test on a
subpopulation
Learn on all
data
Does the model capture the behavior of smaller subpopulations?
Do we perform well on smaller services and smaller roles?
Each service has machines in different roles
Role
Count
Machinecount per role
in Exchange
Services(obfuscated,truncated)
Count
Machinecount per
service
Variance in services and roles
Role
Detection types per role in Exchange
Services
(obfuscated,
truncated)
Detection types per service
Variance in services and roles
Detection Type count (obfuscated) Detection Type count (obfuscated)
Images adapted from work by Walber
https://commons.wikimedia.org/wiki/File:Precisionrecall.svg
[CC BY-SA 4.0 (https://creativecommons.org/licenses/by-sa/4.0)],
from Wikimedia Commons
Ignore noise (precision)
Detect cyberattacks (recall)
Measuring success
A typical small subpopulation will have a low number of benign
machines and no examples of malicious machines
Building a dataset to validate the model
We need to borrow malicious examples from other services.
Validating on this dataset is problematic.
Building a dataset to validate the model
If we classify everything as malicious,
Precision = 22/27 = 0.814
Recall = 22/22 = 1
Building a dataset to validate the model
Instead, start by synthesizing benign examples.
Benign examples should equal malicious examples.
Building a dataset to validate the model
Use oversampling to generate synthetic benign machines.
These vary in some ways, but not others
Building a dataset to validate the model
Modify malicious examples by joining them with benign examples
Union of detections -> Synthetic malicious machine
(D1, D2) U (D1, D3) -> (D1, D2, D3)
Building a dataset to validate the model
Rare benign machines skew the data, and are amplified by resampling
Building a dataset to validate the model
Sample from the frequency
tables
Discard
outliers
What detections occur together,
and what are their counts?
Discard
outliers
How many types of detections
occur on a machine?
Craft synthetic benign machines
without representing outliers
Normalized
Sampling
Detection types per service
over a 10-day period
Normalized
Sampling
Detection type counts (obfuscated)
Services
(obfuscated,
truncated)
Services
(obfuscated,
truncated)
Number of detections
40,000??
Detections per machine
over a 10-day period
Normalized
Sampling
Detection Type Detection Count
Detection 1 33
Detection 2 16
Detection 3 17
Detection 4 5
Detection 6 3
Detection 7 37744
Detection 8 7
Detection 9 2
Detection 10 2304
… (etc) … (etc)
Detections on that machine over
the last 10 days
What’s going
on?
• Balance the classes by generating benign examples
simple oversampling or normalized sampling
• Modify malicious examples to reflect service background noise
In Summary
Takeaways
• Pen Test + Supervised ML effectively spots
known attacksagainst a diverse set of services
• Use Cartesian Bootstrapping to train the model
with diverse examples
• Use Normalized Sampling to validate the model
for targeted subpopulations

More Related Content

PDF
BlueHat v18 || Improving security posture through increased agility with meas...
PDF
BlueHat v18 || Scaling security scanning
PDF
The Joy of Proactive Security
PPTX
You Build It, You Secure It: Introduction to DevSecOps
PPTX
Making the Shift from DevOps to Practical DevSecOps | Sumo Logic Webinar
PPTX
Predictive Analytics based Regression Test Optimization
PDF
Using security to drive chaos engineering
PDF
A Practical Guide to Anomaly Detection for DevOps
BlueHat v18 || Improving security posture through increased agility with meas...
BlueHat v18 || Scaling security scanning
The Joy of Proactive Security
You Build It, You Secure It: Introduction to DevSecOps
Making the Shift from DevOps to Practical DevSecOps | Sumo Logic Webinar
Predictive Analytics based Regression Test Optimization
Using security to drive chaos engineering
A Practical Guide to Anomaly Detection for DevOps

What's hot (18)

PPTX
DockerCon SF 2019 - TDD is Dead
PPTX
DockerCon SF 2019 - Observability Workshop
PDF
Chaos engineering for cloud native security
PDF
Making Runtime Data Useful for Incident Diagnosis: An Experience Report
PDF
Finding bad apples early: Minimizing performance impact
PDF
Dev seccon london 2016 intelliment security
PPTX
Making Security Agile
PDF
ChaoSlingr: Introducing Security-Based Chaos Testing
PDF
A Pragmatic Union: Security and SRE
PDF
The Finest Penetration Testing Framework for Software-Defined Networks
PDF
Red team-view-gaps-in-the-serverless-application-attack-surface
PDF
Biting into the Jawbreaker: Pushing the Boundaries of Threat Hunting Automation
PDF
DevSecCon Asia 2017 Ofer Maor: AppSec DevOps automation – real world cases
PDF
MITRE ATT&CKcon 2018: Building an Atomic Testing Program, Brian Beyer, Red Ca...
PDF
AI & ML in Cyber Security - Why Algorithms are Dangerous
ODP
DevOps, CLI, APIs, Oh My! Security Gone Agile
PDF
Scaling Pinterest's Monitoring
PDF
Establishing a-quality-vulnerability-management-program
DockerCon SF 2019 - TDD is Dead
DockerCon SF 2019 - Observability Workshop
Chaos engineering for cloud native security
Making Runtime Data Useful for Incident Diagnosis: An Experience Report
Finding bad apples early: Minimizing performance impact
Dev seccon london 2016 intelliment security
Making Security Agile
ChaoSlingr: Introducing Security-Based Chaos Testing
A Pragmatic Union: Security and SRE
The Finest Penetration Testing Framework for Software-Defined Networks
Red team-view-gaps-in-the-serverless-application-attack-surface
Biting into the Jawbreaker: Pushing the Boundaries of Threat Hunting Automation
DevSecCon Asia 2017 Ofer Maor: AppSec DevOps automation – real world cases
MITRE ATT&CKcon 2018: Building an Atomic Testing Program, Brian Beyer, Red Ca...
AI & ML in Cyber Security - Why Algorithms are Dangerous
DevOps, CLI, APIs, Oh My! Security Gone Agile
Scaling Pinterest's Monitoring
Establishing a-quality-vulnerability-management-program
Ad

Similar to BlueHat v18 || Crafting synthetic attack examples from past cyber-attacks for applying supervised machine learning in cyber defense. (20)

PPTX
Application of Machine Learning in Cybersecurity
PDF
Machine learning cybersecurity boon or boondoggle
PPTX
Subverting Machine Learning Detections for fun and profit
PDF
BlueHat v18 || Protecting the protector, hardening machine learning defenses ...
PDF
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
PPTX
Webinar: Will the Real AI Please Stand Up?
PPTX
Machine learning in computer security
PDF
DataWorks 2018: How Big Data and AI Saved the Day
PDF
BlueHat v17 || Detecting Compromise on Windows Endpoints with Osquery
PDF
Protecting the Protector, Hardening Machine Learning Defenses Against Adversa...
PDF
Bringing Red vs. Blue to Machine Learning
PPTX
BsidesLVPresso2016_JZeditsv6
PDF
Adversarial machine learning for av software
PDF
DESSERTATION 4 SEM cybersecurity ensemble approach
PPTX
Machine learning cyphort_malware_most_wanted
PDF
Battista Biggio @ ECML PKDD 2013 - Evasion attacks against machine learning a...
PDF
Defcon 21-pinto-defending-networks-machine-learning by pseudor00t
PDF
Of Search Lights and Blind Spots: Machine Learning in Cybersecurity
PDF
AI & ML in Cyber Security - Why Algorithms are Dangerous
PPTX
ICMCSI 2023 PPT 1074.pptx
Application of Machine Learning in Cybersecurity
Machine learning cybersecurity boon or boondoggle
Subverting Machine Learning Detections for fun and profit
BlueHat v18 || Protecting the protector, hardening machine learning defenses ...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Webinar: Will the Real AI Please Stand Up?
Machine learning in computer security
DataWorks 2018: How Big Data and AI Saved the Day
BlueHat v17 || Detecting Compromise on Windows Endpoints with Osquery
Protecting the Protector, Hardening Machine Learning Defenses Against Adversa...
Bringing Red vs. Blue to Machine Learning
BsidesLVPresso2016_JZeditsv6
Adversarial machine learning for av software
DESSERTATION 4 SEM cybersecurity ensemble approach
Machine learning cyphort_malware_most_wanted
Battista Biggio @ ECML PKDD 2013 - Evasion attacks against machine learning a...
Defcon 21-pinto-defending-networks-machine-learning by pseudor00t
Of Search Lights and Blind Spots: Machine Learning in Cybersecurity
AI & ML in Cyber Security - Why Algorithms are Dangerous
ICMCSI 2023 PPT 1074.pptx
Ad

More from BlueHat Security Conference (20)

PDF
BlueHat Seattle 2019 || The cake is a lie! Uncovering the secret world of mal...
PDF
BlueHat Seattle 2019 || Keynote
PDF
BlueHat Seattle 2019 || Guarding Against Physical Attacks: The Xbox One Story
PDF
BlueHat Seattle 2019 || Kubernetes Practical Attack and Defense
PDF
BlueHat Seattle 2019 || Open Source Security, vulnerabilities never come alone
PDF
BlueHat Seattle 2019 || Modern Binary Analysis with ILs
PDF
BlueHat Seattle 2019 || Don't forget to SUBSCRIBE.
PDF
BlueHat Seattle 2019 || I'm in your cloud: A year of hacking Azure AD
PDF
BlueHat Seattle 2019 || Autopsies of Recent DFIR Investigations
PDF
BlueHat Seattle 2019 || The good, the bad & the ugly of ML based approaches f...
PDF
BlueHat Seattle 2019 || Are We There Yet: Why Does Application Security Take ...
PDF
BlueHat Seattle 2019 || Building Secure Machine Learning Pipelines: Security ...
PDF
BlueHat v18 || First strontium uefi rootkit unveiled
PDF
BlueHat v18 || WSL reloaded - Let's try to do better fuzzing
PDF
BlueHat v18 || The hitchhiker's guide to north korea's malware galaxy
PDF
BlueHat v18 || Retpoline - the anti-spectre (type 2) mitigation in windows
PDF
BlueHat v18 || Memory resident implants - code injection is alive and well
PDF
BlueHat v18 || Massive scale usb device driver fuzz without device
PDF
BlueHat v18 || Modern day entomology - examining the inner workings of the bu...
PDF
BlueHat v18 || The matrix has you - protecting linux using deception
BlueHat Seattle 2019 || The cake is a lie! Uncovering the secret world of mal...
BlueHat Seattle 2019 || Keynote
BlueHat Seattle 2019 || Guarding Against Physical Attacks: The Xbox One Story
BlueHat Seattle 2019 || Kubernetes Practical Attack and Defense
BlueHat Seattle 2019 || Open Source Security, vulnerabilities never come alone
BlueHat Seattle 2019 || Modern Binary Analysis with ILs
BlueHat Seattle 2019 || Don't forget to SUBSCRIBE.
BlueHat Seattle 2019 || I'm in your cloud: A year of hacking Azure AD
BlueHat Seattle 2019 || Autopsies of Recent DFIR Investigations
BlueHat Seattle 2019 || The good, the bad & the ugly of ML based approaches f...
BlueHat Seattle 2019 || Are We There Yet: Why Does Application Security Take ...
BlueHat Seattle 2019 || Building Secure Machine Learning Pipelines: Security ...
BlueHat v18 || First strontium uefi rootkit unveiled
BlueHat v18 || WSL reloaded - Let's try to do better fuzzing
BlueHat v18 || The hitchhiker's guide to north korea's malware galaxy
BlueHat v18 || Retpoline - the anti-spectre (type 2) mitigation in windows
BlueHat v18 || Memory resident implants - code injection is alive and well
BlueHat v18 || Massive scale usb device driver fuzz without device
BlueHat v18 || Modern day entomology - examining the inner workings of the bu...
BlueHat v18 || The matrix has you - protecting linux using deception

Recently uploaded (20)

PDF
Machine learning based COVID-19 study performance prediction
PPTX
Big Data Technologies - Introduction.pptx
PPTX
Spectroscopy.pptx food analysis technology
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PPTX
sap open course for s4hana steps from ECC to s4
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Empathic Computing: Creating Shared Understanding
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
KodekX | Application Modernization Development
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Spectral efficient network and resource selection model in 5G networks
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Encapsulation theory and applications.pdf
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Approach and Philosophy of On baking technology
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Machine learning based COVID-19 study performance prediction
Big Data Technologies - Introduction.pptx
Spectroscopy.pptx food analysis technology
Review of recent advances in non-invasive hemoglobin estimation
Diabetes mellitus diagnosis method based random forest with bat algorithm
Chapter 3 Spatial Domain Image Processing.pdf
sap open course for s4hana steps from ECC to s4
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Empathic Computing: Creating Shared Understanding
Network Security Unit 5.pdf for BCA BBA.
KodekX | Application Modernization Development
Building Integrated photovoltaic BIPV_UPV.pdf
Spectral efficient network and resource selection model in 5G networks
Understanding_Digital_Forensics_Presentation.pptx
NewMind AI Weekly Chronicles - August'25 Week I
Advanced methodologies resolving dimensionality complications for autism neur...
Encapsulation theory and applications.pdf
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Approach and Philosophy of On baking technology
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx

BlueHat v18 || Crafting synthetic attack examples from past cyber-attacks for applying supervised machine learning in cyber defense.

  • 1. Synthetic Examples For Applying Supervised Machine Learning In Cyber Defense Samuel Crisanto Naveed Ahmad Office 365 Security
  • 2. The Context • Not customer facing security • Not the typical IT setup with lot of chaos • Intrusion Detection in Data Center • Same software running on hundreds of thousands of machines doing similar things
  • 3. 0 1 2 3 4 5 8/25 8/27 8/29 8/31 9/02 Alerts Per Day + We Catch Attacks ~440 Billion Events Per Day Filter Rules Service Normal Hacker’s Activity Auto-Learn Catching Bad Actors 0 100000 200000 300000 400000 500000 8/25 8/27 8/29 8/31 9/2 Detection per Day Anomaly Rule 0 100000 200000 300000 400000 500000 8/25 8/27 8/29 8/31 9/2 Detection per Day Anomaly Rule
  • 4. • Can’t learn what we haven’t seen • But there is value in auto-learning what we have seen • With a world class Pen Test team you can auto-learn a lot Supervised ML – Limitations & Opportunities 0 1 2 3 4 5 8/25 8/27 8/29 8/31 9/02 Alerts Per Day + We Catch Attacks 0 100000 200000 300000 400000 500000 8/25 8/27 8/29 8/31 9/2 Detection per Day Anomaly Rule
  • 5. Transforming Human Problem in to ML ProblemThe Context
  • 6. P – Process U – User SG – Security Group RD – Remote Destination RK – Registry Key D – Detection Entity with one or more associated detections Entity without an associated detection Detection P2 Launched Process SG1 RD1 Established Connection D1 P1 Launched Process P3 Launched Process Created User P3 Added user to Security Group D3 U1 D2 Launched Process P4 Registry Key Added D4 RK1 High Privileged Injected Process powershell.exe Connection to C2 Server Local Security Group net.exe net.exe Local User Registry Auto-StartKey regedit.exe D1 D3 D2 D4 Human Triage
  • 7. Process-Tree Boundary Machine Boundary User-Session Boundary Benign Detection Noise Malicious Detection Detection Corelation Feature Extractor Feature Vector Feature Extractor Feature Vector MaliciousExample Benign Example <Features> <Feature Type ="Numeric" Signal=“Detection1" Operation="Count" Field="ProcessName" /> <Feature Type ="Numeric" Signal="Detection2" Operation="Max" Field="Score" /> <Feature Type ="Numeric" Signal=“Detection3" Operation="MaxSum" Field="Bytes,IP,Port" /> ... </Features> Extracting Features
  • 8. Training Set [3.0, 2.0, 0.95, 3.0, 5455.0, 6345.0, 2.0, 0.73, … ] – Malicious [0.5, 0.7, 0.95, 1.0, 1151.0, 1312.0, 1.0, 0.43, … ] – Benign [0.2, 0.9, 0.23, 1.5, 1252.0, 2113.0, 0.9, 0.31, … ] – Benign [2.3, 1.8, 0.89, 2.3, 4995.0, 5545.0, 1.9, 0.85, … ] – Malicious … Known Examples New Activity [1.3, 2.1, 0. 29, 1.3, 2791.0, 1595.0, 2.9, 0.95, … ] Malicious/Benign? Random Forest Machine Learning
  • 9. The Challenge In Applying Supervised ML for Defending a Diverse Service 1. Not enough successfulmalicious examples in all the diverse populations → For training a good model 2. Not enough benign examples in targeted subpopulations → For accurately testing and validating model
  • 10. Scarcity of Successful Attack Examples • 438 compromised machine examples Attack Examples Confirmed Attacks Risky OCE Activity Attack Automation • Nearly half a million machines in our service, and growing • For a given time window, 156K show benign anomalous behavior 0 20000 40000 60000 80000 100000 120000 140000 160000 180000 Benign Malicious Examples
  • 11. Scarcity of Successful Attack Examples For all the Diverse Populations We Protect • Model Performance • 156K Benign, 438 Malicious • Area under ROC Curve = 0.971 • Area under PR Curve = 0.999 • Awesome model or Overfit? Detections
  • 12. Solution Crafting synthetic attack examples from past attacksfor all diverse populations we want to protect
  • 13. Extract Attack SignalsExtract Attack SignalsExtract Attack SignalsM1 M2 M1 M2 B1 B2 B3 M1B1 M1B2 M1B3 M2B1 M2B2 M2B3 Overlaid with Attack Signals Anomaly Scoring Extract Attack Signals Sampled Benign Cartesian Bootstrapping Sample
  • 14. Cartesian Bootstrapping - Distribution Bootstrappedmaliciousexamples are more representativeof diverse target population Detections
  • 15. • Model Performance - Before • 156K Benign, 438 Malicious Examples • Area under ROC Curve = 0.971 • Area under PR Curve = 0.999 • Overfit model performingpoorlyin production • Model Performance - After • 156K Benign, 84K Malicious Examples • Area under ROC Curve = 0.851 • Area under PR Curve = 0.848 • Balanced model performingwell in production Cartesian Bootstrapping - Results Malicious Activities Benign Activities Synthetic Attack Examples 438 156K 84K
  • 16. Measuring model performance on targeted small subpopulations The Next Challenge
  • 17. Test on a subpopulation Learn on all data The model is trained on the entire dataset. How does it perform on a small, targeted subpopulation? Measuring Performance
  • 18. Test on a subpopulation Learn on all data Does the model capture the behavior of smaller subpopulations? Do we perform well on smaller services and smaller roles? Each service has machines in different roles
  • 19. Role Count Machinecount per role in Exchange Services(obfuscated,truncated) Count Machinecount per service Variance in services and roles
  • 20. Role Detection types per role in Exchange Services (obfuscated, truncated) Detection types per service Variance in services and roles Detection Type count (obfuscated) Detection Type count (obfuscated)
  • 21. Images adapted from work by Walber https://commons.wikimedia.org/wiki/File:Precisionrecall.svg [CC BY-SA 4.0 (https://creativecommons.org/licenses/by-sa/4.0)], from Wikimedia Commons Ignore noise (precision) Detect cyberattacks (recall) Measuring success
  • 22. A typical small subpopulation will have a low number of benign machines and no examples of malicious machines Building a dataset to validate the model
  • 23. We need to borrow malicious examples from other services. Validating on this dataset is problematic. Building a dataset to validate the model
  • 24. If we classify everything as malicious, Precision = 22/27 = 0.814 Recall = 22/22 = 1 Building a dataset to validate the model
  • 25. Instead, start by synthesizing benign examples. Benign examples should equal malicious examples. Building a dataset to validate the model
  • 26. Use oversampling to generate synthetic benign machines. These vary in some ways, but not others Building a dataset to validate the model
  • 27. Modify malicious examples by joining them with benign examples Union of detections -> Synthetic malicious machine (D1, D2) U (D1, D3) -> (D1, D2, D3) Building a dataset to validate the model
  • 28. Rare benign machines skew the data, and are amplified by resampling Building a dataset to validate the model
  • 29. Sample from the frequency tables Discard outliers What detections occur together, and what are their counts? Discard outliers How many types of detections occur on a machine? Craft synthetic benign machines without representing outliers Normalized Sampling
  • 30. Detection types per service over a 10-day period Normalized Sampling Detection type counts (obfuscated) Services (obfuscated, truncated)
  • 31. Services (obfuscated, truncated) Number of detections 40,000?? Detections per machine over a 10-day period Normalized Sampling
  • 32. Detection Type Detection Count Detection 1 33 Detection 2 16 Detection 3 17 Detection 4 5 Detection 6 3 Detection 7 37744 Detection 8 7 Detection 9 2 Detection 10 2304 … (etc) … (etc) Detections on that machine over the last 10 days What’s going on?
  • 33. • Balance the classes by generating benign examples simple oversampling or normalized sampling • Modify malicious examples to reflect service background noise In Summary
  • 34. Takeaways • Pen Test + Supervised ML effectively spots known attacksagainst a diverse set of services • Use Cartesian Bootstrapping to train the model with diverse examples • Use Normalized Sampling to validate the model for targeted subpopulations