SlideShare a Scribd company logo
Cyber Security + Data Science
NYU Data Science and Analytics Club
Brennan Lodge
September 26, 2018
Agenda
• The Target Breach
• Cyber Security is hard
• A Cyber Security Analyst and the Security Operations Center
• A Data Science Lens to Cyber Security
• Visualizations and Reporting
• A data science model for cyber security
• NYU Data Science and Analytics Club plans
Cyber Security and Data Science
Cyber Security and Data Science
+ Ransomeware
+ Spear Phishing
+ Crypto
+ Geopolitical
Cyber Security is hard
Cyber Security and Data Science
Cyber Security and Data Science
Cyber Security and Data Science
Cyber Security and Data Science
Cyber Security and Data Science
Logs, Events, Cases and Operations
• Event Workflow
• Many, many, many events
• Correlation
• High, Medium, Low?
• Case Workflow
• Investigation
• Escalate
• Review
SEIM
Cyber Security and Data Science
Problem Statement
• Problem Statement
• Many Detections but a lot of noise
• How?
• Understand the detection rule inventory to identify fidelity in the rule set to
reveal noise
• Solution
• By using data science methods, we can empower analysts to work on higher
fidelity alerts
The balance of analysts and defense
Defense & Detections
Data Scientist
Hackers
Cyber Security Analysts
Insiders
Data Science Project workflow
Business Understanding
& Requirements
Gathering
Data Discovery, Data
Understanding and Use
Case Development
Data Preparation,
Transformation and
prototyping
Analytics Delivery
Measurements and
Evaluation
Actionable data-driven
insights
Data
• Get feedback on analytics to
be implemented as reports
for the SOC
• Held a SEIM application and
a data session with Sentinel
development team
Data
Business Understanding
& Requirements
Gathering
Data Discovery, Data
Understanding and Use
Case Development
Data Preparation,
Transformation and
prototyping
Analytics Delivery
Measurements and
Evaluation
Actionable data-driven
insights
• Met with SOC Leadership to
identify analytics opportunities
• Operational
Analytics
• Case Detection
Correlation
• Identified SEIM Data in the
Data Lake and reviewed
data dictionary with SEIM
dev team
• Leveraged Juptyer
Notebook and SQL queries
to pull Sentinel Data and
transform data into
analytics• Delivered analytics
prototypes to support the
use cases and present
findings to SOC leadership
• Analytics provided to SOC to
measure, track, monitor
and make better data
driven decisions
Operational Metrics
qRun Rate ?
q The operations
q Shift for analyst = 8 hours
q Number of Cases Created
Per Day
q Sustainable Production = #
cases per hour
q Current Production =
Number of cases per hour / #
of analysts
q Difference = + or –
Rule Detection Analytics
0%
5%
10%
15%
20%
25%
30%
35%
40%
0 20 40 60 80 100 120 140 160
Detection Rules Fired vs Threat Rate
Threat Rate
More events fired per
detection rule…
…the higher threat
detection rate
Model
SIEM Data
Data Source Transformation Services End Services
Table Merges
SQL Queries
Python Scripts
Data Ingestion
Python pickle file
Modeling
Dashbaord
Reporting
Intrusion Detection Log workflow
Detecting malware from an Email
Logistic Regression to evaluate detections
Examples of features and the dependent variable for scoring
detections:
Independent variables
• Sensors
• Detection Rules
• Day of the week an event has made a detection
• Is the event correlated with another event
• Priority / Severity of the event as scored by a sensor
Dependent Variables
• Threat or Non-Threat based off of an analysts decision of closing an
investigation
Visualize
Python - Plot.ly Dash
R – Shiny
Splunk
Splunk
Questions ??
NYU Data Science and Analytics Club
Appendix
Cyber Security and Data Science
Cyber Security and Data Science

More Related Content

PDF
Azure Digital Twins.pdf
PPTX
SOCstock 2021 The Cloud-native SOC
PDF
generative-ai-fundamentals and Large language models
PPTX
Security operation center (SOC)
PDF
Building Data Products with BigQuery for PPC and SEO (SMX 2022)
PDF
OLAP IN DATA MINING
PDF
Bab 6 (understanding it infrastructure)
PDF
Introduction to Data Engineer and Data Pipeline at Credit OK
Azure Digital Twins.pdf
SOCstock 2021 The Cloud-native SOC
generative-ai-fundamentals and Large language models
Security operation center (SOC)
Building Data Products with BigQuery for PPC and SEO (SMX 2022)
OLAP IN DATA MINING
Bab 6 (understanding it infrastructure)
Introduction to Data Engineer and Data Pipeline at Credit OK

What's hot (20)

PDF
Scaling Ride-Hailing with Machine Learning on MLflow
PDF
ERP Integration and Data Migration
PDF
Introduction on Data Science
PDF
Building an Analytics Enables SOC
PDF
Unlocking the Power of Generative AI An Executive's Guide.pdf
PPTX
Introduction of Data Science
PDF
Bringing ML To Production, What Is Missing? AMLD 2020
PPTX
EDR vs SIEM - The fight is on
PPTX
Streamline Data Governance with Egeria: The Industry's First Open Metadata St...
PDF
Knowledge Graphs, Ontologies, and AI Applications
PDF
Observability, Distributed Tracing, and Open Source: The Missing Primer
PDF
Cybersecurity Capability Maturity Model Self-Evaluation Report Jan 27 2023.pdf
PDF
AI and Cybersecurity - Food for Thought
PDF
Lessons learned from building practical deep learning systems
PPTX
IT Audit - Shadow IT Systems
PPTX
API Management Within a Microservices Architecture
PDF
PPTX
Cyber Threat Intelligence | Information to Insight
PPTX
OpenTelemetry For Architects
PDF
The Data Platform for Today’s Intelligent Applications
Scaling Ride-Hailing with Machine Learning on MLflow
ERP Integration and Data Migration
Introduction on Data Science
Building an Analytics Enables SOC
Unlocking the Power of Generative AI An Executive's Guide.pdf
Introduction of Data Science
Bringing ML To Production, What Is Missing? AMLD 2020
EDR vs SIEM - The fight is on
Streamline Data Governance with Egeria: The Industry's First Open Metadata St...
Knowledge Graphs, Ontologies, and AI Applications
Observability, Distributed Tracing, and Open Source: The Missing Primer
Cybersecurity Capability Maturity Model Self-Evaluation Report Jan 27 2023.pdf
AI and Cybersecurity - Food for Thought
Lessons learned from building practical deep learning systems
IT Audit - Shadow IT Systems
API Management Within a Microservices Architecture
Cyber Threat Intelligence | Information to Insight
OpenTelemetry For Architects
The Data Platform for Today’s Intelligent Applications
Ad

Similar to Cyber Security and Data Science (20)

PDF
Constant Contact: An Online Marketing Leader’s Data Lake Journey
PDF
IT Operation Analytic for security- MiSSconf(sp1)
PDF
Finding the needle in the haystack: how Nestle is leveraging big data to defe...
PPTX
Heureka Webinar – Security, the Growth Engine for eDiscovery Professionals
PDF
CaseWare Monitor - New in 5.4 Release
PPTX
351315535-Module-1-Intro-to-Data-Science-pptx.pptx
PPTX
Data_Analytics_Types_Presentation12.pptx
PPTX
20160000 Cloud Discovery Event - Cloud Access Security Brokers
PPTX
E discovery Process Improvement
PDF
CNIT 160 4d Security Program Management (Part 4)
PPTX
Digital Forensics Triage and Cyber Security
PPTX
How to Use Big Data to Transform IT Operations
PPTX
Netrascale Regtech Presentation Saeed.pptx
PPT
SplunkLive! Cincinnati - Hurricane Labs - Oct 2012
PDF
IOUG93 - Technical Architecture for the Data Warehouse - Presentation
PPTX
Data Connectors San Antonio Cybersecurity Conference 2018
PDF
Steps in it audit
PDF
CNIT 160 4d Security Program Management (Part 4)
PDF
lec1.pdf
PDF
Advanced Project Data Analytics for Improved Project Delivery
Constant Contact: An Online Marketing Leader’s Data Lake Journey
IT Operation Analytic for security- MiSSconf(sp1)
Finding the needle in the haystack: how Nestle is leveraging big data to defe...
Heureka Webinar – Security, the Growth Engine for eDiscovery Professionals
CaseWare Monitor - New in 5.4 Release
351315535-Module-1-Intro-to-Data-Science-pptx.pptx
Data_Analytics_Types_Presentation12.pptx
20160000 Cloud Discovery Event - Cloud Access Security Brokers
E discovery Process Improvement
CNIT 160 4d Security Program Management (Part 4)
Digital Forensics Triage and Cyber Security
How to Use Big Data to Transform IT Operations
Netrascale Regtech Presentation Saeed.pptx
SplunkLive! Cincinnati - Hurricane Labs - Oct 2012
IOUG93 - Technical Architecture for the Data Warehouse - Presentation
Data Connectors San Antonio Cybersecurity Conference 2018
Steps in it audit
CNIT 160 4d Security Program Management (Part 4)
lec1.pdf
Advanced Project Data Analytics for Improved Project Delivery
Ad

Recently uploaded (20)

PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PDF
Lecture1 pattern recognition............
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PDF
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
PPT
Quality review (1)_presentation of this 21
PPTX
Database Infoormation System (DBIS).pptx
PPTX
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PPTX
Moving the Public Sector (Government) to a Digital Adoption
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PPTX
1_Introduction to advance data techniques.pptx
PPTX
Supervised vs unsupervised machine learning algorithms
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PDF
Introduction to Business Data Analytics.
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PDF
Mega Projects Data Mega Projects Data
PPT
Chapter 2 METAL FORMINGhhhhhhhjjjjmmmmmmmmm
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
Lecture1 pattern recognition............
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
Acceptance and paychological effects of mandatory extra coach I classes.pptx
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
Quality review (1)_presentation of this 21
Database Infoormation System (DBIS).pptx
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
Data_Analytics_and_PowerBI_Presentation.pptx
Moving the Public Sector (Government) to a Digital Adoption
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
1_Introduction to advance data techniques.pptx
Supervised vs unsupervised machine learning algorithms
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
Introduction to Business Data Analytics.
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
Mega Projects Data Mega Projects Data
Chapter 2 METAL FORMINGhhhhhhhjjjjmmmmmmmmm
The THESIS FINAL-DEFENSE-PRESENTATION.pptx

Cyber Security and Data Science

  • 1. Cyber Security + Data Science NYU Data Science and Analytics Club Brennan Lodge September 26, 2018
  • 2. Agenda • The Target Breach • Cyber Security is hard • A Cyber Security Analyst and the Security Operations Center • A Data Science Lens to Cyber Security • Visualizations and Reporting • A data science model for cyber security • NYU Data Science and Analytics Club plans
  • 5. + Ransomeware + Spear Phishing + Crypto + Geopolitical
  • 12. Logs, Events, Cases and Operations • Event Workflow • Many, many, many events • Correlation • High, Medium, Low? • Case Workflow • Investigation • Escalate • Review SEIM
  • 14. Problem Statement • Problem Statement • Many Detections but a lot of noise • How? • Understand the detection rule inventory to identify fidelity in the rule set to reveal noise • Solution • By using data science methods, we can empower analysts to work on higher fidelity alerts
  • 15. The balance of analysts and defense Defense & Detections Data Scientist Hackers Cyber Security Analysts Insiders
  • 16. Data Science Project workflow Business Understanding & Requirements Gathering Data Discovery, Data Understanding and Use Case Development Data Preparation, Transformation and prototyping Analytics Delivery Measurements and Evaluation Actionable data-driven insights Data
  • 17. • Get feedback on analytics to be implemented as reports for the SOC • Held a SEIM application and a data session with Sentinel development team Data Business Understanding & Requirements Gathering Data Discovery, Data Understanding and Use Case Development Data Preparation, Transformation and prototyping Analytics Delivery Measurements and Evaluation Actionable data-driven insights • Met with SOC Leadership to identify analytics opportunities • Operational Analytics • Case Detection Correlation • Identified SEIM Data in the Data Lake and reviewed data dictionary with SEIM dev team • Leveraged Juptyer Notebook and SQL queries to pull Sentinel Data and transform data into analytics• Delivered analytics prototypes to support the use cases and present findings to SOC leadership • Analytics provided to SOC to measure, track, monitor and make better data driven decisions
  • 18. Operational Metrics qRun Rate ? q The operations q Shift for analyst = 8 hours q Number of Cases Created Per Day q Sustainable Production = # cases per hour q Current Production = Number of cases per hour / # of analysts q Difference = + or –
  • 19. Rule Detection Analytics 0% 5% 10% 15% 20% 25% 30% 35% 40% 0 20 40 60 80 100 120 140 160 Detection Rules Fired vs Threat Rate Threat Rate More events fired per detection rule… …the higher threat detection rate
  • 20. Model
  • 21. SIEM Data Data Source Transformation Services End Services Table Merges SQL Queries Python Scripts Data Ingestion Python pickle file Modeling Dashbaord Reporting
  • 24. Logistic Regression to evaluate detections Examples of features and the dependent variable for scoring detections: Independent variables • Sensors • Detection Rules • Day of the week an event has made a detection • Is the event correlated with another event • Priority / Severity of the event as scored by a sensor Dependent Variables • Threat or Non-Threat based off of an analysts decision of closing an investigation
  • 31. NYU Data Science and Analytics Club