SlideShare a Scribd company logo
Unleashing your
Security Practitioners
with Data First
Architectures
Empower SIEMs Like Splunk with the
Lakehouse for Cybersecurity
©HSBC Group 2021
George Webster
Global Head of Science and Analytics
Jason Trost
Head of Analytic Engines
Monzy Merza
VP of Cybersecurity Go-To-Market
• Focused on large-scale data
analytics, the offense mindset,
and arguing for budget
• From the DoD, CIA, Academia,
and financial services
• Loves to cook and it shows
• His picture is from a mandated
professional headshot
• Focused on developing
capabilities for network security,
DFIR, and security data science
• From DoD, start-ups, and
financial services
• Strong meme game
• No sane bouncer would even
accept that picture
• Focused on serving
cybersecurity teams by
demystifying cloud scale
security analytics
• From National Labs, Splunk
• Loves green chili
• He did the picture himself but it
looks pretty good
©HSBC Group 2021
Legalieeeeeeeez
The numbers are citable references from publically
available sources, peer reviewed material and
major publications believed to be reliable, but it has
not been independently verified by HSBC or
Databricks. The numbers are not from HSBC or
Databricks.
The examples and demonstrations are samples or
illustrative representations. They are not original
code from HSBC or Databricks
We will be discussing concepts and patterns
generally, these are not actual implementations
from HSBC or Databricks.
We are not vouching for or promoting particular
tooling. We are using our experience of working
with certain tools to explain patterns and principles
that can be applied through various means and
tools available on the market.
Photograph is public domain. License info:
All photos published on Unsplash are licensed under Creative Commons Zero which means you can copy, modify, distribute
and use the photos for free, including commercial purposes, without asking permission from or providing attribution to the
photographer or Unsplash.
CREATIVE COMMONS ZERO: http://creativecommons.org/publicdomain/zero/1.0/
©HSBC Group 2021
~64 Countries
$2.984 Trillion
Total Assets
HSBC is a multinational investment and
financial services company. Founded in 1865
40Million Customers
226,000
Employees
©HSBC Group 2021
Cybersecurity Science and Analytics empowers Cybersecurity teams in protecting the
bank by leveraging data and innovative capabilities to create effective and proactive
security measures as well as enabling data driven business decisions
The office’s primary objective is to empower our people, processes, and technology and
enabling the analysts of the future
©HSBC Group 2021
HOURS VS DAYS
24 hours
Average time for an attacker to
move from victim A to victim B
*Industry averages from peer reviewed sources
200days
For defenders to detect malicious activity
54days
to perform an investigation POST
detection
©HSBC Group 2021
SIEMs Aren’t Setup for Success!
100+ security tools
Data locked in vendor tools
Marginal integrations
SOC humans compensating for
analytical deficiencies
SIEM
Patching
Etc
EDR
AV
AV
Proxy
SOC
IDS
Firewall
Code
Scans
Endpoint Agents
Email
DLP
IAM
Etc
EDR
AV
CMDB
Inventories
Intel
….
Vuln
Network
©HSBC Group 2021
Cybersecurity is a massive big data problem (Cost and Capability)
50-100
TB/day
Endpoint Detection &
Response logs
40-50
TB/day
Network Sensors
20+
TB/day
AWS VPC Flow
5-10
TB/day
AWS Cloudtrail &
Cloudtrail Data Events
100-200
TB/day
Total Log Ingestion
x 13 months
= 38-79 PB
Retention
*Numbers are representative of a large enterprise network
©HSBC Group 2021
COST
COMPLEXITY
CLOUD
©HSBC Group 2021
HSBC: Control the Data - Unlock your People
Cost effective
Data unlocked
Enables Analytics
Empowers People
SIEM
Capabilities
Lakehouse
Etc
EDR
AV
Sensors
ETL
Etc
EDR
AV
Enrichment
Etc
EDR
AV
Tools
Crushing It
©HSBC Group 2021
Use Case
Deep Dive
©HSBC Group 2021
Use-case: Threat Detection in DNS Data
● DNS logs (~10TB/day)
● Near real-time detection
● Use ML, Rules, and threat
intel enrichments
● Send alerts to SIEM
©HSBC Group 2021
DNS Threat Detection Recipe
Streaming Data
- Passive DNS
Enrichment sources
- Threat intel feed
- Geo IP
- Look alike domains
Detections
- Domain Generation Algorithm
(DGA) domains
- Look alike domain name
generation
Deployment
- Streaming
Passive
DNS
S3
Spark
Streaming.
Model Scoring
Malicious Activity
Benign
©HSBC Group 2021
DNS Threat Detection in Near Time
DNS Data
10 TB/ day
AWS S3
Queries
Results
*Simplified view focused only on SIEM
Ingest, ETL,
normalize
Store, Enrich,
Optimize
SQL Analytics
Query, Report
Classify, Alert
Alert
management
ops workflow
SIEM
Alerts
MB’s to GB’s/day
Feedback
©HSBC Group 2021
Benefits of Approach
Scale & Speed
● Process ~10TB of DNS logs/day
● Augment SIEM economically
● Leverage advanced analytics & ML
● Near real-time detection of DNS threats
©HSBC Group 2021
Large-scale Threat Hunting
Sift through cybersecurity log data in order to find signs of malicious activity,
both current and historical, that have evaded existing security defenses
At Pace
Explore large
amounts of
historic logs
Correlate
activity across
log sources
Leverage
Analytics,
Anomaly
Detection, ML
Repeatable, self
documenting,
Team oriented
At Scale
©HSBC Group 2021
Hypothetical Threat Hunt Operation
• A new mass supply chain attack is discovered and the details of this activity are made
public in a government threat intelligence report. Many details are released including
domain names, IP addresses, file hashes of malware, and detailed lists of tactics,
techniques, and procedures (TTPs) observed, but the report claims the activity started
one year ago.
• Threat Hunt Objective
– Is the adversary in our network now?
– Was the adversary ever in our network?
• Scope
– Timeframe: 12 months
©HSBC Group 2021
How do we execute this Threat Hunt Operation?
• The SIEM is where security data lives in most large enterprises
• But 12 months of EDR and network logs is massive, likely several PB’s of data.
• And most SIEMs are:
– not designed for large and complex historical searching over petabytes of log data
– don’t support Many-to-Many JOINs very well
– don’t adequately support ML/AI use cases, esp at scale
• We need a better tool ..
©HSBC Group 2021
Threat Hunting using Spark + Delta Lake
Cheap cloud
storage +
Delta Lake
for ingestion
Cloud Logs
Endpoint Logs
Network Logs
Extreme
volume logs
>100 TB/day
Easily query and
search historic data
in Delta Lake
Databricks
Notebooks
Threat Hunter develops
Databricks Notebooks to
codify the Hunting
operation
Queries
Results
Elasticity of
Cloud
compute
©HSBC Group 2021
Benefits of Approach
Speed
● Perform advanced analytics at the pace and
speed of the adversary
● Hunts are reusable and self documenting
through Notebooks
● Anticipate that we can execute 2-3x more
hunts per analyst because they are no longer
bound by hardware
Scale
● Handle processing all required data,
>100TB/day
● Increase online queryable retention from
days to many months and PB scale
● Anticipate that the scopes of the hunts can
be much larger due to increased data
retention
©HSBC Group 2021
Monzy
©HSBC Group 2021
Demo Time!
• I am going to show you how
– Multiple personas can use Databricks
– You can download the notebook.
• Demos coming up …
– Detection via DNS in action!
– DNS recipe code segments
– Threat hunting in action! Pile of IOCs
– Splunk integration - query and search results
©HSBC Group 2021
Splunk Integration
©HSBC Group 2021
Query Databricks from Splunk UI
©HSBC Group 2021
©HSBC Group 2021
Conclusion
©HSBC Group 2021
Key Takeaways
• There is a time horizon gulf between attackers and defenders
• Legacy SIEMs are not good for the Petabyte data world
• Lakehouse architecture is transforming HSBC’s cyberdefense
• These methods unlock your security teams and save your budget
©HSBC Group 2021
What’s next
Check out the deep dive demos:
- Detecting cyber criminals using Databricks
- Multicloud security operations with Splunk + Databricks
Schedule a hands on training : email cybersecurity@databricks.com
Try the DNS Notebook
Send me a note: monzy@databricks.com
Send HSBC a note: cybersecurity@hsbc.com

More Related Content

PPT
Data loss prevention (dlp)
PDF
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
PDF
Data Loss Prevention (DLP) - Fundamental Concept - Eryk
PDF
Data Modeling Techniques
PDF
Time to Talk about Data Mesh
PPTX
Snowflake Architecture.pptx
PPTX
Data Warehousing Trends, Best Practices, and Future Outlook
PPTX
Demystifying Data Warehouse as a Service
Data loss prevention (dlp)
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
Data Loss Prevention (DLP) - Fundamental Concept - Eryk
Data Modeling Techniques
Time to Talk about Data Mesh
Snowflake Architecture.pptx
Data Warehousing Trends, Best Practices, and Future Outlook
Demystifying Data Warehouse as a Service

What's hot (20)

PDF
DataMinds 2022 Azure Purview Erwin de Kreuk
PDF
Keamanan Informasi Metaverse - 18 Juni 2022.pdf
PDF
Conceptual vs. Logical vs. Physical Data Modeling
PPTX
Introducing the Snowflake Computing Cloud Data Warehouse
PPTX
Elastic Data Warehousing
PDF
Modern Data architecture Design
PPTX
Data Lakehouse Symposium | Day 1 | Part 1
PDF
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
PDF
Getting Started with Delta Lake on Databricks
PDF
Global Cyber Threat Intelligence
PPSX
Next-Gen security operation center
PDF
Snowflake Data Science and AI/ML at Scale
PDF
Threat Hunting
PDF
Data Warehouse or Data Lake, Which Do I Choose?
PDF
DI&A Slides: Data Lake vs. Data Warehouse
PPTX
A 30 day plan to start ending your data struggle with Snowflake
PDF
SIEM and Threat Hunting
PDF
Microsoft Data Warehouse Business Intelligence Lifecycle - The Kimball Approach
PPTX
Databricks for Dummies
PDF
Introduction SQL Analytics on Lakehouse Architecture
DataMinds 2022 Azure Purview Erwin de Kreuk
Keamanan Informasi Metaverse - 18 Juni 2022.pdf
Conceptual vs. Logical vs. Physical Data Modeling
Introducing the Snowflake Computing Cloud Data Warehouse
Elastic Data Warehousing
Modern Data architecture Design
Data Lakehouse Symposium | Day 1 | Part 1
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Getting Started with Delta Lake on Databricks
Global Cyber Threat Intelligence
Next-Gen security operation center
Snowflake Data Science and AI/ML at Scale
Threat Hunting
Data Warehouse or Data Lake, Which Do I Choose?
DI&A Slides: Data Lake vs. Data Warehouse
A 30 day plan to start ending your data struggle with Snowflake
SIEM and Threat Hunting
Microsoft Data Warehouse Business Intelligence Lifecycle - The Kimball Approach
Databricks for Dummies
Introduction SQL Analytics on Lakehouse Architecture
Ad

Similar to Empower Splunk and other SIEMs with the Databricks Lakehouse for Cybersecurity (20)

PDF
2022 Trends in Enterprise Analytics
PDF
Bridging the Gap: Analyzing Data in and Below the Cloud
PPTX
Webinar The IT-Verse
PDF
Secure Your Data with Fidelis Network® for DLP
PDF
FirstEigen Brochure- All clouds.pdf
PPTX
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
PPTX
Building Confidence in Big Data - IBM Smarter Business 2013
PPTX
Key Capibilities.pptx
PPT
AWS Summit Berlin 2013 - Big Data Analytics
PDF
System Security on Cloud
PPTX
MIT-MON Day4 Context.pptx
PPTX
Big Data, NoSQL, NewSQL & The Future of Data Management
PDF
PLNOG19 - Gaweł Mikołajczyk & Michał Garcarz - SOC, studium ciężkich przypadków
PPTX
GDPR Part 5: Better Together Quest & Cyberquest
PPTX
Solnet dev secops meetup
PPTX
Insurtech, Cloud and Cybersecurity - Chartered Insurance Institute
PDF
Applying Auto-Data Classification Techniques for Large Data Sets
PDF
Moving Targets: Harnessing Real-time Value from Data in Motion
PDF
TIC-TOC: Disrupt the Threat Management Conversation with Dominique Singer and...
PDF
NETWORK SECURITY MONITORING WITH BIG DATA ANALYTICS - Nguyễn Minh Đức
2022 Trends in Enterprise Analytics
Bridging the Gap: Analyzing Data in and Below the Cloud
Webinar The IT-Verse
Secure Your Data with Fidelis Network® for DLP
FirstEigen Brochure- All clouds.pdf
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building Confidence in Big Data - IBM Smarter Business 2013
Key Capibilities.pptx
AWS Summit Berlin 2013 - Big Data Analytics
System Security on Cloud
MIT-MON Day4 Context.pptx
Big Data, NoSQL, NewSQL & The Future of Data Management
PLNOG19 - Gaweł Mikołajczyk & Michał Garcarz - SOC, studium ciężkich przypadków
GDPR Part 5: Better Together Quest & Cyberquest
Solnet dev secops meetup
Insurtech, Cloud and Cybersecurity - Chartered Insurance Institute
Applying Auto-Data Classification Techniques for Large Data Sets
Moving Targets: Harnessing Real-time Value from Data in Motion
TIC-TOC: Disrupt the Threat Management Conversation with Dominique Singer and...
NETWORK SECURITY MONITORING WITH BIG DATA ANALYTICS - Nguyễn Minh Đức
Ad

More from Databricks (20)

PPTX
DW Migration Webinar-March 2022.pptx
PPT
Data Lakehouse Symposium | Day 1 | Part 2
PPTX
Data Lakehouse Symposium | Day 2
PPTX
Data Lakehouse Symposium | Day 4
PDF
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
PDF
Democratizing Data Quality Through a Centralized Platform
PDF
Learn to Use Databricks for Data Science
PDF
Why APM Is Not the Same As ML Monitoring
PDF
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
PDF
Stage Level Scheduling Improving Big Data and AI Integration
PDF
Simplify Data Conversion from Spark to TensorFlow and PyTorch
PDF
Scaling your Data Pipelines with Apache Spark on Kubernetes
PDF
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
PDF
Sawtooth Windows for Feature Aggregations
PDF
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
PDF
Re-imagine Data Monitoring with whylogs and Spark
PDF
Raven: End-to-end Optimization of ML Prediction Queries
PDF
Processing Large Datasets for ADAS Applications using Apache Spark
PDF
Massive Data Processing in Adobe Using Delta Lake
PDF
Machine Learning CI/CD for Email Attack Detection
DW Migration Webinar-March 2022.pptx
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 4
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Democratizing Data Quality Through a Centralized Platform
Learn to Use Databricks for Data Science
Why APM Is Not the Same As ML Monitoring
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
Stage Level Scheduling Improving Big Data and AI Integration
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Sawtooth Windows for Feature Aggregations
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Re-imagine Data Monitoring with whylogs and Spark
Raven: End-to-end Optimization of ML Prediction Queries
Processing Large Datasets for ADAS Applications using Apache Spark
Massive Data Processing in Adobe Using Delta Lake
Machine Learning CI/CD for Email Attack Detection

Recently uploaded (20)

PDF
Capcut Pro Crack For PC Latest Version {Fully Unlocked 2025}
PDF
How to run a consulting project- client discovery
PPTX
importance of Data-Visualization-in-Data-Science. for mba studnts
PDF
Data Engineering Interview Questions & Answers Cloud Data Stacks (AWS, Azure,...
PPTX
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
PDF
REAL ILLUMINATI AGENT IN KAMPALA UGANDA CALL ON+256765750853/0705037305
PDF
Microsoft Core Cloud Services powerpoint
PDF
[EN] Industrial Machine Downtime Prediction
PPTX
Database Infoormation System (DBIS).pptx
PDF
Optimise Shopper Experiences with a Strong Data Estate.pdf
PPTX
sac 451hinhgsgshssjsjsjheegdggeegegdggddgeg.pptx
PPTX
STERILIZATION AND DISINFECTION-1.ppthhhbx
PDF
Systems Analysis and Design, 12th Edition by Scott Tilley Test Bank.pdf
PPTX
Introduction to Inferential Statistics.pptx
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PDF
Votre score augmente si vous choisissez une catégorie et que vous rédigez une...
PPTX
CYBER SECURITY the Next Warefare Tactics
PPTX
retention in jsjsksksksnbsndjddjdnFPD.pptx
PPTX
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
PDF
Data Engineering Interview Questions & Answers Batch Processing (Spark, Hadoo...
Capcut Pro Crack For PC Latest Version {Fully Unlocked 2025}
How to run a consulting project- client discovery
importance of Data-Visualization-in-Data-Science. for mba studnts
Data Engineering Interview Questions & Answers Cloud Data Stacks (AWS, Azure,...
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
REAL ILLUMINATI AGENT IN KAMPALA UGANDA CALL ON+256765750853/0705037305
Microsoft Core Cloud Services powerpoint
[EN] Industrial Machine Downtime Prediction
Database Infoormation System (DBIS).pptx
Optimise Shopper Experiences with a Strong Data Estate.pdf
sac 451hinhgsgshssjsjsjheegdggeegegdggddgeg.pptx
STERILIZATION AND DISINFECTION-1.ppthhhbx
Systems Analysis and Design, 12th Edition by Scott Tilley Test Bank.pdf
Introduction to Inferential Statistics.pptx
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
Votre score augmente si vous choisissez une catégorie et que vous rédigez une...
CYBER SECURITY the Next Warefare Tactics
retention in jsjsksksksnbsndjddjdnFPD.pptx
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
Data Engineering Interview Questions & Answers Batch Processing (Spark, Hadoo...

Empower Splunk and other SIEMs with the Databricks Lakehouse for Cybersecurity

  • 1. Unleashing your Security Practitioners with Data First Architectures Empower SIEMs Like Splunk with the Lakehouse for Cybersecurity
  • 2. ©HSBC Group 2021 George Webster Global Head of Science and Analytics Jason Trost Head of Analytic Engines Monzy Merza VP of Cybersecurity Go-To-Market • Focused on large-scale data analytics, the offense mindset, and arguing for budget • From the DoD, CIA, Academia, and financial services • Loves to cook and it shows • His picture is from a mandated professional headshot • Focused on developing capabilities for network security, DFIR, and security data science • From DoD, start-ups, and financial services • Strong meme game • No sane bouncer would even accept that picture • Focused on serving cybersecurity teams by demystifying cloud scale security analytics • From National Labs, Splunk • Loves green chili • He did the picture himself but it looks pretty good
  • 3. ©HSBC Group 2021 Legalieeeeeeeez The numbers are citable references from publically available sources, peer reviewed material and major publications believed to be reliable, but it has not been independently verified by HSBC or Databricks. The numbers are not from HSBC or Databricks. The examples and demonstrations are samples or illustrative representations. They are not original code from HSBC or Databricks We will be discussing concepts and patterns generally, these are not actual implementations from HSBC or Databricks. We are not vouching for or promoting particular tooling. We are using our experience of working with certain tools to explain patterns and principles that can be applied through various means and tools available on the market. Photograph is public domain. License info: All photos published on Unsplash are licensed under Creative Commons Zero which means you can copy, modify, distribute and use the photos for free, including commercial purposes, without asking permission from or providing attribution to the photographer or Unsplash. CREATIVE COMMONS ZERO: http://creativecommons.org/publicdomain/zero/1.0/
  • 4. ©HSBC Group 2021 ~64 Countries $2.984 Trillion Total Assets HSBC is a multinational investment and financial services company. Founded in 1865 40Million Customers 226,000 Employees
  • 5. ©HSBC Group 2021 Cybersecurity Science and Analytics empowers Cybersecurity teams in protecting the bank by leveraging data and innovative capabilities to create effective and proactive security measures as well as enabling data driven business decisions The office’s primary objective is to empower our people, processes, and technology and enabling the analysts of the future
  • 6. ©HSBC Group 2021 HOURS VS DAYS 24 hours Average time for an attacker to move from victim A to victim B *Industry averages from peer reviewed sources 200days For defenders to detect malicious activity 54days to perform an investigation POST detection
  • 7. ©HSBC Group 2021 SIEMs Aren’t Setup for Success! 100+ security tools Data locked in vendor tools Marginal integrations SOC humans compensating for analytical deficiencies SIEM Patching Etc EDR AV AV Proxy SOC IDS Firewall Code Scans Endpoint Agents Email DLP IAM Etc EDR AV CMDB Inventories Intel …. Vuln Network
  • 8. ©HSBC Group 2021 Cybersecurity is a massive big data problem (Cost and Capability) 50-100 TB/day Endpoint Detection & Response logs 40-50 TB/day Network Sensors 20+ TB/day AWS VPC Flow 5-10 TB/day AWS Cloudtrail & Cloudtrail Data Events 100-200 TB/day Total Log Ingestion x 13 months = 38-79 PB Retention *Numbers are representative of a large enterprise network
  • 10. ©HSBC Group 2021 HSBC: Control the Data - Unlock your People Cost effective Data unlocked Enables Analytics Empowers People SIEM Capabilities Lakehouse Etc EDR AV Sensors ETL Etc EDR AV Enrichment Etc EDR AV Tools Crushing It
  • 11. ©HSBC Group 2021 Use Case Deep Dive
  • 12. ©HSBC Group 2021 Use-case: Threat Detection in DNS Data ● DNS logs (~10TB/day) ● Near real-time detection ● Use ML, Rules, and threat intel enrichments ● Send alerts to SIEM
  • 13. ©HSBC Group 2021 DNS Threat Detection Recipe Streaming Data - Passive DNS Enrichment sources - Threat intel feed - Geo IP - Look alike domains Detections - Domain Generation Algorithm (DGA) domains - Look alike domain name generation Deployment - Streaming Passive DNS S3 Spark Streaming. Model Scoring Malicious Activity Benign
  • 14. ©HSBC Group 2021 DNS Threat Detection in Near Time DNS Data 10 TB/ day AWS S3 Queries Results *Simplified view focused only on SIEM Ingest, ETL, normalize Store, Enrich, Optimize SQL Analytics Query, Report Classify, Alert Alert management ops workflow SIEM Alerts MB’s to GB’s/day Feedback
  • 15. ©HSBC Group 2021 Benefits of Approach Scale & Speed ● Process ~10TB of DNS logs/day ● Augment SIEM economically ● Leverage advanced analytics & ML ● Near real-time detection of DNS threats
  • 16. ©HSBC Group 2021 Large-scale Threat Hunting Sift through cybersecurity log data in order to find signs of malicious activity, both current and historical, that have evaded existing security defenses At Pace Explore large amounts of historic logs Correlate activity across log sources Leverage Analytics, Anomaly Detection, ML Repeatable, self documenting, Team oriented At Scale
  • 17. ©HSBC Group 2021 Hypothetical Threat Hunt Operation • A new mass supply chain attack is discovered and the details of this activity are made public in a government threat intelligence report. Many details are released including domain names, IP addresses, file hashes of malware, and detailed lists of tactics, techniques, and procedures (TTPs) observed, but the report claims the activity started one year ago. • Threat Hunt Objective – Is the adversary in our network now? – Was the adversary ever in our network? • Scope – Timeframe: 12 months
  • 18. ©HSBC Group 2021 How do we execute this Threat Hunt Operation? • The SIEM is where security data lives in most large enterprises • But 12 months of EDR and network logs is massive, likely several PB’s of data. • And most SIEMs are: – not designed for large and complex historical searching over petabytes of log data – don’t support Many-to-Many JOINs very well – don’t adequately support ML/AI use cases, esp at scale • We need a better tool ..
  • 19. ©HSBC Group 2021 Threat Hunting using Spark + Delta Lake Cheap cloud storage + Delta Lake for ingestion Cloud Logs Endpoint Logs Network Logs Extreme volume logs >100 TB/day Easily query and search historic data in Delta Lake Databricks Notebooks Threat Hunter develops Databricks Notebooks to codify the Hunting operation Queries Results Elasticity of Cloud compute
  • 20. ©HSBC Group 2021 Benefits of Approach Speed ● Perform advanced analytics at the pace and speed of the adversary ● Hunts are reusable and self documenting through Notebooks ● Anticipate that we can execute 2-3x more hunts per analyst because they are no longer bound by hardware Scale ● Handle processing all required data, >100TB/day ● Increase online queryable retention from days to many months and PB scale ● Anticipate that the scopes of the hunts can be much larger due to increased data retention
  • 22. ©HSBC Group 2021 Demo Time! • I am going to show you how – Multiple personas can use Databricks – You can download the notebook. • Demos coming up … – Detection via DNS in action! – DNS recipe code segments – Threat hunting in action! Pile of IOCs – Splunk integration - query and search results
  • 24. ©HSBC Group 2021 Query Databricks from Splunk UI
  • 27. ©HSBC Group 2021 Key Takeaways • There is a time horizon gulf between attackers and defenders • Legacy SIEMs are not good for the Petabyte data world • Lakehouse architecture is transforming HSBC’s cyberdefense • These methods unlock your security teams and save your budget
  • 28. ©HSBC Group 2021 What’s next Check out the deep dive demos: - Detecting cyber criminals using Databricks - Multicloud security operations with Splunk + Databricks Schedule a hands on training : email cybersecurity@databricks.com Try the DNS Notebook Send me a note: monzy@databricks.com Send HSBC a note: cybersecurity@hsbc.com