SlideShare a Scribd company logo
Dominique Brezinski — Apple Information Security
•
Threat Detection and Response
•
at Scale
This is about the data platform aspect,
not the specific analytics
Agenda
•
Use Cases, Scale,
•
and Challenges/Solutions
Threat Detection and Response at Scale with Dominique Brezinski
Enabling Detection and Analytics
Diverse threats require diverse data sets
Streams (left joined) with context and
filtered or (inner joined) with indicators
Large time window, multi-dataset graphs
Enabling Triage and Containment
Search and Query
WHERE date > current_date() - 30 days
Scale
Threat Detection and Response at Scale with Dominique Brezinski
>100TB new data a day
>300 billion events per day
Most queried table:
504,761,911,529,518 bytes,
11,149,012,553,409 rows
Yeah, trillions!
Streaming Ingestion Architecture
Threat Detection and Response at Scale with Dominique Brezinski
Threat Detection and Response at Scale with Dominique Brezinski
Threat Detection and Response at Scale with Dominique Brezinski
Threat Detection and Response at Scale with Dominique Brezinski
Threat Detection and Response at Scale with Dominique Brezinski
Threat Detection and Response at Scale with Dominique Brezinski
WHERE src_ip = x AND dst_ip = y
Total data size: 504 terabytes, 11,149,387,374,965 rows
Scanned data size: 36.5 terabytes, 722,630,063,648 rows
Additional reduction thanks to data skipping (bytes): 92.4%
Additional reduction thanks to data skipping (rows): 93.2%
Simple. Unified.

More Related Content

PDF
Apache Hadoop Tutorial | Hadoop Tutorial For Beginners | Big Data Hadoop | Ha...
PDF
Parquet performance tuning: the missing guide
PPTX
Mining Data Streams
PDF
Spark shuffle introduction
PDF
The Parquet Format and Performance Optimization Opportunities
PDF
Introduction to Stream Processing
PPTX
Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016
PPTX
Apache Flink and what it is used for
Apache Hadoop Tutorial | Hadoop Tutorial For Beginners | Big Data Hadoop | Ha...
Parquet performance tuning: the missing guide
Mining Data Streams
Spark shuffle introduction
The Parquet Format and Performance Optimization Opportunities
Introduction to Stream Processing
Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016
Apache Flink and what it is used for

What's hot (20)

PPTX
Apache flink
PPTX
Object Storage Overview
PPTX
Hive, Presto, and Spark on TPC-DS benchmark
PDF
Massive Data Processing in Adobe Using Delta Lake
PDF
Apache Spark vs Apache Flink
PPTX
Big Data Analytics with Hadoop
PPTX
Real-time Hadoop: The Ideal Messaging System for Hadoop
PDF
What is Hadoop | Introduction to Hadoop | Hadoop Tutorial | Hadoop Training |...
PDF
Recent Object Detection Research & Person Detection
PPTX
Apache Spark Architecture
PDF
Cheatsheet deep-learning
PPTX
Lessons Learned From Running 1800 Clusters (Brooke Jensen, Instaclustr) | Cas...
PPTX
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
PDF
Inside MongoDB: the Internals of an Open-Source Database
PDF
Apache Spark Overview
PPTX
Transformations and actions a visual guide training
PPTX
Hive + Tez: A Performance Deep Dive
PDF
Introduction to Big Data Analytics and Data Science
PDF
Deep Dive: Memory Management in Apache Spark
PDF
Apache Spark - Basics of RDD | Big Data Hadoop Spark Tutorial | CloudxLab
Apache flink
Object Storage Overview
Hive, Presto, and Spark on TPC-DS benchmark
Massive Data Processing in Adobe Using Delta Lake
Apache Spark vs Apache Flink
Big Data Analytics with Hadoop
Real-time Hadoop: The Ideal Messaging System for Hadoop
What is Hadoop | Introduction to Hadoop | Hadoop Tutorial | Hadoop Training |...
Recent Object Detection Research & Person Detection
Apache Spark Architecture
Cheatsheet deep-learning
Lessons Learned From Running 1800 Clusters (Brooke Jensen, Instaclustr) | Cas...
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Inside MongoDB: the Internals of an Open-Source Database
Apache Spark Overview
Transformations and actions a visual guide training
Hive + Tez: A Performance Deep Dive
Introduction to Big Data Analytics and Data Science
Deep Dive: Memory Management in Apache Spark
Apache Spark - Basics of RDD | Big Data Hadoop Spark Tutorial | CloudxLab
Ad

Similar to Threat Detection and Response at Scale with Dominique Brezinski (20)

PDF
Data Management - Full Stack Deep Learning
PDF
Dp%20 fudamentals%20%28ch1%29
PPTX
Big Data Lessons from the Cloud
PPT
Optim test data management for IMS 2011
PDF
Science cloud foster june 2013
PPTX
Science as a Service: How On-Demand Computing can Accelerate Discovery
PPTX
EMC Deduplication Fundamentals
PPTX
Why 2015 is the Year of Copy Data - What are the requirements?
PDF
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...
PPTX
Data science neural network project life cycle
PPTX
Big Data DC - Analytics at Clearspring
PPTX
Accelerating data-intensive science by outsourcing the mundane
PPT
Data mining
PPT
Final Ucat Ppt
PPTX
XLDB South America Keynote: eScience Institute and Myria
PDF
High Performance Data Analytics and a Java Grande Run Time
PPT
Scaling-up collections digitisation
PPTX
Webinar: How Snapshots CAN be Backups
PPTX
Achieving Real-time Ingestion and Analysis of Security Events through Kafka a...
PPTX
Overview of GovCloud Today
Data Management - Full Stack Deep Learning
Dp%20 fudamentals%20%28ch1%29
Big Data Lessons from the Cloud
Optim test data management for IMS 2011
Science cloud foster june 2013
Science as a Service: How On-Demand Computing can Accelerate Discovery
EMC Deduplication Fundamentals
Why 2015 is the Year of Copy Data - What are the requirements?
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...
Data science neural network project life cycle
Big Data DC - Analytics at Clearspring
Accelerating data-intensive science by outsourcing the mundane
Data mining
Final Ucat Ppt
XLDB South America Keynote: eScience Institute and Myria
High Performance Data Analytics and a Java Grande Run Time
Scaling-up collections digitisation
Webinar: How Snapshots CAN be Backups
Achieving Real-time Ingestion and Analysis of Security Events through Kafka a...
Overview of GovCloud Today
Ad

More from Databricks (20)

PPTX
DW Migration Webinar-March 2022.pptx
PPTX
Data Lakehouse Symposium | Day 1 | Part 1
PPT
Data Lakehouse Symposium | Day 1 | Part 2
PPTX
Data Lakehouse Symposium | Day 2
PPTX
Data Lakehouse Symposium | Day 4
PDF
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
PDF
Democratizing Data Quality Through a Centralized Platform
PDF
Learn to Use Databricks for Data Science
PDF
Why APM Is Not the Same As ML Monitoring
PDF
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
PDF
Stage Level Scheduling Improving Big Data and AI Integration
PDF
Simplify Data Conversion from Spark to TensorFlow and PyTorch
PDF
Scaling your Data Pipelines with Apache Spark on Kubernetes
PDF
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
PDF
Sawtooth Windows for Feature Aggregations
PDF
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
PDF
Re-imagine Data Monitoring with whylogs and Spark
PDF
Raven: End-to-end Optimization of ML Prediction Queries
PDF
Processing Large Datasets for ADAS Applications using Apache Spark
PDF
Machine Learning CI/CD for Email Attack Detection
DW Migration Webinar-March 2022.pptx
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 4
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Democratizing Data Quality Through a Centralized Platform
Learn to Use Databricks for Data Science
Why APM Is Not the Same As ML Monitoring
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
Stage Level Scheduling Improving Big Data and AI Integration
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Sawtooth Windows for Feature Aggregations
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Re-imagine Data Monitoring with whylogs and Spark
Raven: End-to-end Optimization of ML Prediction Queries
Processing Large Datasets for ADAS Applications using Apache Spark
Machine Learning CI/CD for Email Attack Detection

Recently uploaded (20)

PPTX
Managing Community Partner Relationships
PPTX
retention in jsjsksksksnbsndjddjdnFPD.pptx
PDF
How to run a consulting project- client discovery
PDF
Introduction to Data Science and Data Analysis
PPTX
IMPACT OF LANDSLIDE.....................
PPTX
sac 451hinhgsgshssjsjsjheegdggeegegdggddgeg.pptx
PDF
Data Engineering Interview Questions & Answers Batch Processing (Spark, Hadoo...
PDF
Global Data and Analytics Market Outlook Report
PDF
Jean-Georges Perrin - Spark in Action, Second Edition (2020, Manning Publicat...
PDF
Data Engineering Interview Questions & Answers Cloud Data Stacks (AWS, Azure,...
PDF
Optimise Shopper Experiences with a Strong Data Estate.pdf
PDF
Microsoft Core Cloud Services powerpoint
PPTX
SAP 2 completion done . PRESENTATION.pptx
PPTX
modul_python (1).pptx for professional and student
PPTX
(Ali Hamza) Roll No: (F24-BSCS-1103).pptx
PDF
Business Analytics and business intelligence.pdf
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PPTX
Database Infoormation System (DBIS).pptx
DOCX
Factor Analysis Word Document Presentation
PPTX
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
Managing Community Partner Relationships
retention in jsjsksksksnbsndjddjdnFPD.pptx
How to run a consulting project- client discovery
Introduction to Data Science and Data Analysis
IMPACT OF LANDSLIDE.....................
sac 451hinhgsgshssjsjsjheegdggeegegdggddgeg.pptx
Data Engineering Interview Questions & Answers Batch Processing (Spark, Hadoo...
Global Data and Analytics Market Outlook Report
Jean-Georges Perrin - Spark in Action, Second Edition (2020, Manning Publicat...
Data Engineering Interview Questions & Answers Cloud Data Stacks (AWS, Azure,...
Optimise Shopper Experiences with a Strong Data Estate.pdf
Microsoft Core Cloud Services powerpoint
SAP 2 completion done . PRESENTATION.pptx
modul_python (1).pptx for professional and student
(Ali Hamza) Roll No: (F24-BSCS-1103).pptx
Business Analytics and business intelligence.pdf
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
Database Infoormation System (DBIS).pptx
Factor Analysis Word Document Presentation
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj

Threat Detection and Response at Scale with Dominique Brezinski