SlideShare a Scribd company logo
OpenSOC 
The Open Security Operations Center 
 
for 
Analyzing 1.2 Million Network Packets per Second in 
Real Time 
James Sirota, 
Big Data Architect 
Cisco Security Solutions Practice 
jsirota@cisco.com 
Sheetal Dolas 
Principal Architect 
Hortonworks 
sheetal@hortonworks.com 
June 3, 2014
2 
Over Next Few Minutes 
§ Problem Statement  Business Case for OpenSOC 
§ Solution Architecture and Design 
§ Best Practices and Lessons Learned 
§ Q  A
3 
Business Case
“There's now a growing sense of fatalism: 
It's no longer if or when you get hacked, 
but the assumption is that you've already 
4 
been hacked, 
with a focus on minimizing the damage.” 
Source: Dark Reading / Security’s New 
Reality: Assume The Worst
5 
Breaches Happen in Hours… 
But Go Undetected for Months or Even Years 
Source: 2013 Data Breach Investigations Report 
Seconds 
Minutes 
Hours 
Days 
Weeks 
Months 
Years 
Initial Attack to Initial 
Compromise 
10% 
75% 
12% 
2% 
0% 
1% 
1% 
Initial Compromise to 
Data Exfiltration 
8% 
38% 
14% 
25% 
8% 
8% 
0% 
Initial Compromise to 
Discovery 
0% 
0% 
2% 
13% 
29% 
54% 
2% 
Discovery to 
Containment/ 
Restoration 
0% 
1% 
9% 
32% 
38% 
17% 
4% 
Timespan of events by percent of breaches 
In 60% of  
breaches, data  
is stolen in hours 
54% of breaches 
are not discovered for 
months
6 
Cisco Global Cloud Index 
Source: 2014 Cisco Global Cloud Index
7 
Introducing OpenSOC 
Intersection of Big Data and Security Analytics 
Scalable Compute 
Multi Petabyte Storage 
Interactive Query 
Real-Time Search 
Unstructured Data 
Scalable Stream Processing 
Data Access Control 
OpenSOC 
Real-Time Alerts 
Anomaly Detection 
Data Correlation 
Rules and Reports 
Predictive Modeling 
UI and Applications 
Big Data 
Platform 
Hadoop 
Elastic 
Search
8 
OpenSOC Journey 
Sept 2013 
First Prototype 
Dec 2013 
Hortonworks 
joins the 
project 
March 2014 
Platform 
development 
finished 
Sept 2014 
General 
Availability 
May 2014 
CR Work off 
April 2014 
First beta test 
at customer 
site
9 
Solution Architecture  
Design
10 
OpenSOC Conceptual Architecture 
Raw Network Stream 
Network Metadata 
Stream 
Netflow 
Syslog 
Raw Application Logs 
Other Streaming 
Telemetry 
Applications + Analyst Tools 
HBase Hive 
Raw Packet 
Store 
Long-Term 
Store 
Elastic Search 
Real-Time 
Index 
Network 
Packet Mining 
and PCAP 
Reconstruction 
Log Mining 
and Analytics 
Big Data 
Exploration, 
Predictive 
Modeling 
Threat Intelligence 
Parse + Format 
Enrich 
Alert 
Feeds 
Enrichment Data
11 
Key Functional Capabilities 
§ Raw Network Packet Capture, Store, Traffic Reconstruction 
§ Telemetry Ingest, Enrichment and Real-Time Rules-Based Alerts 
§ Real-Time Telemetry Search and Cross-Telemetry Matching 
§ Automated Reports, Anomaly Detection and Anomaly Alerts 
§ Rich Analytics Apps and Integration with Existing Analytics Tools
12 
The OpenSOC Advantage 
§ Fully-Backed by Cisco and Used Internally for Multiple 
Customers 
§ Free, Open Source and Apache Licensed 
§ Built on Highly-Scalable and Proven Platforms (Hadoop, Kafka, Storm) 
§ Extensible and Pluggable Design 
§ Flexible Deployment Model (On-Premise or Cloud) 
§ Centralize your processes, people and data
13 
OpenSOC Deployment at Cisco 
Hardware footprint (40u) 
§ 14 Data Nodes (UCS C240 M3) 
§ 3 Cluster Control Nodes (UCS C220 M3) 
§ 2 ESX Hypervisor Hosts (UCS C220 M3) 
§ 1 PCAP Processor (UCS C220 M3 + 
Napatech NIC) 
§ 2 SourceFire Threat alert processors 
§ 1 Anue Network Traffic splitter 
§ 1 Router 
§ 1 48 Port 10GE Switch 
Software Stack 
§ HDP 2.1 
§ Kafka 0.8 
§ Elastic Search 1.1 
§ MySQL 5.5
14 
OpenSOC - Stitching Things Together 
Source Systems 
Data Collection 
Messaging System 
Real Time Processing 
Storage 
Access 
Kafka 
Storm 
B Topic 
N Topic 
Elastic Search 
Index 
Web Services 
Search 
PCAP 
Reconstruction 
HBase 
PCAP Table 
Analytic 
Tools 
R / Python 
Power Pivot 
Tableau 
Hive 
Raw Data 
ORC 
Passive 
Tap 
PCAP Topic 
DPI Topic 
A Topic 
Telemetry 
Sources 
Syslog 
HTTP 
File System 
Other 
Flume 
Agent A 
Agent B 
Agent N 
A Topology 
B Topology 
N Topology 
PCAP 
Traffic 
Replicator 
PCAP Topology 
DPI Topology
15 
OpenSOC - Stitching Things Together 
Source Systems 
Data Collection 
Messaging System 
Real Time Processing 
Storage 
Access 
Kafka 
Storm 
B Topic 
N Topic 
Elastic Search 
Index 
Web Services 
Search 
PCAP 
Reconstruction 
HBase 
PCAP Table 
Analytic 
Tools 
R / Python 
Power Pivot 
Tableau 
Hive 
Raw Data 
ORC 
Passive 
Tap 
PCAP Topic 
DPI Topic 
A Topic 
Telemetry 
Sources 
Syslog 
HTTP 
File System 
Other 
Flume 
Agent A 
Agent B 
Agent N 
PCAP Topology 
DPI Topology 
A Topology 
B Topology 
N Topology 
PCAP 
Traffic 
Replicator 
Deeper 
Look
16 
PCAP Topology 
Real Time Processing 
Storage 
Storm 
Hive 
Raw Data 
ORC 
Elastic Search 
Index 
HBase 
PCAP Table 
Kafka 
Spout 
Parser 
Bolt 
HDFS 
Bolt 
ES 
Bolt 
HBase 
Bolt
17 
DPI Topology  Telemetry Enrichment 
Real Time Processing 
Storage 
Storm 
Hive 
Raw Data 
ORC 
Elastic Search 
Index 
HBase 
PCAP Table 
Kafka 
Spout 
Parser 
Bolt 
GEO 
Enrich 
Whois 
Enrich 
CIF 
Enrich 
HDFS 
Bolt 
ES 
Bolt
18 
Enrichments 
{! 
“msg_key1”: “msg value1”,! 
“src_ip”: “10.20.30.40”,! 
“dest_ip”: “20.30.40.50”,! 
“domain”: “mydomain.com”! 
}! 
Parser 
Bolt 
GEO 
Enrich 
RAW 
Message 
whois:[ {! 
OrgId:CISCOS,! 
Parent:NET-144-0-0-0-0,! 
OrgAbuseName:Cisco Systems Inc,! 
RegDate:1991-01-171991-01-17,! 
OrgName:Cisco Systems,! 
Address:170 West Tasman Drive,! 
NetType:Direct Assignment! 
} ],! 
“cif”:”Yes”! 
Who 
Is 
Enrich 
geo:[ {region:CA,! 
postalCode:95134,! 
areaCode:408,! 
metroCode:807,! 
longitude:-121.946,! 
latitude:37.425,! 
locId:4522,! 
city:San Jose,! 
country:US! 
}]! 
CIF 
Enrich 
Enriched 
Message 
Cache 
MySQL 
Geo Lite Data 
Cache 
HBase 
Who Is Data 
Cache 
HBase 
CIF Data
19 
Applications: Telemetry Matching and DPI 
Step1: Search 
Step2: Match 
Step3: Analyze 
Step4: Build PCAP
20 
Integration with Analytics Tools 
Dashboards 
Reports
21 
Best Practices  
and 
Lessons Learned
22 
Journey Towards Highly 
Scalable Application
23 
Kafka Tuning
24 
This is where we began
25 
Some code optimizations and increased parallelism
26 
Kafka Tuning 
§ Is Disk I/O heavy 
§ Kafka 0.8+ supports replication and JBOD 
§ Better performance compared to RAID 
§ Parallelism is largely driven by number of disks and partitions per topic 
§ Key configuration parameters: 
§ num.io.threads - Keep it at least equal to number of disks provided to Kafka 
§ num.network.threads - adjust it based on number of concurrent producers, 
consumers and replication factor
27 
After Kafka Tuning
28 
Bottleneck Isolation, Resource Profiling, Load 
Balancing
29 
HBase Tuning
30 
This is where we began
31 
Row Key Design 
§ Row Key design is critical (gets or scans or both?) 
§ Keys with IP Addresses 
§ Standard IP addresses have only two variations of the first character : 1  2 
§ Minimum key length will be 7 characters and max 15 with a typical average of 12 
§ Subnet range scans become difficult – range of 90 to 220 excludes 112 
§ IP converted to hex (10.20.30.40 = 0a141e28) 
§ gives 16 variations of first key character 
§ consistently 8 character key 
§ Easy to search for subnet ranges
32 
Experiments with Row Key
33 
Region Splits 
§ Know your data 
§ Auto split under high workload can result into hotspots and split storms 
§ Understand your data and presplit the regions 
§ Identify how many regions a RS can have to perform optimally. Use the formula 
below 
(RS memory)*(total memstore fraction)/((memstore size)*(# column families))!
34 
With Region Pre-Splits
35 
Know Your Application 
§ Enable Micro Batching (client side buffer) 
§ Smart shuffle/grouping in storm 
§ Understand your data and situationally exploit various WAL options 
§ Watch for many minor compactions 
§ For heavy ‘write’ workload Increase hbase.hstore.blockingStoreFiles (we 
used 200)
36 
And Finally
37 
Kafka Spout
38 
Kafka Spout 
§ Parallelism is controlled by number of partitions per topic 
§ Set Kafka spout parallelism equal to number of partitions in topic 
§ Other key parameters that drive performance 
§ fetchSizeBytes! 
§ bufferSizeBytes!
39 
Mysteriously Missing Data
40 
Mysteriously Missing Data Root Cause 
§ A bug in Kafka spout that used to miss out some partitions and 
loose data 
§ It is now fixed and available from Hortonworks repository ( 
http://repo.hortonworks.com/content/repositories/releases/org/apache/ 
storm/storm-Kafka )
41 
Storm
42 
Storm 
§ Every small thing counts at scale 
§ Even simple string operations can slowdown throughput when executed on 
millions of Tuples
43 
Storm 
§ Error handling is critical 
§ Poorly handled errors can lead to topology failure and eventually loss of 
data (or data duplication)
44 
Storm 
§ Tune  Scale individual spout and bolts before performance 
testing/tuning entire topology 
§ Write your own simple data generator spouts and no-op bolts 
§ Making as many things configurable as possible helps a lot
45 
Lessons Learned 
§ When it comes to Hadoop…partner up 
§ Separate the hype from the opportunity 
§ Start small then scale up 
§ Design Iteratively 
§ It doesn’t work unless you have proven it at scale 
§ Keep an eye on ROI
46 
Looking for Community Partners 
Cisco + Hortonworks + Community Support for OpenSOC 
How can you contribute? 
§ Technology Partner Program – contribute developers to join 
the Cisco and Hortonworks team
Thank you! 
We are hiring: 
jsirota@cisco.com 
sheetal@hortonworks.com

More Related Content

PPTX
Treat Detection using Hadoop
PDF
Detecting Hacks: Anomaly Detection on Networking Data
PDF
Confusion and deception new tools for data protection
PDF
Advances in cloud scale machine learning for cyber-defense
PPTX
Applied Detection and Analysis Using Flow Data - MIRCon 2014
PDF
Applied cognitive security complementing the security analyst
PDF
Applied machine learning defeating modern malicious documents
PDF
Burning Down the Haystack to Find the Needle: Security Analytics in Action
Treat Detection using Hadoop
Detecting Hacks: Anomaly Detection on Networking Data
Confusion and deception new tools for data protection
Advances in cloud scale machine learning for cyber-defense
Applied Detection and Analysis Using Flow Data - MIRCon 2014
Applied cognitive security complementing the security analyst
Applied machine learning defeating modern malicious documents
Burning Down the Haystack to Find the Needle: Security Analytics in Action

What's hot (19)

PDF
Incident response-in-the-cloud
PPTX
Apache metron meetup presentation at capital one
PDF
How Automated Vulnerability Analysis Discovered Hundreds of Android 0-days
PPTX
What's New in StealthWatch v6.5
PDF
Insider Threat Visualization - HITB 2007, Kuala Lumpur
PPTX
Using Splunk for Information Security
PPTX
Novetta Cyber Analytics
PDF
Splunking configfiles 20211208_daniel_wilson
PPTX
Cybersecurity - Jim Butterworth
PDF
Cisco CSIRT Case Study: Forensic Investigations with NetFlow
PDF
DNS Measurement Activity on ITB 2010
PDF
How to Hunt for Lateral Movement on Your Network
PDF
Parrot Drones Hijacking
PPTX
Splunk Stream - Einblicke in Netzwerk Traffic
PDF
Crypto 101: Encryption, Codebreaking, SSL and Bitcoin
PDF
Threat intel- -content-curation-organizing-the-path-to-successful-detection
PPTX
Splunk Enterpise for Information Security Hands-On
PDF
Soc 2030-socs-are-broken-lets-fix- them
PDF
SPO2-T11_Automated-Prevention-of-Ransomware-with-Machine-Learning-and-GPOs
Incident response-in-the-cloud
Apache metron meetup presentation at capital one
How Automated Vulnerability Analysis Discovered Hundreds of Android 0-days
What's New in StealthWatch v6.5
Insider Threat Visualization - HITB 2007, Kuala Lumpur
Using Splunk for Information Security
Novetta Cyber Analytics
Splunking configfiles 20211208_daniel_wilson
Cybersecurity - Jim Butterworth
Cisco CSIRT Case Study: Forensic Investigations with NetFlow
DNS Measurement Activity on ITB 2010
How to Hunt for Lateral Movement on Your Network
Parrot Drones Hijacking
Splunk Stream - Einblicke in Netzwerk Traffic
Crypto 101: Encryption, Codebreaking, SSL and Bitcoin
Threat intel- -content-curation-organizing-the-path-to-successful-detection
Splunk Enterpise for Information Security Hands-On
Soc 2030-socs-are-broken-lets-fix- them
SPO2-T11_Automated-Prevention-of-Ransomware-with-Machine-Learning-and-GPOs
Ad

Viewers also liked (20)

PPTX
Analyzing 1.2 Million Network Packets per Second in Real-time
PPTX
An introduction to SOC (Security Operation Center)
PDF
Building Security Operation Center
PPTX
Design Patterns For Real Time Streaming Data Analytics
PPTX
Security Operation Center - Design & Build
PPTX
Cisco OpenSOC
PDF
GISS2016_Getting Started
PPTX
Build, Manage and Grow Your Services Business with IBM Chicago Briefing Prese...
PDF
유전 알고리즘으로 패킷 필터링 규칙 만들기
PPTX
Tracing your security telemetry with Apache Metron
PDF
Rothke secure360 building a security operations center (soc)
PPTX
Apache Metron Meetup May 4, 2016 - Big data cybersecurity
PPSX
Apache metron - An Introduction
PDF
Blue team reboot - HackFest
PDF
War on stealth cyber attacks phishing docusign apache metron
PDF
Building an Analytics Enables SOC
PPTX
Hortonworks Data In Motion Webinar Series Pt. 2
PPTX
Hortonworks Data in Motion Webinar Series - Part 1
PPTX
Design Patterns for Large-Scale Real-Time Learning
PDF
IBM Security Services Overview
Analyzing 1.2 Million Network Packets per Second in Real-time
An introduction to SOC (Security Operation Center)
Building Security Operation Center
Design Patterns For Real Time Streaming Data Analytics
Security Operation Center - Design & Build
Cisco OpenSOC
GISS2016_Getting Started
Build, Manage and Grow Your Services Business with IBM Chicago Briefing Prese...
유전 알고리즘으로 패킷 필터링 규칙 만들기
Tracing your security telemetry with Apache Metron
Rothke secure360 building a security operations center (soc)
Apache Metron Meetup May 4, 2016 - Big data cybersecurity
Apache metron - An Introduction
Blue team reboot - HackFest
War on stealth cyber attacks phishing docusign apache metron
Building an Analytics Enables SOC
Hortonworks Data In Motion Webinar Series Pt. 2
Hortonworks Data in Motion Webinar Series - Part 1
Design Patterns for Large-Scale Real-Time Learning
IBM Security Services Overview
Ad

Similar to Open Security Operations Center - OpenSOC (20)

PDF
2014 sept 26_thug_lambda_part1
PDF
Code-to-Cloud Visibility: An Essential Framework for DevOps Success
PPTX
Achieving Real-time Ingestion and Analysis of Security Events through Kafka a...
PPTX
An adaptive and eventually self healing framework for geo-distributed real-ti...
PPTX
Realtime Detection of DDOS attacks using Apache Spark and MLLib
PDF
Towards Data Operations
PDF
Solving Cybersecurity at Scale
PDF
Strata Conference + Hadoop World NY 2016: Lessons learned building a scalable...
PDF
Mesos at OpenTable
PPTX
Data Architectures for Robust Decision Making
PPTX
Cloud Security Monitoring and Spark Analytics
PDF
Delivering Apache Hadoop for the Modern Data Architecture
PPTX
Integrating OpenStack To Existing Infrastructure
PPTX
Disaster Recovery Experience at CACIB: Hardening Hadoop for Critical Financia...
PPTX
Software architecture for data applications
PPTX
Crossing the Chasm
PDF
Unlock Bigdata Analytic Efficiency with Ceph Data Lake - Zhang Jian, Fu Yong
PDF
bakalarska_praca
PPTX
A streaming architecture for Cyber Security - Apache Metron
PDF
Chill, Distill, No Overkill: Best Practices to Stress Test Kafka with Siva Ku...
2014 sept 26_thug_lambda_part1
Code-to-Cloud Visibility: An Essential Framework for DevOps Success
Achieving Real-time Ingestion and Analysis of Security Events through Kafka a...
An adaptive and eventually self healing framework for geo-distributed real-ti...
Realtime Detection of DDOS attacks using Apache Spark and MLLib
Towards Data Operations
Solving Cybersecurity at Scale
Strata Conference + Hadoop World NY 2016: Lessons learned building a scalable...
Mesos at OpenTable
Data Architectures for Robust Decision Making
Cloud Security Monitoring and Spark Analytics
Delivering Apache Hadoop for the Modern Data Architecture
Integrating OpenStack To Existing Infrastructure
Disaster Recovery Experience at CACIB: Hardening Hadoop for Critical Financia...
Software architecture for data applications
Crossing the Chasm
Unlock Bigdata Analytic Efficiency with Ceph Data Lake - Zhang Jian, Fu Yong
bakalarska_praca
A streaming architecture for Cyber Security - Apache Metron
Chill, Distill, No Overkill: Best Practices to Stress Test Kafka with Siva Ku...

Recently uploaded (20)

PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Electronic commerce courselecture one. Pdf
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Modernizing your data center with Dell and AMD
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PPTX
A Presentation on Artificial Intelligence
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Encapsulation theory and applications.pdf
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Approach and Philosophy of On baking technology
20250228 LYD VKU AI Blended-Learning.pptx
Per capita expenditure prediction using model stacking based on satellite ima...
Electronic commerce courselecture one. Pdf
Spectral efficient network and resource selection model in 5G networks
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Understanding_Digital_Forensics_Presentation.pptx
Building Integrated photovoltaic BIPV_UPV.pdf
Unlocking AI with Model Context Protocol (MCP)
Dropbox Q2 2025 Financial Results & Investor Presentation
Agricultural_Statistics_at_a_Glance_2022_0.pdf
“AI and Expert System Decision Support & Business Intelligence Systems”
The AUB Centre for AI in Media Proposal.docx
Modernizing your data center with Dell and AMD
NewMind AI Weekly Chronicles - August'25 Week I
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
A Presentation on Artificial Intelligence
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Encapsulation theory and applications.pdf
Diabetes mellitus diagnosis method based random forest with bat algorithm
Approach and Philosophy of On baking technology

Open Security Operations Center - OpenSOC

  • 1. OpenSOC The Open Security Operations Center for Analyzing 1.2 Million Network Packets per Second in Real Time James Sirota, Big Data Architect Cisco Security Solutions Practice jsirota@cisco.com Sheetal Dolas Principal Architect Hortonworks sheetal@hortonworks.com June 3, 2014
  • 2. 2 Over Next Few Minutes § Problem Statement Business Case for OpenSOC § Solution Architecture and Design § Best Practices and Lessons Learned § Q A
  • 4. “There's now a growing sense of fatalism: It's no longer if or when you get hacked, but the assumption is that you've already 4 been hacked, with a focus on minimizing the damage.” Source: Dark Reading / Security’s New Reality: Assume The Worst
  • 5. 5 Breaches Happen in Hours… But Go Undetected for Months or Even Years Source: 2013 Data Breach Investigations Report Seconds Minutes Hours Days Weeks Months Years Initial Attack to Initial Compromise 10% 75% 12% 2% 0% 1% 1% Initial Compromise to Data Exfiltration 8% 38% 14% 25% 8% 8% 0% Initial Compromise to Discovery 0% 0% 2% 13% 29% 54% 2% Discovery to Containment/ Restoration 0% 1% 9% 32% 38% 17% 4% Timespan of events by percent of breaches In 60% of breaches, data is stolen in hours 54% of breaches are not discovered for months
  • 6. 6 Cisco Global Cloud Index Source: 2014 Cisco Global Cloud Index
  • 7. 7 Introducing OpenSOC Intersection of Big Data and Security Analytics Scalable Compute Multi Petabyte Storage Interactive Query Real-Time Search Unstructured Data Scalable Stream Processing Data Access Control OpenSOC Real-Time Alerts Anomaly Detection Data Correlation Rules and Reports Predictive Modeling UI and Applications Big Data Platform Hadoop Elastic Search
  • 8. 8 OpenSOC Journey Sept 2013 First Prototype Dec 2013 Hortonworks joins the project March 2014 Platform development finished Sept 2014 General Availability May 2014 CR Work off April 2014 First beta test at customer site
  • 10. 10 OpenSOC Conceptual Architecture Raw Network Stream Network Metadata Stream Netflow Syslog Raw Application Logs Other Streaming Telemetry Applications + Analyst Tools HBase Hive Raw Packet Store Long-Term Store Elastic Search Real-Time Index Network Packet Mining and PCAP Reconstruction Log Mining and Analytics Big Data Exploration, Predictive Modeling Threat Intelligence Parse + Format Enrich Alert Feeds Enrichment Data
  • 11. 11 Key Functional Capabilities § Raw Network Packet Capture, Store, Traffic Reconstruction § Telemetry Ingest, Enrichment and Real-Time Rules-Based Alerts § Real-Time Telemetry Search and Cross-Telemetry Matching § Automated Reports, Anomaly Detection and Anomaly Alerts § Rich Analytics Apps and Integration with Existing Analytics Tools
  • 12. 12 The OpenSOC Advantage § Fully-Backed by Cisco and Used Internally for Multiple Customers § Free, Open Source and Apache Licensed § Built on Highly-Scalable and Proven Platforms (Hadoop, Kafka, Storm) § Extensible and Pluggable Design § Flexible Deployment Model (On-Premise or Cloud) § Centralize your processes, people and data
  • 13. 13 OpenSOC Deployment at Cisco Hardware footprint (40u) § 14 Data Nodes (UCS C240 M3) § 3 Cluster Control Nodes (UCS C220 M3) § 2 ESX Hypervisor Hosts (UCS C220 M3) § 1 PCAP Processor (UCS C220 M3 + Napatech NIC) § 2 SourceFire Threat alert processors § 1 Anue Network Traffic splitter § 1 Router § 1 48 Port 10GE Switch Software Stack § HDP 2.1 § Kafka 0.8 § Elastic Search 1.1 § MySQL 5.5
  • 14. 14 OpenSOC - Stitching Things Together Source Systems Data Collection Messaging System Real Time Processing Storage Access Kafka Storm B Topic N Topic Elastic Search Index Web Services Search PCAP Reconstruction HBase PCAP Table Analytic Tools R / Python Power Pivot Tableau Hive Raw Data ORC Passive Tap PCAP Topic DPI Topic A Topic Telemetry Sources Syslog HTTP File System Other Flume Agent A Agent B Agent N A Topology B Topology N Topology PCAP Traffic Replicator PCAP Topology DPI Topology
  • 15. 15 OpenSOC - Stitching Things Together Source Systems Data Collection Messaging System Real Time Processing Storage Access Kafka Storm B Topic N Topic Elastic Search Index Web Services Search PCAP Reconstruction HBase PCAP Table Analytic Tools R / Python Power Pivot Tableau Hive Raw Data ORC Passive Tap PCAP Topic DPI Topic A Topic Telemetry Sources Syslog HTTP File System Other Flume Agent A Agent B Agent N PCAP Topology DPI Topology A Topology B Topology N Topology PCAP Traffic Replicator Deeper Look
  • 16. 16 PCAP Topology Real Time Processing Storage Storm Hive Raw Data ORC Elastic Search Index HBase PCAP Table Kafka Spout Parser Bolt HDFS Bolt ES Bolt HBase Bolt
  • 17. 17 DPI Topology Telemetry Enrichment Real Time Processing Storage Storm Hive Raw Data ORC Elastic Search Index HBase PCAP Table Kafka Spout Parser Bolt GEO Enrich Whois Enrich CIF Enrich HDFS Bolt ES Bolt
  • 18. 18 Enrichments {! “msg_key1”: “msg value1”,! “src_ip”: “10.20.30.40”,! “dest_ip”: “20.30.40.50”,! “domain”: “mydomain.com”! }! Parser Bolt GEO Enrich RAW Message whois:[ {! OrgId:CISCOS,! Parent:NET-144-0-0-0-0,! OrgAbuseName:Cisco Systems Inc,! RegDate:1991-01-171991-01-17,! OrgName:Cisco Systems,! Address:170 West Tasman Drive,! NetType:Direct Assignment! } ],! “cif”:”Yes”! Who Is Enrich geo:[ {region:CA,! postalCode:95134,! areaCode:408,! metroCode:807,! longitude:-121.946,! latitude:37.425,! locId:4522,! city:San Jose,! country:US! }]! CIF Enrich Enriched Message Cache MySQL Geo Lite Data Cache HBase Who Is Data Cache HBase CIF Data
  • 19. 19 Applications: Telemetry Matching and DPI Step1: Search Step2: Match Step3: Analyze Step4: Build PCAP
  • 20. 20 Integration with Analytics Tools Dashboards Reports
  • 21. 21 Best Practices and Lessons Learned
  • 22. 22 Journey Towards Highly Scalable Application
  • 24. 24 This is where we began
  • 25. 25 Some code optimizations and increased parallelism
  • 26. 26 Kafka Tuning § Is Disk I/O heavy § Kafka 0.8+ supports replication and JBOD § Better performance compared to RAID § Parallelism is largely driven by number of disks and partitions per topic § Key configuration parameters: § num.io.threads - Keep it at least equal to number of disks provided to Kafka § num.network.threads - adjust it based on number of concurrent producers, consumers and replication factor
  • 27. 27 After Kafka Tuning
  • 28. 28 Bottleneck Isolation, Resource Profiling, Load Balancing
  • 30. 30 This is where we began
  • 31. 31 Row Key Design § Row Key design is critical (gets or scans or both?) § Keys with IP Addresses § Standard IP addresses have only two variations of the first character : 1 2 § Minimum key length will be 7 characters and max 15 with a typical average of 12 § Subnet range scans become difficult – range of 90 to 220 excludes 112 § IP converted to hex (10.20.30.40 = 0a141e28) § gives 16 variations of first key character § consistently 8 character key § Easy to search for subnet ranges
  • 33. 33 Region Splits § Know your data § Auto split under high workload can result into hotspots and split storms § Understand your data and presplit the regions § Identify how many regions a RS can have to perform optimally. Use the formula below (RS memory)*(total memstore fraction)/((memstore size)*(# column families))!
  • 34. 34 With Region Pre-Splits
  • 35. 35 Know Your Application § Enable Micro Batching (client side buffer) § Smart shuffle/grouping in storm § Understand your data and situationally exploit various WAL options § Watch for many minor compactions § For heavy ‘write’ workload Increase hbase.hstore.blockingStoreFiles (we used 200)
  • 38. 38 Kafka Spout § Parallelism is controlled by number of partitions per topic § Set Kafka spout parallelism equal to number of partitions in topic § Other key parameters that drive performance § fetchSizeBytes! § bufferSizeBytes!
  • 40. 40 Mysteriously Missing Data Root Cause § A bug in Kafka spout that used to miss out some partitions and loose data § It is now fixed and available from Hortonworks repository ( http://repo.hortonworks.com/content/repositories/releases/org/apache/ storm/storm-Kafka )
  • 42. 42 Storm § Every small thing counts at scale § Even simple string operations can slowdown throughput when executed on millions of Tuples
  • 43. 43 Storm § Error handling is critical § Poorly handled errors can lead to topology failure and eventually loss of data (or data duplication)
  • 44. 44 Storm § Tune Scale individual spout and bolts before performance testing/tuning entire topology § Write your own simple data generator spouts and no-op bolts § Making as many things configurable as possible helps a lot
  • 45. 45 Lessons Learned § When it comes to Hadoop…partner up § Separate the hype from the opportunity § Start small then scale up § Design Iteratively § It doesn’t work unless you have proven it at scale § Keep an eye on ROI
  • 46. 46 Looking for Community Partners Cisco + Hortonworks + Community Support for OpenSOC How can you contribute? § Technology Partner Program – contribute developers to join the Cisco and Hortonworks team
  • 47. Thank you! We are hiring: jsirota@cisco.com sheetal@hortonworks.com