SlideShare a Scribd company logo
Cyber Attacks
Analysis
Shwetha Narayanan
Insight Data Engineering Fellow
New York – Summer 2017
Real Time Analysis of Cyber Attack Hotspots
Motivation
Streaming Data Source
• Anti-virus software companies
• Augmented data to scale
• 4000 - 6000 events per minute
• Scaled up to 100,000 events per minute
Streaming Data Source
• Content
–Attack Type
•Malware
•DDOS
•Backdoor
–Location
Information of
victim and attacker
Metric for Hotspot
Analysis
Getis - Ord
Getis - Ord
• Used when you have geospatial data
• Calculates statistical significant clusters based
on a feature
• Estimate a Gi Score for every space in the
region
– Higher Gi score => Significant Hotspot
• Compares the feature score of current cell and
it’s neighbors with sum of all feature values
Getis Ord – Gi Score
• Steps to Calculate
– Divide the space into
cells
– Accumulate attack
counts in each cell
– Calculate Gi Score
• Blue vs Green
– Blue is surrounded
by cells of higher
attack count
5
3 2
4
1
5
10
14
9
9
10
1
2
Interactive Query
• Find events within a
radius of 10 miles
– Calculate Bounding
box
Bounding box
• min(x), max(x), min(y), max(y)
• Based on earth’s spherical
radius at that point
Data Pipeline
Cyber Attacks
Streaming Data
Source
Demo
Kafka Streams Technical Challenges
• Streams application should provide Serializers
and Deserializers to materialize the data
– Read input from stream / Write to stream
• Built in serializers are: String, Integer, Long,
Double
How to work with other data
formats?
Deserializer for other data formats
Creating a Serde - SerializerDeserializer
Kafka Streams Technical Challenges
• Kafka Streams Errors
– Internal Topic Error - Cannot create internal
topics
• User permissions to create topics - Stack Overflow
• Set Group ID and Application ID
• Used for co-ordinating between instances
About Me - Shwetha Narayanan
• Recently graduated with
Masters in Computer Science
• Worked for 2 years as a
Software Engineer
• Co-authored a paper on
“Enabling Real time crime
intelligence using mobile GIS and
prediction methods”, EISIC, 2013
Screenshots - Hotspots
Screenshots - Cyber Attack Trends
Getis Score - Calculation
Bounding Box calculation
acos(sin(input_lat) * sin(Lat) + cos(input_lat) *
cos(Lat) * cos(Lon - (input_lon))) * 6371 <=
1000;

More Related Content

PPTX
mcubed london - data science at the edge
PDF
Large Scale Stream Analytics using a Resource-constrained Edge
PPTX
Oozma kappa
DOCX
Enabling Efficient and Geometric Range Query with Access Control over Encrypt...
PPTX
Final ppt
PPTX
Building Scalable Aggregation Systems
PPTX
Solving Cyber at Scale
PPTX
Apache Spot
mcubed london - data science at the edge
Large Scale Stream Analytics using a Resource-constrained Edge
Oozma kappa
Enabling Efficient and Geometric Range Query with Access Control over Encrypt...
Final ppt
Building Scalable Aggregation Systems
Solving Cyber at Scale
Apache Spot

What's hot (19)

PDF
Khan farhan cv
PDF
Deep Learning in Security—An Empirical Example in User and Entity Behavior An...
PDF
Lakesh_resume_02-07
PPTX
Event streaming pipeline with Windows Azure and ArcGIS Geoevent extension
DOCX
Dynamic Multi-Keyword Ranked Search Based on Bloom Filter Over Encrypted Clou...
PDF
Deep Learning for Public Safety in Chicago and San Francisco
PDF
October 2014 Webinar: Cybersecurity Threat Detection
PPT
Complex Event Processing with Esper
PPTX
Deploy Deep Learning Models with TensorFlow + Lambda
PPT
VeriSign iDefense Security Intelligence Services
PPTX
Trend-Based Networking Driven by Big Data Telemetry for Sdn and Traditional N...
PPTX
Random4 and hirshberg algorithm
PDF
Large-Scale Malicious Domain Detection with Spark AI
PPTX
Bioinformatics Data Pipelines built by CSIRO on AWS
PPTX
XGSN: An Open-source Semantic Sensing Middleware for the Web of Things
PDF
Distributed Near Real-Time Processing of Sensor Network Data Flows for Smart ...
PPTX
Event Processing Using Semantic Web Technologies
PDF
iThome Cloud Summit: The next generation of data center: Machine Intelligent ...
PDF
The Heatmap
 - Why is Security Visualization so Hard?
Khan farhan cv
Deep Learning in Security—An Empirical Example in User and Entity Behavior An...
Lakesh_resume_02-07
Event streaming pipeline with Windows Azure and ArcGIS Geoevent extension
Dynamic Multi-Keyword Ranked Search Based on Bloom Filter Over Encrypted Clou...
Deep Learning for Public Safety in Chicago and San Francisco
October 2014 Webinar: Cybersecurity Threat Detection
Complex Event Processing with Esper
Deploy Deep Learning Models with TensorFlow + Lambda
VeriSign iDefense Security Intelligence Services
Trend-Based Networking Driven by Big Data Telemetry for Sdn and Traditional N...
Random4 and hirshberg algorithm
Large-Scale Malicious Domain Detection with Spark AI
Bioinformatics Data Pipelines built by CSIRO on AWS
XGSN: An Open-source Semantic Sensing Middleware for the Web of Things
Distributed Near Real-Time Processing of Sensor Network Data Flows for Smart ...
Event Processing Using Semantic Web Technologies
iThome Cloud Summit: The next generation of data center: Machine Intelligent ...
The Heatmap
 - Why is Security Visualization so Hard?
Ad

Similar to Cyber Attacks Spatial Analysis (20)

PDF
Creating Your Own Threat Intel Through Hunting & Visualization
PPTX
Discover advanced threats with threat intelligence - Jeremy Li
PDF
System Support for Internet of Things
PDF
Solving Cybersecurity at Scale
PDF
SPEC-1_Deteksi Situs Judi Online_sys1_drat_ENGLISH.pdf
PDF
DataStax and Esri: Geotemporal IoT Search and Analytics
PDF
Transfer Learning: Repurposing ML Algorithms from Different Domains to Cloud ...
PPTX
Intrusion detection using data mining
PDF
Finding the needle in the haystack: how Nestle is leveraging big data to defe...
PDF
SPAR 2015 - Civil Maps Presentation by Sravan Puttagunta
PDF
Using Data Science for Cybersecurity
PDF
Approximation Data Structures for Streaming Applications
PPTX
Applying Provenance in APT Monitoring and Analysis Practical Challenges for S...
PPSX
Secure and Privacy-Preserving Big-Data Processing
PDF
Bertenthal
PPTX
SVA-Review-final-pjhjhfhjhsjdshublic.pptx
PPTX
Mining Software Repositories for Security: Data Quality Issues Lessons from T...
PDF
High Availability HPC ~ Microservice Architectures for Supercomputing
PDF
DHPA Techday 2015 - Maciej Korczyński - Reputation Metrics Design to Improve ...
PPTX
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Creating Your Own Threat Intel Through Hunting & Visualization
Discover advanced threats with threat intelligence - Jeremy Li
System Support for Internet of Things
Solving Cybersecurity at Scale
SPEC-1_Deteksi Situs Judi Online_sys1_drat_ENGLISH.pdf
DataStax and Esri: Geotemporal IoT Search and Analytics
Transfer Learning: Repurposing ML Algorithms from Different Domains to Cloud ...
Intrusion detection using data mining
Finding the needle in the haystack: how Nestle is leveraging big data to defe...
SPAR 2015 - Civil Maps Presentation by Sravan Puttagunta
Using Data Science for Cybersecurity
Approximation Data Structures for Streaming Applications
Applying Provenance in APT Monitoring and Analysis Practical Challenges for S...
Secure and Privacy-Preserving Big-Data Processing
Bertenthal
SVA-Review-final-pjhjhfhjhsjdshublic.pptx
Mining Software Repositories for Security: Data Quality Issues Lessons from T...
High Availability HPC ~ Microservice Architectures for Supercomputing
DHPA Techday 2015 - Maciej Korczyński - Reputation Metrics Design to Improve ...
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Ad

Recently uploaded (20)

PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PDF
Lecture1 pattern recognition............
PPTX
Market Analysis -202507- Wind-Solar+Hybrid+Street+Lights+for+the+North+Amer...
PPT
ISS -ESG Data flows What is ESG and HowHow
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
PDF
Business Analytics and business intelligence.pdf
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PDF
Fluorescence-microscope_Botany_detailed content
PDF
Introduction to the R Programming Language
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PPTX
IB Computer Science - Internal Assessment.pptx
PPTX
Supervised vs unsupervised machine learning algorithms
PPTX
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
Lecture1 pattern recognition............
Market Analysis -202507- Wind-Solar+Hybrid+Street+Lights+for+the+North+Amer...
ISS -ESG Data flows What is ESG and HowHow
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
Acceptance and paychological effects of mandatory extra coach I classes.pptx
oil_refinery_comprehensive_20250804084928 (1).pptx
Business Analytics and business intelligence.pdf
Introduction-to-Cloud-ComputingFinal.pptx
Miokarditis (Inflamasi pada Otot Jantung)
Fluorescence-microscope_Botany_detailed content
Introduction to the R Programming Language
Galatica Smart Energy Infrastructure Startup Pitch Deck
IB Computer Science - Internal Assessment.pptx
Supervised vs unsupervised machine learning algorithms
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
IBA_Chapter_11_Slides_Final_Accessible.pptx

Cyber Attacks Spatial Analysis

  • 1. Cyber Attacks Analysis Shwetha Narayanan Insight Data Engineering Fellow New York – Summer 2017 Real Time Analysis of Cyber Attack Hotspots
  • 3. Streaming Data Source • Anti-virus software companies • Augmented data to scale • 4000 - 6000 events per minute • Scaled up to 100,000 events per minute
  • 4. Streaming Data Source • Content –Attack Type •Malware •DDOS •Backdoor –Location Information of victim and attacker
  • 6. Getis - Ord • Used when you have geospatial data • Calculates statistical significant clusters based on a feature • Estimate a Gi Score for every space in the region – Higher Gi score => Significant Hotspot • Compares the feature score of current cell and it’s neighbors with sum of all feature values
  • 7. Getis Ord – Gi Score • Steps to Calculate – Divide the space into cells – Accumulate attack counts in each cell – Calculate Gi Score • Blue vs Green – Blue is surrounded by cells of higher attack count 5 3 2 4 1 5 10 14 9 9 10 1 2
  • 8. Interactive Query • Find events within a radius of 10 miles – Calculate Bounding box Bounding box • min(x), max(x), min(y), max(y) • Based on earth’s spherical radius at that point
  • 10. Demo
  • 11. Kafka Streams Technical Challenges • Streams application should provide Serializers and Deserializers to materialize the data – Read input from stream / Write to stream • Built in serializers are: String, Integer, Long, Double
  • 12. How to work with other data formats?
  • 13. Deserializer for other data formats
  • 14. Creating a Serde - SerializerDeserializer
  • 15. Kafka Streams Technical Challenges • Kafka Streams Errors – Internal Topic Error - Cannot create internal topics • User permissions to create topics - Stack Overflow • Set Group ID and Application ID • Used for co-ordinating between instances
  • 16. About Me - Shwetha Narayanan • Recently graduated with Masters in Computer Science • Worked for 2 years as a Software Engineer • Co-authored a paper on “Enabling Real time crime intelligence using mobile GIS and prediction methods”, EISIC, 2013
  • 18. Screenshots - Cyber Attack Trends
  • 19. Getis Score - Calculation
  • 20. Bounding Box calculation acos(sin(input_lat) * sin(Lat) + cos(input_lat) * cos(Lat) * cos(Lon - (input_lon))) * 6371 <= 1000;