SlideShare a Scribd company logo
BE ready. BE safe. BE secure.
Bsides Lisbon 2015
Security data Metrics
and measurements at
scale1
binaryedge.io
Who
TIAGO HENRIQUES
• BSc Software Engineering /
University of Brighton
• MSc Computer Security and
Forensics / University of
Bedfordshire
• 8 Years experience in Information
Security consultancy, leadership
and research
CEO and Founder @ BinaryEdge
TIAGO MARTINS
• BSc and MSc Computer
Science / University of
Lisbon
• 7 Years experience
developing real-time
systems and high-volume
data processing
CTO and Co-Founder @ BinaryEdge
ROBERTO BARBOSA
• More than 20 years on the IT
sector
• ex-Engineer at Sun Microsystems
• Former Philip Morris corporate
Auditor
• expert on High Scalability and
Availability on the Finance Sector
(UBS, Citigroup and Leonteq) and
mobile startup.
COO and Head of DataScience at BinaryEdge
2
binaryedge.io
Where
Europe
SWITZERLAND
Headquarter
Market
Management
Portugal
Workforce
United Kingdom
Market
Workforce
3
binaryedge.io
What
Machine
Learning Data Mining
security
data
data
data
data
4
binaryedge.io
NEW COMMODITY
DATA IS THE NEW OIL
5
binaryedge.io
NEW CURRENCY
6
binaryedge.io
DATA BUSINESS MODEL
Gather
and Sell
raw Data
collect
supply
store
host
filter
refine
enhance
enrich
simplify
access
consult
advise
Hold onto
someone
else’s data
for them
Strip out
problematic
records or
data fields or
release
interesting
data subsets
Blend in
other
datasets to
create a new
and
interesting
picture
Help people
cherry-pick
the data they
want in the
format they
prefer
Provide
guidance on
others’ data
efforts
7
binaryedge.io
ORGANIsATION
8
binaryedge.io
BECOMING EXPONENTIAL ORGANIZATION
SI
D
E
A
S
C
A
L
E
Staff on Demand
MASSIVE TRANSFORMATIVE PURPOSE
MTP
MARKET
Interfaces
Dashboards
Experimentation
Autonomy
Social
Community and Crowd
Algorithms
Lease Assets
Engagement
LEFT BRAIN
order
control
stability
RIGHT BRAIN
creativity
growth
uncertainty
9
binaryedge.io
Business Operations
Product
Security
DEVdata
science
UI
sysops
data
agent
UX
backend
product
owner
frontend
mobile
information
retrievalInternet
of
Things
Internet
of
Money
Internet
of
People
Internet
of
Content
machine
learning
math/stats
design
Tester
QA
quality
support
security
experts
security
experts
audits
devops
marketing
human
resources
finance
sales
sales rep
sales rep
assistant
assistant
social
media
consulting
ORGANISATION RELATIONSHIP
10
binaryedge.io
Goals Requirements Results
• Easy to UNDERSTAND • understandability Simple Architecture
• EASY TO extend • extensibility Loosely Coupled Services
• EASY TO change • changeability Built for replacement
• EASY TO replace • replaceability Self-dependency
• EASY TO deploy • deployability Immutability
• EASY TO scale • scalability Responsibility Segretation
• EASY TO recover • resilience Decoupling and Isolation
• EASY TO connect • uniform interface API based
• EASY TO afford • cost efficienT On-demand computing
DATA ARCHITECTURE DESIGN
11
binaryedge.io
PHASEIMPORTANCEEFFORTMILESTONES
less
MORE
GREATER
average
milestone
intelligencedata information knowledge
July August September October November December 2015
sensor agent
(minions)
backend feed API User API data
visualizationscalability
machine
learning threat
classification
deep
learning
storage &
archiving
search&
classification
data analytics
image
processing
predictive
analytics
POC data process analysis intel
PRODUCT IMPROVEMENTS 2015
12
binaryedge.io
ENGINEERING
13
No legacy to maintain
Lots of experience in the team
Lots of technologies to pick from
Micro service based approach
Metrics collection at large scale
Very young
startup
Technologies?
Architecture?
Prototype?
but where
to start?
14
Metrics collection at large scale
15
Architecture
Focus on
architecture
simple resilient scalable
replaceable
components
Technology
independent
16
architecture overview
17
HTTP API
Command line clients
Modules
• Python
• NodeJS
• Go
Third-party APIs
architecture - job request
API oriented
job types Data Collection
Data Processing / Analytics
18
Agents listen for work in channels
technologies
Multiple types of agents
Agents
architecture - job execution
GO
Python
NodeJS
Scala
Java
RabbitMQ
NSQ
Redis
Apollo
job control
19
ActiveMQ
NATS
Kafka
Kestrel
NSQ
RabbitMQ
Redis
QPID
HornetQ
Apollo
http://bravenewgeek.com/dissecting-message-queues/
architecture - job execution
Messaging
Broker
ActiveMQ
20
http://bravenewgeek.com/dissecting-message-queues/
architecture - job execution
zeromq
nanomsg
Messaging
Brokerless
21
architecture - job execution
Amazon
Microsoft
Google
realtime.co
…
Messaging
Cloud
22
Agents can feed other agents
Different types of enrichment
• Clean data
• Process data
• Alarms
architecture - data enrichment
23
All information is stored
• RAW data
• Processed data
Geolocate of information
Encrypted data for each client
Data Storage
Cloud Services
• Amazon S3
• Amazon DynamoDB
• AzureDocumentDB
• Azure Storage
• Google Cloud Storage
• Google BigQuery
• Rackspace Cloud Files
• Constant Cloud Storage
• Skylable
• RunAbove
architecture - store
Database Solutions
• MongoDB
• ElasticSearch
• Cassandra
• Riak
• LUCEne
24
Delivering data
• Realtime - Streaming
• Storage for Analytics
• API
• Raw
Data Analytics
• Kibana
• InfluxDB
• Druid
architecture - serving
25
Data Processing
• Apache Spark
• Hadoop
• Amazon Kinesis
Data Intelligence
• Amazon Machine Learning/EMR
• Google Prediction API
• Azure Machine Learning
architecture - serving
26
Our agents are very simple
• Simple tasks
• Easy to maintain and adapt
agents/ minions
Agents can be located/run anywhere
• Geo distribution
• Clouds
• Dedicated Servers
• Raspberry Pis in Tiago Henriques’ dual gbit connection
27
New Relic
Logentries
Server Density
Cloud watch
Grafana
logstash
monitoring
28
Ansible
Puppet
Docker
Saltstack
etcd
deployment
saltstack.com 29
binaryedge.io
Machine Learning
30
binaryedge.io
CHALLENGES IN DATA MINING
MODELLING LARGE
SCALE NETWORKS
DISCOVERY OF THREATS
Network dynamics
and Cyberattacks
Privacy Preservation
in data mining
31
ip address
url address
linked urls
internal
external
Company
registration
email
people
phonesocial
search
photos
family&friend
behavior
news
forums
sub-reddits
topics
likes
metadata
BGP
co-hosted
sites
shared
infrastructure
AS membership
AS Peer
list of IPs
AS
whois
contact
geolocation
phone
social
networks
office
locations
portscan
services
web
http https
certificate
configuration
authorities
entities
web server
framework
headers
cookies
screenshots
dns
domains
AXFR
MX records
banners
image classifier
threat
SMB
VNC
RDP
files
files apps
SW
users
OCR
data points
contingency
irrelevant
© 2015 binaryedge.io
malware
32
binaryedge.io
Machine Learning techniques
• Artificial Neural Network (ANN)
• Support vector machine (SVM)
• Decision trees
• bayesian networks (BNS)
• K-Nearest neighbour (knn)
• Hidden Markov Model (HMM)
33
binaryedge.io
Machine Learning - Why?
Classification
Detection
Clustering
Automation
correlation
prediction
analysis
34
binaryedge.io
Measurements on our own data
Support - Indicates which percentage of data on storage shows correlation
Confidence - Indicates probability of our assumption being correct
35
binaryedge.io
Improving our own data
• Kalman Filter
• AdaBoost (Adaptive Boost)
SAMPLE 1 SAMPLE 1.2 SAMPLE 1.3 SAMPLE 1.4
DEEPER DATA POINT SUPPORT
MANUAL CLASSIFICATION
Weight 6/10
Portscan
Weight 3/10
GEolocation
Weight 5/10
OCR screenshot
Weight 9/10
Previous known
Correct data
36
binaryedge.io
DATA chain
Collection
Data
Processing
ML of Data Report Storage
37
binaryedge.io
CYBER INNOVATION LOOP
Observations
guide &
control
cultural
IDENTITY
new
information
previous
experience
analysis
&
synthesis
Decisions Action
feedback
feedback
interaction
with
environment
interaction
with
environment
SECURITY
FEEDS
REAL WORD DATA
RWD
MODELS
guide &
control
observe ORIENT DECIDE act
feed
forward
feed
forward
feed
forward
38
binaryedge.io
CYBER INNOVATION LOOP
INFORMATION
HYPOTHESISdirectives
facts classification
resolution
ASSESSMENT
enactment
knowledge
data
• classification knowledge transforms fact to information
• assessment knowledge transforms information to hypothesis
• resolution knowledge transforms hypothesis to directive
• enactment knowledge transforms directive to fact
39
binaryedge.io
CYBERSECURITY DATA SCIENCE
TELEMETRY
SENSOR DATA
CONTEXTUAL DATA
HISTORICAL DATA
REAL TIME PREDICTIONS AND
DECISIONS
agents
agents
REAL WORD DATA
RWD
RECOMMENDERCLASSIFIER SOCIAL THREAT FRAUD
features MODELS VALUE
data INTELLIGENCE
data engineering
custom
dashboard
40
binaryedge.io
DEMO
41
binaryedge.io
DEMO
42
binaryedge.io
DEMO
43
binaryedge.io
DEMO
44
contingency irrelevantthreat safe
BE ready. BE safe. BE secure.
BINARYEDGE.IO
Finsterrütistrasse 4, 8134
Adliswil, ZURICH
Switzerland
+ 41 78 632 32 90 Email : th@binaryedge.io
www.binaryedge.io
45

More Related Content

PDF
Pixels Camp 2017 - Stories from the trenches of building a data architecture
PDF
I FOR ONE WELCOME OUR NEW CYBER OVERLORDS! AN INTRODUCTION TO THE USE OF MACH...
PDF
BSides Lisbon - Data science, machine learning and cybersecurity
PDF
Webzurich - The State of Web Security in Switzerland
PDF
The state of cybersecurity in Switzerland - FinTechDay 2017
PDF
Pixels Camp 2017 - Stranger Things the internet version
PPTX
UNCOVER DATA SECURITY BLIND SPOTS IN YOUR CLOUD, BIG DATA & DEVOPS ENVIRONMENT
PPTX
Infragard atlanta ulf mattsson - cloud security - regulations and data prot...
Pixels Camp 2017 - Stories from the trenches of building a data architecture
I FOR ONE WELCOME OUR NEW CYBER OVERLORDS! AN INTRODUCTION TO THE USE OF MACH...
BSides Lisbon - Data science, machine learning and cybersecurity
Webzurich - The State of Web Security in Switzerland
The state of cybersecurity in Switzerland - FinTechDay 2017
Pixels Camp 2017 - Stranger Things the internet version
UNCOVER DATA SECURITY BLIND SPOTS IN YOUR CLOUD, BIG DATA & DEVOPS ENVIRONMENT
Infragard atlanta ulf mattsson - cloud security - regulations and data prot...

What's hot (20)

PPTX
Emerging Data Privacy and Security for Cloud
PPTX
What i learned at gartner summit 2019
PPTX
What I Learned at RSAC 2020
PPTX
Jun 15 privacy in the cloud at financial institutions at the object managemen...
PDF
What I learned from RSAC 2019
PPTX
Data protection on premises, and in public and private clouds
PPTX
Data Protection & Privacy During the Coronavirus Pandemic
PPTX
Emerging application and data protection for multi cloud
PDF
Key note in nyc the next breach target and how oracle can help - nyoug
PDF
Layer8 exploitation: Lock'n Load Target
PDF
[CB20] It is a World Wide Web, but All Politics is Local: Planning to Survive...
PPTX
Next generation data protection and security for oracle users - gdpr blockc...
PPTX
New york oracle users group 2013 spring general meeting ulf mattsson
PPTX
Bridging the gap between privacy and big data Ulf Mattsson - Protegrity Sep 10
PPTX
New regulations and the evolving cybersecurity technology landscape
PPTX
Securing data today and in the future - Oracle NYC
PPTX
What is a secure enterprise architecture roadmap?
PDF
F5 networks the_expectation_of_ssl_everywhere
PDF
[EMC] Source Code Protection
PPTX
ISSA Atlanta - Emerging application and data protection for multi cloud
Emerging Data Privacy and Security for Cloud
What i learned at gartner summit 2019
What I Learned at RSAC 2020
Jun 15 privacy in the cloud at financial institutions at the object managemen...
What I learned from RSAC 2019
Data protection on premises, and in public and private clouds
Data Protection & Privacy During the Coronavirus Pandemic
Emerging application and data protection for multi cloud
Key note in nyc the next breach target and how oracle can help - nyoug
Layer8 exploitation: Lock'n Load Target
[CB20] It is a World Wide Web, but All Politics is Local: Planning to Survive...
Next generation data protection and security for oracle users - gdpr blockc...
New york oracle users group 2013 spring general meeting ulf mattsson
Bridging the gap between privacy and big data Ulf Mattsson - Protegrity Sep 10
New regulations and the evolving cybersecurity technology landscape
Securing data today and in the future - Oracle NYC
What is a secure enterprise architecture roadmap?
F5 networks the_expectation_of_ssl_everywhere
[EMC] Source Code Protection
ISSA Atlanta - Emerging application and data protection for multi cloud
Ad

Similar to BinaryEdge - Security Data Metrics and Measurements at Scale - BSidesLisbon 2015 (20)

PDF
DDDP 2019 - Brown to Green
PDF
Technology trends, disruptions and Opportunities
PDF
Future of Big Data
PPSX
10-Hot-Data-Analytics-Tre-8904178.ppsx
PDF
influence of AI in IS
PDF
IOT_MODULE_4.pd easy to understand notes
PDF
Analytics&IoT
PDF
How to maximize profit from IoT by using data platform - Albert Lewandowski, ...
PDF
Accelerating Cyber Threat Detection With GPU
PDF
Ajit jaokar slides
PDF
Data Science for Internet of Things with Ajit Jaokar
PPTX
Software engineering practices for the data science and machine learning life...
PDF
Big Data for Product Managers
PPTX
Touring the Dark Side of Internet: A Journey through IOT, TOR & Docker
PPTX
Algorithm Marketplace and the new "Algorithm Economy"
PDF
Tech essentials for Product managers
PPTX
Managing your Assets with Big Data Tools
PDF
Introduction to big data
PDF
PPTX
Technology Trends, Disruptions and Opportunities
DDDP 2019 - Brown to Green
Technology trends, disruptions and Opportunities
Future of Big Data
10-Hot-Data-Analytics-Tre-8904178.ppsx
influence of AI in IS
IOT_MODULE_4.pd easy to understand notes
Analytics&IoT
How to maximize profit from IoT by using data platform - Albert Lewandowski, ...
Accelerating Cyber Threat Detection With GPU
Ajit jaokar slides
Data Science for Internet of Things with Ajit Jaokar
Software engineering practices for the data science and machine learning life...
Big Data for Product Managers
Touring the Dark Side of Internet: A Journey through IOT, TOR & Docker
Algorithm Marketplace and the new "Algorithm Economy"
Tech essentials for Product managers
Managing your Assets with Big Data Tools
Introduction to big data
Technology Trends, Disruptions and Opportunities
Ad

More from Tiago Henriques (17)

PDF
BSides Lisbon 2023 - AI in Cybersecurity.pdf
PDF
Codebits 2014 - Secure Coding - Gamification and automation for the win
PPTX
Presentation Brucon - Anubisnetworks and PTCoresec
PPTX
Hardware hacking 101
PPTX
Workshop
PPTX
PPTX
Confraria 28-feb-2013 mesa redonda
PPTX
Preso fcul
PPTX
How to dominate a country
PPTX
Country domination - Causing chaos and wrecking havoc
PDF
(Mis)trusting and (ab)using ssh
PPTX
Secure coding - Balgan - Tiago Henriques
PPTX
Vulnerability, exploit to metasploit
PPTX
Practical exploitation and social engineering
PDF
PPT
Talkj4mshare
PPT
Codebits 2010
BSides Lisbon 2023 - AI in Cybersecurity.pdf
Codebits 2014 - Secure Coding - Gamification and automation for the win
Presentation Brucon - Anubisnetworks and PTCoresec
Hardware hacking 101
Workshop
Confraria 28-feb-2013 mesa redonda
Preso fcul
How to dominate a country
Country domination - Causing chaos and wrecking havoc
(Mis)trusting and (ab)using ssh
Secure coding - Balgan - Tiago Henriques
Vulnerability, exploit to metasploit
Practical exploitation and social engineering
Talkj4mshare
Codebits 2010

Recently uploaded (20)

PPTX
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PPTX
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
PPT
Quality review (1)_presentation of this 21
PDF
Lecture1 pattern recognition............
PDF
Foundation of Data Science unit number two notes
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PPTX
Supervised vs unsupervised machine learning algorithms
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PPTX
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
PDF
Fluorescence-microscope_Botany_detailed content
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PPT
ISS -ESG Data flows What is ESG and HowHow
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PPTX
climate analysis of Dhaka ,Banglades.pptx
PPT
Reliability_Chapter_ presentation 1221.5784
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
Quality review (1)_presentation of this 21
Lecture1 pattern recognition............
Foundation of Data Science unit number two notes
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
Supervised vs unsupervised machine learning algorithms
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
Fluorescence-microscope_Botany_detailed content
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
ISS -ESG Data flows What is ESG and HowHow
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
climate analysis of Dhaka ,Banglades.pptx
Reliability_Chapter_ presentation 1221.5784
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
IBA_Chapter_11_Slides_Final_Accessible.pptx

BinaryEdge - Security Data Metrics and Measurements at Scale - BSidesLisbon 2015