SlideShare a Scribd company logo
SITUATIONAL AWARENESS, BOT-
NET AND MALWARE DETECTION
IN THE MODERN ERA
Machine Learning Enabled Advanced Security
CodeMotion Milan 2016
Davide Papini
Introduction ML for Cyber Security Final Remarks
ABOUT ME
Research & Innovation @Ele ronica S.p.a.
Postdoc @ISG Royal Holloway, UK on ML
applied to cyber situational awareness.
M.Sc. Telecommunication Engineering
@Politecnico di Milano:
→ Erasmus @Danmarks Tekniske Universitet
→ Master Thesis on ``Anomaly Based
Wireless Intrusion Detection Systems''
Ph.D. @Danmarks Tekniske Universitet:
→ ``Attacker Modeling in Ubiquitous
Computing Systems''
→ External stay at COSIC, KU Leuven
2
Introduction ML for Cyber Security Final Remarks
WHAT THIS TALK IS ABOUT
Topics:
Applications of ML in Cybersecurity research.
Successful research: botnets, DGAs, early malware
detection.
ML traps.
Evaluation metrics.
NOT about:
New ML algorithms.
Showing one specific Security-ML based application.
Wear you out with math.
3
Introduction ML for Cyber Security Final Remarks
MOTIVATIONAL SLIDE
4
Introduction ML for Cyber Security Final Remarks
MOTIVATIONAL SLIDE
4
Introduction ML for Cyber Security Final Remarks
MOTIVATIONAL SLIDE
5
Introduction ML for Cyber Security Final Remarks
MOTIVATIONAL SLIDE
Control of the botnet for 10 days: 180,000 infections,
recording of over 70GB of data.
Torpig intercepts and records keystroke information at a
low level, targeting a wide variety of applications and
websites.
Stealing financial and personal informations, login
credentials for social networking etc.
Torpig periodically uploads any new data that it has
captured to a central server.
The researchers were able to infiltrate the botnet by
registering one of the domains from a list of potential ones
infected machines use.
5
Introduction ML for Cyber Security Final Remarks
SOME STATISTICS
h ps://www.mcafee.com/us/resources/reports/rp-quarterly-threats-sep-2016.pdf
6
Introduction ML for Cyber Security Final Remarks
SOME STATISTICS
h ps://www.mcafee.com/us/resources/reports/rp-quarterly-threats-sep-2016.pdf
450,000 new malware per day.
20,000 is mobile malware.
Includes: ransomware, botnets, rootkits, trojians …
6
Introduction ML for Cyber Security Final Remarks
NEED A GAME CHANGER
Modern malware/intrusions are difficult to detect/block:
Code obfuscation, polimorfism and packing.
Malware written ad-hoc for specific targets.
AVs are mainly signature-based.
URL Blacklists cannot be updated fast enough.
Local changes are often too small/subtle to be detected.
Logs contains lot of noise (≃ 90%)
7
Introduction ML for Cyber Security Final Remarks
NEED A GAME CHANGER
Modern malware/intrusions are difficult to detect/block:
Code obfuscation, polimorfism and packing.
Malware written ad-hoc for specific targets.
AVs are mainly signature-based.
URL Blacklists cannot be updated fast enough.
Local changes are often too small/subtle to be detected.
Logs contains lot of noise (≃ 90%)
Need for intelligent approaches:
Adapt to unforseen "events"
Learn from data i.e. extract behaviours NOT signatures
Leverage global knowledge
Can be quasi-real-time.
7
ML FOR CYBER SECURITY
Introduction ML for Cyber Security Final Remarks
Machine learning has been applied to many fields in security:
Botnet detection and classification
Mobile application analysis
Spam detection and campaigns analysis
Situational awareness through network traffic analysis
Download malware detection
and many more...
Also in many flavours:
Supervised
Unsupervised
combinations of those
9
Introduction ML for Cyber Security Final Remarks
BOTNETS
Situational awareness: knowledge of the health status of a
network (e.g. malware infections, intrusions and data
exfiltration).
Botnet: a network of bots (drones), i.e. programs installed
on the machines of unwitting Internet users and receiving
commands from a bot controller.
10
Introduction ML for Cyber Security Final Remarks
BOTNETS
Situational awareness: knowledge of the health status of a
network (e.g. malware infections, intrusions and data
exfiltration).
Botnet: a network of bots (drones), i.e. programs installed
on the machines of unwitting Internet users and receiving
commands from a bot controller.
10
Introduction ML for Cyber Security Final Remarks
BOTNETS C&C CHANNEL
Bots connect to C&C Server in three ways:
Hard coded IP:
Bot → 1.2.3.4
Hard coded domain:
Bot → badguy.ru → 1.2.3.4
Automatically Generated Domains:
→ Bot cycles through time-dependent domains.
→ Domain names are generated using a Domain Generation
Algorithm.
→ The botmaster needs to register only one of those domains.
jhhfghf7.tk faukiijjj25.tk pvgvy.tk
cvq.com epu.org bwn.org
11
Introduction ML for Cyber Security Final Remarks
BOTNETS C&C CHANNEL
Bots connect to C&C Server in three ways:
Hard coded IP:
Bot → 1.2.3.4
Hard coded domain:
Bot → badguy.ru → 1.2.3.4
Automatically Generated Domains:
→ Bot cycles through time-dependent domains.
→ Domain names are generated using a Domain Generation
Algorithm.
→ The botmaster needs to register only one of those domains.
jhhfghf7.tk faukiijjj25.tk pvgvy.tk
cvq.com epu.org bwn.org
courtesy of E.Colombo - Cerberus
11
Introduction ML for Cyber Security Final Remarks
BOTNETS C&C CHANNEL
Bots connect to C&C Server in three ways:
Hard coded IP:
Bot → 1.2.3.4
Hard coded domain:
Bot → badguy.ru → 1.2.3.4
Automatically Generated Domains:
→ Bot cycles through time-dependent domains.
→ Domain names are generated using a Domain Generation
Algorithm.
→ The botmaster needs to register only one of those domains.
jhhfghf7.tk faukiijjj25.tk pvgvy.tk
cvq.com epu.org bwn.org
courtesy of E.Colombo - Cerberus
Sinkholing: If domain is already
registered
botmaster looses control of botnets!
11
Introduction ML for Cyber Security Final Remarks
PHOENIX AND CERBERUS
Developed at Polimi and ISG@RHUL
System that relies on Machine Learning to identify DGA:
Leverage known malicious and benign domain names to
build a classifier:
→ Distinguish Human Generated Domains from AGD.
→ Identifies the DGA used: botnets might share the same
DGA.
Use unsupervised learning to identify new DGAs.
Traffic comes from a na onal authoritative DNS server.
S. Schiavoni et al., Phoenix: DGA-Based Botnet Tracking and Intelligence. In Detection of
Intrusions and Malware, and Vulnerability Assessment (DIMVA) 2014.
E. Colombo, Cerberus: Detec on and Characteriza on of Automa cally-Generated
Malicious Domains. Master Thesis, Politecnico di Milano 2014.
12
Introduction ML for Cyber Security Final Remarks
PHOENIX AND CERBERUS
Developed at Polimi and ISG@RHUL
System that relies on Machine Learning to identify DGA:
Leverage known malicious and benign domain names to
build a classifier:
→ Distinguish Human Generated Domains from AGD.
→ Identifies the DGA used: botnets might share the same
DGA.
Use unsupervised learning to identify new DGAs.
Traffic comes from a na onal authoritative DNS server.
S. Schiavoni et al., Phoenix: DGA-Based Botnet Tracking and Intelligence. In Detection of
Intrusions and Malware, and Vulnerability Assessment (DIMVA) 2014.
E. Colombo, Cerberus: Detec on and Characteriza on of Automa cally-Generated
Malicious Domains. Master Thesis, Politecnico di Milano 2014.
Malicious Domains Phoenix Clusters
Time DetectiveSuspicious Domains
Filtering
DNS Stream
Classifier
Bootstrap
Filtering
Detection
courtesy of E.Colombo - Cerberus
12
Introduction ML for Cyber Security Final Remarks
CERBERUS FINDINGS
187 malicious domains detected and labeled
3,576 suspicious domains collected
47 clusters of DGA-generated domains discovered
319 new domains detected in the next 24 hours
13
Introduction ML for Cyber Security Final Remarks
MASTINO: REALTIME MALWARE DETECTION
Developed at TrendMicro and presented Defcon London 2016
System for advanced realtime malware detection:
Leverages global knowledge on download events
Classifies malware from goodware
Based on statistical evidence and graph analysis:
Tripartite graph: URLs, Files, Machines
Intrinsic features e.g.
→ file: size, obfuscated, signed;
→ url: FQD, e2LD, query path
→ machine: malware download history, processes
Behaviour-based features:
→ Consider reputation of neighboring nodes
→ Help to classify unknown nodes
14
Introduction ML for Cyber Security Final Remarks
MASTINO: REALTIME MALWARE DETECTION
Developed at TrendMicro and presented Defcon London 2016
System for advanced realtime malware detection:
Leverages global knowledge on download events
Classifies malware from goodware
Based on statistical evidence and graph analysis:
Tripartite graph: URLs, Files, Machines
Intrinsic features e.g.
→ file: size, obfuscated, signed;
→ url: FQD, e2LD, query path
→ machine: malware download history, processes
Behaviour-based features:
→ Consider reputation of neighboring nodes
→ Help to classify unknown nodes
Huge work on feature enginering!
14
Introduction ML for Cyber Security Final Remarks
MASTINO SYSTEM OVERVIEW
Copyright 2016 Trend Micro Inc.7
System Overview
courtesy of M.Balduzzi - TrendMicro
15
Introduction ML for Cyber Security Final Remarks
MASTINO TRAINING AND DETECTION
courtesy of M.Balduzzi - TrendMicro
16
Introduction ML for Cyber Security Final Remarks
MASTINO RESULTS
Mastino evaluation:
On testing dataset: 95.8% TP, 0.5% FP
Early detection experiment, deployed in the wild for 6
months:
→ Detected 84% of future malware
→ Verified later through VirusTotal
17
Introduction ML for Cyber Security Final Remarks
MASTINO RESULTS
Mastino evaluation:
On testing dataset: 95.8% TP, 0.5% FP
Early detection experiment, deployed in the wild for 6
months:
→ Detected 84% of future malware
→ Verified later through VirusTotal
Detec on me ≃ 0.16s!
17
Introduction ML for Cyber Security Final Remarks
ISSUES
Traditional ML developed for ``natural'' objects:
Natural Language Processing.
Image analysis e.g. picture text search.
Classification of plants animals.
Economics laws.
Metrics like ROC, FP, FN, work very well in these cases,
however cyberworld is not natural:
Things change abruptly e.g. updates, new malware, new
technologies.
There is no clear evolutionary law.
Change is deterministic and unpredictable.
Behaviours change/slide over time.
18
Introduction ML for Cyber Security Final Remarks
ML TRAPS
Machine learning often seen as a black-box panacea:
Little is understood.
Results with hi accuracy taken without questioning quality.
However:
Overfitting: if training and testing is not done carefully.
Validity of results: a system that works on paper may not
work in the field.
Datasets: Variety vs Chronology
19
Introduction ML for Cyber Security Final Remarks
ML TRAPS
Machine learning often seen as a black-box panacea:
Little is understood.
Results with hi accuracy taken without questioning quality.
However:
Overfitting: if training and testing is not done carefully.
Validity of results: a system that works on paper may not
work in the field.
Datasets: Variety vs Chronology
Need for novel metrics!
19
Introduction ML for Cyber Security Final Remarks
CONFORMAL EVALUATOR
Library developed at Informa on Security Group at Royal
Holloway:
Evaluates algorithms in terms of confidence and credibility.
Core is Non-Conformity measure, elicited directly from the
algorithm, which in essence tells the difference between a
sample and a set of samples.
Builds decision and alpha assessments to evaluate the
algorithm.
R. Jordaney, Z. Wang, D. Papini, I. Nouretdinov and L. Cavallaro, Misleading Metrics:
On Evalua ng Machine Learning for Malware with Confidence, Technical Report 2016-1
Royal Holloway University of London.
20
Introduction ML for Cyber Security Final Remarks
CONFORMAL EVALUATOR
Library developed at Informa on Security Group at Royal
Holloway:
Evaluates algorithms in terms of confidence and credibility.
Core is Non-Conformity measure, elicited directly from the
algorithm, which in essence tells the difference between a
sample and a set of samples.
Builds decision and alpha assessments to evaluate the
algorithm.
R. Jordaney, Z. Wang, D. Papini, I. Nouretdinov and L. Cavallaro, Misleading Metrics:
On Evalua ng Machine Learning for Malware with Confidence, Technical Report 2016-1
Royal Holloway University of London.
Training and
Testing
Dataset
Similarity Based
Classification/Clustering
Algorithm
Conformal
Evaluator
Alpha
Assessment
Decision
Assessment
Non-Conformity
Measure
Conformal Evaluator Overview
20
Introduction ML for Cyber Security Final Remarks
CE: EXAMPLE 1
System for Botnet detection and classification
bifrose sasfis blackenergy banbra pushdo
0.0
0.2
0.4
0.6
0.8
1.0 0.86 0.27 0.29 0.31 0.9 0.2 0.84 0.18 0.95 0.15
Average algorithm correct choice
Average algorithm credibility Average algorithm confidence
bifrose sasfis blackenergy banbra pushdo
0.0
0.2
0.4
0.6
0.8
1.0 0.42 0.53 0.58 0.17 0.68 0.29 0.62 0.29 0.73 0.12
Average algorithm incorrect choice
Average algorithm credibility Average algorithm confidence
Decision Assessment
21
Introduction ML for Cyber Security Final Remarks
CE: EXAMPLE 1
System for Botnet detection and classification
bifrose sasfis blackenergy banbra pushdo
0.0
0.2
0.4
0.6
0.8
1.0 0.86 0.27 0.29 0.31 0.9 0.2 0.84 0.18 0.95 0.15
Average algorithm correct choice
Average algorithm credibility Average algorithm confidence
bifrose sasfis blackenergy banbra pushdo
0.0
0.2
0.4
0.6
0.8
1.0 0.42 0.53 0.58 0.17 0.68 0.29 0.62 0.29 0.73 0.12
Average algorithm incorrect choice
Average algorithm credibility Average algorithm confidence
Decision Assessment
bifrose's
samples
sasfis's
samples
blackenergy's
samples
banbra's
samples
pushdo's
samples
0.0
0.2
0.4
0.6
0.8
1.0
P-values
P-values: bifrose P-values: sasfis P-values: blackenergy P-values: banbra P-values: pushdo
Alpha Assessment
21
Introduction ML for Cyber Security Final Remarks
CE: EXAMPLE 1
System for Botnet detection and classification
bifrose sasfis blackenergy banbra pushdo
0.0
0.2
0.4
0.6
0.8
1.0 0.86 0.27 0.29 0.31 0.9 0.2 0.84 0.18 0.95 0.15
Average algorithm correct choice
Average algorithm credibility Average algorithm confidence
bifrose sasfis blackenergy banbra pushdo
0.0
0.2
0.4
0.6
0.8
1.0 0.42 0.53 0.58 0.17 0.68 0.29 0.62 0.29 0.73 0.12
Average algorithm incorrect choice
Average algorithm credibility Average algorithm confidence
Decision Assessment
bifrose's
samples
sasfis's
samples
blackenergy's
samples
banbra's
samples
pushdo's
samples
0.0
0.2
0.4
0.6
0.8
1.0
P-values
P-values: bifrose P-values: sasfis P-values: blackenergy P-values: banbra P-values: pushdo
Alpha Assessment
Although the algorithm has reasonably good re-
sults on paper, CE shows the quality of the re-
sults is not good!
We run experiments on another dataset to
confirm, and the classifier get worse.
21
Introduction ML for Cyber Security Final Remarks
CE: EXAMPLE 2
Mobile App classification: Malware vs Goodware
Correct choices Incorrect choices0.0
0.2
0.4
0.6
0.8
1.0
Average algorithm credibility for correct choice
Average algorithm confidence for correct choice
Average algorithm credibility for incorrect choice
Average algorithm confidence for incorrect choice
MALICIOUS's
samples
BENIGN's
samples
0.0
0.2
0.4
0.6
0.8
1.0
P-values
P-values: MALICIOUS P-values: BENIGN
22
FINAL REMARKS
Introduction ML for Cyber Security Final Remarks
FINAL REMARKS
Getting your hands in the game, what you need:
You need to study a bit of ML
You need a problem
You need data
You need good metrics
In the wild analysis is a plus
You need tools:
→ We did everything in python: Numpy, Scipy
→ ML libraries: sk-learn, shogun-toolbox.org
24
Introduction ML for Cyber Security Final Remarks
FINAL REMARKS
Machine Learning is great for Cyber Security!
25
Introduction ML for Cyber Security Final Remarks
FINAL REMARKS
Machine Learning is great for Cyber Security!
Thanks for listening:
Ques ons?
25

More Related Content

PDF
Hacking for Salone: Drone Races - Di Saverio; Lippolis - Codemotion Milan 2016
ODP
Elixir and Lambda talk with a Telegram bot - Paolo Montrasio - Codemotion Mil...
PPTX
It’s All About Developers. Discover Cisco DevNet. - Jason Goecke - Codemotion...
PDF
IoT to Human interactions - Stève Sfartz - Codemotion Milan 2016
PDF
The Secret Recipe for Automating Android Malware Analysis - Lorenzo Cavallaro...
PPTX
App Forum 2015 Deciding your next step in application development for Android
PDF
Twilio Signal 2016 Leading An Open Hardware Revolution
PDF
BlueHat v18 || The hitchhiker's guide to north korea's malware galaxy
Hacking for Salone: Drone Races - Di Saverio; Lippolis - Codemotion Milan 2016
Elixir and Lambda talk with a Telegram bot - Paolo Montrasio - Codemotion Mil...
It’s All About Developers. Discover Cisco DevNet. - Jason Goecke - Codemotion...
IoT to Human interactions - Stève Sfartz - Codemotion Milan 2016
The Secret Recipe for Automating Android Malware Analysis - Lorenzo Cavallaro...
App Forum 2015 Deciding your next step in application development for Android
Twilio Signal 2016 Leading An Open Hardware Revolution
BlueHat v18 || The hitchhiker's guide to north korea's malware galaxy

Viewers also liked (7)

PDF
Automated Hacking Tools - Meet the New Rock Stars in the Cyber Underground
PDF
Advanced Malware Analysis
PDF
DEF CON 20 - Botnets Die Hard - Owned and Operated
PPT
Ethical_Hacking_ppt
PDF
Malware classification and detection
PDF
Collective classification for unknown malware detection - SECRYPT 2011
PDF
Detection of Malware Downloads via Graph Mining (AsiaCCS '16)
Automated Hacking Tools - Meet the New Rock Stars in the Cyber Underground
Advanced Malware Analysis
DEF CON 20 - Botnets Die Hard - Owned and Operated
Ethical_Hacking_ppt
Malware classification and detection
Collective classification for unknown malware detection - SECRYPT 2011
Detection of Malware Downloads via Graph Mining (AsiaCCS '16)
Ad

Similar to Situational Awareness, Botnet and Malware Detection in the Modern Era - Davide Papini - Codemotion Milan 2016 (20)

PPTX
The artificial reality of cyber defense
PPTX
AI for improved surveillance & cybersecurity
PPTX
Green and Cyan Modern Animated Tech Presentation.pptx
PDF
AI and Machine Learning in Cybersecurity.pdf
PPTX
Hacking with Skynet - How AI is Empowering Adversaries
PPTX
Group 8 Advanced Cybersecurity Applications using AI.pptx
PDF
presentazione informatica per sito web scuola
PPTX
An Efficient Framework for Detection & Classification of IoT BotNet.pptx
PPTX
NEUTRALIZATION OF BOTNET ACTIVITIES WITH MACHINE LEARNING AND AI APPROACH
PDF
The Pivotal Role of AI and ML in Modern Cybersecurity
PDF
AI for Cybersecurity Innovation
PDF
Digital marketing revolution in 2025 for business people
PPTX
Application of Machine Learning in Cybersecurity
PPTX
Leveraging Machine Learning to Enhance Cybersecurity v2.pptx
PDF
Avast @ Machine Learning
PPTX
Untitled design_20241205_00009_0000.pptx
PPTX
Advanced AI Applications for Cybersecurity.pptx
PPTX
Artificial Intelligence and Cybersecurity
PDF
Machine Learning in Cybersecurity.pdf
PDF
influence of AI in IS
The artificial reality of cyber defense
AI for improved surveillance & cybersecurity
Green and Cyan Modern Animated Tech Presentation.pptx
AI and Machine Learning in Cybersecurity.pdf
Hacking with Skynet - How AI is Empowering Adversaries
Group 8 Advanced Cybersecurity Applications using AI.pptx
presentazione informatica per sito web scuola
An Efficient Framework for Detection & Classification of IoT BotNet.pptx
NEUTRALIZATION OF BOTNET ACTIVITIES WITH MACHINE LEARNING AND AI APPROACH
The Pivotal Role of AI and ML in Modern Cybersecurity
AI for Cybersecurity Innovation
Digital marketing revolution in 2025 for business people
Application of Machine Learning in Cybersecurity
Leveraging Machine Learning to Enhance Cybersecurity v2.pptx
Avast @ Machine Learning
Untitled design_20241205_00009_0000.pptx
Advanced AI Applications for Cybersecurity.pptx
Artificial Intelligence and Cybersecurity
Machine Learning in Cybersecurity.pdf
influence of AI in IS
Ad

More from Codemotion (20)

PDF
Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...
PDF
Pompili - From hero to_zero: The FatalNoise neverending story
PPTX
Pastore - Commodore 65 - La storia
PPTX
Pennisi - Essere Richard Altwasser
PPTX
Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...
PPTX
Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019
PPTX
Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019
PPTX
Francesco Baldassarri - Deliver Data at Scale - Codemotion Amsterdam 2019 -
PDF
Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...
PDF
Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...
PDF
Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...
PDF
Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...
PDF
Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019
PDF
Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019
PPTX
Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019
PPTX
James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...
PDF
Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...
PDF
Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019
PDF
Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019
PDF
Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019
Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...
Pompili - From hero to_zero: The FatalNoise neverending story
Pastore - Commodore 65 - La storia
Pennisi - Essere Richard Altwasser
Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...
Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019
Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019
Francesco Baldassarri - Deliver Data at Scale - Codemotion Amsterdam 2019 -
Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...
Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...
Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...
Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...
Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019
Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019
Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019
James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...
Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...
Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019
Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019
Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019

Recently uploaded (20)

PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Encapsulation theory and applications.pdf
PDF
Unlocking AI with Model Context Protocol (MCP)
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
NewMind AI Monthly Chronicles - July 2025
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Modernizing your data center with Dell and AMD
PDF
Approach and Philosophy of On baking technology
PDF
Electronic commerce courselecture one. Pdf
PPT
Teaching material agriculture food technology
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PPTX
Cloud computing and distributed systems.
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Encapsulation_ Review paper, used for researhc scholars
Encapsulation theory and applications.pdf
Unlocking AI with Model Context Protocol (MCP)
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
20250228 LYD VKU AI Blended-Learning.pptx
NewMind AI Monthly Chronicles - July 2025
“AI and Expert System Decision Support & Business Intelligence Systems”
Modernizing your data center with Dell and AMD
Approach and Philosophy of On baking technology
Electronic commerce courselecture one. Pdf
Teaching material agriculture food technology
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Cloud computing and distributed systems.
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
The Rise and Fall of 3GPP – Time for a Sabbatical?
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Review of recent advances in non-invasive hemoglobin estimation
Chapter 3 Spatial Domain Image Processing.pdf
Digital-Transformation-Roadmap-for-Companies.pptx
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...

Situational Awareness, Botnet and Malware Detection in the Modern Era - Davide Papini - Codemotion Milan 2016

  • 1. SITUATIONAL AWARENESS, BOT- NET AND MALWARE DETECTION IN THE MODERN ERA Machine Learning Enabled Advanced Security CodeMotion Milan 2016 Davide Papini
  • 2. Introduction ML for Cyber Security Final Remarks ABOUT ME Research & Innovation @Ele ronica S.p.a. Postdoc @ISG Royal Holloway, UK on ML applied to cyber situational awareness. M.Sc. Telecommunication Engineering @Politecnico di Milano: → Erasmus @Danmarks Tekniske Universitet → Master Thesis on ``Anomaly Based Wireless Intrusion Detection Systems'' Ph.D. @Danmarks Tekniske Universitet: → ``Attacker Modeling in Ubiquitous Computing Systems'' → External stay at COSIC, KU Leuven 2
  • 3. Introduction ML for Cyber Security Final Remarks WHAT THIS TALK IS ABOUT Topics: Applications of ML in Cybersecurity research. Successful research: botnets, DGAs, early malware detection. ML traps. Evaluation metrics. NOT about: New ML algorithms. Showing one specific Security-ML based application. Wear you out with math. 3
  • 4. Introduction ML for Cyber Security Final Remarks MOTIVATIONAL SLIDE 4
  • 5. Introduction ML for Cyber Security Final Remarks MOTIVATIONAL SLIDE 4
  • 6. Introduction ML for Cyber Security Final Remarks MOTIVATIONAL SLIDE 5
  • 7. Introduction ML for Cyber Security Final Remarks MOTIVATIONAL SLIDE Control of the botnet for 10 days: 180,000 infections, recording of over 70GB of data. Torpig intercepts and records keystroke information at a low level, targeting a wide variety of applications and websites. Stealing financial and personal informations, login credentials for social networking etc. Torpig periodically uploads any new data that it has captured to a central server. The researchers were able to infiltrate the botnet by registering one of the domains from a list of potential ones infected machines use. 5
  • 8. Introduction ML for Cyber Security Final Remarks SOME STATISTICS h ps://www.mcafee.com/us/resources/reports/rp-quarterly-threats-sep-2016.pdf 6
  • 9. Introduction ML for Cyber Security Final Remarks SOME STATISTICS h ps://www.mcafee.com/us/resources/reports/rp-quarterly-threats-sep-2016.pdf 450,000 new malware per day. 20,000 is mobile malware. Includes: ransomware, botnets, rootkits, trojians … 6
  • 10. Introduction ML for Cyber Security Final Remarks NEED A GAME CHANGER Modern malware/intrusions are difficult to detect/block: Code obfuscation, polimorfism and packing. Malware written ad-hoc for specific targets. AVs are mainly signature-based. URL Blacklists cannot be updated fast enough. Local changes are often too small/subtle to be detected. Logs contains lot of noise (≃ 90%) 7
  • 11. Introduction ML for Cyber Security Final Remarks NEED A GAME CHANGER Modern malware/intrusions are difficult to detect/block: Code obfuscation, polimorfism and packing. Malware written ad-hoc for specific targets. AVs are mainly signature-based. URL Blacklists cannot be updated fast enough. Local changes are often too small/subtle to be detected. Logs contains lot of noise (≃ 90%) Need for intelligent approaches: Adapt to unforseen "events" Learn from data i.e. extract behaviours NOT signatures Leverage global knowledge Can be quasi-real-time. 7
  • 12. ML FOR CYBER SECURITY
  • 13. Introduction ML for Cyber Security Final Remarks Machine learning has been applied to many fields in security: Botnet detection and classification Mobile application analysis Spam detection and campaigns analysis Situational awareness through network traffic analysis Download malware detection and many more... Also in many flavours: Supervised Unsupervised combinations of those 9
  • 14. Introduction ML for Cyber Security Final Remarks BOTNETS Situational awareness: knowledge of the health status of a network (e.g. malware infections, intrusions and data exfiltration). Botnet: a network of bots (drones), i.e. programs installed on the machines of unwitting Internet users and receiving commands from a bot controller. 10
  • 15. Introduction ML for Cyber Security Final Remarks BOTNETS Situational awareness: knowledge of the health status of a network (e.g. malware infections, intrusions and data exfiltration). Botnet: a network of bots (drones), i.e. programs installed on the machines of unwitting Internet users and receiving commands from a bot controller. 10
  • 16. Introduction ML for Cyber Security Final Remarks BOTNETS C&C CHANNEL Bots connect to C&C Server in three ways: Hard coded IP: Bot → 1.2.3.4 Hard coded domain: Bot → badguy.ru → 1.2.3.4 Automatically Generated Domains: → Bot cycles through time-dependent domains. → Domain names are generated using a Domain Generation Algorithm. → The botmaster needs to register only one of those domains. jhhfghf7.tk faukiijjj25.tk pvgvy.tk cvq.com epu.org bwn.org 11
  • 17. Introduction ML for Cyber Security Final Remarks BOTNETS C&C CHANNEL Bots connect to C&C Server in three ways: Hard coded IP: Bot → 1.2.3.4 Hard coded domain: Bot → badguy.ru → 1.2.3.4 Automatically Generated Domains: → Bot cycles through time-dependent domains. → Domain names are generated using a Domain Generation Algorithm. → The botmaster needs to register only one of those domains. jhhfghf7.tk faukiijjj25.tk pvgvy.tk cvq.com epu.org bwn.org courtesy of E.Colombo - Cerberus 11
  • 18. Introduction ML for Cyber Security Final Remarks BOTNETS C&C CHANNEL Bots connect to C&C Server in three ways: Hard coded IP: Bot → 1.2.3.4 Hard coded domain: Bot → badguy.ru → 1.2.3.4 Automatically Generated Domains: → Bot cycles through time-dependent domains. → Domain names are generated using a Domain Generation Algorithm. → The botmaster needs to register only one of those domains. jhhfghf7.tk faukiijjj25.tk pvgvy.tk cvq.com epu.org bwn.org courtesy of E.Colombo - Cerberus Sinkholing: If domain is already registered botmaster looses control of botnets! 11
  • 19. Introduction ML for Cyber Security Final Remarks PHOENIX AND CERBERUS Developed at Polimi and ISG@RHUL System that relies on Machine Learning to identify DGA: Leverage known malicious and benign domain names to build a classifier: → Distinguish Human Generated Domains from AGD. → Identifies the DGA used: botnets might share the same DGA. Use unsupervised learning to identify new DGAs. Traffic comes from a na onal authoritative DNS server. S. Schiavoni et al., Phoenix: DGA-Based Botnet Tracking and Intelligence. In Detection of Intrusions and Malware, and Vulnerability Assessment (DIMVA) 2014. E. Colombo, Cerberus: Detec on and Characteriza on of Automa cally-Generated Malicious Domains. Master Thesis, Politecnico di Milano 2014. 12
  • 20. Introduction ML for Cyber Security Final Remarks PHOENIX AND CERBERUS Developed at Polimi and ISG@RHUL System that relies on Machine Learning to identify DGA: Leverage known malicious and benign domain names to build a classifier: → Distinguish Human Generated Domains from AGD. → Identifies the DGA used: botnets might share the same DGA. Use unsupervised learning to identify new DGAs. Traffic comes from a na onal authoritative DNS server. S. Schiavoni et al., Phoenix: DGA-Based Botnet Tracking and Intelligence. In Detection of Intrusions and Malware, and Vulnerability Assessment (DIMVA) 2014. E. Colombo, Cerberus: Detec on and Characteriza on of Automa cally-Generated Malicious Domains. Master Thesis, Politecnico di Milano 2014. Malicious Domains Phoenix Clusters Time DetectiveSuspicious Domains Filtering DNS Stream Classifier Bootstrap Filtering Detection courtesy of E.Colombo - Cerberus 12
  • 21. Introduction ML for Cyber Security Final Remarks CERBERUS FINDINGS 187 malicious domains detected and labeled 3,576 suspicious domains collected 47 clusters of DGA-generated domains discovered 319 new domains detected in the next 24 hours 13
  • 22. Introduction ML for Cyber Security Final Remarks MASTINO: REALTIME MALWARE DETECTION Developed at TrendMicro and presented Defcon London 2016 System for advanced realtime malware detection: Leverages global knowledge on download events Classifies malware from goodware Based on statistical evidence and graph analysis: Tripartite graph: URLs, Files, Machines Intrinsic features e.g. → file: size, obfuscated, signed; → url: FQD, e2LD, query path → machine: malware download history, processes Behaviour-based features: → Consider reputation of neighboring nodes → Help to classify unknown nodes 14
  • 23. Introduction ML for Cyber Security Final Remarks MASTINO: REALTIME MALWARE DETECTION Developed at TrendMicro and presented Defcon London 2016 System for advanced realtime malware detection: Leverages global knowledge on download events Classifies malware from goodware Based on statistical evidence and graph analysis: Tripartite graph: URLs, Files, Machines Intrinsic features e.g. → file: size, obfuscated, signed; → url: FQD, e2LD, query path → machine: malware download history, processes Behaviour-based features: → Consider reputation of neighboring nodes → Help to classify unknown nodes Huge work on feature enginering! 14
  • 24. Introduction ML for Cyber Security Final Remarks MASTINO SYSTEM OVERVIEW Copyright 2016 Trend Micro Inc.7 System Overview courtesy of M.Balduzzi - TrendMicro 15
  • 25. Introduction ML for Cyber Security Final Remarks MASTINO TRAINING AND DETECTION courtesy of M.Balduzzi - TrendMicro 16
  • 26. Introduction ML for Cyber Security Final Remarks MASTINO RESULTS Mastino evaluation: On testing dataset: 95.8% TP, 0.5% FP Early detection experiment, deployed in the wild for 6 months: → Detected 84% of future malware → Verified later through VirusTotal 17
  • 27. Introduction ML for Cyber Security Final Remarks MASTINO RESULTS Mastino evaluation: On testing dataset: 95.8% TP, 0.5% FP Early detection experiment, deployed in the wild for 6 months: → Detected 84% of future malware → Verified later through VirusTotal Detec on me ≃ 0.16s! 17
  • 28. Introduction ML for Cyber Security Final Remarks ISSUES Traditional ML developed for ``natural'' objects: Natural Language Processing. Image analysis e.g. picture text search. Classification of plants animals. Economics laws. Metrics like ROC, FP, FN, work very well in these cases, however cyberworld is not natural: Things change abruptly e.g. updates, new malware, new technologies. There is no clear evolutionary law. Change is deterministic and unpredictable. Behaviours change/slide over time. 18
  • 29. Introduction ML for Cyber Security Final Remarks ML TRAPS Machine learning often seen as a black-box panacea: Little is understood. Results with hi accuracy taken without questioning quality. However: Overfitting: if training and testing is not done carefully. Validity of results: a system that works on paper may not work in the field. Datasets: Variety vs Chronology 19
  • 30. Introduction ML for Cyber Security Final Remarks ML TRAPS Machine learning often seen as a black-box panacea: Little is understood. Results with hi accuracy taken without questioning quality. However: Overfitting: if training and testing is not done carefully. Validity of results: a system that works on paper may not work in the field. Datasets: Variety vs Chronology Need for novel metrics! 19
  • 31. Introduction ML for Cyber Security Final Remarks CONFORMAL EVALUATOR Library developed at Informa on Security Group at Royal Holloway: Evaluates algorithms in terms of confidence and credibility. Core is Non-Conformity measure, elicited directly from the algorithm, which in essence tells the difference between a sample and a set of samples. Builds decision and alpha assessments to evaluate the algorithm. R. Jordaney, Z. Wang, D. Papini, I. Nouretdinov and L. Cavallaro, Misleading Metrics: On Evalua ng Machine Learning for Malware with Confidence, Technical Report 2016-1 Royal Holloway University of London. 20
  • 32. Introduction ML for Cyber Security Final Remarks CONFORMAL EVALUATOR Library developed at Informa on Security Group at Royal Holloway: Evaluates algorithms in terms of confidence and credibility. Core is Non-Conformity measure, elicited directly from the algorithm, which in essence tells the difference between a sample and a set of samples. Builds decision and alpha assessments to evaluate the algorithm. R. Jordaney, Z. Wang, D. Papini, I. Nouretdinov and L. Cavallaro, Misleading Metrics: On Evalua ng Machine Learning for Malware with Confidence, Technical Report 2016-1 Royal Holloway University of London. Training and Testing Dataset Similarity Based Classification/Clustering Algorithm Conformal Evaluator Alpha Assessment Decision Assessment Non-Conformity Measure Conformal Evaluator Overview 20
  • 33. Introduction ML for Cyber Security Final Remarks CE: EXAMPLE 1 System for Botnet detection and classification bifrose sasfis blackenergy banbra pushdo 0.0 0.2 0.4 0.6 0.8 1.0 0.86 0.27 0.29 0.31 0.9 0.2 0.84 0.18 0.95 0.15 Average algorithm correct choice Average algorithm credibility Average algorithm confidence bifrose sasfis blackenergy banbra pushdo 0.0 0.2 0.4 0.6 0.8 1.0 0.42 0.53 0.58 0.17 0.68 0.29 0.62 0.29 0.73 0.12 Average algorithm incorrect choice Average algorithm credibility Average algorithm confidence Decision Assessment 21
  • 34. Introduction ML for Cyber Security Final Remarks CE: EXAMPLE 1 System for Botnet detection and classification bifrose sasfis blackenergy banbra pushdo 0.0 0.2 0.4 0.6 0.8 1.0 0.86 0.27 0.29 0.31 0.9 0.2 0.84 0.18 0.95 0.15 Average algorithm correct choice Average algorithm credibility Average algorithm confidence bifrose sasfis blackenergy banbra pushdo 0.0 0.2 0.4 0.6 0.8 1.0 0.42 0.53 0.58 0.17 0.68 0.29 0.62 0.29 0.73 0.12 Average algorithm incorrect choice Average algorithm credibility Average algorithm confidence Decision Assessment bifrose's samples sasfis's samples blackenergy's samples banbra's samples pushdo's samples 0.0 0.2 0.4 0.6 0.8 1.0 P-values P-values: bifrose P-values: sasfis P-values: blackenergy P-values: banbra P-values: pushdo Alpha Assessment 21
  • 35. Introduction ML for Cyber Security Final Remarks CE: EXAMPLE 1 System for Botnet detection and classification bifrose sasfis blackenergy banbra pushdo 0.0 0.2 0.4 0.6 0.8 1.0 0.86 0.27 0.29 0.31 0.9 0.2 0.84 0.18 0.95 0.15 Average algorithm correct choice Average algorithm credibility Average algorithm confidence bifrose sasfis blackenergy banbra pushdo 0.0 0.2 0.4 0.6 0.8 1.0 0.42 0.53 0.58 0.17 0.68 0.29 0.62 0.29 0.73 0.12 Average algorithm incorrect choice Average algorithm credibility Average algorithm confidence Decision Assessment bifrose's samples sasfis's samples blackenergy's samples banbra's samples pushdo's samples 0.0 0.2 0.4 0.6 0.8 1.0 P-values P-values: bifrose P-values: sasfis P-values: blackenergy P-values: banbra P-values: pushdo Alpha Assessment Although the algorithm has reasonably good re- sults on paper, CE shows the quality of the re- sults is not good! We run experiments on another dataset to confirm, and the classifier get worse. 21
  • 36. Introduction ML for Cyber Security Final Remarks CE: EXAMPLE 2 Mobile App classification: Malware vs Goodware Correct choices Incorrect choices0.0 0.2 0.4 0.6 0.8 1.0 Average algorithm credibility for correct choice Average algorithm confidence for correct choice Average algorithm credibility for incorrect choice Average algorithm confidence for incorrect choice MALICIOUS's samples BENIGN's samples 0.0 0.2 0.4 0.6 0.8 1.0 P-values P-values: MALICIOUS P-values: BENIGN 22
  • 38. Introduction ML for Cyber Security Final Remarks FINAL REMARKS Getting your hands in the game, what you need: You need to study a bit of ML You need a problem You need data You need good metrics In the wild analysis is a plus You need tools: → We did everything in python: Numpy, Scipy → ML libraries: sk-learn, shogun-toolbox.org 24
  • 39. Introduction ML for Cyber Security Final Remarks FINAL REMARKS Machine Learning is great for Cyber Security! 25
  • 40. Introduction ML for Cyber Security Final Remarks FINAL REMARKS Machine Learning is great for Cyber Security! Thanks for listening: Ques ons? 25