SlideShare a Scribd company logo
Copyright © 2016 Splunk Inc.
Dynamic Population
Discovery for Lateral
Movement Detection
Rod Sotto & Joseph Zadeh
Splunk UBA Team
INTRODUCTION
$Whoami…
- Joseph Zadeh
Senior Data Scientist with Splunk User Behavioral
Analytics.
- Rod Soto
Senior Security Researcher with Splunk User Behavioral
Analytics.
What is Lateral Movement?
● Lateral movement are series of actions
conducted after a successful exploitation attack
or infiltration in a organization’s network that
seeks to further reconnaissance and expand
reach of attacker by gaining knowledge of
internal network assets and accessing them.
Lateral Movement process
Example
Objectives
● Main objectives of lateral move are:
- Gain further knowledge of internal network
assets.
- Expand access into other systems
Ultimate goal is to get “Crown Jewels” which may be AD
Domain admin credentials or access to valuable,
sensitive information.
Tools
● Some of the tools used for lateral movement include:
- Keyloggers, ARP spoofing, PwDump, Mimikatz
- PsExec, WMI, PowerShell, Metasploit, sc, at,
wmic, reg, winrs
- RDP, SSH, VNC
- Exploits (PTH/PTT), BruteForce tools
Why is lateral movement detection important?
● Let’s talk about the concept of DWELL, or Detection
Deficit (VZN)
● Rapid detection of lateral movement can reduce,
contain and prevent further impact of a breach
● Detection of lateral move enables SOC/SECOPS/IR/DR
teams to act in a more efficient manner
● Increases cost/deters attackers and would be
external attackers, as well as insiders
How do we establish assets (current
available technologies)
● “In information security, computer security and
network security an Asset is any data,device, or other
component of the environment that supports
information-related activities. Assets generally include
hardware (e.g. servers and switches), software (e.g.
mission critical applications and support systems) and
confidential information.[1][2] Assets should be
protected from illicit access, use, disclosure,
alteration, destruction, and/or theft, resulting in loss
to the organization.” *Wikipedia
● Anything that is network enabled inside the
perimeter should be consider an asset, most common
assets are: Network servers, Routers, Switches,
Databases, Application Servers, Workstations,
Printers, IoTs.
The Importance of Asset Management
● No asset management = No risk analysis
● Unmonitored unsupervised assets are
likely to be targeted and exploited by
attackers.
● Lack of OS/Application version/patch level
increases risk of compromise
● Enables access management to resources
inside the perimeter
● Enables SECOPS/IR/DS to identify and
assess resources in case of incident
PRACTICAL ML FOR SECURITY
Use the right tools for the job
Decomposing Behaviors for Intrusion Detection
Cybersecurity Analytics: ROIv1
Behaviors: Sequential + “Unordered”
• Sequential Behaviors
– Exploit Chains
– Timing Analysis (Periodicity)
– Active Directory Sequence
– Authentication Graph
• Non Sequential Behaviors
– Fingerprinting
– Grouping Behaviors
– Application Counts
– Rare file extension counts for
Webshell detection
Mapping Behaviors to Code
• Easy to Parallelize
– Count()
– Average()
– Time series()
– Local state computations
 Per user/IP/account/…
• Hard to Parallelize (NC Complete
Complexity)
– Rank()
– Median
– …
– Anything that keeps track of global
state
Adversarial Drift
● Current status quo, is driven by adversaries developing and
introducing changes in their TTPs, bypassing all current detection
technologies.
Advesarial Models
• Machine Learning
Looses
Effectiveness the
more complex the
adversary
Advesarial Models
Automatable
Actions: Good for
ML
Non-Automatable
Actions: Hybrid
Human/Computer
Analysis
Learning = Compression?
● There is a duality between learning and compression
Input Data Total
Size = 1 GB
Learned output is a
set of “coefficients”
Total Output Size =
1K
Primary Key
Tim
e
UserI
D
Count
Row 1 … … …
Row 2 … … …
Row 3 … … …
… … … …
Row N … … …
C1 C2 C3 C4 C5
Learning = Compression?
● Example of Linear Regression in R
Learning = Compression?
● Train a model to predict mpg as a function of car weight, number of
cylinders and displacement
Learning = Compression?
● Train a model to predict mpg as a function of car weight, number of
cylinders and displacement
Learning = Compression?
● The overall input data is reduced in a “compressed form” to use in
future predictions
Learning = Compression?
● This process is extremely brittle in terms of modeling a changing
signal or an adversary that changes patterns over time
Learning = Compression?
● The simple linear model gives us output that separates the Signal
from the Noise (this is not always possible with a model)
Learning = Compression?
● Real example of random forest trained on C2 traffic
Learning = Compression?
● We really “learn” a function we can call in batch or real time
When is a model ready?
29
SECURITY ANALYTICS FOR DEFENSE
“But all too often we forget the first rule of battle - the battlefield – the
attacker can escape everything it cannot escape the terrain – choose the
terrain, use the terrain – we win” Sun Tzu
High Level Objectives
– Asset Class Discovery
‣ Identify all things acting like device type “X”
– Identify key services/assets in the DMZ
– Identify human / non human by device
– Anomalies on rare paths
‣ U->S
‣ S->U
‣ U->U (LAN to LAN)
‣ S->S (DMZ to LAN)
– Identity Resolution Impossible Mappings
Modeling Methodology
● Step 1: Identity Resolution
● Step 2: Topology Discovery
● Step 3: Behavioral Profiles
● Step 4: Client/Server Relationship Discovery
● Step 5: Monitor for changes in asset relationship
graph
Raw Data
Learn DMZ
Assets
Asset/Service Dynamic Discovery
Spark Data Frame
Fixed Services
Discovery:
FTP, HTTP
Identity
Resolution
Anomalies: U->S, S->U, U->U (LAN to LAN), S->S (DMZ to LAN)
Pull in Other
Data
(Beacons/Finger
print)
Mapping
Anomalies
Human
Fingerprint
Seeing the Analytic In Action
Seeing the Analytic In Action
Seeing the Analytic In Action
● Once identity resolution/learning process is complete we create
new anomalies based on new paths/actions that are rare for a
particular population profiel
Lightweight Webshell in
the DMZ
STEP1: IDENTITY RESOLUTION
GARBAGE IN GARBAGE OUT
Identity Resolution
● Many possible ways to attack the identity resolution problem with
enterprise solutions but this usually has complexity
● Smaller scale shops should leverage work already done here - SIEM
is a good example a tool that normalizes lots of these scenarios
● Advanced Pattern – Inventory Based Trust : Usenix 2016
“BeyondCorp: Design to Deployment at Google”
ID Resolution WORKFLOW
DHCP
IMS/IPAM
FW
Proxy
VPN
AD
Active ID Table
ID Res Event ID Fi
lter
DHCP State Table
IMS State Table
AD State Table
Duplicate Streams
Identity Annotator /
Normalization Engine
Algorithms
Similar to SQL’s Coallase:
Username = select coallesce(user_name,
hostname, IP) from Active_ID_Table where IP
= ‘10.10.100.23)
ETL Online Mode: Raw Individual Streams
Incremental load: Prioritizing updates to state table
in real time
1. Assign priority to data streams for
automated ETL of
daily/weekly/incremental updates
2. Update Active ID Table before any other
workflow task begins
DHCP
IMS/IPAM
FW
Proxy
VPN
AD
Active ID Table
ID Res Event ID Fi
lter
DHCP State Table
IMS State Table
AD State Table
ETL Online Mode: Raw Individual Streams
DHCP
AD
ID Res Event ID Fi
lter
DHCP State Table
AD State Table
1. Drop all tuples not containing Event ID =
673, EventID = 4663
2. ID data extractor for keeping only key
data points necessary for AD State table
IP_Address Hostname MAC LastLease_Timestamp
10.10.50.25 dave.eng.acme.com 58:5c:35:c3:6e:a4 2014-03-10T13:00:00
10.10.50.25 dave.eng.acme.com 58:5c:35:c3:6e:a4 2014-03-10T14:00:00
10.10.50.25 dave.eng.acme.com 58:5c:35:c3:6e:a4 2014-03-10T22:30:00
10.10.50.25 dave.eng.acme.com 58:5c:35:c3:6e:a4 2014-03-11T09:00:00
10.100.1.23 dave.eng.acme.com 58:5c:35:c3:6e:a4 2014-03-11T14:00:00
10.5.12.2 scott.hr.acme.com 12:3a:74:b2:6a:22 2014-03-10T10:00:00
10.5.12.2 scott.hr.acme.com 12:3a:74:b2:6a:22 2014-03-10T14:30:00
192.168.1.65 scott.hr.acme.com 00:50:a6:d2:21:01 2014-03-10T14:30:00
10.5.12.2 scott.hr.acme.com 12:3a:74:b2:6a:22 2014-03-10T17:30:00
192.168.1.65 scott.hr.acme.com 1b:31:a5:1d:b0:11 2014-03-11T14:50:00
10.13.11.221 scott.hr.acme.com 12:3a:74:b2:6a:22 2014-03-12T14:30:00
AD_IP Username FQDN Event_Time
10.10.50.25 dave dave.eng.acme
2014-03-
10T13:00:00
10.10.50.25 dave dave.eng.acme
2014-03-
10T14:00:00
10.10.50.25 dave dave.eng.acme
2014-03-
11T09:00:00
10.100.1.23 dave@acme.com
2014-03-
11T14:00:00
10.5.12.2 scott scott.hr.acme
2014-03-
10T10:00:00
192.168.1.6
5 scott@acme.com
2014-03-
10T14:30:00
ETL Online Mode: Real Time Active State Table
IP_Address Hostname MAC LastLease_Timestamp
10.10.50.25 steve.eng.acme.com 58:5c:35:c3:6e:a4 2014-03-10T13:00:00
10.10.50.25 dave.eng.acme.com 58:5c:35:c3:6e:a4 2014-03-10T14:00:00
10.10.50.25 dave.eng.acme.com 58:5c:35:c3:6e:a4 2014-03-10T22:30:00
10.10.50.25 dave.eng.acme.com 58:5c:35:c3:6e:a4 2014-03-11T09:00:00
10.100.1.23 dave.eng.acme.com 58:5c:35:c3:6e:a4 2014-03-11T14:00:00
10.5.12.2 scott.hr.acme.com 12:3a:74:b2:6a:22 2014-03-10T10:00:00
10.5.12.2 scott.hr.acme.com 12:3a:74:b2:6a:22 2014-03-10T14:30:00
192.168.1.6
5 scott.hr.acme.com 00:50:a6:d2:21:01 2014-03-10T14:30:00
10.5.12.2 scott.hr.acme.com 12:3a:74:b2:6a:22 2014-03-10T17:30:00
192.168.1.6
5 scott.hr.acme.com
1b:31:a5:1d:b0:1
1 2014-03-11T14:50:00
10.13.11.22
1 scott.hr.acme.com 12:3a:74:b2:6a:22 2014-03-12T14:30:00
AD_IP Username FQDN Event_Time
10.10.50.25 dave dave.eng.acme
2014-03-
10T13:00:00
10.10.50.25 dave dave.eng.acme
2014-03-
10T14:00:00
10.10.50.25 dave dave.eng.acme
2014-03-
11T09:00:00
10.100.1.23 dave@acme.com
2014-03-
11T14:00:00
10.5.12.2 scott scott.hr.acme
2014-03-
10T10:00:00
192.168.1.6
5 scott@acme.com
2014-03-
10T14:30:00
10.13.11.22
1 scott
2014-03-
12T14:30:00
192.168.1.6
5 scot scott.hr.acme
2014-03-
11T14:50:00
IP DHCP.hostname DHCP.MAC DHCP_Lasteventtime AD_username AD_FQDN AD_Lasteventtime
10.100.1.23 dave.eng.acme.com 58:5c:35:c3:6e:a4 2014-03-11T14:00:00 dave@acme.com dave.eng.acme.com 2014-03-11T14:00:00
10.13.11.22
1 scott.hr.acme.com 12:3a:74:b2:6a:22 2014-03-12T14:30:00 scott scot.hr.acme.com 2014-03-12T14:30:00
10.131.1.4 admin NULL NULL domain_admin acme.com 2014-03-12T23:00:00
Primary Key
STEP 2: TOPOLOGY DISCOVERY
Learning the Local Layers
Map the lower layers of the OSI model passively
● Infer key properties
– DMZ blocks (often times we find new segments this way)
– LAN only blocks
– VLAN behavior (Student VLAN, ADMIN VLAN, STAFF VLAN)
● Keep in mind we loose visibility into switched traffic flows (layer 2 is
hard to see at scale)
Basic Features Built in this Step
● Graph Features
– Source/Destination behavior
‣ How many hosts talk to this IP? (In Degree)
‣ How many hosts are talked to by this IP? (Out Degree)
● Layer 2/Layer 3 Features
– IP Subnet Behavior
‣ # LAN to LAN conversations (non routable IP flows)
‣ # LAN to WAN conversations (non routable address to routable routable
address )
Graph Features: Example
STEP 3: BEHAVIOR BASED
PROFILING
Asset Fingerprints
● Goal is to use machine learning, vanilla/fuzzy
correlation to discover some common asset classes
(ML term sometimes is class labels)
*nix ServerDesktop Laptop MS Server
Biomedical
Devices
IOT Energy Meters
What are behavioral profiles? What does it apply
to?● Windows 2008/2003 Server Profile
– Flow Characteristics:
‣ Byte distribution ratios are asymmetric
– Application Layer Characteristics
‣ SMB, Netbios MS-Update
‣ Number of unique domains per day
● Windows 2008/2003 End Device Profile
– Flow Characteristics:
– Application Layer Characteristics:
‣ Facebook Chat, social media, twitter,
‣ Non uniform browsing patterns
● *nix Server/End Device Device Profile
– Flow Characteristics:
– Application Layer Characteristic
‣ Software updates for distros (ubuntu, rhel)
Representing a Profile Over Time
Comparing a Profile to a Group
Application Fingerprints
Layer 7 Info
● Not always possible to build these kind of statistics
without higher layer application data or PCAPs
ML/Stat Workflow Engine
STEP 4: CLIENT/SERVER RELATION
DISCOVERY
Its all in the Bytes
● Depending on what type of visibility you have you can
leverage certain levels of granularity
– Flows (Netflow v 7) you get number of packets per flow very
important
– PCAPS best case scenario but hard to log/process at scale for
large environments
– Higher layers might get a loss of signal
Dynamic Population Discovery for Lateral Movement (Using Machine Learning)
Histogram of Byte Distribution
Group Based Comparisons
STEP 5: MONITOR FOR CHANGES IN
ASSET RELATIONSHIP GRAPH
“At this point all the hard work is done”…
Mining For Relationship Anomalies
● Anomalies on rare paths
– U->S
– S->U !!
– U->U (LAN to LAN)
– S->S (DMZ to LAN)!!
Desktop Server Desktop Laptop
LAN AssetDMZ Server
Webshell
DMZ to LAN Trust
Beyond the Indicator
Seeing the Analytic In Action
Seeing the Analytic In Action
Seeing the Analytic In Action
● Once identity resolution/learning process is complete we create
new anomalies based on new paths/actions that are rare for a
particular population profiel
Lightweight Webshell in
the DMZ
Conclusion - Rod - Joe
● New approaches in machine learning and data
science can help improve lateral movement
detection.
● Establish behavioral patterns based on data driven
approaches can provide tools for detecting and
predicting unusual, high risk and malicious behavior
patterns in users and use of assets.
● We have been successful catching webshells and new
kinds of in memory malware using the rare path
approach
– U->S
– S->U !!
– U->U (LAN to LAN)
– S->S (DMZ to LAN)!!
Q&A
● Thank you
● Rod Soto @rodsoto
● Joseph Zadeh @josephzadeh
APPENDIX
Cybersecurity Analytics: ROIv1
Cybersecurity Analytics: ROIv1
Cybersecurity Analytics: ROIv1
Key to ML: Label Your Analysis
● This is how the algorithms will “learn” from human
expertise and help support a common security
workflow
Domain Name TotalCnt RiskFactor
AGD
SessionTime RefEntropy NullUa Outcome
yyfaimjmocdu.com 144 6.05 1 1 0 0 Malicious
jjeyd2u37an30.com 6192 5.05 0 1 0 0 Malicious
cdn4s.steelhousemedia.com 107 3 0 0 0 0 Benign
log.tagcade.com 111 2 0 1 0 0 Benign
go.vidprocess.com 170 2 0 0 0 0 Benign
statse.webtrendslive.com 310 2 0 1 0 0 Benign
cdn4s.steelhousemedia.com 107 1 0 0 0 0 Benign
log.tagcade.com 111 1 0 1 0 0 Benign
Human Expertise is manually encoded into a format
computers understand: Sometimes this process is
called Labeling or “Truth-ing” the data
Lambda Architecture
74
• Architecture is described by three simple equations:
batch view = function(all data)
realtime view = function(realtime view, new data)
query = function(batch view, realtime view)
Lambda Security
DHCP
IMS/IPAM
FW
Proxy
VPN
AD
Data
Ingest
Lambda Security
DHCP
IMS/IPAM
FW
Proxy
VPN
AD
Real Time Identity Resolution
Distributed
ETL
Username = select
coallesce(user_name,
hostname, IP) from
Active_ID_Table
where IP =
‘10.10.100.23)
IP DHCP.MAC DHCP_Lasteventtime AD_FQDN
10.100.1.23 58:5c:35:c3:6e:a4 2014-03-11T14:00:00 joe.eng.acme.com
10.13.11.221 12:3a:74:b2:6a:22 2014-03-12T14:30:00 ad.hr.acme.com
Sequential
Models and
IOC’s
Data
Ingest
Real Time Layer
Lambda Security
77
DHCP
IMS/IPAM
FW
Proxy
VPN
AD
Real Time Identity Resolution
Distributed
ETL
Username = select
coallesce(user_name,
hostname, IP) from
Active_ID_Table
where IP =
‘10.10.100.23)
IP DHCP.MAC DHCP_Lasteventtime AD_FQDN
10.100.1.23 58:5c:35:c3:6e:a4 2014-03-11T14:00:00 joe.eng.acme.com
10.13.11.221 12:3a:74:b2:6a:22 2014-03-12T14:30:00 ad.hr.acme.com
Sequential
Models and
IOC’s
Data
Ingest
Large Scale Models and
Non-Sequential IOC’s
Real Time Layer
Batch
Layer
Lambda Security
78
DHCP
IMS/IPAM
FW
Proxy
VPN
AD
Real Time Identity Resolution
Distributed
ETL
Username = select
coallesce(user_name,
hostname, IP) from
Active_ID_Table
where IP =
‘10.10.100.23)
IP DHCP.MAC DHCP_Lasteventtime AD_FQDN
10.100.1.23 58:5c:35:c3:6e:a4 2014-03-11T14:00:00 joe.eng.acme.com
10.13.11.221 12:3a:74:b2:6a:22 2014-03-12T14:30:00 ad.hr.acme.com
Sequential
Models and
IOC’s
Data
Ingest
Large Scale Models and
Non-Sequential IOC’s
Real Time Layer
Batch
Layer
Hybrid View
(Batch + Real
Time)
79
DHCP
IMS/IPAM
FW
Proxy
VPN
AD
Real Time Identity Resolution
Distributed
ETL
Username = select
coallesce(user_name,
hostname, IP) from
Active_ID_Table
where IP =
‘10.10.100.23)
IP DHCP.MAC DHCP_Lasteventtime AD_FQDN
10.100.1.23 58:5c:35:c3:6e:a4 2014-03-11T14:00:00 joe.eng.acme.com
10.13.11.221 12:3a:74:b2:6a:22 2014-03-12T14:30:00 ad.hr.acme.com
Sequential
Models and
IOC’s
Data
Ingest
Large Scale Models and
Non-Sequential IOC’s
Hybrid View
(Batch + Real
Time)
80
DHCP
IMS/IPAM
FW
Proxy
VPN
AD
Real Time Identity Resolution
Distributed
ETL
Username = select
coallesce(user_name,
hostname, IP) from
Active_ID_Table
where IP =
‘10.10.100.23)
IP DHCP.MAC DHCP_Lasteventtime AD_FQDN
10.100.1.23 58:5c:35:c3:6e:a4 2014-03-11T14:00:00 joe.eng.acme.com
10.13.11.221 12:3a:74:b2:6a:22 2014-03-12T14:30:00 ad.hr.acme.com
Sequential
Models and
IOC’s
Data
Ingest
Large Scale Models and
Non-Sequential IOC’s
Automated process to
accelerate workflows like
Splunk Query to retrieve PCAP
for further analysis combined
with automatic VT/heuristic
correlations
Hybrid View
(Batch + Real
Time)
ML + Sequencing the Security DNA
● We parallelize across many nodes (JVMs) and use both real time and
batch computations
JVM 1
JVM 2
JVM 3
1. GET http://forbes.com/gels-contrariness-domain-
punchable/"
2. GET http://portcullisesposturen.europartsplus.org/
3. POST http://dpckd2ftmf7lelsa.jjeyd2u37an30.com/
1. GET http://youtube.com/
2. GET http://avazudsp.net/
3. GET http://betradar.com/
4. GET http://displaymarketplace.com/
1. GET http:/clickable.net/
2. GET http://vuiviet.vn/
3. GET http://homedepotemail.com/
4. GET http://css-tricks.com/
Command and Control (C2) traffic has
been established between “Beachead”
and command and control operator
Heartbeat traffic
signals C2 operator
that infected asset
is up and ready for
instructions
Obfuscated instructions get returned through an
Upstream conversation embedded in PHP, .js, Flash, etc..
Commands obfuscated in this way can be through of as a
hidden “Downstream Beacon”
Embedded commands can signal infected asset to enumerate
local information on the machine, attach to open network
shares and perform lateral reconnaissance and privilege
escalation throughout the compromised network
After targeted lateral movement and privilege
enumeration all cases of targeted attacks
eventually involve the compromise of the directory
services roots servers (Usually AD Forest Roots) and
exfiltration of key personnel information along with
any
BFS/DFS + Other classic graph search algorithms are a great
examples of algorithms useful in detecting this graph signature
Edge weights can be encoded with key security features to
increase overall model accuracy regardless of the underlying
algorithms
How can we automate discovery and data
aggregation of DMZ assets - Joe
Proof of Concept / Example – Joe - Rod
Explanation of data science tools and techniques
used for analysis – Joe
How can this be applied to layer 4/7 data or
PCAP data - Joe
Copyright © 2016 Splunk Inc.
Command and Control (C2)
traffic has been established
between compromised
hosts inside the corporate
network and C2 servers
Copyright © 2016 Splunk Inc.
Command and Control (C2)
traffic has been established
between compromised
hosts inside the corporate
network and C2 servers
Copyright © 2016 Splunk Inc.
C2 Infrastructure changes locations of
command and control server new
communication path is established
Copyright © 2016 Splunk Inc.
C2 Infrastructure changes locations of
command and control server new
communication path is established
Copyright © 2016 Splunk Inc.
http://en.wikipedia.org/wiki/Fast_flux: Fast flux is a DNS technique used by botnets to hide phishing
and malware delivery sites behind an ever-changing network of compromised hosts acting as proxies.
It can also refer to the combination of peer-to-peer networking, distributed command and control,
web-based load balancing and proxy redirection used to make malware networks more resistant to
discovery and counter-measures. The Storm Worm is one of the recent malware variants to make use
of this technique.
Copyright © 2016 Splunk Inc.
At each time step (typically a day or two)
the C2 Infrastructure changes locations of
command and control via this “Fluxing”
behavior. A subset of these type of graph
patterns is known as “Fast Fluxing”
Copyright © 2016 Splunk Inc.
The constant mobility of
command and control
infrastructure will
continue this IP/Domain
fluxing movement until
detected
Copyright © 2016 Splunk Inc.
The constant mobility of
command and control
infrastructure will
continue this IP/Domain
fluxing movement until
detected

More Related Content

PPTX
Strata 2015 Presentation -- Detecting Lateral Movement
PDF
Windows Threat Hunting
PDF
The Lambda Defense Functional Paradigms for Cyber Security
PPTX
AktaionPPTv5_JZedits
PDF
How to Hunt for Lateral Movement on Your Network
PDF
Advanced Threats and Lateral Movement Detection
PPTX
My Keynote from BSidesTampa 2015 (video in description)
PPTX
Hunting: Defense Against The Dark Arts - BSides Philadelphia - 2016
Strata 2015 Presentation -- Detecting Lateral Movement
Windows Threat Hunting
The Lambda Defense Functional Paradigms for Cyber Security
AktaionPPTv5_JZedits
How to Hunt for Lateral Movement on Your Network
Advanced Threats and Lateral Movement Detection
My Keynote from BSidesTampa 2015 (video in description)
Hunting: Defense Against The Dark Arts - BSides Philadelphia - 2016

What's hot (20)

PDF
DeepLocker - Concealing Targeted Attacks with AI Locksmithing
PDF
PHDays 2018 Threat Hunting Hands-On Lab
PDF
The Finest Penetration Testing Framework for Software-Defined Networks
PDF
Cyber Threat hunting workshop
PPTX
Penetration Testing vs. Vulnerability Scanning
PDF
Applied cognitive security complementing the security analyst
PDF
Full-System Emulation Achieving Successful Automated Dynamic Analysis of Evas...
PDF
Hunting: Defense Against The Dark Arts
PPTX
Using Static Binary Analysis To Find Vulnerabilities And Backdoors in Firmware
PDF
Applied machine learning defeating modern malicious documents
PPTX
Billions & Billions of Logs
PDF
SPO2-T11_Automated-Prevention-of-Ransomware-with-Machine-Learning-and-GPOs
PPT
Next Generation Advanced Malware Detection and Defense
PPTX
Purple team is awesome
PPTX
Go Hack Yourself - 10 Pen Test Tactics for Blue Teamers
PDF
IoT Malware: Comprehensive Survey, Analysis Framework and Case Studies
PPTX
Reacting to Advanced, Unknown Attacks in Real-Time with Lastline
PDF
Machine learning cybersecurity boon or boondoggle
PDF
H@dfex 2015 malware analysis
PDF
MITRE ATTACKCon Power Hour - December
DeepLocker - Concealing Targeted Attacks with AI Locksmithing
PHDays 2018 Threat Hunting Hands-On Lab
The Finest Penetration Testing Framework for Software-Defined Networks
Cyber Threat hunting workshop
Penetration Testing vs. Vulnerability Scanning
Applied cognitive security complementing the security analyst
Full-System Emulation Achieving Successful Automated Dynamic Analysis of Evas...
Hunting: Defense Against The Dark Arts
Using Static Binary Analysis To Find Vulnerabilities And Backdoors in Firmware
Applied machine learning defeating modern malicious documents
Billions & Billions of Logs
SPO2-T11_Automated-Prevention-of-Ransomware-with-Machine-Learning-and-GPOs
Next Generation Advanced Malware Detection and Defense
Purple team is awesome
Go Hack Yourself - 10 Pen Test Tactics for Blue Teamers
IoT Malware: Comprehensive Survey, Analysis Framework and Case Studies
Reacting to Advanced, Unknown Attacks in Real-Time with Lastline
Machine learning cybersecurity boon or boondoggle
H@dfex 2015 malware analysis
MITRE ATTACKCon Power Hour - December
Ad

Similar to Dynamic Population Discovery for Lateral Movement (Using Machine Learning) (20)

PPTX
Advanced malware analysis training session3 botnet analysis part2
PDF
Adventures in Observability - Clickhouse and Instana
PDF
Adventures in Observability: How in-house ClickHouse deployment enabled Inst...
PDF
OORPT Dynamic Analysis
PDF
Automated prevention of ransomware with machine learning and gpos
PDF
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...
PPTX
from source to solution - building a system for event-oriented data
PDF
High Availability HPC ~ Microservice Architectures for Supercomputing
PDF
Our Data Ourselves, Pydata 2015
PPT
Integris Security - Hacking With Glue ℠
PPTX
Advanced Malware Analysis Training Session 3 - Botnet Analysis Part 2
PPTX
Why Pentesting is Vital to the Modern DoD Workforce
PPT
network-management Web base.ppt
PDF
Monitoring - deeper dive
PDF
RIoT (Raiding Internet of Things) by Jacob Holcomb
PDF
Hitbkl 2012
 
PDF
breed_python_tx_redacted
PDF
Monitoring in 2017 - TIAD Camp Docker
PDF
Monitoring and Instrumentation Strategies: Tips and Best Practices - AppSphere16
PPT
sector-sphere
Advanced malware analysis training session3 botnet analysis part2
Adventures in Observability - Clickhouse and Instana
Adventures in Observability: How in-house ClickHouse deployment enabled Inst...
OORPT Dynamic Analysis
Automated prevention of ransomware with machine learning and gpos
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...
from source to solution - building a system for event-oriented data
High Availability HPC ~ Microservice Architectures for Supercomputing
Our Data Ourselves, Pydata 2015
Integris Security - Hacking With Glue ℠
Advanced Malware Analysis Training Session 3 - Botnet Analysis Part 2
Why Pentesting is Vital to the Modern DoD Workforce
network-management Web base.ppt
Monitoring - deeper dive
RIoT (Raiding Internet of Things) by Jacob Holcomb
Hitbkl 2012
 
breed_python_tx_redacted
Monitoring in 2017 - TIAD Camp Docker
Monitoring and Instrumentation Strategies: Tips and Best Practices - AppSphere16
sector-sphere
Ad

Recently uploaded (20)

PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Approach and Philosophy of On baking technology
PPT
Teaching material agriculture food technology
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PPTX
Cloud computing and distributed systems.
PPTX
sap open course for s4hana steps from ECC to s4
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Network Security Unit 5.pdf for BCA BBA.
PPTX
Big Data Technologies - Introduction.pptx
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
The Rise and Fall of 3GPP – Time for a Sabbatical?
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
MIND Revenue Release Quarter 2 2025 Press Release
Diabetes mellitus diagnosis method based random forest with bat algorithm
Chapter 3 Spatial Domain Image Processing.pdf
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
20250228 LYD VKU AI Blended-Learning.pptx
Review of recent advances in non-invasive hemoglobin estimation
Approach and Philosophy of On baking technology
Teaching material agriculture food technology
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Cloud computing and distributed systems.
sap open course for s4hana steps from ECC to s4
Mobile App Security Testing_ A Comprehensive Guide.pdf
Network Security Unit 5.pdf for BCA BBA.
Big Data Technologies - Introduction.pptx
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Agricultural_Statistics_at_a_Glance_2022_0.pdf

Dynamic Population Discovery for Lateral Movement (Using Machine Learning)

  • 1. Copyright © 2016 Splunk Inc. Dynamic Population Discovery for Lateral Movement Detection Rod Sotto & Joseph Zadeh Splunk UBA Team
  • 3. $Whoami… - Joseph Zadeh Senior Data Scientist with Splunk User Behavioral Analytics. - Rod Soto Senior Security Researcher with Splunk User Behavioral Analytics.
  • 4. What is Lateral Movement? ● Lateral movement are series of actions conducted after a successful exploitation attack or infiltration in a organization’s network that seeks to further reconnaissance and expand reach of attacker by gaining knowledge of internal network assets and accessing them.
  • 6. Objectives ● Main objectives of lateral move are: - Gain further knowledge of internal network assets. - Expand access into other systems Ultimate goal is to get “Crown Jewels” which may be AD Domain admin credentials or access to valuable, sensitive information.
  • 7. Tools ● Some of the tools used for lateral movement include: - Keyloggers, ARP spoofing, PwDump, Mimikatz - PsExec, WMI, PowerShell, Metasploit, sc, at, wmic, reg, winrs - RDP, SSH, VNC - Exploits (PTH/PTT), BruteForce tools
  • 8. Why is lateral movement detection important? ● Let’s talk about the concept of DWELL, or Detection Deficit (VZN) ● Rapid detection of lateral movement can reduce, contain and prevent further impact of a breach ● Detection of lateral move enables SOC/SECOPS/IR/DR teams to act in a more efficient manner ● Increases cost/deters attackers and would be external attackers, as well as insiders
  • 9. How do we establish assets (current available technologies) ● “In information security, computer security and network security an Asset is any data,device, or other component of the environment that supports information-related activities. Assets generally include hardware (e.g. servers and switches), software (e.g. mission critical applications and support systems) and confidential information.[1][2] Assets should be protected from illicit access, use, disclosure, alteration, destruction, and/or theft, resulting in loss to the organization.” *Wikipedia ● Anything that is network enabled inside the perimeter should be consider an asset, most common assets are: Network servers, Routers, Switches, Databases, Application Servers, Workstations, Printers, IoTs.
  • 10. The Importance of Asset Management ● No asset management = No risk analysis ● Unmonitored unsupervised assets are likely to be targeted and exploited by attackers. ● Lack of OS/Application version/patch level increases risk of compromise ● Enables access management to resources inside the perimeter ● Enables SECOPS/IR/DS to identify and assess resources in case of incident
  • 11. PRACTICAL ML FOR SECURITY Use the right tools for the job
  • 12. Decomposing Behaviors for Intrusion Detection
  • 14. Behaviors: Sequential + “Unordered” • Sequential Behaviors – Exploit Chains – Timing Analysis (Periodicity) – Active Directory Sequence – Authentication Graph • Non Sequential Behaviors – Fingerprinting – Grouping Behaviors – Application Counts – Rare file extension counts for Webshell detection
  • 15. Mapping Behaviors to Code • Easy to Parallelize – Count() – Average() – Time series() – Local state computations  Per user/IP/account/… • Hard to Parallelize (NC Complete Complexity) – Rank() – Median – … – Anything that keeps track of global state
  • 16. Adversarial Drift ● Current status quo, is driven by adversaries developing and introducing changes in their TTPs, bypassing all current detection technologies.
  • 17. Advesarial Models • Machine Learning Looses Effectiveness the more complex the adversary
  • 18. Advesarial Models Automatable Actions: Good for ML Non-Automatable Actions: Hybrid Human/Computer Analysis
  • 19. Learning = Compression? ● There is a duality between learning and compression Input Data Total Size = 1 GB Learned output is a set of “coefficients” Total Output Size = 1K Primary Key Tim e UserI D Count Row 1 … … … Row 2 … … … Row 3 … … … … … … … Row N … … … C1 C2 C3 C4 C5
  • 20. Learning = Compression? ● Example of Linear Regression in R
  • 21. Learning = Compression? ● Train a model to predict mpg as a function of car weight, number of cylinders and displacement
  • 22. Learning = Compression? ● Train a model to predict mpg as a function of car weight, number of cylinders and displacement
  • 23. Learning = Compression? ● The overall input data is reduced in a “compressed form” to use in future predictions
  • 24. Learning = Compression? ● This process is extremely brittle in terms of modeling a changing signal or an adversary that changes patterns over time
  • 25. Learning = Compression? ● The simple linear model gives us output that separates the Signal from the Noise (this is not always possible with a model)
  • 26. Learning = Compression? ● Real example of random forest trained on C2 traffic
  • 27. Learning = Compression? ● We really “learn” a function we can call in batch or real time
  • 28. When is a model ready? 29
  • 29. SECURITY ANALYTICS FOR DEFENSE “But all too often we forget the first rule of battle - the battlefield – the attacker can escape everything it cannot escape the terrain – choose the terrain, use the terrain – we win” Sun Tzu
  • 30. High Level Objectives – Asset Class Discovery ‣ Identify all things acting like device type “X” – Identify key services/assets in the DMZ – Identify human / non human by device – Anomalies on rare paths ‣ U->S ‣ S->U ‣ U->U (LAN to LAN) ‣ S->S (DMZ to LAN) – Identity Resolution Impossible Mappings
  • 31. Modeling Methodology ● Step 1: Identity Resolution ● Step 2: Topology Discovery ● Step 3: Behavioral Profiles ● Step 4: Client/Server Relationship Discovery ● Step 5: Monitor for changes in asset relationship graph
  • 32. Raw Data Learn DMZ Assets Asset/Service Dynamic Discovery Spark Data Frame Fixed Services Discovery: FTP, HTTP Identity Resolution Anomalies: U->S, S->U, U->U (LAN to LAN), S->S (DMZ to LAN) Pull in Other Data (Beacons/Finger print) Mapping Anomalies Human Fingerprint
  • 33. Seeing the Analytic In Action
  • 34. Seeing the Analytic In Action
  • 35. Seeing the Analytic In Action ● Once identity resolution/learning process is complete we create new anomalies based on new paths/actions that are rare for a particular population profiel Lightweight Webshell in the DMZ
  • 37. Identity Resolution ● Many possible ways to attack the identity resolution problem with enterprise solutions but this usually has complexity ● Smaller scale shops should leverage work already done here - SIEM is a good example a tool that normalizes lots of these scenarios ● Advanced Pattern – Inventory Based Trust : Usenix 2016 “BeyondCorp: Design to Deployment at Google”
  • 38. ID Resolution WORKFLOW DHCP IMS/IPAM FW Proxy VPN AD Active ID Table ID Res Event ID Fi lter DHCP State Table IMS State Table AD State Table Duplicate Streams Identity Annotator / Normalization Engine Algorithms Similar to SQL’s Coallase: Username = select coallesce(user_name, hostname, IP) from Active_ID_Table where IP = ‘10.10.100.23)
  • 39. ETL Online Mode: Raw Individual Streams Incremental load: Prioritizing updates to state table in real time 1. Assign priority to data streams for automated ETL of daily/weekly/incremental updates 2. Update Active ID Table before any other workflow task begins DHCP IMS/IPAM FW Proxy VPN AD Active ID Table ID Res Event ID Fi lter DHCP State Table IMS State Table AD State Table
  • 40. ETL Online Mode: Raw Individual Streams DHCP AD ID Res Event ID Fi lter DHCP State Table AD State Table 1. Drop all tuples not containing Event ID = 673, EventID = 4663 2. ID data extractor for keeping only key data points necessary for AD State table IP_Address Hostname MAC LastLease_Timestamp 10.10.50.25 dave.eng.acme.com 58:5c:35:c3:6e:a4 2014-03-10T13:00:00 10.10.50.25 dave.eng.acme.com 58:5c:35:c3:6e:a4 2014-03-10T14:00:00 10.10.50.25 dave.eng.acme.com 58:5c:35:c3:6e:a4 2014-03-10T22:30:00 10.10.50.25 dave.eng.acme.com 58:5c:35:c3:6e:a4 2014-03-11T09:00:00 10.100.1.23 dave.eng.acme.com 58:5c:35:c3:6e:a4 2014-03-11T14:00:00 10.5.12.2 scott.hr.acme.com 12:3a:74:b2:6a:22 2014-03-10T10:00:00 10.5.12.2 scott.hr.acme.com 12:3a:74:b2:6a:22 2014-03-10T14:30:00 192.168.1.65 scott.hr.acme.com 00:50:a6:d2:21:01 2014-03-10T14:30:00 10.5.12.2 scott.hr.acme.com 12:3a:74:b2:6a:22 2014-03-10T17:30:00 192.168.1.65 scott.hr.acme.com 1b:31:a5:1d:b0:11 2014-03-11T14:50:00 10.13.11.221 scott.hr.acme.com 12:3a:74:b2:6a:22 2014-03-12T14:30:00 AD_IP Username FQDN Event_Time 10.10.50.25 dave dave.eng.acme 2014-03- 10T13:00:00 10.10.50.25 dave dave.eng.acme 2014-03- 10T14:00:00 10.10.50.25 dave dave.eng.acme 2014-03- 11T09:00:00 10.100.1.23 dave@acme.com 2014-03- 11T14:00:00 10.5.12.2 scott scott.hr.acme 2014-03- 10T10:00:00 192.168.1.6 5 scott@acme.com 2014-03- 10T14:30:00
  • 41. ETL Online Mode: Real Time Active State Table IP_Address Hostname MAC LastLease_Timestamp 10.10.50.25 steve.eng.acme.com 58:5c:35:c3:6e:a4 2014-03-10T13:00:00 10.10.50.25 dave.eng.acme.com 58:5c:35:c3:6e:a4 2014-03-10T14:00:00 10.10.50.25 dave.eng.acme.com 58:5c:35:c3:6e:a4 2014-03-10T22:30:00 10.10.50.25 dave.eng.acme.com 58:5c:35:c3:6e:a4 2014-03-11T09:00:00 10.100.1.23 dave.eng.acme.com 58:5c:35:c3:6e:a4 2014-03-11T14:00:00 10.5.12.2 scott.hr.acme.com 12:3a:74:b2:6a:22 2014-03-10T10:00:00 10.5.12.2 scott.hr.acme.com 12:3a:74:b2:6a:22 2014-03-10T14:30:00 192.168.1.6 5 scott.hr.acme.com 00:50:a6:d2:21:01 2014-03-10T14:30:00 10.5.12.2 scott.hr.acme.com 12:3a:74:b2:6a:22 2014-03-10T17:30:00 192.168.1.6 5 scott.hr.acme.com 1b:31:a5:1d:b0:1 1 2014-03-11T14:50:00 10.13.11.22 1 scott.hr.acme.com 12:3a:74:b2:6a:22 2014-03-12T14:30:00 AD_IP Username FQDN Event_Time 10.10.50.25 dave dave.eng.acme 2014-03- 10T13:00:00 10.10.50.25 dave dave.eng.acme 2014-03- 10T14:00:00 10.10.50.25 dave dave.eng.acme 2014-03- 11T09:00:00 10.100.1.23 dave@acme.com 2014-03- 11T14:00:00 10.5.12.2 scott scott.hr.acme 2014-03- 10T10:00:00 192.168.1.6 5 scott@acme.com 2014-03- 10T14:30:00 10.13.11.22 1 scott 2014-03- 12T14:30:00 192.168.1.6 5 scot scott.hr.acme 2014-03- 11T14:50:00 IP DHCP.hostname DHCP.MAC DHCP_Lasteventtime AD_username AD_FQDN AD_Lasteventtime 10.100.1.23 dave.eng.acme.com 58:5c:35:c3:6e:a4 2014-03-11T14:00:00 dave@acme.com dave.eng.acme.com 2014-03-11T14:00:00 10.13.11.22 1 scott.hr.acme.com 12:3a:74:b2:6a:22 2014-03-12T14:30:00 scott scot.hr.acme.com 2014-03-12T14:30:00 10.131.1.4 admin NULL NULL domain_admin acme.com 2014-03-12T23:00:00 Primary Key
  • 42. STEP 2: TOPOLOGY DISCOVERY Learning the Local Layers
  • 43. Map the lower layers of the OSI model passively ● Infer key properties – DMZ blocks (often times we find new segments this way) – LAN only blocks – VLAN behavior (Student VLAN, ADMIN VLAN, STAFF VLAN) ● Keep in mind we loose visibility into switched traffic flows (layer 2 is hard to see at scale)
  • 44. Basic Features Built in this Step ● Graph Features – Source/Destination behavior ‣ How many hosts talk to this IP? (In Degree) ‣ How many hosts are talked to by this IP? (Out Degree) ● Layer 2/Layer 3 Features – IP Subnet Behavior ‣ # LAN to LAN conversations (non routable IP flows) ‣ # LAN to WAN conversations (non routable address to routable routable address )
  • 46. STEP 3: BEHAVIOR BASED PROFILING
  • 47. Asset Fingerprints ● Goal is to use machine learning, vanilla/fuzzy correlation to discover some common asset classes (ML term sometimes is class labels) *nix ServerDesktop Laptop MS Server Biomedical Devices IOT Energy Meters
  • 48. What are behavioral profiles? What does it apply to?● Windows 2008/2003 Server Profile – Flow Characteristics: ‣ Byte distribution ratios are asymmetric – Application Layer Characteristics ‣ SMB, Netbios MS-Update ‣ Number of unique domains per day ● Windows 2008/2003 End Device Profile – Flow Characteristics: – Application Layer Characteristics: ‣ Facebook Chat, social media, twitter, ‣ Non uniform browsing patterns ● *nix Server/End Device Device Profile – Flow Characteristics: – Application Layer Characteristic ‣ Software updates for distros (ubuntu, rhel)
  • 50. Comparing a Profile to a Group
  • 52. Layer 7 Info ● Not always possible to build these kind of statistics without higher layer application data or PCAPs
  • 54. STEP 4: CLIENT/SERVER RELATION DISCOVERY
  • 55. Its all in the Bytes ● Depending on what type of visibility you have you can leverage certain levels of granularity – Flows (Netflow v 7) you get number of packets per flow very important – PCAPS best case scenario but hard to log/process at scale for large environments – Higher layers might get a loss of signal
  • 57. Histogram of Byte Distribution
  • 59. STEP 5: MONITOR FOR CHANGES IN ASSET RELATIONSHIP GRAPH “At this point all the hard work is done”…
  • 60. Mining For Relationship Anomalies ● Anomalies on rare paths – U->S – S->U !! – U->U (LAN to LAN) – S->S (DMZ to LAN)!! Desktop Server Desktop Laptop LAN AssetDMZ Server
  • 61. Webshell DMZ to LAN Trust Beyond the Indicator
  • 62. Seeing the Analytic In Action
  • 63. Seeing the Analytic In Action
  • 64. Seeing the Analytic In Action ● Once identity resolution/learning process is complete we create new anomalies based on new paths/actions that are rare for a particular population profiel Lightweight Webshell in the DMZ
  • 65. Conclusion - Rod - Joe ● New approaches in machine learning and data science can help improve lateral movement detection. ● Establish behavioral patterns based on data driven approaches can provide tools for detecting and predicting unusual, high risk and malicious behavior patterns in users and use of assets. ● We have been successful catching webshells and new kinds of in memory malware using the rare path approach – U->S – S->U !! – U->U (LAN to LAN) – S->S (DMZ to LAN)!!
  • 66. Q&A ● Thank you ● Rod Soto @rodsoto ● Joseph Zadeh @josephzadeh
  • 71. Key to ML: Label Your Analysis ● This is how the algorithms will “learn” from human expertise and help support a common security workflow Domain Name TotalCnt RiskFactor AGD SessionTime RefEntropy NullUa Outcome yyfaimjmocdu.com 144 6.05 1 1 0 0 Malicious jjeyd2u37an30.com 6192 5.05 0 1 0 0 Malicious cdn4s.steelhousemedia.com 107 3 0 0 0 0 Benign log.tagcade.com 111 2 0 1 0 0 Benign go.vidprocess.com 170 2 0 0 0 0 Benign statse.webtrendslive.com 310 2 0 1 0 0 Benign cdn4s.steelhousemedia.com 107 1 0 0 0 0 Benign log.tagcade.com 111 1 0 1 0 0 Benign Human Expertise is manually encoded into a format computers understand: Sometimes this process is called Labeling or “Truth-ing” the data
  • 72. Lambda Architecture 74 • Architecture is described by three simple equations: batch view = function(all data) realtime view = function(realtime view, new data) query = function(batch view, realtime view)
  • 74. Lambda Security DHCP IMS/IPAM FW Proxy VPN AD Real Time Identity Resolution Distributed ETL Username = select coallesce(user_name, hostname, IP) from Active_ID_Table where IP = ‘10.10.100.23) IP DHCP.MAC DHCP_Lasteventtime AD_FQDN 10.100.1.23 58:5c:35:c3:6e:a4 2014-03-11T14:00:00 joe.eng.acme.com 10.13.11.221 12:3a:74:b2:6a:22 2014-03-12T14:30:00 ad.hr.acme.com Sequential Models and IOC’s Data Ingest Real Time Layer
  • 75. Lambda Security 77 DHCP IMS/IPAM FW Proxy VPN AD Real Time Identity Resolution Distributed ETL Username = select coallesce(user_name, hostname, IP) from Active_ID_Table where IP = ‘10.10.100.23) IP DHCP.MAC DHCP_Lasteventtime AD_FQDN 10.100.1.23 58:5c:35:c3:6e:a4 2014-03-11T14:00:00 joe.eng.acme.com 10.13.11.221 12:3a:74:b2:6a:22 2014-03-12T14:30:00 ad.hr.acme.com Sequential Models and IOC’s Data Ingest Large Scale Models and Non-Sequential IOC’s Real Time Layer Batch Layer
  • 76. Lambda Security 78 DHCP IMS/IPAM FW Proxy VPN AD Real Time Identity Resolution Distributed ETL Username = select coallesce(user_name, hostname, IP) from Active_ID_Table where IP = ‘10.10.100.23) IP DHCP.MAC DHCP_Lasteventtime AD_FQDN 10.100.1.23 58:5c:35:c3:6e:a4 2014-03-11T14:00:00 joe.eng.acme.com 10.13.11.221 12:3a:74:b2:6a:22 2014-03-12T14:30:00 ad.hr.acme.com Sequential Models and IOC’s Data Ingest Large Scale Models and Non-Sequential IOC’s Real Time Layer Batch Layer Hybrid View (Batch + Real Time)
  • 77. 79 DHCP IMS/IPAM FW Proxy VPN AD Real Time Identity Resolution Distributed ETL Username = select coallesce(user_name, hostname, IP) from Active_ID_Table where IP = ‘10.10.100.23) IP DHCP.MAC DHCP_Lasteventtime AD_FQDN 10.100.1.23 58:5c:35:c3:6e:a4 2014-03-11T14:00:00 joe.eng.acme.com 10.13.11.221 12:3a:74:b2:6a:22 2014-03-12T14:30:00 ad.hr.acme.com Sequential Models and IOC’s Data Ingest Large Scale Models and Non-Sequential IOC’s Hybrid View (Batch + Real Time)
  • 78. 80 DHCP IMS/IPAM FW Proxy VPN AD Real Time Identity Resolution Distributed ETL Username = select coallesce(user_name, hostname, IP) from Active_ID_Table where IP = ‘10.10.100.23) IP DHCP.MAC DHCP_Lasteventtime AD_FQDN 10.100.1.23 58:5c:35:c3:6e:a4 2014-03-11T14:00:00 joe.eng.acme.com 10.13.11.221 12:3a:74:b2:6a:22 2014-03-12T14:30:00 ad.hr.acme.com Sequential Models and IOC’s Data Ingest Large Scale Models and Non-Sequential IOC’s Automated process to accelerate workflows like Splunk Query to retrieve PCAP for further analysis combined with automatic VT/heuristic correlations Hybrid View (Batch + Real Time)
  • 79. ML + Sequencing the Security DNA ● We parallelize across many nodes (JVMs) and use both real time and batch computations JVM 1 JVM 2 JVM 3 1. GET http://forbes.com/gels-contrariness-domain- punchable/" 2. GET http://portcullisesposturen.europartsplus.org/ 3. POST http://dpckd2ftmf7lelsa.jjeyd2u37an30.com/ 1. GET http://youtube.com/ 2. GET http://avazudsp.net/ 3. GET http://betradar.com/ 4. GET http://displaymarketplace.com/ 1. GET http:/clickable.net/ 2. GET http://vuiviet.vn/ 3. GET http://homedepotemail.com/ 4. GET http://css-tricks.com/
  • 80. Command and Control (C2) traffic has been established between “Beachead” and command and control operator
  • 81. Heartbeat traffic signals C2 operator that infected asset is up and ready for instructions
  • 82. Obfuscated instructions get returned through an Upstream conversation embedded in PHP, .js, Flash, etc.. Commands obfuscated in this way can be through of as a hidden “Downstream Beacon”
  • 83. Embedded commands can signal infected asset to enumerate local information on the machine, attach to open network shares and perform lateral reconnaissance and privilege escalation throughout the compromised network
  • 84. After targeted lateral movement and privilege enumeration all cases of targeted attacks eventually involve the compromise of the directory services roots servers (Usually AD Forest Roots) and exfiltration of key personnel information along with any
  • 85. BFS/DFS + Other classic graph search algorithms are a great examples of algorithms useful in detecting this graph signature Edge weights can be encoded with key security features to increase overall model accuracy regardless of the underlying algorithms
  • 86. How can we automate discovery and data aggregation of DMZ assets - Joe
  • 87. Proof of Concept / Example – Joe - Rod
  • 88. Explanation of data science tools and techniques used for analysis – Joe
  • 89. How can this be applied to layer 4/7 data or PCAP data - Joe
  • 90. Copyright © 2016 Splunk Inc. Command and Control (C2) traffic has been established between compromised hosts inside the corporate network and C2 servers
  • 91. Copyright © 2016 Splunk Inc. Command and Control (C2) traffic has been established between compromised hosts inside the corporate network and C2 servers
  • 92. Copyright © 2016 Splunk Inc. C2 Infrastructure changes locations of command and control server new communication path is established
  • 93. Copyright © 2016 Splunk Inc. C2 Infrastructure changes locations of command and control server new communication path is established
  • 94. Copyright © 2016 Splunk Inc. http://en.wikipedia.org/wiki/Fast_flux: Fast flux is a DNS technique used by botnets to hide phishing and malware delivery sites behind an ever-changing network of compromised hosts acting as proxies. It can also refer to the combination of peer-to-peer networking, distributed command and control, web-based load balancing and proxy redirection used to make malware networks more resistant to discovery and counter-measures. The Storm Worm is one of the recent malware variants to make use of this technique.
  • 95. Copyright © 2016 Splunk Inc. At each time step (typically a day or two) the C2 Infrastructure changes locations of command and control via this “Fluxing” behavior. A subset of these type of graph patterns is known as “Fast Fluxing”
  • 96. Copyright © 2016 Splunk Inc. The constant mobility of command and control infrastructure will continue this IP/Domain fluxing movement until detected
  • 97. Copyright © 2016 Splunk Inc. The constant mobility of command and control infrastructure will continue this IP/Domain fluxing movement until detected

Editor's Notes

  • #17: The Complexity Class P-Complete and NC NC => parallelizable Some problems don’t parallelize well!! P-Complete => Inherently Sequential Any problem where you have to maintain state across nodes: Circuit Value Problem, Linear programming Streaming models are usually harder to maintain than batch models
  • #18: Rod
  • #31: Great another Sun Tzu quote at a security talk! ;-) Be the Floyd Maywheather of defense I want you to know what I am going to be able to do but still be able to defend Open source tactics Security is bullshit in terms of closed form defensive solutions lets change that paradigm