SlideShare a Scribd company logo
Identifying Network Users
Using Flow-Based
Behavioral Fingerprinting
Barsamian, Berk, Murphy
Presented to FloCon 2013
What Is A User Fingerprint?
• Users settle into unique patterns of behavior according to
their tasks and interests
• If a particular behavior seems to be unique to one user…
… and that behavior is observed…
… can we assume that the original user was observed?
• Affected by population size, organization mission, and the
people themselves
Why Fingerprint?
• Basic Research
• Policy Violations and Advanced Security Warning
• Automated Census and Classification
2
Why Fingerprint?
• Basic Research
– Change Detection
– Population Analysis
• Policy Violations and Advance Warning
– Preliminary heads-up of botnet activity
– Identify misuse of credentials
• Automated Census and Classification
– Passive network inventory
– User count estimation (despite multiple devices)
– Determination of roles
3
Background
• Passive and active static fingerprints
– Operating system identification
• p0f/NetworkMiner, Nmap
– Signature-based detection of worms and intrusions
• Dynamic fingerprints
– Hardware identification
– Unauthorized device detection1
– Browser fingerprinting2
• Increasingly important part of security systems3
– Reinforcing authentication
– Identifying policy violations
4
1 Bratus, et al “Active Behavioral Fingerprinting of Wireless Devices”, 2008
2 http://panopticlick.eff.org
3 François, et al “Enforcing Security with Behavioral Fingerprinting”, 2011
But…
• Difficult to implement, requiring significant
expertise not available to many IT departments
• Require unusual or unavailable data
– Data collection incurs overhead; easier to justify if
data is useful for multiple purposes
• No unitaskers in my shop!
– Protocol analysis needed
• Computationally expensive
• Impinges user privacy
• Increasingly defeated by encrypted channels and tunnels
5
Challenge
Make active, adaptive fingerprinting available to the
widest possible set of network administrators
• Data requirements
– Common data source, common data fields
• Processing requirements
– Can’t require major computing resources to create and
handle
• Ease of implementation
– Not just technology, but policy
– Could search emails and web forms for personally-
identifying statistically improbable phrases, but would
never fly at most institutions
6
Why NetFlow Fingerprints?
• NetFlow has very attractive properties to an
analyst…
– Privacy
• Unintrusive to end users
• Not affected by encrypted channels
– Speed
• Easily-parsed datagrams with fixed fields
• Bulk of processing taken care of by specialty equipment
– Scalability
• Less affected by volume than protocol analyzers
• … but is it up to the task?
– (Spoiler alert: yes)
7
Methodology
After multiple revisions, arrived at the following:
1. Define your parameters
2. Get a list of all the outgoing sessions from
that subnet
1. List of sessions for which client IP is in
CIDR block of interest
2. From that list, extract the destination
addresses
3. For each of those destination addresses, do
a 'ip-pair' query: (CLNIP==classC &&
SRVIP=dest).
1. Count the unique local addresses for each
destination
4. Eliminate all of the external addresses that
get contacted by more than 1 local address
5. Result is a set of external addresses that are
only contacted by ONE client
8
(CLNIP==classC)
(CLNIP==classC &&
SRVIP=dest)
Example Fingerprints
User B 661 total
sessions
eee.87.169.51 93
eee.87.160.30 34
eee.87.169.50 37
9
User A 8475 total
sessions
aaa.93.185.143 38
bbb.175.78.11 44
ccc.22.176.46 42
ddd.28.187.143 37
• Individual fingerprints for a user
(when that user has one)
contain a list of IP addresses
that user (and only that user)
contacted within the time
period
• One-time connections not
included here
• Using the Class C block for the
server would compress
fingerprints like User B’s
• In this case, would still be
unique
Parameters
• Definition of local network
– Select the smallest network of interest
– May be worth fingerprinting wired and wireless networks
separately, to account for users with both desktops and
wireless devices
• Time frame
– Shorter-term profiles faster to create
– Longer-term profiles less transitory
• Destination subnet
– When filtering on each destination, using a slightly wider
subnet can reduce the computing impact of content
distribution networks
• Top N vs. All
– Cutting off the list of servers with very few sessions
improves scalability
– Potential reduced fingerprint list
Data Source Characterization
• Knowing your source helps determine optimal
parameters
• Educational environment with a mix of wireless and
wired infrastructure
• Inherent “life spans” to fingerprints
– Large turnover each year
– “Mission” changes every term
– Gaps in data (scheduled breaks) confound ability to
detect gradual change
11
Select Outbound Requests
• Get a list of top servers by
destination
• How do you define “outbound”
and why?
– Anything outside examined
subnet? Outside organization?
– Presumption that use of
internal resources not
identifying?
• Mostly true, but what about
private servers?
12
Select Pairs
• For each server in Top N list,
get the list of clients that
contacted it
• Filter to reduce computation?
– Select only ports of interest
(HTTP)
• Avoiding BitTorrent makes for
stronger profiles
– Filter out known-common
networks (Akamai, Google)
– Include only servers with
more than some minimum
number of sessions
13
Compile Fingerprints
• At this stage we have a list of those servers that have
only been contacted by one client
– Potentially pre-filtered for significance (e.g. minimum
number of sessions, removed trivial connects such as
BitTorrent, etc)
• Create for each client a list of servers
– Optionally: ranked by percent of client’s total traffic
(requires second query for each client, increasing total
fingerprint time, but providing context and significance
measure)
• Each list is a basic but functional fingerprint of that
client
– Sessions to one of those servers in future traffic
indicates likely link to that fingerprinted user
• Primary: that user generated that traffic (on the original device
or not)
• Secondary: that user is connected directly to the user who
generated that traffic14
Initial Results
• Of ~250 users, profiles could be created representing
– 38% of users
– 53% of total traffic
• Breakdown by profile length (# servers in profile):
1. 51 users (55.4% of profiles)
2. 20 users (21.7%)
3. 7 users (7.6%)
4. 9 users (9.8%)
5. 2 users (2.2%)
6. 1 users (1.1%)
7. 1 users (1.1%)
8. 1 users (1.1%)
Unique
Profiles
NP
1
2
3
4
5
6
7
(i.e. 51 users each contacted 1 host unique to them, and one user contacted 8 hosts
that nobody else did)
15
Uniqueness Levels
• By relaxing uniqueness
requirement, more users can be
fingerprinted
– Tradeoff: Certainty vs. breadth
• Nomenclature
– The more clients that share a
host, the higher the U number
16
U1
U2
U3
U4
• What is lost in ability to pinpoint users, is gained in
insight into shared task/interest
• Some profiles non-unique
• Same user at different IP addresses?
U1-U4 Profile Lists
U1 Profiles
NP
1
2
3
4
5
U2 Profiles
NP
1
2
3
4
5
U3 Profiles
NP
1
2
3
4
5
U4 Profiles
NP
1
2
3
4
5
38% of users, 53% of traffic 60% of users, 78% of traffic
12 non-unique usersNone
U1U2
U3
U4
Membership
None
U1
U2
U3
U4
75% of users, 89% of traffic
10 non-unique users
83% of users, 93% of traffic
10 non-unique users17
Variance Over Time
• Variability from month to month is observed
• Month 1
• Month 2
18
Uniqueness % of users % of traffic
U1 38% 53%
U2 60% 78%
U3 75% 89%
U4 83% 93%
Uniqueness % of users % of traffic
U1 46% 80%
U2 60% 92%
U3 69% 96%
U4 75% 98%
Results and Lessons Learned
• This represents a first step toward making simple
flexible fingerprinting widely available
– NetFlow is an ideal data source
• Able to fingerprint users comprising majority of
network traffic in relatively unrestricted
environment
• Uniqueness Levels
– U1 profiles are more significant
– U4 profiles cover far more of the population
– Keeping track of them in parallel allows us the best
of both worlds
19
Take-Home
• NetFlow, with its benefits to privacy, ease, and
scalability, can be used to produce simple user
fingerprints
– Several types are possible; we went with the
simplest plausible type
• Unique site accesses represent one such
fingerprint type
– Intuitive and easy to grasp
– Adjustable to the level of desired uniqueness
• More sophisticated fingerprints are expected to be
more useful still
20
Next Steps, Short-Term
• Room to grow within NetFlow collection regime:
– Refine by port/protocol
– Aggregate content distribution networks
• Make better use of ground truth
– Newer version of software allows searching on
MAC address, to quickly check when fingerprint
appears to change or duplicate
– Determine whether there are substantive
differences between wireless and wired networks
• Number of individuals with identifiable fingerprints
• Fingerprint stability
21
Next Steps, Long-Term
• Learning Period Estimation
– What constitutes a baseline?
• Long-Term Stability
– How much do these fingerprints change over time?
– What can be learned from those changes?
– How are fingerprint lives distributed?
vs
• Autonomous Operation
– Can fingerprint creation and tuning be automated?
… to the point of using them for auto-remediation?
22
For Additional Information…
• For a copy of these slides and the whitepaper, or
to evaluate the fingerprinting tool, visit us at:
– http://www.flowtraq.com/research/FloCon2012.html
• We would be happy to address any questions or
comments
– abarsam@flowtraq.com
– vberk@flowtraq.com
– jmurphy@flowtraq.com
23

More Related Content

PPTX
Network traffic analysis with cyber security
PDF
Dist sniffing & scanning project
PDF
Detection and Prevention of security vulnerabilities associated with mobile b...
PPT
Network forensics1
PDF
Feature selection for detection of peer to-peer botnet traffic
PDF
Network Forensics: Packet Analysis Using Wireshark
PPTX
Defense against botnets
PPTX
Network forensics and investigating logs
Network traffic analysis with cyber security
Dist sniffing & scanning project
Detection and Prevention of security vulnerabilities associated with mobile b...
Network forensics1
Feature selection for detection of peer to-peer botnet traffic
Network Forensics: Packet Analysis Using Wireshark
Defense against botnets
Network forensics and investigating logs

What's hot (20)

PPT
Lec 1 apln security(4pd)
PPTX
Minimizing Information Transparency
PDF
Detection of the botnets’ low-rate DDoS attacks based on self-similarity
PPT
On hyper-local web pages
PDF
Fuzzy System-based Suspicious Pattern Detection in Mobile Forensic Evidence -...
PPTX
Commonly Used Peer to Peer Methods & Applications
PPTX
Introduction to cyber forensics
PPTX
Network forensic
PDF
Network Forensic
PPTX
Open source network forensics and advanced pcap analysis
PPTX
Network packet analysis -capture and Analysis
PPTX
Mobile fraud detection using neural networks
PPTX
A system for denial of-service attack detection based on multivariate correla...
PDF
A Location Based Cryptosystem For Mobile Devices Using Improved Rabin Algorithm
PDF
A proposed architecture for network
PPTX
Secure your LAN
PPTX
Distance-bounding facing both mafia and distance frauds
PPTX
What’s New: Splunk App for Stream and Splunk MINT
PPT
Twitter as a Transport Layer Platform
PDF
A Taxonomy of Botnet Detection Approaches
Lec 1 apln security(4pd)
Minimizing Information Transparency
Detection of the botnets’ low-rate DDoS attacks based on self-similarity
On hyper-local web pages
Fuzzy System-based Suspicious Pattern Detection in Mobile Forensic Evidence -...
Commonly Used Peer to Peer Methods & Applications
Introduction to cyber forensics
Network forensic
Network Forensic
Open source network forensics and advanced pcap analysis
Network packet analysis -capture and Analysis
Mobile fraud detection using neural networks
A system for denial of-service attack detection based on multivariate correla...
A Location Based Cryptosystem For Mobile Devices Using Improved Rabin Algorithm
A proposed architecture for network
Secure your LAN
Distance-bounding facing both mafia and distance frauds
What’s New: Splunk App for Stream and Splunk MINT
Twitter as a Transport Layer Platform
A Taxonomy of Botnet Detection Approaches
Ad

Viewers also liked (15)

PDF
Flow questions and answers
DOCX
PDF
PPTX
Camouflage
PPT
Introduction to TCP/IP
PDF
Investment from the GCC and development in the Mediterranean.The outlook for ...
PDF
Foreign direct investment in the Med countries in 2008: Facing the crisis
PDF
Revit Architecture Training Topics and Notes in Detail Days-51-60
PDF
Boosting Business in the Mediterranean: Entrepreneurs' Success Stories
PPTX
Investment opportunities tekirdag_burcu
ODP
How to bet against a stock
PDF
Revit Architecture Training Topics and Notes in Detail Days-41-50
PPTX
Final case protocol 'abortion'
PPT
Right to Information Act
PPT
Right to Information Act
Flow questions and answers
Camouflage
Introduction to TCP/IP
Investment from the GCC and development in the Mediterranean.The outlook for ...
Foreign direct investment in the Med countries in 2008: Facing the crisis
Revit Architecture Training Topics and Notes in Detail Days-51-60
Boosting Business in the Mediterranean: Entrepreneurs' Success Stories
Investment opportunities tekirdag_burcu
How to bet against a stock
Revit Architecture Training Topics and Notes in Detail Days-41-50
Final case protocol 'abortion'
Right to Information Act
Right to Information Act
Ad

Similar to Barsamian alexander-identifying-network-users (20)

PDF
Network Analysis Mini Project 2.pdf
PPTX
Network Analysis Mini Project 2.pptx
PPTX
Splunk for Security: Background & Customer Case Study
PPTX
New Splunk Management Solutions Update: Splunk MINT and Splunk App for Stream
PPT
cyber forensics-enum,sniffing,malware threat.ppt
PDF
Network security monitoring elastic webinar - 16 june 2021
PDF
Security Delivery Platform: Best practices
PPTX
Prensentation on packet sniffer and injection tool
PPTX
Big Data Analytics and Advanced Computer Networking Scenarios
PPTX
The stories of IXP development and the way forward, MYNOG 7
PDF
Stephen Wallo
PDF
Internet Security, A Solid Foundation for Sustainable Internet Development
PPTX
The differing ways to monitor and instrument
PPTX
Onion routing and tor: Fundamentals and Anonymity
PPTX
Splunk App for Stream for Enhanced Operational Intelligence from Wire Data
PPTX
ManageEngine OpUtils Technical Overview
PPTX
Network Bandwidth management - Mumbai Seminar
PPTX
Free Netflow analyzer training - diagnosing_and_troubleshooting
PDF
Network Situational Awareness with d00gle
PPTX
Threat_actors_and_vectors_with_whiiteandblack_boxtesting.pptx
Network Analysis Mini Project 2.pdf
Network Analysis Mini Project 2.pptx
Splunk for Security: Background & Customer Case Study
New Splunk Management Solutions Update: Splunk MINT and Splunk App for Stream
cyber forensics-enum,sniffing,malware threat.ppt
Network security monitoring elastic webinar - 16 june 2021
Security Delivery Platform: Best practices
Prensentation on packet sniffer and injection tool
Big Data Analytics and Advanced Computer Networking Scenarios
The stories of IXP development and the way forward, MYNOG 7
Stephen Wallo
Internet Security, A Solid Foundation for Sustainable Internet Development
The differing ways to monitor and instrument
Onion routing and tor: Fundamentals and Anonymity
Splunk App for Stream for Enhanced Operational Intelligence from Wire Data
ManageEngine OpUtils Technical Overview
Network Bandwidth management - Mumbai Seminar
Free Netflow analyzer training - diagnosing_and_troubleshooting
Network Situational Awareness with d00gle
Threat_actors_and_vectors_with_whiiteandblack_boxtesting.pptx

Recently uploaded (20)

PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PPT
Teaching material agriculture food technology
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PPTX
MYSQL Presentation for SQL database connectivity
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
KodekX | Application Modernization Development
PDF
Encapsulation_ Review paper, used for researhc scholars
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Machine learning based COVID-19 study performance prediction
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Chapter 3 Spatial Domain Image Processing.pdf
Mobile App Security Testing_ A Comprehensive Guide.pdf
Teaching material agriculture food technology
Building Integrated photovoltaic BIPV_UPV.pdf
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Dropbox Q2 2025 Financial Results & Investor Presentation
Network Security Unit 5.pdf for BCA BBA.
Reach Out and Touch Someone: Haptics and Empathic Computing
MYSQL Presentation for SQL database connectivity
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Review of recent advances in non-invasive hemoglobin estimation
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
20250228 LYD VKU AI Blended-Learning.pptx
Per capita expenditure prediction using model stacking based on satellite ima...
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
KodekX | Application Modernization Development
Encapsulation_ Review paper, used for researhc scholars
Understanding_Digital_Forensics_Presentation.pptx
Machine learning based COVID-19 study performance prediction
Advanced methodologies resolving dimensionality complications for autism neur...
Chapter 3 Spatial Domain Image Processing.pdf

Barsamian alexander-identifying-network-users

  • 1. Identifying Network Users Using Flow-Based Behavioral Fingerprinting Barsamian, Berk, Murphy Presented to FloCon 2013
  • 2. What Is A User Fingerprint? • Users settle into unique patterns of behavior according to their tasks and interests • If a particular behavior seems to be unique to one user… … and that behavior is observed… … can we assume that the original user was observed? • Affected by population size, organization mission, and the people themselves Why Fingerprint? • Basic Research • Policy Violations and Advanced Security Warning • Automated Census and Classification 2
  • 3. Why Fingerprint? • Basic Research – Change Detection – Population Analysis • Policy Violations and Advance Warning – Preliminary heads-up of botnet activity – Identify misuse of credentials • Automated Census and Classification – Passive network inventory – User count estimation (despite multiple devices) – Determination of roles 3
  • 4. Background • Passive and active static fingerprints – Operating system identification • p0f/NetworkMiner, Nmap – Signature-based detection of worms and intrusions • Dynamic fingerprints – Hardware identification – Unauthorized device detection1 – Browser fingerprinting2 • Increasingly important part of security systems3 – Reinforcing authentication – Identifying policy violations 4 1 Bratus, et al “Active Behavioral Fingerprinting of Wireless Devices”, 2008 2 http://panopticlick.eff.org 3 François, et al “Enforcing Security with Behavioral Fingerprinting”, 2011
  • 5. But… • Difficult to implement, requiring significant expertise not available to many IT departments • Require unusual or unavailable data – Data collection incurs overhead; easier to justify if data is useful for multiple purposes • No unitaskers in my shop! – Protocol analysis needed • Computationally expensive • Impinges user privacy • Increasingly defeated by encrypted channels and tunnels 5
  • 6. Challenge Make active, adaptive fingerprinting available to the widest possible set of network administrators • Data requirements – Common data source, common data fields • Processing requirements – Can’t require major computing resources to create and handle • Ease of implementation – Not just technology, but policy – Could search emails and web forms for personally- identifying statistically improbable phrases, but would never fly at most institutions 6
  • 7. Why NetFlow Fingerprints? • NetFlow has very attractive properties to an analyst… – Privacy • Unintrusive to end users • Not affected by encrypted channels – Speed • Easily-parsed datagrams with fixed fields • Bulk of processing taken care of by specialty equipment – Scalability • Less affected by volume than protocol analyzers • … but is it up to the task? – (Spoiler alert: yes) 7
  • 8. Methodology After multiple revisions, arrived at the following: 1. Define your parameters 2. Get a list of all the outgoing sessions from that subnet 1. List of sessions for which client IP is in CIDR block of interest 2. From that list, extract the destination addresses 3. For each of those destination addresses, do a 'ip-pair' query: (CLNIP==classC && SRVIP=dest). 1. Count the unique local addresses for each destination 4. Eliminate all of the external addresses that get contacted by more than 1 local address 5. Result is a set of external addresses that are only contacted by ONE client 8 (CLNIP==classC) (CLNIP==classC && SRVIP=dest)
  • 9. Example Fingerprints User B 661 total sessions eee.87.169.51 93 eee.87.160.30 34 eee.87.169.50 37 9 User A 8475 total sessions aaa.93.185.143 38 bbb.175.78.11 44 ccc.22.176.46 42 ddd.28.187.143 37 • Individual fingerprints for a user (when that user has one) contain a list of IP addresses that user (and only that user) contacted within the time period • One-time connections not included here • Using the Class C block for the server would compress fingerprints like User B’s • In this case, would still be unique
  • 10. Parameters • Definition of local network – Select the smallest network of interest – May be worth fingerprinting wired and wireless networks separately, to account for users with both desktops and wireless devices • Time frame – Shorter-term profiles faster to create – Longer-term profiles less transitory • Destination subnet – When filtering on each destination, using a slightly wider subnet can reduce the computing impact of content distribution networks • Top N vs. All – Cutting off the list of servers with very few sessions improves scalability – Potential reduced fingerprint list
  • 11. Data Source Characterization • Knowing your source helps determine optimal parameters • Educational environment with a mix of wireless and wired infrastructure • Inherent “life spans” to fingerprints – Large turnover each year – “Mission” changes every term – Gaps in data (scheduled breaks) confound ability to detect gradual change 11
  • 12. Select Outbound Requests • Get a list of top servers by destination • How do you define “outbound” and why? – Anything outside examined subnet? Outside organization? – Presumption that use of internal resources not identifying? • Mostly true, but what about private servers? 12
  • 13. Select Pairs • For each server in Top N list, get the list of clients that contacted it • Filter to reduce computation? – Select only ports of interest (HTTP) • Avoiding BitTorrent makes for stronger profiles – Filter out known-common networks (Akamai, Google) – Include only servers with more than some minimum number of sessions 13
  • 14. Compile Fingerprints • At this stage we have a list of those servers that have only been contacted by one client – Potentially pre-filtered for significance (e.g. minimum number of sessions, removed trivial connects such as BitTorrent, etc) • Create for each client a list of servers – Optionally: ranked by percent of client’s total traffic (requires second query for each client, increasing total fingerprint time, but providing context and significance measure) • Each list is a basic but functional fingerprint of that client – Sessions to one of those servers in future traffic indicates likely link to that fingerprinted user • Primary: that user generated that traffic (on the original device or not) • Secondary: that user is connected directly to the user who generated that traffic14
  • 15. Initial Results • Of ~250 users, profiles could be created representing – 38% of users – 53% of total traffic • Breakdown by profile length (# servers in profile): 1. 51 users (55.4% of profiles) 2. 20 users (21.7%) 3. 7 users (7.6%) 4. 9 users (9.8%) 5. 2 users (2.2%) 6. 1 users (1.1%) 7. 1 users (1.1%) 8. 1 users (1.1%) Unique Profiles NP 1 2 3 4 5 6 7 (i.e. 51 users each contacted 1 host unique to them, and one user contacted 8 hosts that nobody else did) 15
  • 16. Uniqueness Levels • By relaxing uniqueness requirement, more users can be fingerprinted – Tradeoff: Certainty vs. breadth • Nomenclature – The more clients that share a host, the higher the U number 16 U1 U2 U3 U4 • What is lost in ability to pinpoint users, is gained in insight into shared task/interest • Some profiles non-unique • Same user at different IP addresses?
  • 17. U1-U4 Profile Lists U1 Profiles NP 1 2 3 4 5 U2 Profiles NP 1 2 3 4 5 U3 Profiles NP 1 2 3 4 5 U4 Profiles NP 1 2 3 4 5 38% of users, 53% of traffic 60% of users, 78% of traffic 12 non-unique usersNone U1U2 U3 U4 Membership None U1 U2 U3 U4 75% of users, 89% of traffic 10 non-unique users 83% of users, 93% of traffic 10 non-unique users17
  • 18. Variance Over Time • Variability from month to month is observed • Month 1 • Month 2 18 Uniqueness % of users % of traffic U1 38% 53% U2 60% 78% U3 75% 89% U4 83% 93% Uniqueness % of users % of traffic U1 46% 80% U2 60% 92% U3 69% 96% U4 75% 98%
  • 19. Results and Lessons Learned • This represents a first step toward making simple flexible fingerprinting widely available – NetFlow is an ideal data source • Able to fingerprint users comprising majority of network traffic in relatively unrestricted environment • Uniqueness Levels – U1 profiles are more significant – U4 profiles cover far more of the population – Keeping track of them in parallel allows us the best of both worlds 19
  • 20. Take-Home • NetFlow, with its benefits to privacy, ease, and scalability, can be used to produce simple user fingerprints – Several types are possible; we went with the simplest plausible type • Unique site accesses represent one such fingerprint type – Intuitive and easy to grasp – Adjustable to the level of desired uniqueness • More sophisticated fingerprints are expected to be more useful still 20
  • 21. Next Steps, Short-Term • Room to grow within NetFlow collection regime: – Refine by port/protocol – Aggregate content distribution networks • Make better use of ground truth – Newer version of software allows searching on MAC address, to quickly check when fingerprint appears to change or duplicate – Determine whether there are substantive differences between wireless and wired networks • Number of individuals with identifiable fingerprints • Fingerprint stability 21
  • 22. Next Steps, Long-Term • Learning Period Estimation – What constitutes a baseline? • Long-Term Stability – How much do these fingerprints change over time? – What can be learned from those changes? – How are fingerprint lives distributed? vs • Autonomous Operation – Can fingerprint creation and tuning be automated? … to the point of using them for auto-remediation? 22
  • 23. For Additional Information… • For a copy of these slides and the whitepaper, or to evaluate the fingerprinting tool, visit us at: – http://www.flowtraq.com/research/FloCon2012.html • We would be happy to address any questions or comments – abarsam@flowtraq.com – vberk@flowtraq.com – jmurphy@flowtraq.com 23