SlideShare a Scribd company logo
Copyright 2011 Trend Micro Inc.Classification 8/2/2013 1
Overview of Data Loss Prevention (DLP) Technology
Liwei Ren, Ph.D
Data Security Research, Trend Micro™
Sept, 2012, Tsinghua University, Beijing, China
Copyright 2011 Trend Micro Inc.
Backgrounds
• Liwei Ren, Data Security Research, Trend Micro™
– Education
• MS/BS in mathematics, Tsinghua University, Beijing
• Ph.D in mathematics, MS in information science, University of Pittsburgh
– Research interests
• DLP, differential compression, data de-duplication, file transfer protocols, database
security, and algorithms
– Major works
• N academic papers, M patents and K startup company where N≥10, M ≥12 and K=1
– TEEC member since 2005.
– liwei_ren@trendmicro.com
• Trend Micro™
– Global security software company with headquarter in Tokyo, and R&D centers in
Nanjing, Taipei and Silicon Valley.
– One of top 3 anti-malware vendors (competing with Symantec & McAfee)
– Pioneer in cloud security with product lines Deep Security™, SecureCloud™
– Major DLP vendor after Provilla™ acquisition
2
Copyright 2011 Trend Micro Inc.
Agenda
• What is Data Loss Prevention (数据泄露防护)?
• DLP Models
• DLP Systems and Architecture
• Data Classification and Identification
• Technical Challenges
• Summary
Classification 8/2/2013 3
Copyright 2011 Trend Micro Inc.
What Is Data Loss Prevention?
• What is Data Loss Prevention?
– Data loss prevention (aka, DLP) is a data security technology
that detects potential data breach incidents in timely manner
and prevents them by monitoring data in-use (endpoints), in-
motion (network traffic), and at-rest (data storage) in an
organization’s network.
Classification 8/2/2013 4
Copyright 2011 Trend Micro Inc.
What Is Data Loss Prevention?
• What drives DLP development?
– Regulatory compliances such as PCI,SOX, HIPAA, GLBA, SB1382 and etc
– Confidential information protection
– Intellectual property protection
• What data loss incidents does a DLP system handle?
– Incautious data leak by an internal worker
– Intentional data theft by an unskillful worker
– Determined data theft by a highly technical worker
– Determined data theft by external hackers or advanced malwares or APT
Classification 8/2/2013 5
Copyright 2011 Trend Micro Inc.
What Is Data Loss Prevention?
• The evolution of naming
– Information Leak Prevention (ILP)
– Information Leak Detection and Prevention (ILDP)
– DLP
• Data Leak Prevention
• Data Loss Prevention
Classification 8/2/2013 6
Copyright 2011 Trend Micro Inc.
DLP Models
• A model is used to describe a technology with rigorous terms
• We need models to define/scope what a DLP system should
do
• Three States of Data
– Data in Use (endpoints)
– Data in Motion (network)
– Data at Rest (storage)
Classification 8/2/2013 7
Copyright 2011 Trend Micro Inc.
DLP Models
• The data in use at endpoints can be leaked via
– USB
– Emails
– Web mails
– HTTP/HTTPS
– IM
– FTP
– …
• The data in motion can be leaked via
– SMTP
– FTP
– HTTP/HTTPS
– …
Classification 8/2/2013 8
Copyright 2011 Trend Micro Inc.
DLP Models
• The data at rest could
– reside at wrong place
– Be accessed by wrong person
– Be owned by wrong person
Classification 8/2/2013 9
Copyright 2011 Trend Micro Inc.
DLP Models
• A conceptual view for data-in-use and data-in-
motion:
Classification 8/2/2013 10
Copyright 2011 Trend Micro Inc.
DLP Models
• Technical views for data-in-use and data-in-motion:
Classification 8/2/2013 11
Copyright 2011 Trend Micro Inc.
DLP Models
• DLP Model for data-in-use and data-in-motion:
– DATA flows from SOURCE to DESTINATION via CHANNEL do
ACTIONs
• DATA specifies what confidential data is
• SOURCE can be an user, an endpoint, an email address, or a group of
them
• DESTINATION can be an endpoint, an email address, or a group of
them, or simply the external world
• CHANNEL indicates the data leak channel such as USB, email, network
protocols and etc
• ACTION is the action that needs to be taken by the DLP system when
an incident occurs
Classification 8/2/2013 12
Copyright 2011 Trend Micro Inc.
DLP Models
• DLP Model for data-at-rest
Classification 8/2/2013 13
Copyright 2011 Trend Micro Inc.
DLP Models
• DLP Model for data-at-rest
– DATA resides at SOURCE do ACTIONs
• DATA specifies what the sensitive data (which has potential for
leakage) is
• SOURCE can be an endpoint, a storage server or a group of them
• ACTION is the action that needs to be taken by the DLP system when
confidential data is identified at rest.
Classification 8/2/2013 14
Copyright 2011 Trend Micro Inc.
DLP Models
• These two DLP models are fundamental
• They basically define the formats of DLP security rules (or DLP
security policies)
Classification 8/2/2013 15
Copyright 2011 Trend Micro Inc.
DLP Systems and Architecture
• Typical DLP systems
– DLP Management Console
– DLP Endpoint Agent
– DLP Network Gateway
– Data Discovery Agent (or Appliance)
Classification 8/2/2013 16
Copyright 2011 Trend Micro Inc.
DLP Systems and Architecture
• Typical DLP system architecture
Classification 8/2/2013 17
Copyright 2011 Trend Micro Inc.
Data Classification and Identification
• One expects a DLP system can answer the following questions
– What is sensitive information?
– How to define sensitive information?
– How to categorize sensitive information?
– How to check if a given document contains sensitive information?
– How to measure data sensitivity?
• Data inspection is an important capability for a content-
aware DLP solution. It consists of two parts:
– To define sensitive data, i.e., data classification
– To identify sensitive data in real time
Classification 8/2/2013 18
Copyright 2011 Trend Micro Inc.
Data Classification and Identification
• Sensitive data is contained in textual documents.
• What does a document mean to you?
• We need text models to describe a text:
Classification 8/2/2013 19
Copyright 2011 Trend Micro Inc.
Data Classification and Identification
• I prefer to use UTF-8 text model
– Handling all languages, especially for CJK group.
– A textual document is normalized into a sequence of UTF-8 characters
• Four fundamental approaches for sensitive data definition and
identification:
– Document fingerprinting
– Database record fingerprinting
– Multiple Keyword matching
– Regular expression matching
Classification 8/2/2013 20
Copyright 2011 Trend Micro Inc.
Data Classification and Identification
• What is document fingerprinting about?
– It is a solution to a problem of information retrieval:
• Identify modified versions of known documents
• Near duplicate document detection (NDDD)
– A technique of variant detection for documents
• Extract invariants from variants of digital objects
• Variant detection is a principle with 1-to-many capability
Classification 8/2/2013 21
Copyright 2011 Trend Micro Inc.
Data Classification and Identification
• Problem Definition (a model):
– Let S= { T1, T2, …,Tn} be a set of known texts
– Given a query text T, one needs to determine if there exist at least a
document t ϵ S such that T and t share common textual content
significantly.
• Multiple documents are ranked by how much common content are shared.
Classification 8/2/2013 22
Copyright 2011 Trend Micro Inc.
Data Classification and Identification
• Alternative model:
– Let S= { T1, T2, …,Tn} be a set of known texts
– Given a query text T and X%, one needs to determine if there exist at
least a document t ϵ S such that |T ∩t| /Min(|T|,|t|) ≥ X%
• Multiple documents are ranked by the percentils.
Classification 8/2/2013 23
Copyright 2011 Trend Micro Inc.
Data Classification and Identification
• Solutions
– Liwei Ren & el., US patent 7516130, Matching engine with signature generation
– Liwei Ren & el., US patent 7747642, Matching engine for querying relevant
documents
– Liwei Ren & el., US patent 7860853, Document matching engine using
asymmetric signature generation
• Solution Highlights:
– A document fingerprint is a textual feature that we extract from a given text which is a
sequence of UTF-8 characters
– A single document has multiple fingerprints
– Uniqueness: Any two irrelevant documents should not have common fingerprints
– Robustness: If two documents share significantly common texts, they should have common
fingerprints. In other words, when a document has moderate changes , its fingerprints
should have good probability to survive.
– The key is to identify anchor points within text that can survive text changes. fingerprint
can be generated from its textual neighborhood
– The major part of the solution is a fingerprint generation algorithm.
– Finally, we arrive at a fingerprint based search engine
Classification 8/2/2013 24
Copyright 2011 Trend Micro Inc.
Data Classification and Identification
• How to evaluate a fingerprint generation algorithm?
– Accuracy in terms of false positive and false negative
– Performance
– Small fingerprint size that is required for an endpoint DLP solution
– Language independence
Classification 8/2/2013 25
Copyright 2011 Trend Micro Inc.
Data Classification and Identification
• What is database record fingerprinting about?
– Also known as Exact Match in DLP field
– It is a technique to detect if there exist sensitive data
records within a text.
• Use Case:
– We have several personal data records of <SSN, Phone#, address> that
are included in a text, we want to extract all records from the file to
determine the sensitivity of the file.
• Example: Two data records < 178-76-6754, 412-876-6789, 43 Atword Street,
Pittsburgh, PA 15260> & <159-87-8965, (408)780-8876 , 76 Parkview Ave,
Sunnyvale, CA 94086 > are embedded in text in an unstructured manner.
– Hhghghg 178-76-6754 ggkjkkkkk879-45-6785kjkjjk 43 Atword Street, Pittsburgh,
PA 15260 kllkll 412-876-6789 kjkjjkj 76 Parkview Ave, Sunnyvale, CA 94086
hhjhjhj (408)780-8876 hjhjkjkjjj 159-87-8965hjhjhjhj
Classification 8/2/2013 26
Copyright 2011 Trend Micro Inc.
Data Classification and Identification
• Problem Definition :
– Let S= { R1, R2, …,Rn} be a set of known data records of the same table.
– Given any text T, one needs to extract all records or sub-records from T
while the record cells may appear randomly within the text.
• A solution:
– Liwei Ren & el., US patent 7950062, Fingerprinting based entity
extraction.
Classification 8/2/2013 27
Copyright 2011 Trend Micro Inc.
Data Classification and Identification
• Multiple keyword match and RegEx match
– They are well-known & well-defined problems
– Very useful in DLP data inspection
• Problem Definition for Keyword Match:
– Let S= {K1,K2,…,Kn} be a dictionary of keywords.
– Given any text T, one needs to identify all keyword occurrences from T.
• Problem Definition for RegEx Match:
– Let S= {P1,P2,…,Pm} be a set of RegEx patterns.
– Given any text T, one needs to identify all pattern instances from T.
• Easy problems?
– Not at all. For large n and m, one will have performance issue.
– That’s the problem of scalability.
– Scalable algorithms must be provided.
Classification 8/2/2013 28
Copyright 2011 Trend Micro Inc.
Data Classification and Identification
• Data inspection template and framework
• The 4 different data inspection techniques need to work
together
– To meet various DLP use cases
– Especially, the regulatory compliances.
• For example, PCI needs the following Boolean logic supported
by both keyword match and RegEx match:
– SSN-Entity (2) OR [CCN(1) AND NAME(1) ] OR [CCN(1) AND Partial-Date(1) AND Expiration-
Keyword ]
– That is the PCI data template
Classification 8/2/2013 29
Copyright 2011 Trend Micro Inc.
Data Classification and Identification
• Data template framework:
Classification 8/2/2013 30
Copyright 2011 Trend Micro Inc.
Data Classification and Identification
• DLP rule engine works on top of both DLP models and data
template framework:
Classification 8/2/2013 31
Copyright 2011 Trend Micro Inc.
Technical Challenges
• Some areas with challenges
– Concept Match
– Data Discovery
– Document Classification Automation
– Determined Data Theft Detection
Classification 8/2/2013 32
Copyright 2011 Trend Micro Inc.
Summary
• What DLP is about
• DLP models
• DLP systems
• Text Models
• Data template framework with
– 4 data inspection techniques on top of a text model
Classification 8/2/2013 33
Copyright 2011 Trend Micro Inc.
Q&A
• Thanks for your time
• Any questions?
Classification 8/2/2013 34

More Related Content

PPTX
Data Loss Prevention
PPTX
Data Loss Prevention from Symantec
PPT
Data loss prevention (dlp)
PDF
DLP Systems: Models, Architecture and Algorithms
PPTX
Data Loss Prevention
PDF
Data Leakage Prevention (DLP)
PDF
DATA LOSS PREVENTION OVERVIEW
PPTX
Technology Overview - Symantec Data Loss Prevention (DLP)
Data Loss Prevention
Data Loss Prevention from Symantec
Data loss prevention (dlp)
DLP Systems: Models, Architecture and Algorithms
Data Loss Prevention
Data Leakage Prevention (DLP)
DATA LOSS PREVENTION OVERVIEW
Technology Overview - Symantec Data Loss Prevention (DLP)

What's hot (20)

PDF
Introducing Data Loss Prevention 14
PDF
Data Loss Prevention: Challenges, Impacts & Effective Strategies
PDF
Data Loss Prevention (DLP) - Fundamental Concept - Eryk
PPTX
Information Leakage & DLP
PPTX
Sensitive data
PDF
Data Loss Threats and Mitigations
PPTX
Overview of Microsoft Teams and Data Loss Prevention(DLP)
PPTX
Data Security
PPTX
Introduction to Network Security
PDF
Best Practices for Implementing Data Loss Prevention (DLP)
PPTX
The Zero Trust Model of Information Security
PDF
Privacy-ready Data Protection Program Implementation
PPTX
Azure security and Compliance
PDF
Microsoft 365 Compliance and Security Overview
PDF
DLP Data leak prevention
PDF
Data Protection Indonesia: Basic Regulation and Technical Aspects_Eryk
PPT
Data Protection Presentation
PDF
Azure Information Protection
PPTX
EDR(End Point Detection And Response).pptx
PPTX
Identity and access management
Introducing Data Loss Prevention 14
Data Loss Prevention: Challenges, Impacts & Effective Strategies
Data Loss Prevention (DLP) - Fundamental Concept - Eryk
Information Leakage & DLP
Sensitive data
Data Loss Threats and Mitigations
Overview of Microsoft Teams and Data Loss Prevention(DLP)
Data Security
Introduction to Network Security
Best Practices for Implementing Data Loss Prevention (DLP)
The Zero Trust Model of Information Security
Privacy-ready Data Protection Program Implementation
Azure security and Compliance
Microsoft 365 Compliance and Security Overview
DLP Data leak prevention
Data Protection Indonesia: Basic Regulation and Technical Aspects_Eryk
Data Protection Presentation
Azure Information Protection
EDR(End Point Detection And Response).pptx
Identity and access management
Ad

Viewers also liked (15)

PDF
The Definitive Guide to Data Loss Prevention
PPTX
Humans Are The Weakest Link – How DLP Can Help
PPTX
Intro to Data Loss Prevention in SharePoint 2016
PDF
How to Secure Your Files with DLP and FAM
PPTX
InfoWatch - Data loss prevention (dlp) and social media monitoring (smm)
PPTX
Top learnings from evaluating and implementing a DLP Solution
PDF
Symantec DLP for Tablet
PDF
Catalogo Portachiavi Per Auto
PPTX
DLP customer presentation
PDF
DLP 9.4 - новые возможности защиты от утечек
PPTX
Charity: A Secret for Cyberspace by Jon Creekmore
PDF
Extreme Hacking: Encrypted Networks SWAT style - Wayne Burke
PPTX
Evolution of Malware and Attempts to Prevent by Michael Angelo Vien
PPTX
The value of our data
PPTX
Edge pereira oss304 tech ed australia regulatory compliance and microsoft off...
The Definitive Guide to Data Loss Prevention
Humans Are The Weakest Link – How DLP Can Help
Intro to Data Loss Prevention in SharePoint 2016
How to Secure Your Files with DLP and FAM
InfoWatch - Data loss prevention (dlp) and social media monitoring (smm)
Top learnings from evaluating and implementing a DLP Solution
Symantec DLP for Tablet
Catalogo Portachiavi Per Auto
DLP customer presentation
DLP 9.4 - новые возможности защиты от утечек
Charity: A Secret for Cyberspace by Jon Creekmore
Extreme Hacking: Encrypted Networks SWAT style - Wayne Burke
Evolution of Malware and Attempts to Prevent by Michael Angelo Vien
The value of our data
Edge pereira oss304 tech ed australia regulatory compliance and microsoft off...
Ad

Similar to Overview of Data Loss Prevention (DLP) Technology (20)

PDF
Data security or technology what drives dlp implementation
PPTX
DSS.LV - Principles Of Data Protection - March2015 By Arturs Filatovs
PDF
IRJET- An Approach Towards Data Security in Organizations by Avoiding Data Br...
PDF
How Data Loss Prevention End-Point Agents Use HPE IDOL’s Comprehensive Data C...
PDF
Data Loss Prevention (DLP): Protecting Your Sensitive Data
PPT
PDF
2010 za con_stephen_kreusch
PPTX
(Slides) What's Yours Is Mine: How Employess Are Putting Your Sensitive Data ...
PPTX
Data Leakage Prevention
PPT
Lecture Data Classification And Data Loss Prevention
PPT
Data Classification And Loss Prevention
PPT
Lecture data classification_and_data_loss_prevention
PPTX
Seqrite Data Loss Prevention- Complete Protection from Data Theft and Data Loss
PDF
05.05.2021-webinar-presentation-experts-series-How-to-Switch-to-a-Better-DLP.pdf
PPT
Shariyaz abdeen data leakage prevention presentation
PDF
Data loss prevention by using MRSH-v2 algorithm
PDF
Be Aware Webinar Symantec-Maxímice su prevención hacia la fuga de la información
PDF
Dean carey - data loss-prevention - atlseccon2011
PDF
Data Lost Prevention (DLP).pdf
PPTX
DG_Architecture_Training.pptx
Data security or technology what drives dlp implementation
DSS.LV - Principles Of Data Protection - March2015 By Arturs Filatovs
IRJET- An Approach Towards Data Security in Organizations by Avoiding Data Br...
How Data Loss Prevention End-Point Agents Use HPE IDOL’s Comprehensive Data C...
Data Loss Prevention (DLP): Protecting Your Sensitive Data
2010 za con_stephen_kreusch
(Slides) What's Yours Is Mine: How Employess Are Putting Your Sensitive Data ...
Data Leakage Prevention
Lecture Data Classification And Data Loss Prevention
Data Classification And Loss Prevention
Lecture data classification_and_data_loss_prevention
Seqrite Data Loss Prevention- Complete Protection from Data Theft and Data Loss
05.05.2021-webinar-presentation-experts-series-How-to-Switch-to-a-Better-DLP.pdf
Shariyaz abdeen data leakage prevention presentation
Data loss prevention by using MRSH-v2 algorithm
Be Aware Webinar Symantec-Maxímice su prevención hacia la fuga de la información
Dean carey - data loss-prevention - atlseccon2011
Data Lost Prevention (DLP).pdf
DG_Architecture_Training.pptx

More from Liwei Ren任力偉 (20)

PDF
信息安全领域里的创新和机遇
PDF
企业安全市场综述
PDF
Introduction to Deep Neural Network
PDF
聊一聊大明朝的火器
PDF
防火牆們的故事
PDF
移动互联网时代下创新的思维
PDF
硅谷的那点事儿
PDF
非齐次特征值问题解存在性研究
PDF
世纪猜想
PDF
Arm the World with SPN based Security
PDF
Extending Boyer-Moore Algorithm to an Abstract String Matching Problem
PDF
Near Duplicate Document Detection: Mathematical Modeling and Algorithms
PDF
Monotonicity of Phaselocked Solutions in Chains and Arrays of Nearest-Neighbo...
PDF
Phase locking in chains of multiple-coupled oscillators
PDF
On existence of the solution of inhomogeneous eigenvalue problem
PDF
Math stories
PDF
Binary Similarity : Theory, Algorithms and Tool Evaluation
PDF
IoT Security: Problems, Challenges and Solutions
PDF
Taxonomy of Differential Compression
PDF
Bytewise Approximate Match: Theory, Algorithms and Applications
信息安全领域里的创新和机遇
企业安全市场综述
Introduction to Deep Neural Network
聊一聊大明朝的火器
防火牆們的故事
移动互联网时代下创新的思维
硅谷的那点事儿
非齐次特征值问题解存在性研究
世纪猜想
Arm the World with SPN based Security
Extending Boyer-Moore Algorithm to an Abstract String Matching Problem
Near Duplicate Document Detection: Mathematical Modeling and Algorithms
Monotonicity of Phaselocked Solutions in Chains and Arrays of Nearest-Neighbo...
Phase locking in chains of multiple-coupled oscillators
On existence of the solution of inhomogeneous eigenvalue problem
Math stories
Binary Similarity : Theory, Algorithms and Tool Evaluation
IoT Security: Problems, Challenges and Solutions
Taxonomy of Differential Compression
Bytewise Approximate Match: Theory, Algorithms and Applications

Recently uploaded (20)

PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
Machine learning based COVID-19 study performance prediction
PDF
Advanced Soft Computing BINUS July 2025.pdf
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
[발표본] 너의 과제는 클라우드에 있어_KTDS_김동현_20250524.pdf
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Empathic Computing: Creating Shared Understanding
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Approach and Philosophy of On baking technology
PDF
Network Security Unit 5.pdf for BCA BBA.
PPTX
MYSQL Presentation for SQL database connectivity
PPTX
Understanding_Digital_Forensics_Presentation.pptx
Mobile App Security Testing_ A Comprehensive Guide.pdf
The Rise and Fall of 3GPP – Time for a Sabbatical?
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Machine learning based COVID-19 study performance prediction
Advanced Soft Computing BINUS July 2025.pdf
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
20250228 LYD VKU AI Blended-Learning.pptx
[발표본] 너의 과제는 클라우드에 있어_KTDS_김동현_20250524.pdf
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
The AUB Centre for AI in Media Proposal.docx
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Empathic Computing: Creating Shared Understanding
NewMind AI Weekly Chronicles - August'25 Week I
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Chapter 3 Spatial Domain Image Processing.pdf
Approach and Philosophy of On baking technology
Network Security Unit 5.pdf for BCA BBA.
MYSQL Presentation for SQL database connectivity
Understanding_Digital_Forensics_Presentation.pptx

Overview of Data Loss Prevention (DLP) Technology

  • 1. Copyright 2011 Trend Micro Inc.Classification 8/2/2013 1 Overview of Data Loss Prevention (DLP) Technology Liwei Ren, Ph.D Data Security Research, Trend Micro™ Sept, 2012, Tsinghua University, Beijing, China
  • 2. Copyright 2011 Trend Micro Inc. Backgrounds • Liwei Ren, Data Security Research, Trend Micro™ – Education • MS/BS in mathematics, Tsinghua University, Beijing • Ph.D in mathematics, MS in information science, University of Pittsburgh – Research interests • DLP, differential compression, data de-duplication, file transfer protocols, database security, and algorithms – Major works • N academic papers, M patents and K startup company where N≥10, M ≥12 and K=1 – TEEC member since 2005. – liwei_ren@trendmicro.com • Trend Micro™ – Global security software company with headquarter in Tokyo, and R&D centers in Nanjing, Taipei and Silicon Valley. – One of top 3 anti-malware vendors (competing with Symantec & McAfee) – Pioneer in cloud security with product lines Deep Security™, SecureCloud™ – Major DLP vendor after Provilla™ acquisition 2
  • 3. Copyright 2011 Trend Micro Inc. Agenda • What is Data Loss Prevention (数据泄露防护)? • DLP Models • DLP Systems and Architecture • Data Classification and Identification • Technical Challenges • Summary Classification 8/2/2013 3
  • 4. Copyright 2011 Trend Micro Inc. What Is Data Loss Prevention? • What is Data Loss Prevention? – Data loss prevention (aka, DLP) is a data security technology that detects potential data breach incidents in timely manner and prevents them by monitoring data in-use (endpoints), in- motion (network traffic), and at-rest (data storage) in an organization’s network. Classification 8/2/2013 4
  • 5. Copyright 2011 Trend Micro Inc. What Is Data Loss Prevention? • What drives DLP development? – Regulatory compliances such as PCI,SOX, HIPAA, GLBA, SB1382 and etc – Confidential information protection – Intellectual property protection • What data loss incidents does a DLP system handle? – Incautious data leak by an internal worker – Intentional data theft by an unskillful worker – Determined data theft by a highly technical worker – Determined data theft by external hackers or advanced malwares or APT Classification 8/2/2013 5
  • 6. Copyright 2011 Trend Micro Inc. What Is Data Loss Prevention? • The evolution of naming – Information Leak Prevention (ILP) – Information Leak Detection and Prevention (ILDP) – DLP • Data Leak Prevention • Data Loss Prevention Classification 8/2/2013 6
  • 7. Copyright 2011 Trend Micro Inc. DLP Models • A model is used to describe a technology with rigorous terms • We need models to define/scope what a DLP system should do • Three States of Data – Data in Use (endpoints) – Data in Motion (network) – Data at Rest (storage) Classification 8/2/2013 7
  • 8. Copyright 2011 Trend Micro Inc. DLP Models • The data in use at endpoints can be leaked via – USB – Emails – Web mails – HTTP/HTTPS – IM – FTP – … • The data in motion can be leaked via – SMTP – FTP – HTTP/HTTPS – … Classification 8/2/2013 8
  • 9. Copyright 2011 Trend Micro Inc. DLP Models • The data at rest could – reside at wrong place – Be accessed by wrong person – Be owned by wrong person Classification 8/2/2013 9
  • 10. Copyright 2011 Trend Micro Inc. DLP Models • A conceptual view for data-in-use and data-in- motion: Classification 8/2/2013 10
  • 11. Copyright 2011 Trend Micro Inc. DLP Models • Technical views for data-in-use and data-in-motion: Classification 8/2/2013 11
  • 12. Copyright 2011 Trend Micro Inc. DLP Models • DLP Model for data-in-use and data-in-motion: – DATA flows from SOURCE to DESTINATION via CHANNEL do ACTIONs • DATA specifies what confidential data is • SOURCE can be an user, an endpoint, an email address, or a group of them • DESTINATION can be an endpoint, an email address, or a group of them, or simply the external world • CHANNEL indicates the data leak channel such as USB, email, network protocols and etc • ACTION is the action that needs to be taken by the DLP system when an incident occurs Classification 8/2/2013 12
  • 13. Copyright 2011 Trend Micro Inc. DLP Models • DLP Model for data-at-rest Classification 8/2/2013 13
  • 14. Copyright 2011 Trend Micro Inc. DLP Models • DLP Model for data-at-rest – DATA resides at SOURCE do ACTIONs • DATA specifies what the sensitive data (which has potential for leakage) is • SOURCE can be an endpoint, a storage server or a group of them • ACTION is the action that needs to be taken by the DLP system when confidential data is identified at rest. Classification 8/2/2013 14
  • 15. Copyright 2011 Trend Micro Inc. DLP Models • These two DLP models are fundamental • They basically define the formats of DLP security rules (or DLP security policies) Classification 8/2/2013 15
  • 16. Copyright 2011 Trend Micro Inc. DLP Systems and Architecture • Typical DLP systems – DLP Management Console – DLP Endpoint Agent – DLP Network Gateway – Data Discovery Agent (or Appliance) Classification 8/2/2013 16
  • 17. Copyright 2011 Trend Micro Inc. DLP Systems and Architecture • Typical DLP system architecture Classification 8/2/2013 17
  • 18. Copyright 2011 Trend Micro Inc. Data Classification and Identification • One expects a DLP system can answer the following questions – What is sensitive information? – How to define sensitive information? – How to categorize sensitive information? – How to check if a given document contains sensitive information? – How to measure data sensitivity? • Data inspection is an important capability for a content- aware DLP solution. It consists of two parts: – To define sensitive data, i.e., data classification – To identify sensitive data in real time Classification 8/2/2013 18
  • 19. Copyright 2011 Trend Micro Inc. Data Classification and Identification • Sensitive data is contained in textual documents. • What does a document mean to you? • We need text models to describe a text: Classification 8/2/2013 19
  • 20. Copyright 2011 Trend Micro Inc. Data Classification and Identification • I prefer to use UTF-8 text model – Handling all languages, especially for CJK group. – A textual document is normalized into a sequence of UTF-8 characters • Four fundamental approaches for sensitive data definition and identification: – Document fingerprinting – Database record fingerprinting – Multiple Keyword matching – Regular expression matching Classification 8/2/2013 20
  • 21. Copyright 2011 Trend Micro Inc. Data Classification and Identification • What is document fingerprinting about? – It is a solution to a problem of information retrieval: • Identify modified versions of known documents • Near duplicate document detection (NDDD) – A technique of variant detection for documents • Extract invariants from variants of digital objects • Variant detection is a principle with 1-to-many capability Classification 8/2/2013 21
  • 22. Copyright 2011 Trend Micro Inc. Data Classification and Identification • Problem Definition (a model): – Let S= { T1, T2, …,Tn} be a set of known texts – Given a query text T, one needs to determine if there exist at least a document t ϵ S such that T and t share common textual content significantly. • Multiple documents are ranked by how much common content are shared. Classification 8/2/2013 22
  • 23. Copyright 2011 Trend Micro Inc. Data Classification and Identification • Alternative model: – Let S= { T1, T2, …,Tn} be a set of known texts – Given a query text T and X%, one needs to determine if there exist at least a document t ϵ S such that |T ∩t| /Min(|T|,|t|) ≥ X% • Multiple documents are ranked by the percentils. Classification 8/2/2013 23
  • 24. Copyright 2011 Trend Micro Inc. Data Classification and Identification • Solutions – Liwei Ren & el., US patent 7516130, Matching engine with signature generation – Liwei Ren & el., US patent 7747642, Matching engine for querying relevant documents – Liwei Ren & el., US patent 7860853, Document matching engine using asymmetric signature generation • Solution Highlights: – A document fingerprint is a textual feature that we extract from a given text which is a sequence of UTF-8 characters – A single document has multiple fingerprints – Uniqueness: Any two irrelevant documents should not have common fingerprints – Robustness: If two documents share significantly common texts, they should have common fingerprints. In other words, when a document has moderate changes , its fingerprints should have good probability to survive. – The key is to identify anchor points within text that can survive text changes. fingerprint can be generated from its textual neighborhood – The major part of the solution is a fingerprint generation algorithm. – Finally, we arrive at a fingerprint based search engine Classification 8/2/2013 24
  • 25. Copyright 2011 Trend Micro Inc. Data Classification and Identification • How to evaluate a fingerprint generation algorithm? – Accuracy in terms of false positive and false negative – Performance – Small fingerprint size that is required for an endpoint DLP solution – Language independence Classification 8/2/2013 25
  • 26. Copyright 2011 Trend Micro Inc. Data Classification and Identification • What is database record fingerprinting about? – Also known as Exact Match in DLP field – It is a technique to detect if there exist sensitive data records within a text. • Use Case: – We have several personal data records of <SSN, Phone#, address> that are included in a text, we want to extract all records from the file to determine the sensitivity of the file. • Example: Two data records < 178-76-6754, 412-876-6789, 43 Atword Street, Pittsburgh, PA 15260> & <159-87-8965, (408)780-8876 , 76 Parkview Ave, Sunnyvale, CA 94086 > are embedded in text in an unstructured manner. – Hhghghg 178-76-6754 ggkjkkkkk879-45-6785kjkjjk 43 Atword Street, Pittsburgh, PA 15260 kllkll 412-876-6789 kjkjjkj 76 Parkview Ave, Sunnyvale, CA 94086 hhjhjhj (408)780-8876 hjhjkjkjjj 159-87-8965hjhjhjhj Classification 8/2/2013 26
  • 27. Copyright 2011 Trend Micro Inc. Data Classification and Identification • Problem Definition : – Let S= { R1, R2, …,Rn} be a set of known data records of the same table. – Given any text T, one needs to extract all records or sub-records from T while the record cells may appear randomly within the text. • A solution: – Liwei Ren & el., US patent 7950062, Fingerprinting based entity extraction. Classification 8/2/2013 27
  • 28. Copyright 2011 Trend Micro Inc. Data Classification and Identification • Multiple keyword match and RegEx match – They are well-known & well-defined problems – Very useful in DLP data inspection • Problem Definition for Keyword Match: – Let S= {K1,K2,…,Kn} be a dictionary of keywords. – Given any text T, one needs to identify all keyword occurrences from T. • Problem Definition for RegEx Match: – Let S= {P1,P2,…,Pm} be a set of RegEx patterns. – Given any text T, one needs to identify all pattern instances from T. • Easy problems? – Not at all. For large n and m, one will have performance issue. – That’s the problem of scalability. – Scalable algorithms must be provided. Classification 8/2/2013 28
  • 29. Copyright 2011 Trend Micro Inc. Data Classification and Identification • Data inspection template and framework • The 4 different data inspection techniques need to work together – To meet various DLP use cases – Especially, the regulatory compliances. • For example, PCI needs the following Boolean logic supported by both keyword match and RegEx match: – SSN-Entity (2) OR [CCN(1) AND NAME(1) ] OR [CCN(1) AND Partial-Date(1) AND Expiration- Keyword ] – That is the PCI data template Classification 8/2/2013 29
  • 30. Copyright 2011 Trend Micro Inc. Data Classification and Identification • Data template framework: Classification 8/2/2013 30
  • 31. Copyright 2011 Trend Micro Inc. Data Classification and Identification • DLP rule engine works on top of both DLP models and data template framework: Classification 8/2/2013 31
  • 32. Copyright 2011 Trend Micro Inc. Technical Challenges • Some areas with challenges – Concept Match – Data Discovery – Document Classification Automation – Determined Data Theft Detection Classification 8/2/2013 32
  • 33. Copyright 2011 Trend Micro Inc. Summary • What DLP is about • DLP models • DLP systems • Text Models • Data template framework with – 4 data inspection techniques on top of a text model Classification 8/2/2013 33
  • 34. Copyright 2011 Trend Micro Inc. Q&A • Thanks for your time • Any questions? Classification 8/2/2013 34