SlideShare a Scribd company logo
Submitted By,
JOSNA KRISHNA
S7 CSE
ROLL No.:35
 INTRODUCTION
 SENSITIVE DATAS IN COMPANIES
 DATA LEAKAGE-------HOW???
 DANGER…
 TOWARDS SECURITY
 EXISTING SYSTEM
 PROPOSED SYSTEM
 INTO THE ALGORITHM
 CONCLUSION
DATA LEAKAGE:
Data leakage is the unauthorized
transmission of sensitive data or
information from within an organization
to an external destination .
•Intellectual Properties
•Financial Information
•Patient Information
•Personal Credit Card Data,
•& Other Information
Depending Upon the
Business and the industry.
•In the course of business, data must be
handed over to trusted 3rd Parties for
some operations.
•Sometimes these trusted 3rd
Parties may act as points of
Data leakage.
•Data Leakage mainly
happens due to
Human Errors.
•A hospital may give patient records to
researcher who will devise new treatment.
•Company may have partnership with other
companies that require sharing of customer
data.
•An enterprise may outsource
it’s data processing, so data
must be given to various other
companies.
Fast detection of transformed data leaks[mithun_p_c]
•Number of leaked sensitive data records has
grown 10 times in recent years.
•Data leakage by accidents exceeds the risk posed
by vulnerable software.
•Sensitive data leakage is more in cases where
there is no End-to-End encryption (example: PGP-
Pretty Good Privacy)
•Prevent clear text sensitive Data from Direct Access.
•Deploy a Screening Tool:
-To scan computer file systems.
-To scan server storage.
-Inspect outbound network traffic.
•Data leak detection differs from AntiVirus and Network
Intrusion Detection System (AV&NIDS).
->New security requirements
&
->Algorithmic Challenges.
Algorithmic Challenges:
-Data Transformation
-Scalability
•Direct usage of Automata-based string matching
is not possible.
It is based on Set Intersection.
Operation performed on 2 sets
of n-grams.
One from content and one from sensitive data.
This method is used to detect similar
documents on:
•The web.
•Shared malicious traffic pattern.
•Malware.
•E-mail spam.
 Symantec DLP
 Identity Finder
 Global Velocity
 GoCloud DLP etc.
Set Intersection is order less.
(Ordering of shared n-grams is not analyzed)
Generates false alerts.
(When n is set to small value)
Cannot detect the partial data leakage.
It is not an adequate method.
This one is holding sequential alignment
algorithm.
Executed on :
•Sampled sensitive data sequence.
•Sampled content being inspected.
Alignment produces the amount of sensitive data
in a content.
More accuracy is achieved.
Scalability issue is solved by sampling both the
Sensitive Data & Content Sequence before aligning.
A pair of algorithms is used:
•Comparable Sampling Algorithm
•Sampling Oblivious Alignment Algorithm
High detection specificity.
Pervasive & localized modifications.
o The Comparable Sampling Algorithm yields
constant samples of a sequence wherever
the sampling starts and ends
o The Sampling Oblivious Alignment
Algorithm infers the similarity between the
original unsampled sequence with
sophisticated techniques through dynamic
programming.
 In this method, both sensitive data &
content sequence are sampled.
 The alignment is performed on sampled
sequences
 Here, a ‘Comparable Sampling’ property is
used.
 Both the algorithms performs more faster
on a GPU than a CPU.
 Promises high speed security scanning.
INTO THE ALGORITHMS 
Requirements:
Definition 1: A substring is a consecutive
segment of the original string.
Definition 2: A subsequence does not
require its items to be consecutive in the
original string.
Definition 3: Given string x is substring
of y ,comparable sampling on x and y
yields x’ and y’. x’ is similar to a
substring of y’.
Definition 4: Given x as a substring of
y, a subsequence preserving sampling on
x and y yield two subsequences x’ and y’
,so that x’ is substring of y’.
 It is deterministic and subsequence
preserving.
 This algorithm is unbiased.
 It yields a constant samples of a
sequence wherever the sampling starts
and ends.
 Input: an array S of items, a size |w| for a sliding
window w, a
 selection function f (w, N) that selects N smallest
items from a
 window w, i.e., f = min(w, N)
 Output: a sampled array T
 1: initialize T as an empty array of size |S|
 2: w ←read(S, |w|)
 3: let w.head and w.tail be indices in S
corresponding to the
 higher-indexed end and lower-indexed end of w,
respectively
 4: collection mc ← min(w, N)
 5: while w is within the boundary of S do
 6: mp ←mc
 7: move w toward high index by 1
 8: mc ← min(w, N)
 9: if mc = mp then
 10: item en ← collectionDiff (mc,mp)
 11: item eo ← collectionDiff (mp,mc)
 12: if en < eo then
 13: write value en to T at w.head’s position
 14: else
 15: write value eo to T at w.tail’s position
 16: end if
 17: end if
 18: end while
We set our sampling procedure with a sliding window
of size 6 (i.e., |w| = 6) and N= 3. The input
sequence is 1,5,1,9,8,5,3,2,4,8. The initial window
w= [1,5,1,9,8,5] and collection mc = sliding{1,1,5}.
 The complexity of selection function is
O(n log|w|) or O(n),where n is the size of
input, |w| is the size of the window.
 The factor O(log|w|) comes from
maintaining the smallest N items within
the window.
Requirements:
The algorithm runs on compact sampled sequences L .
Extra fields for scoring matrix cells in dynamic
programming.
Extra step in recurrence relation for updating the null
region.
Complex weight function computes similarities
between two null region.
 Order –aware comparison
 High Tolerance to pattern variation
 Capability of detecting partial leaks
 Consistent
 Input: A weight function fw, visited cells in
H matrix that are
adjacent to H(i, j ): H(i −1, j −1), H(i, j −1),
and H(i −1, j ),
and the i -th and j -th items Lai,Lbj
in two sampled sequences La
and Lb, respectively.
Fast detection of transformed data leaks[mithun_p_c]
•Presented here is a content inspection technique
for sensitive data leakage.
•Detection approach is based on aligning 2
samples for similarity comparison.
•Our alignment method is useful for common data
scenarios.
Fast detection of transformed data leaks[mithun_p_c]

More Related Content

PDF
Cybersecurity Interview Questions and Answers | CyberSecurity Interview Tips ...
PPTX
Trusted systems1
PPTX
Digital signature(Cryptography)
PPTX
Osi security architecture in network.pptx
PPTX
Introduction of Cloud computing
PPTX
EDR(End Point Detection And Response).pptx
PPTX
Seminar Presentation | Network Intrusion Detection using Supervised Machine L...
DOCX
Ethical Hacking (CEH) - Industrial Training Report
Cybersecurity Interview Questions and Answers | CyberSecurity Interview Tips ...
Trusted systems1
Digital signature(Cryptography)
Osi security architecture in network.pptx
Introduction of Cloud computing
EDR(End Point Detection And Response).pptx
Seminar Presentation | Network Intrusion Detection using Supervised Machine L...
Ethical Hacking (CEH) - Industrial Training Report

What's hot (20)

PDF
Undirected graphs
PPTX
Hash Function
PDF
Web Security
PPTX
Data leakage detection
PPTX
Platform as a Service (PaaS)
PPTX
Traditional Problems Associated with Computer Crime
PPT
Network Security and Cryptography
PPT
Email security
PPT
Cipher techniques
PPT
X.509 Certificates
PPTX
Cia security model
PPTX
Cyber security
PPTX
Credit Card Fraud Detection
PDF
Network security - OSI Security Architecture
PPTX
Quantum cryptography
PPTX
ARTIFICIAL INTELLIGENCE
PPTX
Introduction to Machine Learning
PPT
Digital signature
PPT
Digital Signature
PPTX
DISEASE PREDICTION SYSTEM USING DATA MINING
Undirected graphs
Hash Function
Web Security
Data leakage detection
Platform as a Service (PaaS)
Traditional Problems Associated with Computer Crime
Network Security and Cryptography
Email security
Cipher techniques
X.509 Certificates
Cia security model
Cyber security
Credit Card Fraud Detection
Network security - OSI Security Architecture
Quantum cryptography
ARTIFICIAL INTELLIGENCE
Introduction to Machine Learning
Digital signature
Digital Signature
DISEASE PREDICTION SYSTEM USING DATA MINING
Ad

Viewers also liked (20)

PPT
Data leakage detection Complete Seminar
PDF
Data leakage detection
PPTX
Data leakage detection
PPT
Data leakage detection
PPTX
data-leakage-detection
DOC
Data leakage detection (synopsis)
PPTX
Data Leakage Detection
PDF
Data leakage detection
PPTX
Seminar presentation on 5G
PPT
PPTX
SGIP Webinar “Regulatory Commission Members Discuss How SGIP Helps Shape Sm...
PDF
web services
PPT
Asset Tracking on the Android Smartphone
DOC
Jpdcs1 data leakage detection
PPTX
Data leakage detection
PDF
Fpga implementation of fusion technique for fingerprint application
PDF
Proyecto de vida_jhoz
PDF
Blog historia espe
DOCX
Secuencia 112
PDF
CAPM 1.1
Data leakage detection Complete Seminar
Data leakage detection
Data leakage detection
Data leakage detection
data-leakage-detection
Data leakage detection (synopsis)
Data Leakage Detection
Data leakage detection
Seminar presentation on 5G
SGIP Webinar “Regulatory Commission Members Discuss How SGIP Helps Shape Sm...
web services
Asset Tracking on the Android Smartphone
Jpdcs1 data leakage detection
Data leakage detection
Fpga implementation of fusion technique for fingerprint application
Proyecto de vida_jhoz
Blog historia espe
Secuencia 112
CAPM 1.1
Ad

Similar to Fast detection of transformed data leaks[mithun_p_c] (20)

PDF
AN ALTERNATIVE APPROACH FOR SELECTION OF PSEUDO RANDOM NUMBERS FOR ONLINE EXA...
DOCX
Anomaly detection Full Article
PPT
Supervised and unsupervised learning
PPTX
Neural networks
PPTX
Interpolation Missing values.pptx
PPTX
linearly separable and therefore a set of weights exist that are consistent ...
PDF
F017132529
PDF
Performance Analysis of Different Clustering Algorithm
PPTX
BDA_MO_1_S7_Apply_basic_analytics_methods_such_as_distributions.pptx
PPTX
Python for Data Science
ODT
Probability and random processes project based learning template.pdf
PPT
Analysis design and analysis of algorithms ppt
PPT
algorithm and Analysis daa unit 2 aktu.ppt
PPTX
interpolation-and-its-application-180107160107.pptx
PDF
Interpolation and-its-application
PDF
Classifiers
PPTX
Cerdit card
PPTX
Unit III_Ch 17_Probablistic Methods.pptx
PPTX
Data Structures - Lecture 1 [introduction]
PPT
` Traffic Classification based on Machine Learning
AN ALTERNATIVE APPROACH FOR SELECTION OF PSEUDO RANDOM NUMBERS FOR ONLINE EXA...
Anomaly detection Full Article
Supervised and unsupervised learning
Neural networks
Interpolation Missing values.pptx
linearly separable and therefore a set of weights exist that are consistent ...
F017132529
Performance Analysis of Different Clustering Algorithm
BDA_MO_1_S7_Apply_basic_analytics_methods_such_as_distributions.pptx
Python for Data Science
Probability and random processes project based learning template.pdf
Analysis design and analysis of algorithms ppt
algorithm and Analysis daa unit 2 aktu.ppt
interpolation-and-its-application-180107160107.pptx
Interpolation and-its-application
Classifiers
Cerdit card
Unit III_Ch 17_Probablistic Methods.pptx
Data Structures - Lecture 1 [introduction]
` Traffic Classification based on Machine Learning

Recently uploaded (20)

PDF
Model Code of Practice - Construction Work - 21102022 .pdf
PDF
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
PDF
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
PPTX
Internet of Things (IOT) - A guide to understanding
PDF
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
PPTX
UNIT 4 Total Quality Management .pptx
PPTX
CH1 Production IntroductoryConcepts.pptx
PDF
Well-logging-methods_new................
PDF
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
PPT
Introduction, IoT Design Methodology, Case Study on IoT System for Weather Mo...
PPTX
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
PDF
Automation-in-Manufacturing-Chapter-Introduction.pdf
PDF
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
PDF
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
PDF
Unit I ESSENTIAL OF DIGITAL MARKETING.pdf
PPTX
Construction Project Organization Group 2.pptx
PPTX
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
PPTX
Safety Seminar civil to be ensured for safe working.
PPTX
Current and future trends in Computer Vision.pptx
PPTX
Geodesy 1.pptx...............................................
Model Code of Practice - Construction Work - 21102022 .pdf
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
Internet of Things (IOT) - A guide to understanding
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
UNIT 4 Total Quality Management .pptx
CH1 Production IntroductoryConcepts.pptx
Well-logging-methods_new................
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
Introduction, IoT Design Methodology, Case Study on IoT System for Weather Mo...
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
Automation-in-Manufacturing-Chapter-Introduction.pdf
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
Unit I ESSENTIAL OF DIGITAL MARKETING.pdf
Construction Project Organization Group 2.pptx
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
Safety Seminar civil to be ensured for safe working.
Current and future trends in Computer Vision.pptx
Geodesy 1.pptx...............................................

Fast detection of transformed data leaks[mithun_p_c]

  • 2.  INTRODUCTION  SENSITIVE DATAS IN COMPANIES  DATA LEAKAGE-------HOW???  DANGER…  TOWARDS SECURITY  EXISTING SYSTEM  PROPOSED SYSTEM  INTO THE ALGORITHM  CONCLUSION
  • 3. DATA LEAKAGE: Data leakage is the unauthorized transmission of sensitive data or information from within an organization to an external destination .
  • 4. •Intellectual Properties •Financial Information •Patient Information •Personal Credit Card Data, •& Other Information Depending Upon the Business and the industry.
  • 5. •In the course of business, data must be handed over to trusted 3rd Parties for some operations. •Sometimes these trusted 3rd Parties may act as points of Data leakage. •Data Leakage mainly happens due to Human Errors.
  • 6. •A hospital may give patient records to researcher who will devise new treatment. •Company may have partnership with other companies that require sharing of customer data. •An enterprise may outsource it’s data processing, so data must be given to various other companies.
  • 8. •Number of leaked sensitive data records has grown 10 times in recent years. •Data leakage by accidents exceeds the risk posed by vulnerable software. •Sensitive data leakage is more in cases where there is no End-to-End encryption (example: PGP- Pretty Good Privacy)
  • 9. •Prevent clear text sensitive Data from Direct Access. •Deploy a Screening Tool: -To scan computer file systems. -To scan server storage. -Inspect outbound network traffic. •Data leak detection differs from AntiVirus and Network Intrusion Detection System (AV&NIDS).
  • 10. ->New security requirements & ->Algorithmic Challenges. Algorithmic Challenges: -Data Transformation -Scalability •Direct usage of Automata-based string matching is not possible.
  • 11. It is based on Set Intersection. Operation performed on 2 sets of n-grams. One from content and one from sensitive data. This method is used to detect similar documents on: •The web. •Shared malicious traffic pattern. •Malware. •E-mail spam.
  • 12.  Symantec DLP  Identity Finder  Global Velocity  GoCloud DLP etc.
  • 13. Set Intersection is order less. (Ordering of shared n-grams is not analyzed) Generates false alerts. (When n is set to small value) Cannot detect the partial data leakage. It is not an adequate method.
  • 14. This one is holding sequential alignment algorithm. Executed on : •Sampled sensitive data sequence. •Sampled content being inspected. Alignment produces the amount of sensitive data in a content. More accuracy is achieved.
  • 15. Scalability issue is solved by sampling both the Sensitive Data & Content Sequence before aligning. A pair of algorithms is used: •Comparable Sampling Algorithm •Sampling Oblivious Alignment Algorithm High detection specificity. Pervasive & localized modifications.
  • 16. o The Comparable Sampling Algorithm yields constant samples of a sequence wherever the sampling starts and ends o The Sampling Oblivious Alignment Algorithm infers the similarity between the original unsampled sequence with sophisticated techniques through dynamic programming.
  • 17.  In this method, both sensitive data & content sequence are sampled.  The alignment is performed on sampled sequences  Here, a ‘Comparable Sampling’ property is used.  Both the algorithms performs more faster on a GPU than a CPU.  Promises high speed security scanning.
  • 19. Requirements: Definition 1: A substring is a consecutive segment of the original string. Definition 2: A subsequence does not require its items to be consecutive in the original string.
  • 20. Definition 3: Given string x is substring of y ,comparable sampling on x and y yields x’ and y’. x’ is similar to a substring of y’. Definition 4: Given x as a substring of y, a subsequence preserving sampling on x and y yield two subsequences x’ and y’ ,so that x’ is substring of y’.
  • 21.  It is deterministic and subsequence preserving.  This algorithm is unbiased.  It yields a constant samples of a sequence wherever the sampling starts and ends.
  • 22.  Input: an array S of items, a size |w| for a sliding window w, a  selection function f (w, N) that selects N smallest items from a  window w, i.e., f = min(w, N)  Output: a sampled array T  1: initialize T as an empty array of size |S|  2: w ←read(S, |w|)  3: let w.head and w.tail be indices in S corresponding to the  higher-indexed end and lower-indexed end of w, respectively  4: collection mc ← min(w, N)  5: while w is within the boundary of S do
  • 23.  6: mp ←mc  7: move w toward high index by 1  8: mc ← min(w, N)  9: if mc = mp then  10: item en ← collectionDiff (mc,mp)  11: item eo ← collectionDiff (mp,mc)  12: if en < eo then  13: write value en to T at w.head’s position  14: else  15: write value eo to T at w.tail’s position  16: end if  17: end if  18: end while
  • 24. We set our sampling procedure with a sliding window of size 6 (i.e., |w| = 6) and N= 3. The input sequence is 1,5,1,9,8,5,3,2,4,8. The initial window w= [1,5,1,9,8,5] and collection mc = sliding{1,1,5}.
  • 25.  The complexity of selection function is O(n log|w|) or O(n),where n is the size of input, |w| is the size of the window.  The factor O(log|w|) comes from maintaining the smallest N items within the window.
  • 26. Requirements: The algorithm runs on compact sampled sequences L . Extra fields for scoring matrix cells in dynamic programming. Extra step in recurrence relation for updating the null region. Complex weight function computes similarities between two null region.
  • 27.  Order –aware comparison  High Tolerance to pattern variation  Capability of detecting partial leaks  Consistent
  • 28.  Input: A weight function fw, visited cells in H matrix that are adjacent to H(i, j ): H(i −1, j −1), H(i, j −1), and H(i −1, j ), and the i -th and j -th items Lai,Lbj in two sampled sequences La and Lb, respectively.
  • 30. •Presented here is a content inspection technique for sensitive data leakage. •Detection approach is based on aligning 2 samples for similarity comparison. •Our alignment method is useful for common data scenarios.