SlideShare a Scribd company logo
Association Rule Mining with Privacy Preservation
In Horizontally Distributed Databases
Group 1 – Abhra Basak, Apoorva Kumar, Sachin K. Saini, Shiv Sankar, Suraj B. Malode
Introduction
Look before you leap
The Flow
Association
Rule Mining
Privacy
Preservation
Horizontally
Distributed
Datasets
Before we start mining!
trends or patterns in
large datasets
extracting useful
information
useful and
unexpected insights
analyze and
predicting system
behavior
Data Mining
Scalability
?
Artificial
Engineering
Machine
Learning
Statistics
Database
Systems
Association Rule Learning
By Rakesh Agarwal, IBM Almaden Research Center
• 80% of people who buy bread + butter, buy milk
• {Bread, Butter} → {Milk}
What is an Association Rule?
Antecedent
Consequent
Antecedent Consequent
Definitions
• 80% of people who buy bread + butter, buy milk
• {Bread, Butter} → {Milk}
Antecedent
• Prerequisites for
the rule to be
applied
Consequent
• The outcome
Support
• Percentage of
transaction
containing the
itemset
Confidence
• Faction of
transaction
satisfying the
rule
• Two different forms of constraints are used to generate the required association rules
• Syntactic Constraints: Restricts the attributes that may be present in a rule.
• Support Constraints: No of transactions that support a rule from the set of transactions.
Constraints
Association Rule Learning in Large Datasets
large datasets
• To find association rules
Generating
Large Itemset
• combinations of itemsets which are above a minimum support threshold
Generating
Association
Rules
•Mining all rules which are satisfied in that itemset
Association Rule Learning in Distributed Datasets
And Privacy Preservation
• Most tools used for mining association rules assume that data to be analyzed can be
collected at one central site.
• But issues like Privacy Preservation restrict the collection of data.
• Alternative methods for mining have to be devised for distributed datasets to the mining
process feasible while ensuring privacy.
Preview
• Dataset
• Combined data of Twitter and Facebook
• Rule
• How many percentage of people login into a social networking
site and post within the next 2 minutes?
Privacy Preservation
• Horizontally Partitioned (Example: Insurance Companies)
• Rule Being Mined: Does a procedure have an unusual rate of
complication?
• Implications:
• A company may have high cases of the procedure failing and
they may change policies to help.
• At the same time if this rule is exposed it may be a huge
problem for the company.
• The risks outweigh the gains.
Privacy Preservation
Patient ID Disease Prescription Effect
Patient ID Disease Prescription Effect
Patient ID Disease Prescription Effect
Company A
Company C
Company B
• Vertically Partitioned
Privacy Preservation
Credit Card No. Bought
tablet
2365987545623526 1
3639871526589414 1
4365845698742563 1
5962845632561200 1
6621563289657412 1
Credit Card No. Bought
TCover
2365987545623526 0
7639871526589414 1
4365845698742563 1
9962845632561200 0
6621563289657412 1
Common Property
Not One We
can exploit.
Mining of Association Rules
In Horizontally Partitioned Databases
What we want
• Computing Association Rules without revealing private information and getting
• The global support
• The global confidence
What we have
• Only the following information is available
• Local Support
• Local Confidence
• Size of the DB
Fundamental Steps
Even this information may not be shared freely between sites.
But we’ll get to that.
Calculating Required Values
• It protects individual privacy but each site has to disclose information.
• It reveals the local support and confidence in a rule at each site.
• This information if revealed can be harmful to an organization.
Problems with the approach
• We will be exploring two algorithms that have been used.
• One algorithm that has been used incorporates encryption with data distortion
while data sharing between sites.
• The second algorithm uses a particular Check Sum as the method of encryption.
Introducing the two Algorithms
Algorithm Uno
Some people are honest
• Phase 1: Uses encryption for mining of the large itemsets
• Phase 2: Uses a random number to preserve the privacy of each site (assuming a 3 or more party system)
Two phased algorithm
Phase 1: Commutative Encryption
Phase 2: Data Distortion
Site A
ABC:5
Size=100
Site B
ABC:6
Size=200
Site C
ABC:20
Size=300
R+count-5%*Size
=17+5-5%*100
13+20-5%*300 17+6-5%*200
13
17
18 >= R
R=17
• Doesn’t work for a 2 party system
• Assumes honest parties
• Assumes Boolean responses to variable for support of rules rather than a
subjective or weighted approach.
• As the no of candidate itemsets increases the encryption overhead
increases.
• The encryption overhead also varies directly proportional to the no of
sites or partitions.
Problems with the Algorithm
I got
……
Algorithm Dua
Don’t trust anyone
• Primarily used for to tackle semi honest sites.
• Data of each site is broken down into segments.
• Two interleaved nodes have a probability of hacking the one in between them.
• The neighbors are changed for each round. Hence, they can only obtain one such segment.
CK Secure Sum
P1
P2
P3
P4
Changing Neighbors
P1
P2
P4
P3
P1
P4
P2
P3
Round 1
Round 2
Round 3
Conclusion
The moral of the story...
Before you leave
• It is interesting that association rules play a vital role in data mining.
• Through this, what appears to be unrelated can have a logical explanation through
careful analysis.
• This aspect of data mining can be very useful in predicting patterns and foreseeing
trends in consumer behavior, choices and preferences.
• Association rules are indeed one of the best ways to succeed in business and enjoy the
harvest from data mining.
There are no dumb questions
(No questions please shhhh…)

More Related Content

PDF
Privacy Preserving by Anonymization Approach
PDF
Jadu GDPR guide: A easy to follow guide for Digital Service Managers and Webs...
PPTX
Market Basket Analysis
PDF
Privacy Preserving Data Mining
PPTX
Concept description characterization and comparison
PPT
Association rule mining
PDF
Lecture13 - Association Rules
PDF
Data Mining: Association Rules Basics
Privacy Preserving by Anonymization Approach
Jadu GDPR guide: A easy to follow guide for Digital Service Managers and Webs...
Market Basket Analysis
Privacy Preserving Data Mining
Concept description characterization and comparison
Association rule mining
Lecture13 - Association Rules
Data Mining: Association Rules Basics

Similar to Privacy Preservation Issues in Association Rule Mining in Horizontally Partitioned Databases (20)

PPT
Vinay bamane
DOCX
Secure mining of association rules in horizontally distributed databases
DOCX
JAVA 2013 IEEE DATAMINING PROJECT Secure mining of association rules in horiz...
PDF
Secure Mining of Association Rules in Horizontally Distributed Databases
DOCX
secure mining of association rules in horizontally distributed databases
PDF
IRJET- Secure Distributed Data Mining
PDF
Analysis and Implementation of Efficient Association Rules using K-mean and N...
PDF
D-Eclat Association Rules on Vertically Partitioned Dynamic Data to Outsource...
PDF
A Novel Filtering based Scheme for Privacy Preserving Data Mining
PDF
F04713641
PDF
F04713641
PDF
Maintaining Data Confidentiality in Association Rule Mining in Distributed En...
PDF
Maintaining Data Confidentiality in Association Rule Mining in Distributed En...
PDF
Privacy Preserving Approaches for High Dimensional Data
DOCX
IEEE 2014 JAVA DATA MINING PROJECTS Secure mining of association rules in hor...
DOCX
2014 IEEE JAVA DATA MINING PROJECT Secure mining of association rules in hori...
PPTX
Data Mining
PDF
Dmppt 180312092027
PDF
A SURVEY ON PRIVACY PRESERVING ASSOCIATION RULE MINING
PDF
International Journal of Engineering Research and Development (IJERD)
Vinay bamane
Secure mining of association rules in horizontally distributed databases
JAVA 2013 IEEE DATAMINING PROJECT Secure mining of association rules in horiz...
Secure Mining of Association Rules in Horizontally Distributed Databases
secure mining of association rules in horizontally distributed databases
IRJET- Secure Distributed Data Mining
Analysis and Implementation of Efficient Association Rules using K-mean and N...
D-Eclat Association Rules on Vertically Partitioned Dynamic Data to Outsource...
A Novel Filtering based Scheme for Privacy Preserving Data Mining
F04713641
F04713641
Maintaining Data Confidentiality in Association Rule Mining in Distributed En...
Maintaining Data Confidentiality in Association Rule Mining in Distributed En...
Privacy Preserving Approaches for High Dimensional Data
IEEE 2014 JAVA DATA MINING PROJECTS Secure mining of association rules in hor...
2014 IEEE JAVA DATA MINING PROJECT Secure mining of association rules in hori...
Data Mining
Dmppt 180312092027
A SURVEY ON PRIVACY PRESERVING ASSOCIATION RULE MINING
International Journal of Engineering Research and Development (IJERD)
Ad

More from Abhra Basak (8)

PPTX
FINANCIAL MARKET PREDICTION AND PORTFOLIO OPTIMIZATION USING FUZZY DECISION T...
PPTX
Concurrency in java
PPTX
Introduction to XML
PPTX
Spanner - Google distributed database
PPTX
DADAGIRI - The Fire Within
PPTX
Usability evaluation of the IIT Mandi Website
PPTX
Course Recommender
ODP
National Stock Exchange and Nasdaq 100
FINANCIAL MARKET PREDICTION AND PORTFOLIO OPTIMIZATION USING FUZZY DECISION T...
Concurrency in java
Introduction to XML
Spanner - Google distributed database
DADAGIRI - The Fire Within
Usability evaluation of the IIT Mandi Website
Course Recommender
National Stock Exchange and Nasdaq 100
Ad

Recently uploaded (20)

PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Encapsulation theory and applications.pdf
PDF
Electronic commerce courselecture one. Pdf
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PPTX
Programs and apps: productivity, graphics, security and other tools
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Approach and Philosophy of On baking technology
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Review of recent advances in non-invasive hemoglobin estimation
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PPTX
Big Data Technologies - Introduction.pptx
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Encapsulation_ Review paper, used for researhc scholars
PPTX
Cloud computing and distributed systems.
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PPTX
Spectroscopy.pptx food analysis technology
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Spectral efficient network and resource selection model in 5G networks
Encapsulation theory and applications.pdf
Electronic commerce courselecture one. Pdf
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Programs and apps: productivity, graphics, security and other tools
Digital-Transformation-Roadmap-for-Companies.pptx
Reach Out and Touch Someone: Haptics and Empathic Computing
Dropbox Q2 2025 Financial Results & Investor Presentation
The AUB Centre for AI in Media Proposal.docx
Approach and Philosophy of On baking technology
Advanced methodologies resolving dimensionality complications for autism neur...
Review of recent advances in non-invasive hemoglobin estimation
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
Big Data Technologies - Introduction.pptx
Per capita expenditure prediction using model stacking based on satellite ima...
Encapsulation_ Review paper, used for researhc scholars
Cloud computing and distributed systems.
NewMind AI Weekly Chronicles - August'25 Week I
Spectroscopy.pptx food analysis technology

Privacy Preservation Issues in Association Rule Mining in Horizontally Partitioned Databases

  • 1. Association Rule Mining with Privacy Preservation In Horizontally Distributed Databases Group 1 – Abhra Basak, Apoorva Kumar, Sachin K. Saini, Shiv Sankar, Suraj B. Malode
  • 4. Before we start mining! trends or patterns in large datasets extracting useful information useful and unexpected insights analyze and predicting system behavior Data Mining Scalability ? Artificial Engineering Machine Learning Statistics Database Systems
  • 5. Association Rule Learning By Rakesh Agarwal, IBM Almaden Research Center
  • 6. • 80% of people who buy bread + butter, buy milk • {Bread, Butter} → {Milk} What is an Association Rule? Antecedent Consequent Antecedent Consequent
  • 7. Definitions • 80% of people who buy bread + butter, buy milk • {Bread, Butter} → {Milk} Antecedent • Prerequisites for the rule to be applied Consequent • The outcome Support • Percentage of transaction containing the itemset Confidence • Faction of transaction satisfying the rule
  • 8. • Two different forms of constraints are used to generate the required association rules • Syntactic Constraints: Restricts the attributes that may be present in a rule. • Support Constraints: No of transactions that support a rule from the set of transactions. Constraints
  • 9. Association Rule Learning in Large Datasets large datasets • To find association rules Generating Large Itemset • combinations of itemsets which are above a minimum support threshold Generating Association Rules •Mining all rules which are satisfied in that itemset
  • 10. Association Rule Learning in Distributed Datasets And Privacy Preservation
  • 11. • Most tools used for mining association rules assume that data to be analyzed can be collected at one central site. • But issues like Privacy Preservation restrict the collection of data. • Alternative methods for mining have to be devised for distributed datasets to the mining process feasible while ensuring privacy. Preview
  • 12. • Dataset • Combined data of Twitter and Facebook • Rule • How many percentage of people login into a social networking site and post within the next 2 minutes? Privacy Preservation
  • 13. • Horizontally Partitioned (Example: Insurance Companies) • Rule Being Mined: Does a procedure have an unusual rate of complication? • Implications: • A company may have high cases of the procedure failing and they may change policies to help. • At the same time if this rule is exposed it may be a huge problem for the company. • The risks outweigh the gains. Privacy Preservation Patient ID Disease Prescription Effect Patient ID Disease Prescription Effect Patient ID Disease Prescription Effect Company A Company C Company B
  • 14. • Vertically Partitioned Privacy Preservation Credit Card No. Bought tablet 2365987545623526 1 3639871526589414 1 4365845698742563 1 5962845632561200 1 6621563289657412 1 Credit Card No. Bought TCover 2365987545623526 0 7639871526589414 1 4365845698742563 1 9962845632561200 0 6621563289657412 1 Common Property Not One We can exploit.
  • 15. Mining of Association Rules In Horizontally Partitioned Databases
  • 16. What we want • Computing Association Rules without revealing private information and getting • The global support • The global confidence What we have • Only the following information is available • Local Support • Local Confidence • Size of the DB Fundamental Steps Even this information may not be shared freely between sites. But we’ll get to that.
  • 18. • It protects individual privacy but each site has to disclose information. • It reveals the local support and confidence in a rule at each site. • This information if revealed can be harmful to an organization. Problems with the approach
  • 19. • We will be exploring two algorithms that have been used. • One algorithm that has been used incorporates encryption with data distortion while data sharing between sites. • The second algorithm uses a particular Check Sum as the method of encryption. Introducing the two Algorithms
  • 21. • Phase 1: Uses encryption for mining of the large itemsets • Phase 2: Uses a random number to preserve the privacy of each site (assuming a 3 or more party system) Two phased algorithm
  • 22. Phase 1: Commutative Encryption
  • 23. Phase 2: Data Distortion Site A ABC:5 Size=100 Site B ABC:6 Size=200 Site C ABC:20 Size=300 R+count-5%*Size =17+5-5%*100 13+20-5%*300 17+6-5%*200 13 17 18 >= R R=17
  • 24. • Doesn’t work for a 2 party system • Assumes honest parties • Assumes Boolean responses to variable for support of rules rather than a subjective or weighted approach. • As the no of candidate itemsets increases the encryption overhead increases. • The encryption overhead also varies directly proportional to the no of sites or partitions. Problems with the Algorithm I got ……
  • 26. • Primarily used for to tackle semi honest sites. • Data of each site is broken down into segments. • Two interleaved nodes have a probability of hacking the one in between them. • The neighbors are changed for each round. Hence, they can only obtain one such segment. CK Secure Sum
  • 28. Conclusion The moral of the story...
  • 29. Before you leave • It is interesting that association rules play a vital role in data mining. • Through this, what appears to be unrelated can have a logical explanation through careful analysis. • This aspect of data mining can be very useful in predicting patterns and foreseeing trends in consumer behavior, choices and preferences. • Association rules are indeed one of the best ways to succeed in business and enjoy the harvest from data mining.
  • 30. There are no dumb questions (No questions please shhhh…)

Editor's Notes

  • #7: Replace arrows :P
  • #8: Support - It provides the idea of feasibility of a rule; sometimes applied to antecedent only
  • #15: Replace arrow