SlideShare a Scribd company logo
Er. Nawaraj Bhandari
Data Warehouse/Data Mining
Mining Association Rules in Large
Databases
Chapter 7:
Introduction
Association rule mining finds interesting association or correlation relationships
among a large set of data items.
With massive amounts of data continuously being collected and stored , many
industries are becoming interested in mining association huge amounts of
business transaction records can help in many business decision making
processes, such as catalog design, cross-marketing, and loss-leader analysis.
 A typical example of association rule mining is market basket analysis.
Association Rules
 Analyzes and predicts customer behavior.
 If / then statements.
 Examples:
 Bread=>butter.
If someone purchase bread then he/she likely to purchase butter.
Buys{onions, potatoes}=> buys{tomatoes}
Parts of Association Rules
Bread=>butter[20%, 45%]
Bread: Antecedent
Butter: Consequent
20% is Support
And 45% is Confidence
Support and Confidence
A=>B
Support denoted probability that contains both A & B
Confidence denotes probability that a transaction
containing A also contains B.
Support and Confidence
Consider in a super market
Total transcations: 100
Bread: 20
So ,
20/100 * 100=20% which is support
In 20 transaction of bread, butter : 9 transactions
So, 9/20 * 100=45% which is confidence.
Types of Association Rules
Single dimension association rule
Multidimensional association rule
Hybrid association rule
Single dimension association rule
Bread=>Butter
Dimension: buying.
Here one and only dimension is buying.
Multi dimension association rule
 With 2 or more dimensions.
 Occupation(I.T), Age(>22)=>buys(laptops)
 Here we have 3 dimensions i.e occupation, age limit and buys.
 In multidimensional rules we can not duplicate dimension.
Hybrid dimension association rule
 Dimension or predicates can be repeated.
 Time(5 O'clock ), Buy(tea)=>Buy(biscuits)
 If a person at 5 o’clock get tea, he or she is likely to get biscuits also.
 Here dimensions are repeated.
Field of association rule
 Web usages mining
 Banking
 Bio informatics
 Market based analysis
 Credit/ debit card analysis
 Product clustering
 Catalog design
Algorithms of association rule
 Apriori Algorithm
 Elcat Algorithm
 F.P Growth Algorithm
Apriori Algorithm
 If you brought tooth brush, there will be suggestion of tooth paste or if you
brought beer there will be suggestion of chips and potato cracker etc.
 Many ecommerce websites are using these trends of suggestion in market. This
is called Apriori Algorithms. This is machine learning algorithms and a lot of
ecommerce websites (like flipcart, amazon) are using this.
Apriori Algorithm
Apriori Algorithm
Candidates First
C1:
Item Set Support Count
M 3
O 4
N 2
K 5
E 4
Y 3
D 1
A 1
U 1
C 2
Apriori Algorithm
L1: (The item set which are frequently repeating using minimum support)
Item Set Support Count
M 3
O 4
K 5
E 4
Y 3
Apriori Algorithm
Candidates First
C2:
Item Set Support Count
M, O 1
M, K 3
M, E 2
M,Y 2
O, K 3
O, E 3
O, Y 2
K, E 4
K, Y 3
E, Y 2
Apriori Algorithm
L2: (The item set which are frequently repeating using minimum support)
Item Set Support Count
M, K 3
O, K 3
O, E 3
K, E 4
K, Y 3
Apriori Algorithm
Candidates First
C3:
Item Set Support Count
M, K, O 1
M, K, E 2
M, K, Y 2
O, K, E 3
O, K, Y 2
Apriori Algorithm
L3: (The item set which are frequently repeating using minimum support)
Item Set Support Count
O, K, E 3
Apriori Algorithm
Now create association rules with support and confidence for O, K, E.
Association rules as like
O AND K GIVES E
Confidence= (support/no of time it occur i.e. O AND K OF O^K=>E)
For example confidence for o and k = (3/3)=1
Association Rule Support Confidence Confidence %
O^K=>E 3 3/3=1 100
O^E=>K 3 3/3=1 100
K^E=>O 3 3/4=0.75 75
E=>O^K 3 3/4=0.75 75
K=>O^E 3 3/5=0.6 60
O=>K^E 3 3/4=0.75 75
Apriori Algorithm
Compare this with the minimum confidence 80%
Association Rule Support Confidence Confidence %
O^K=>E 3 3/3=1 100
O^E=>K 3 3/3=1 100
Hence final association rules are:
O^K=>E
O^E=>K
Now this is called market basket analysis.
Pros and Cons of Association Rule Mining
Pros
 It is an easy-to-implement and easy-to-understand algorithm.
 It can be used on large itemsets.
Cons
 Sometimes, it may need to find a large number of candidate rules which can be
computationally expensive.
 Calculating support is also expensive because it has to go through the entire
database.
June 8, 2019 Data Mining: Concepts and Techniques 23
Assignment
Minimum support:2, Minimum confidence:70%. Use Apriori algorithm to get
frequent itemsets and strong association rules.
TID Item
1 I1, I3, I4
2 I2, I3, I5
3 I1, I2, I3, I5
4 I2, I5
References
1. Sam Anahory, Dennis Murray, “Data warehousing In the Real World”, Pearson
Education.
2. Kimball, R. “The Data Warehouse Toolkit”, Wiley, 1996.
3. Teorey, T. J., “Database Modeling and Design: The Entity-Relationship Approach”,
Morgan Kaufmann Publishers, Inc., 1990.
4. “An Overview of Data Warehousing and OLAP Technology”, S. Chaudhuri,
Microsoft Research
5. “Data Warehousing with Oracle”, M. A. Shahzad
6. “Data Mining Concepts and Techniques”, Morgan Kaufmann J. Han, M Kamber
Second Edition ISBN : 978-1-55860-901-3
ANY QUESTIONS?

More Related Content

PPTX
Lect6 Association rule & Apriori algorithm
PPSX
Frequent itemset mining methods
PDF
Lecture13 - Association Rules
PPTX
Association rule mining and Apriori algorithm
PPTX
Association Rule mining
PPTX
Data mining presentation.ppt
PPTX
Association Analysis in Data Mining
PPTX
Association rule mining.pptx
Lect6 Association rule & Apriori algorithm
Frequent itemset mining methods
Lecture13 - Association Rules
Association rule mining and Apriori algorithm
Association Rule mining
Data mining presentation.ppt
Association Analysis in Data Mining
Association rule mining.pptx

What's hot (20)

PPT
Data mining-primitives-languages-and-system-architectures2641
PPTX
Mining single dimensional boolean association rules from transactional
PPT
Association rule mining
PPTX
Major issues in data mining
PPTX
data generalization and summarization
PPTX
Decision tree induction \ Decision Tree Algorithm with Example| Data science
PDF
Data Mining: Association Rules Basics
PPTX
Association rule Mining
PPT
2.2 decision tree
PDF
Code optimization in compiler design
PPTX
Data mining tasks
PPTX
Classification and prediction in data mining
PPTX
Decision Trees
PPT
Heuristic Search Techniques Unit -II.ppt
PPT
3. mining frequent patterns
PPTX
Clustering in Data Mining
PPTX
OLAP & DATA WAREHOUSE
PPT
Types of Load distributing algorithm in Distributed System
PPTX
Knowledge discovery process
PPT
16. Concurrency Control in DBMS
Data mining-primitives-languages-and-system-architectures2641
Mining single dimensional boolean association rules from transactional
Association rule mining
Major issues in data mining
data generalization and summarization
Decision tree induction \ Decision Tree Algorithm with Example| Data science
Data Mining: Association Rules Basics
Association rule Mining
2.2 decision tree
Code optimization in compiler design
Data mining tasks
Classification and prediction in data mining
Decision Trees
Heuristic Search Techniques Unit -II.ppt
3. mining frequent patterns
Clustering in Data Mining
OLAP & DATA WAREHOUSE
Types of Load distributing algorithm in Distributed System
Knowledge discovery process
16. Concurrency Control in DBMS
Ad

Similar to Mining Association Rules in Large Database (20)

PPT
Data Mining: Association-Rules Techniques.ppt
PPTX
MODULE 5 _ Mining frequent patterns and associations.pptx
PPTX
Unit 4_ML.pptx
PDF
MCA-IV_DataMining16_DataMining_AssociationRules_APriori_Keerti_Dixit.pdf
PPTX
Unit-II-1-1@dm.pptx .
PDF
A literature review of modern association rule mining techniques
PPTX
Presentation on the topic of association rule mining
PPTX
ASSOCIATION Rule plus MArket basket Analysis.pptx
PPTX
BAS 250 Lecture 4
PDF
IRJET- Effecient Support Itemset Mining using Parallel Map Reducing
PPTX
Association rule introduction, Market basket Analysis
PPT
1.9.association mining 1
PPT
Lec6_Association.ppt
PDF
Data Mining For Supermarket Sale Analysis Using Association Rule
PDF
PROJECT-109,93.pdf data miiining project
PDF
Volume 2-issue-6-2081-2084
PDF
Volume 2-issue-6-2081-2084
PPTX
Association Rule Mining
PDF
AssociationRule.pdf
PPT
associations and Data Mining in Machine learning.ppt
Data Mining: Association-Rules Techniques.ppt
MODULE 5 _ Mining frequent patterns and associations.pptx
Unit 4_ML.pptx
MCA-IV_DataMining16_DataMining_AssociationRules_APriori_Keerti_Dixit.pdf
Unit-II-1-1@dm.pptx .
A literature review of modern association rule mining techniques
Presentation on the topic of association rule mining
ASSOCIATION Rule plus MArket basket Analysis.pptx
BAS 250 Lecture 4
IRJET- Effecient Support Itemset Mining using Parallel Map Reducing
Association rule introduction, Market basket Analysis
1.9.association mining 1
Lec6_Association.ppt
Data Mining For Supermarket Sale Analysis Using Association Rule
PROJECT-109,93.pdf data miiining project
Volume 2-issue-6-2081-2084
Volume 2-issue-6-2081-2084
Association Rule Mining
AssociationRule.pdf
associations and Data Mining in Machine learning.ppt
Ad

More from Er. Nawaraj Bhandari (20)

PPTX
Data mining approaches and methods
PPTX
Research trends in data warehousing and data mining
PPTX
Introduction to data mining and data warehousing
PPTX
Data warehouse testing
PPTX
Data warehouse physical design
PPTX
Data warehouse logical design
PPTX
Chapter 3: Simplification of Boolean Function
PPTX
Chapter 6: Sequential Logic
PPTX
Chapter 5: Cominational Logic with MSI and LSI
PPTX
Chapter 4: Combinational Logic
PPTX
Chapter 2: Boolean Algebra and Logic Gates
PPTX
Chapter 1: Binary System
PPTX
Introduction to Electronic Commerce
PPT
Evaluating software development
PPT
Using macros in microsoft excel part 2
PPT
Using macros in microsoft excel part 1
PPTX
Using macros in microsoft access
PPTX
Testing software development
PPTX
Application software and business processes
PPTX
An introduction to vba and macros
Data mining approaches and methods
Research trends in data warehousing and data mining
Introduction to data mining and data warehousing
Data warehouse testing
Data warehouse physical design
Data warehouse logical design
Chapter 3: Simplification of Boolean Function
Chapter 6: Sequential Logic
Chapter 5: Cominational Logic with MSI and LSI
Chapter 4: Combinational Logic
Chapter 2: Boolean Algebra and Logic Gates
Chapter 1: Binary System
Introduction to Electronic Commerce
Evaluating software development
Using macros in microsoft excel part 2
Using macros in microsoft excel part 1
Using macros in microsoft access
Testing software development
Application software and business processes
An introduction to vba and macros

Recently uploaded (20)

PPTX
climate analysis of Dhaka ,Banglades.pptx
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PPT
Reliability_Chapter_ presentation 1221.5784
PPTX
Supervised vs unsupervised machine learning algorithms
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PPTX
Qualitative Qantitative and Mixed Methods.pptx
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PPTX
Introduction to machine learning and Linear Models
PPTX
Introduction to Knowledge Engineering Part 1
PPTX
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
PDF
Business Analytics and business intelligence.pdf
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PPT
ISS -ESG Data flows What is ESG and HowHow
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PPTX
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
PPTX
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
climate analysis of Dhaka ,Banglades.pptx
Miokarditis (Inflamasi pada Otot Jantung)
Reliability_Chapter_ presentation 1221.5784
Supervised vs unsupervised machine learning algorithms
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
Qualitative Qantitative and Mixed Methods.pptx
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
Introduction to machine learning and Linear Models
Introduction to Knowledge Engineering Part 1
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
Business Analytics and business intelligence.pdf
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
ISS -ESG Data flows What is ESG and HowHow
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
oil_refinery_comprehensive_20250804084928 (1).pptx
Introduction-to-Cloud-ComputingFinal.pptx
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj

Mining Association Rules in Large Database

  • 1. Er. Nawaraj Bhandari Data Warehouse/Data Mining Mining Association Rules in Large Databases Chapter 7:
  • 2. Introduction Association rule mining finds interesting association or correlation relationships among a large set of data items. With massive amounts of data continuously being collected and stored , many industries are becoming interested in mining association huge amounts of business transaction records can help in many business decision making processes, such as catalog design, cross-marketing, and loss-leader analysis.  A typical example of association rule mining is market basket analysis.
  • 3. Association Rules  Analyzes and predicts customer behavior.  If / then statements.  Examples:  Bread=>butter. If someone purchase bread then he/she likely to purchase butter. Buys{onions, potatoes}=> buys{tomatoes}
  • 4. Parts of Association Rules Bread=>butter[20%, 45%] Bread: Antecedent Butter: Consequent 20% is Support And 45% is Confidence
  • 5. Support and Confidence A=>B Support denoted probability that contains both A & B Confidence denotes probability that a transaction containing A also contains B.
  • 6. Support and Confidence Consider in a super market Total transcations: 100 Bread: 20 So , 20/100 * 100=20% which is support In 20 transaction of bread, butter : 9 transactions So, 9/20 * 100=45% which is confidence.
  • 7. Types of Association Rules Single dimension association rule Multidimensional association rule Hybrid association rule
  • 8. Single dimension association rule Bread=>Butter Dimension: buying. Here one and only dimension is buying.
  • 9. Multi dimension association rule  With 2 or more dimensions.  Occupation(I.T), Age(>22)=>buys(laptops)  Here we have 3 dimensions i.e occupation, age limit and buys.  In multidimensional rules we can not duplicate dimension.
  • 10. Hybrid dimension association rule  Dimension or predicates can be repeated.  Time(5 O'clock ), Buy(tea)=>Buy(biscuits)  If a person at 5 o’clock get tea, he or she is likely to get biscuits also.  Here dimensions are repeated.
  • 11. Field of association rule  Web usages mining  Banking  Bio informatics  Market based analysis  Credit/ debit card analysis  Product clustering  Catalog design
  • 12. Algorithms of association rule  Apriori Algorithm  Elcat Algorithm  F.P Growth Algorithm
  • 13. Apriori Algorithm  If you brought tooth brush, there will be suggestion of tooth paste or if you brought beer there will be suggestion of chips and potato cracker etc.  Many ecommerce websites are using these trends of suggestion in market. This is called Apriori Algorithms. This is machine learning algorithms and a lot of ecommerce websites (like flipcart, amazon) are using this.
  • 15. Apriori Algorithm Candidates First C1: Item Set Support Count M 3 O 4 N 2 K 5 E 4 Y 3 D 1 A 1 U 1 C 2
  • 16. Apriori Algorithm L1: (The item set which are frequently repeating using minimum support) Item Set Support Count M 3 O 4 K 5 E 4 Y 3
  • 17. Apriori Algorithm Candidates First C2: Item Set Support Count M, O 1 M, K 3 M, E 2 M,Y 2 O, K 3 O, E 3 O, Y 2 K, E 4 K, Y 3 E, Y 2
  • 18. Apriori Algorithm L2: (The item set which are frequently repeating using minimum support) Item Set Support Count M, K 3 O, K 3 O, E 3 K, E 4 K, Y 3
  • 19. Apriori Algorithm Candidates First C3: Item Set Support Count M, K, O 1 M, K, E 2 M, K, Y 2 O, K, E 3 O, K, Y 2
  • 20. Apriori Algorithm L3: (The item set which are frequently repeating using minimum support) Item Set Support Count O, K, E 3
  • 21. Apriori Algorithm Now create association rules with support and confidence for O, K, E. Association rules as like O AND K GIVES E Confidence= (support/no of time it occur i.e. O AND K OF O^K=>E) For example confidence for o and k = (3/3)=1 Association Rule Support Confidence Confidence % O^K=>E 3 3/3=1 100 O^E=>K 3 3/3=1 100 K^E=>O 3 3/4=0.75 75 E=>O^K 3 3/4=0.75 75 K=>O^E 3 3/5=0.6 60 O=>K^E 3 3/4=0.75 75
  • 22. Apriori Algorithm Compare this with the minimum confidence 80% Association Rule Support Confidence Confidence % O^K=>E 3 3/3=1 100 O^E=>K 3 3/3=1 100 Hence final association rules are: O^K=>E O^E=>K Now this is called market basket analysis.
  • 23. Pros and Cons of Association Rule Mining Pros  It is an easy-to-implement and easy-to-understand algorithm.  It can be used on large itemsets. Cons  Sometimes, it may need to find a large number of candidate rules which can be computationally expensive.  Calculating support is also expensive because it has to go through the entire database. June 8, 2019 Data Mining: Concepts and Techniques 23
  • 24. Assignment Minimum support:2, Minimum confidence:70%. Use Apriori algorithm to get frequent itemsets and strong association rules. TID Item 1 I1, I3, I4 2 I2, I3, I5 3 I1, I2, I3, I5 4 I2, I5
  • 25. References 1. Sam Anahory, Dennis Murray, “Data warehousing In the Real World”, Pearson Education. 2. Kimball, R. “The Data Warehouse Toolkit”, Wiley, 1996. 3. Teorey, T. J., “Database Modeling and Design: The Entity-Relationship Approach”, Morgan Kaufmann Publishers, Inc., 1990. 4. “An Overview of Data Warehousing and OLAP Technology”, S. Chaudhuri, Microsoft Research 5. “Data Warehousing with Oracle”, M. A. Shahzad 6. “Data Mining Concepts and Techniques”, Morgan Kaufmann J. Han, M Kamber Second Edition ISBN : 978-1-55860-901-3