SlideShare a Scribd company logo
1
Mining Association Rules
Mohamed G. Elfeky
2
Introduction
Data mining is the discovery of
knowledge and useful information from
the large amounts of data stored in
databases.
Association Rules: describing association
relationships among the attributes in
the set of relevant data.
3
Rules
Body ==> Consequent [ Support , Confidence ]
Body: represents the examined data.
Consequent: represents a discovered property
for the examined data.
Support: represents the percentage of the
records satisfying the body or the consequent.
Confidence: represents the percentage of the
records satisfying both the body and the
consequent to those satisfying only the body.
4
Association Rules Examples
Basket Data
Tea ^ Milk ==> Sugar [0.3 , 0.9]
Relational Data
x.diagnosis = Heart ^ x.sex = Male ==> x.age > 50 [0.4 , 0.7]
Object-Oriented Data
s.hobbies = { sport , art } ==> s.age() = Young [0.5 , 0.8]
5
Topics of Discussion
Formal Statement of the Problem
Different Algorithms
 AIS
 SETM
 Apriori
 AprioriTid
 AprioriHybrid
Performance Analysis
6
Formal Statement of the
Problem
I = { i1 , i2 , … , im } is a set of items
D is a set of transactions T
Each transaction T is a set of items (subset of I)
TID is a unique identifier that is associated with
each transaction
The problem is to generate all association rules
that have support and confidence greater than
the user-specified minimum support and
minimum confidence
7
Problem Decomposition
The problem can be decomposed into two subproblems:
1. Find all sets of items (itemsets) that have
support (number of transactions) greater
than the minimum support (large itemsets).
2. Use the large itemsets to generate the desired
rules.
For each large itemset l, find all non-empty
subsets, and for each subset a generate a rule a
==> (l-a) if its confidence is greater than the
minimum confidence.
8
General Algorithm
1. In the first pass, the support of each individual
item is counted, and the large ones are
determined
2. In each subsequent pass, the large itemsets
determined in the previous pass is used to
generate new itemsets called candidate
itemsets.
3. The support of each candidate itemset is
counted, and the large ones are determined.
4. This process continues until no new large
itemsets are found.
9
AIS Algorithm
Candidate itemsets are generated and counted on-
the-fly as the database is scanned.
1. For each transaction, it is determined which of the
large itemsets of the previous pass are contained in
this transaction.
2. New candidate itemsets are generated by extending
these large itemsets with other items in this
transaction.
The disadvantage is that this results in unnecessarily
generating and counting too many candidate
itemsets that turn out to be small.
10
Example
TID Items
100 1 3 4
200 2 3 5
300 1 2 3 5
400 2 5
Database
Itemset Support
{1} 2
{2} 3
{3} 3
{5} 3
L1
Itemset Support
{1 3}* 2
{1 4} 1
{3 4} 1
{2 3}* 2
{2 5}* 3
{3 5}* 2
{1 2} 1
{1 5} 1
C2
Itemset Support
{1 3 4} 1
{2 3 5}* 2
{1 3 5} 1
C3
11
SETM Algorithm
Candidate itemsets are generated on-the-fly as the
database is scanned, but counted at the end of the pass.
1. New candidate itemsets are generated the same way as
in AIS algorithm, but the TID of the generating
transaction is saved with the candidate itemset in a
sequential structure.
2. At the end of the pass, the support count of candidate
itemsets is determined by aggregating this sequential
structure
It has the same disadvantage of the AIS algorithm.
Another disadvantage is that for each candidate itemset,
there are as many entries as its support value.
12
Example
TID Items
100 1 3 4
200 2 3 5
300 1 2 3 5
400 2 5
Database
Itemset Support
{1} 2
{2} 3
{3} 3
{5} 3
L1
Itemset TID
{1 3} 100
{1 4} 100
{3 4} 100
{2 3} 200
{2 5} 200
{3 5} 200
{1 2} 300
{1 3} 300
{1 5} 300
{2 3} 300
{2 5} 300
{3 5} 300
{2 5} 400
C2
Itemset TID
{1 3 4} 100
{2 3 5} 200
{1 3 5} 300
{2 3 5} 300
C3
13
Apriori Algorithm
Candidate itemsets are generated using only the
large itemsets of the previous pass without
considering the transactions in the database.
1.The large itemset of the previous pass is joined
with itself to generate all itemsets whose size is
higher by 1.
2.Each generated itemset, that has a subset
which is not large, is deleted. The remaining
itemsets are the candidate ones.
14
Example
TID Items
100 1 3 4
200 2 3 5
300 1 2 3 5
400 2 5
Database
Itemset Support
{1} 2
{2} 3
{3} 3
{5} 3
L1
Itemset Support
{1 2} 1
{1 3}* 2
{1 5} 1
{2 3}* 2
{2 5}* 3
{3 5}* 2
C2
Itemset Support
{2 3 5}* 2
C3
{1 2 3}
{1 3 5}
{2 3 5}
15
AprioriTid Algorithm
The database is not used at all for counting the
support of candidate itemsets after the first pass.
1. The candidate itemsets are generated the same way
as in Apriori algorithm.
2. Another set C’ is generated of which each member
has the TID of each transaction and the large
itemsets present in this transaction. This set is used
to count the support of each candidate itemset.
The advantage is that the number of entries in C’ may
be smaller than the number of transactions in the
database, especially in the later passes.
16
Example
TID Items
100 1 3 4
200 2 3 5
300 1 2 3 5
400 2 5
Database
Itemset Support
{1} 2
{2} 3
{3} 3
{5} 3
L1
Itemset Support
{1 2} 1
{1 3}* 2
{1 5} 1
{2 3}* 2
{2 5}* 3
{3 5}* 2
C2
Itemset Support
{2 3 5}* 2
C3
100 {1 3}
200 {2 3}, {2 5}, {3 5}
300 {1 2}, {1 3}, {1 5},
{2 3}, {2 5}, {3 5}
400 {2 5}
C’2
200 {2 3 5}
300 {2 3 5}
C’3
17
Performance Analysis
18
AprioriHybrid Algorithm
Performance Analysis shows that:
1. Apriori does better than AprioriTid in the
earlier passes.
2. AprioriTid does better than Apriori in the
later passes.
Hence, a hybrid algorithm can be
designed that uses Apriori in the initial
passes and switches to AprioriTid when it
expects that the set C’ will fit in memory.

More Related Content

PPT
Associative Learning
PDF
An Improved Frequent Itemset Generation Algorithm Based On Correspondence
PDF
Ijcatr04051008
PPT
Rmining
PPTX
Presentation on the topic of association rule mining
PDF
Comparative analysis of association rule generation algorithms in data streams
PPT
Cs583 association-sequential-patterns
PDF
A NEW ASSOCIATION RULE MINING BASED ON FREQUENT ITEM SET
Associative Learning
An Improved Frequent Itemset Generation Algorithm Based On Correspondence
Ijcatr04051008
Rmining
Presentation on the topic of association rule mining
Comparative analysis of association rule generation algorithms in data streams
Cs583 association-sequential-patterns
A NEW ASSOCIATION RULE MINING BASED ON FREQUENT ITEM SET

Similar to MiningAssociationRulesMiningAssociationRulesMiningAssociationRules.ppt (20)

PDF
Discovering Frequent Patterns with New Mining Procedure
PDF
Ej36829834
PPTX
Hiding Sensitive Association Rules
PDF
J0945761
PDF
Efficient Temporal Association Rule Mining
PDF
PDF
B0950814
PPTX
Datamining.pptx
PPTX
Hiding slides
PPT
Association Rule.ppt
PPT
Association Rule.ppt
PDF
ifip2008albashiri.pdf
PDF
Intelligent Supermarket using Apriori
PPT
CS583 - association-rules(BahanAR-5).ppt
PPTX
Apriori Algorithm.pptx
PDF
unit II Mining Association Rule.pdf
PPT
CS583-association-rules.ppt
PPT
Association rule mining used in data mining
PPT
CS583-association-rules.ppt
Discovering Frequent Patterns with New Mining Procedure
Ej36829834
Hiding Sensitive Association Rules
J0945761
Efficient Temporal Association Rule Mining
B0950814
Datamining.pptx
Hiding slides
Association Rule.ppt
Association Rule.ppt
ifip2008albashiri.pdf
Intelligent Supermarket using Apriori
CS583 - association-rules(BahanAR-5).ppt
Apriori Algorithm.pptx
unit II Mining Association Rule.pdf
CS583-association-rules.ppt
Association rule mining used in data mining
CS583-association-rules.ppt
Ad

Recently uploaded (20)

PPTX
Construction Project Organization Group 2.pptx
PPT
introduction to datamining and warehousing
PPTX
additive manufacturing of ss316l using mig welding
PDF
BIO-INSPIRED HORMONAL MODULATION AND ADAPTIVE ORCHESTRATION IN S-AI-GPT
PPTX
UNIT-1 - COAL BASED THERMAL POWER PLANTS
PDF
null (2) bgfbg bfgb bfgb fbfg bfbgf b.pdf
PPTX
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
PDF
Well-logging-methods_new................
DOCX
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
PDF
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
PDF
PREDICTION OF DIABETES FROM ELECTRONIC HEALTH RECORDS
PDF
Categorization of Factors Affecting Classification Algorithms Selection
PPTX
Fundamentals of safety and accident prevention -final (1).pptx
PPTX
CYBER-CRIMES AND SECURITY A guide to understanding
PPTX
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
PPTX
UNIT 4 Total Quality Management .pptx
PPT
Total quality management ppt for engineering students
PDF
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
PDF
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
PDF
Level 2 – IBM Data and AI Fundamentals (1)_v1.1.PDF
Construction Project Organization Group 2.pptx
introduction to datamining and warehousing
additive manufacturing of ss316l using mig welding
BIO-INSPIRED HORMONAL MODULATION AND ADAPTIVE ORCHESTRATION IN S-AI-GPT
UNIT-1 - COAL BASED THERMAL POWER PLANTS
null (2) bgfbg bfgb bfgb fbfg bfbgf b.pdf
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
Well-logging-methods_new................
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
PREDICTION OF DIABETES FROM ELECTRONIC HEALTH RECORDS
Categorization of Factors Affecting Classification Algorithms Selection
Fundamentals of safety and accident prevention -final (1).pptx
CYBER-CRIMES AND SECURITY A guide to understanding
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
UNIT 4 Total Quality Management .pptx
Total quality management ppt for engineering students
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
Level 2 – IBM Data and AI Fundamentals (1)_v1.1.PDF
Ad

MiningAssociationRulesMiningAssociationRulesMiningAssociationRules.ppt

  • 2. 2 Introduction Data mining is the discovery of knowledge and useful information from the large amounts of data stored in databases. Association Rules: describing association relationships among the attributes in the set of relevant data.
  • 3. 3 Rules Body ==> Consequent [ Support , Confidence ] Body: represents the examined data. Consequent: represents a discovered property for the examined data. Support: represents the percentage of the records satisfying the body or the consequent. Confidence: represents the percentage of the records satisfying both the body and the consequent to those satisfying only the body.
  • 4. 4 Association Rules Examples Basket Data Tea ^ Milk ==> Sugar [0.3 , 0.9] Relational Data x.diagnosis = Heart ^ x.sex = Male ==> x.age > 50 [0.4 , 0.7] Object-Oriented Data s.hobbies = { sport , art } ==> s.age() = Young [0.5 , 0.8]
  • 5. 5 Topics of Discussion Formal Statement of the Problem Different Algorithms  AIS  SETM  Apriori  AprioriTid  AprioriHybrid Performance Analysis
  • 6. 6 Formal Statement of the Problem I = { i1 , i2 , … , im } is a set of items D is a set of transactions T Each transaction T is a set of items (subset of I) TID is a unique identifier that is associated with each transaction The problem is to generate all association rules that have support and confidence greater than the user-specified minimum support and minimum confidence
  • 7. 7 Problem Decomposition The problem can be decomposed into two subproblems: 1. Find all sets of items (itemsets) that have support (number of transactions) greater than the minimum support (large itemsets). 2. Use the large itemsets to generate the desired rules. For each large itemset l, find all non-empty subsets, and for each subset a generate a rule a ==> (l-a) if its confidence is greater than the minimum confidence.
  • 8. 8 General Algorithm 1. In the first pass, the support of each individual item is counted, and the large ones are determined 2. In each subsequent pass, the large itemsets determined in the previous pass is used to generate new itemsets called candidate itemsets. 3. The support of each candidate itemset is counted, and the large ones are determined. 4. This process continues until no new large itemsets are found.
  • 9. 9 AIS Algorithm Candidate itemsets are generated and counted on- the-fly as the database is scanned. 1. For each transaction, it is determined which of the large itemsets of the previous pass are contained in this transaction. 2. New candidate itemsets are generated by extending these large itemsets with other items in this transaction. The disadvantage is that this results in unnecessarily generating and counting too many candidate itemsets that turn out to be small.
  • 10. 10 Example TID Items 100 1 3 4 200 2 3 5 300 1 2 3 5 400 2 5 Database Itemset Support {1} 2 {2} 3 {3} 3 {5} 3 L1 Itemset Support {1 3}* 2 {1 4} 1 {3 4} 1 {2 3}* 2 {2 5}* 3 {3 5}* 2 {1 2} 1 {1 5} 1 C2 Itemset Support {1 3 4} 1 {2 3 5}* 2 {1 3 5} 1 C3
  • 11. 11 SETM Algorithm Candidate itemsets are generated on-the-fly as the database is scanned, but counted at the end of the pass. 1. New candidate itemsets are generated the same way as in AIS algorithm, but the TID of the generating transaction is saved with the candidate itemset in a sequential structure. 2. At the end of the pass, the support count of candidate itemsets is determined by aggregating this sequential structure It has the same disadvantage of the AIS algorithm. Another disadvantage is that for each candidate itemset, there are as many entries as its support value.
  • 12. 12 Example TID Items 100 1 3 4 200 2 3 5 300 1 2 3 5 400 2 5 Database Itemset Support {1} 2 {2} 3 {3} 3 {5} 3 L1 Itemset TID {1 3} 100 {1 4} 100 {3 4} 100 {2 3} 200 {2 5} 200 {3 5} 200 {1 2} 300 {1 3} 300 {1 5} 300 {2 3} 300 {2 5} 300 {3 5} 300 {2 5} 400 C2 Itemset TID {1 3 4} 100 {2 3 5} 200 {1 3 5} 300 {2 3 5} 300 C3
  • 13. 13 Apriori Algorithm Candidate itemsets are generated using only the large itemsets of the previous pass without considering the transactions in the database. 1.The large itemset of the previous pass is joined with itself to generate all itemsets whose size is higher by 1. 2.Each generated itemset, that has a subset which is not large, is deleted. The remaining itemsets are the candidate ones.
  • 14. 14 Example TID Items 100 1 3 4 200 2 3 5 300 1 2 3 5 400 2 5 Database Itemset Support {1} 2 {2} 3 {3} 3 {5} 3 L1 Itemset Support {1 2} 1 {1 3}* 2 {1 5} 1 {2 3}* 2 {2 5}* 3 {3 5}* 2 C2 Itemset Support {2 3 5}* 2 C3 {1 2 3} {1 3 5} {2 3 5}
  • 15. 15 AprioriTid Algorithm The database is not used at all for counting the support of candidate itemsets after the first pass. 1. The candidate itemsets are generated the same way as in Apriori algorithm. 2. Another set C’ is generated of which each member has the TID of each transaction and the large itemsets present in this transaction. This set is used to count the support of each candidate itemset. The advantage is that the number of entries in C’ may be smaller than the number of transactions in the database, especially in the later passes.
  • 16. 16 Example TID Items 100 1 3 4 200 2 3 5 300 1 2 3 5 400 2 5 Database Itemset Support {1} 2 {2} 3 {3} 3 {5} 3 L1 Itemset Support {1 2} 1 {1 3}* 2 {1 5} 1 {2 3}* 2 {2 5}* 3 {3 5}* 2 C2 Itemset Support {2 3 5}* 2 C3 100 {1 3} 200 {2 3}, {2 5}, {3 5} 300 {1 2}, {1 3}, {1 5}, {2 3}, {2 5}, {3 5} 400 {2 5} C’2 200 {2 3 5} 300 {2 3 5} C’3
  • 18. 18 AprioriHybrid Algorithm Performance Analysis shows that: 1. Apriori does better than AprioriTid in the earlier passes. 2. AprioriTid does better than Apriori in the later passes. Hence, a hybrid algorithm can be designed that uses Apriori in the initial passes and switches to AprioriTid when it expects that the set C’ will fit in memory.