SlideShare a Scribd company logo
1
Mining Frequent
Patterns
UNIT-
III
2
e,o r
Freq uent pattern refer to sets of item s,subsequenc
substructures that app earfret uently to gether in a
dataset.
U se of frequent pattern m ining: Id entify ing p rod ucts
that are o ften purchased
to gether to o ptim ize invento ry, im pro ve sales strateg
ies and desig n better
prom otion c am paigns.
Apps
Other
sources
ETL Data
Warehous
e
Business
Intelligenc
e
Data
Science
Decision
Making
4
W hat is m arket basket
analysis?
M arket basket analysis is a data m ining tec hnique used by retailers to
inc rease sales by better understand ing c ustom er purc hasing p atterns. It invo
lves analyz ing larg e data sets, suc h as p urchase history , to reveal pro duc t
grouping s, as w ell as pro ducts that are likely to be purchased tog ether.
Marketing analysis uses data mining techniques to understand customer
behavior, preferences, and market trends to improve decision-making in
marketing
strategies.
5
6
• Data Marts: These are smaller, more focused data repositories
derived
from the data warehouse, designed to meet the needs of specific
business departments or functions.
• OLAP (Online Analytical Processing) Tools: OLAP tools allow
users to analyze data in multiple dimensions, providing deeper
insights and
supporting complex analytical queries.
• End-User Access Tools: These are reporting and analysis tools,
such as
dashboards or Business Intelligence (BI) tools, that enable business
7
8
Market Basket Analysis (MBA)A popular data mining technique.Goal: Find
associations between products bought together.Example Rule:If a customer buys
bread and butter, they are likely to buy milk.---5. Applications in MarketingCross-
selling and Up- selling: Recommend related products.Customer Segmentation:
Group customers for targeted advertising.Churn Prediction: Identify customers
likely to leave.Personalized Marketing: Offer deals based on purchase
history.Campaign Management: Evaluate the success of promotional
campaigns.---6. BenefitsBetter understanding of customer needs.Increased sales
and customer loyalty.More effective and efficient marketing
strategies.---
•9im
D
p
a
rove
ta mi
c
n
ro
in
s
g
s-
c
s
o
e
n
l
c
il
e
n
p
g
ts
op
ar
p
e
o
i
r
n
ut
u
n
s
i
e
it e
fo
,s
r
to
Sa
i
l
n
e
c
s
re
a
a
n
s
d
e
m
d
a
rei
rk
c
e
t
t
m
ing
ia l
to
er
p
s
r
p
o
o
v
n
id
s
e
e
b
ar
e
te
tte
.s
r customer service, to
• Customer Retention in the form of pattern identification and prediction of likely defections is
possible by Data mining.
• Risk Assessment and Fraud area also use the data-mining concept for identifying
inappropriate or unusual behavior etc.
Market basket analysis mainly works with the ASSOCIATION RULE {IF} -> {THEN}.
• IF means Antecedent: An antecedent is an item found within the data
• THEN means Consequent: A consequent is an item found in combination with the
antecedent.
10
Support
Sup port is a m easure of how freq uently the item s app ear in the dataset. It
helps to id entify the m o st c om m on item s o r item sets in the dataset.
SUPPORT: It is been calculated with the number of transactions divided by the
total number of transactions made,
11
Confidence
C onfidence is a m easure o f the reliab ility of the inferenc e m ade by a rule.
It quantifies the likeliho od of finding item B in transactions under the co
ndition that the transac tio n already co ntains item A.
CONFIDENCE: It is been calculated for whether the product sales are
popular on individual sales or through combined sales. That is calculated
with combined transactions/individual transactions.
12
Lif
t
Lift m easures the streng th o f a rule o ver the rando m co -occ urrence of the
item set, p rov iding a m etric to und erstand ho w m uch m o re likely item B is to
be b ought
w hen item A is b oug ht co m pared to if B w as b ought indep endently.
LIFT: Lift is calculated for knowing the ratio for the sales
13
Mining techniques are methods used to discover patterns, relationships, and insights from large
datasets. Some common mining techniques include:
*Types of Mining Techniques*
1. *Frequent Pattern Mining*: Discovers frequent patterns and relationships in data.
2. *Association Rule Mining*: Generates rules that describe relationships between items.
3. *Classification*: Predicts the class or category of an item based on its attributes.
4. *Clustering*: Groups similar items together based on their attributes.
5. *Anomaly Detection*: Identifies unusual or outlier data points.
*.
14
*Applications*
1. *Market Basket Analysis*: Identifies products that are frequently purchased together.
2. *Recommendation Systems*: Suggests products based on user behavior.
3. *Customer Segmentation*: Groups customers based on their behavior and attributes.
4. *Fraud Detection*: Identifies unusual patterns in data that may indicate fraud.
*Benefits*
5. *Improved Decision-Making*: Mining techniques help businesses make informed decisions.
6. *Increased Efficiency*: Automated pattern discovery saves time and resources.
7. *Enhanced Customer Insights*: Mining techniques provide valuable insights into customer
behavior.
*Challenges*
8. *Data Quality*: Poor data quality can affect the accuracy of mining results.
9. *Scalability*: Handling large datasets can be computationally expensive.
10.*Interpretability*: Understanding and interpreting mining results can be challenging.
15
---
What is Apriori Algorithm?
It’s used for Association Rule Mining — finding frequent itemsets in a
database and deriving rules (like "If people buy milk, they also buy
bread").
Works on the principle:
"If an itemset is frequent, all its subsets must also be
frequent." (This is called the Apriori Property.)
T--h- e Apriori algorithm is a popular algorithm for mining frequent itemsets and generating association rules. Here's a step-by-
step overview:
*2A- priori Algorithm Steps*
1. *Generate Candidate Itemsets*: Generate all possible itemsets from the dataset.
2. *Calculate Support*: Calculate the support for each candidate itemset.
3.*Prune Itemsets*: Prune itemsets that do not meet the minimum support
threshold. 4--.- *Repeat*: Repeat steps 1-3 until no more frequent itemsets can
be generated.
5. *Generate Association Rules*: Generate association rules from the frequent
itemsets.
16
2-Step Apriori Algorithm Process
Step 1: Find Frequent 1-itemsets and 2-itemsets
Scan the database and count the support (how often it appears) for single
items and pairs of items.
Keep only those items/pairs that meet the minimum support threshold.
Step 2: Generate Association Rules
From the frequent 2-itemsets, generate rules.
Check if the rules meet the minimum confidence thresho
17
Step 1: Find Frequent 1-itemsets and 2-
itemsets
1-itemsets:
A (4 times), B (3 times), C (3 times), D (1 time), E (2
times)
Keep A, B, C, E (because D has support 1 < 2)
2-itemsets:
(A, B): 2 times
(A, C): 3 times
(A, E): 1 time
(B, C): 2 times
(B, E): 2 times
(C, E): 1 time
Keep (A, B), (A, C), (B, C), (B, E)l
Step 2: Generate Rules
(A → C), (C → A), (B → C), (C → B), etc.
Check Confidence for each rule (Confidence = Support(Itemset) /
Support(Antecedent)) Example: Confidence(A → C) = Support(A, C) / Support(A) =
18
19
20
21
22
23
Fundamentals of Data Science
Dr. Chandrajit M, MIT First Grade college
1. Simplicity & ease of implementation
2. The rules are easy to human-readable
3.Works well on unlabelled
data 4.Flexibility &
customisability
5.Extensions for multiple use
cases can be created easily
6. The algorithm is widely
used & studied
Disadvantages of Apriori
algorithm:1.Computational
complexity: Requires many
database scans.
7. Higher memory usage:
Assumes transaction
database is memory
resident.
8. It needs to generate a huge
no. of candidate sets.
9. Limited discovery of
complex patterns
24
Improving the efficiency of Apriori Algorithm;
Here are some of the methods how to improve efficiency of apriori algorithm -
1. Hash-Based Technique: This method uses a hash-based structure called a hash table for generating the k-iternsets
and their corresponding count. It uses a hash function for
2
generating the table
Transaction Reduction: This method reduces the number of transactions scanned in iterations. The transactions which do not
contain frequent items are marked or removed.Partitioning:This method requires only two database scans to mine the frequent
itemsets. It says that for any itemset to be potentially frequent in the database, it should be frequent in at least one of the
partitions of the database.
4. Sampling: This method picks a random sample S from Database D and then searches for frequent itemset in S. It may be posible
to lose a
global frequent itenset. This can be reduced bv lowerins the min sun
5.Dynamic Itemset Counting: This technique can add new candidate itemsets at any marked start point of the database during the
scanning of the database.
25
Frequent Pattern-growth Algorithm
FP-growth is an algorithm for mining frequent patterns that uses a divide-and-conquer approach.FP Growth algoritbm was
developed by Han in 2000. It constructs a tree-like data structure called the frequent pattern (FP) tree, where each node
represents an item in a frequent pattern, and its children represent its immediate sub-patterns. By scanning the dataset only
twice, FP-growth can efficiently mine all frequent itensets without generating candidate itemsets explicitly. It is particularly suitable
for datasets with long patterns and relatively low support thresholds.
26
Working on FP Growth Algorithm
The working of the FP Growth algorithm in data mining can be summarized in the following
steps: Scan the database:
In this step, the algorithm scans the input dataset to determine the frequency of each item.
This determines the order in which
items are added to the FP tree, with the most frequent items
added first Sort items:
In this step, the items in the dataset are sorted in descending
order of frequency. The infrequent items that do not meet the
minimum support threshold are removed from the dataset. This helps to reduce the dataset's size and improve the
algorithm's efficiency.
Construct the FP-tree;
In this step, the FP-ree is constructed. The FP-tree is a compact data structure that stores the frequent itemsets
and their support counts.
27
Generate frequent itemsets:
Once the FP-tree has been constructed, frequent itemsets can be generated by recursively mining the tree. Starting at the
bottom of the tree, the algorithm finds all combinations of frequent item sets that satisfy the minimum support threshold.
Generate association rules:
Once all frequent item sets have been generated, the algorithm post-processes the generated frequent item sets to
generate association rules, which can be used to identify interesting relationshins between the itens in the dataset
28
FP Tree
The FP-tree (Frequent Pattern tree) is a data structure used in the FP Growth algorithm for frequent patterm mining, It represents the
frequent itemsets in the input dataset compactly and efficiently. The FP tree consists of the following components:
Root Node:
The root node of the FP-tree represents an empty set. It has no associated item but a pointer to the first node of each item in
the tree. Item Node:
Each item node in the FP-tree represents a unique item in the dataset. It stores the item name and the frequency count of the
item in the dataset.
Header Table:
The header table lists all the unique items in the dataset, along with their frequency
count. It is used to track each item's location in the FP tree
Child Node
Each child node of an item node represents an item that co-occurs with the item the parent node represents in at least one
transaction in the dataset.
Node Link:
The node-link is a pointer that connects each item in the header table to the first node of that item in the FP-tree. It is used to
traverse the conditional pattern base of each item during the mining process.
29
Construction Steps1. Scan the database once to count
item frequencies.2. Discard infrequent items (below min
support).3. Sort items in each transaction by frequency
(descending).4. Insert transactions into the tree:Shared
prefixes are merged.Frequencies are updated.
30
Working of FP- Growth Algorithm
31
Step 1: Count Item Frequencies
32 Step 2: Sort Items in Each Transaction by Frequency (Descending)
descending order of their respective frequencies. After
insertion of the relevant items, the set L looks like
this:-
L = {K : 5, E : 4, M : 3, O : 4, Y : 3}
33
Step 3: Build the FP-TreeStart with a null root node, and add transactions one
by one
Inserting the set {K, E, M, O, Y}:
34 Inserting the set {K, E, O, Y}:
Till the insertion of the elements K and E, simply the
support count is increased by 1. On inserting O we can
see that there is no direct link between E and O,
therefore a new node for the item O is initialized with
the support count as 1 and item E is linked to this new
node. On inserting Y, we first initialize a new node for
the item Y with support count as 1 and link the new
node of O with the new node of Y.
35
Inserting the set {K, E, M}:
simply the support count of each element is increased by 1.
36
Inserting the set {K, M, Y}:
Similar to step b), first the support count of K is increased, then new nodes for M and
Y are initialized and linked accordingly.
37
Inserting the set {K, E, O}:
Here simply the support counts of the respective elements are increased. Note that
the support count of the new node of item O is increased.
38
Multilevel Association Rule :
Association rules created from mining information at
different degrees of reflection are called various level
or staggered association rules.
Multilevel association rules can be mined effectively
utilizing idea progressions under a help certainty
system.
Rules at a high idea level may add to good judgment
while rules at a low idea level may not be valuable
consistently.
39
Needs of Multidimensional Rule :
•Sometimes at the low data level, data does not show
any significant pattern but there is useful information
hiding behind it.
•The aim is to find the hidden information in or
between levels of abstraction.
40
Multidimensional Association Rules :
In Multi dimensional association rule Qualities can be
absolute or quantitative.
• Quantitative characteristics are numeric and
consolidates order.
• Numeric traits should be discretized.
• Multi dimensional affiliation rule comprises of more
than one measurement.
• Example –buys(X, “IBM Laptop computer”)buys(X,
“HP Inkjet Printer”)
41
Multilevel Association Rules
M ultilevel asso ciation rules invo lve find ing relationship s betw een item s at d ifferent
levels o f abstractio n in a hierarchical struc ture. F or instance, in a retail sc enario,
produc ts c an be o rganized into c atego ries suc h as "Electro nic s" and "H om e
Ap pliances,
" w hich c an be further d ivided into subcateg ories like "M ob ile Phones"
and "Refrig erators."
1. Hierarchy of Items: Item s are organized in a hierarchical m anner. F or
exam ple:
● Level 1: Electronic s, H om e App liances
● Level 2: M o bile P hones, Lapto ps (under Electro nic s), Refrig erators,
W ashing M ac hines (und er H o m e Ap pliances)
● Level 3: S pecific brand s or m odels o f m o bile p hones, lapto ps, etc .
2. Support and Confidence: At d ifferent levels, the supp ort (frequency of
item sets) and co nfid enc e (reliability o f the asso ciation) are c alculated.
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70

More Related Content

PPTX
MIning association rules and frequent patterns.pptx
PPTX
Dma unit 2
PPTX
Association and Classification Algorithm
PDF
DATA MINING-MODULE II NOTES(S4 BCA)_________.pdf
PDF
Data Mining Module 4 Business Analytics.pdf
PPTX
APRIORI ALGORITHM -PPT.pptx
PPTX
Association and Correlation analysis.....
PPTX
Association rules apriori algorithm
MIning association rules and frequent patterns.pptx
Dma unit 2
Association and Classification Algorithm
DATA MINING-MODULE II NOTES(S4 BCA)_________.pdf
Data Mining Module 4 Business Analytics.pdf
APRIORI ALGORITHM -PPT.pptx
Association and Correlation analysis.....
Association rules apriori algorithm

Similar to Fundamental of Data Science BCA 6th Sem Notes (20)

PPTX
big data seminar.pptx
PDF
IRJET- Classification of Pattern Storage System and Analysis of Online Shoppi...
PDF
6 module 4
PPTX
Association Rule Mining in Data Mining.pptx
PPTX
Association Rule Mining
PPTX
Apriori Algorithm.pptx
PPTX
MODULE 5 _ Mining frequent patterns and associations.pptx
PDF
IRJET- Minning Frequent Patterns,Associations and Correlations
PDF
Frequent Pattern Analysis, Apriori and FP Growth Algorithm
PDF
Dm unit ii r16
PPT
UNIT 3.2 -Mining Frquent Patterns (part1).ppt
PPT
20IT501_DWDM_PPT_Unit_III.ppt
PPT
20IT501_DWDM_U3.ppt
PPT
Associations1
PPTX
Association Rule Mining with Apriori Algorithm.pptx
PPTX
Lasso Regression regression amalysis.pptx
PPTX
Association rule introduction, Market basket Analysis
PPT
Associations.ppt
PPTX
Data Mining: Mining ,associations, and correlations
PPTX
Data Mining: Mining ,associations, and correlations
big data seminar.pptx
IRJET- Classification of Pattern Storage System and Analysis of Online Shoppi...
6 module 4
Association Rule Mining in Data Mining.pptx
Association Rule Mining
Apriori Algorithm.pptx
MODULE 5 _ Mining frequent patterns and associations.pptx
IRJET- Minning Frequent Patterns,Associations and Correlations
Frequent Pattern Analysis, Apriori and FP Growth Algorithm
Dm unit ii r16
UNIT 3.2 -Mining Frquent Patterns (part1).ppt
20IT501_DWDM_PPT_Unit_III.ppt
20IT501_DWDM_U3.ppt
Associations1
Association Rule Mining with Apriori Algorithm.pptx
Lasso Regression regression amalysis.pptx
Association rule introduction, Market basket Analysis
Associations.ppt
Data Mining: Mining ,associations, and correlations
Data Mining: Mining ,associations, and correlations
Ad

Recently uploaded (20)

PPTX
The Healthy Child – Unit II | Child Health Nursing I | B.Sc Nursing 5th Semester
PDF
2.FourierTransform-ShortQuestionswithAnswers.pdf
PDF
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
PDF
O5-L3 Freight Transport Ops (International) V1.pdf
PPTX
Renaissance Architecture: A Journey from Faith to Humanism
PPTX
Week 4 Term 3 Study Techniques revisited.pptx
PPTX
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
PDF
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
PPTX
Cell Types and Its function , kingdom of life
PDF
Basic Mud Logging Guide for educational purpose
PDF
Business Ethics Teaching Materials for college
PDF
Classroom Observation Tools for Teachers
PPTX
Introduction to Child Health Nursing – Unit I | Child Health Nursing I | B.Sc...
PPTX
Microbial diseases, their pathogenesis and prophylaxis
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PDF
Insiders guide to clinical Medicine.pdf
PDF
Mark Klimek Lecture Notes_240423 revision books _173037.pdf
PDF
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
PDF
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
PDF
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
The Healthy Child – Unit II | Child Health Nursing I | B.Sc Nursing 5th Semester
2.FourierTransform-ShortQuestionswithAnswers.pdf
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
O5-L3 Freight Transport Ops (International) V1.pdf
Renaissance Architecture: A Journey from Faith to Humanism
Week 4 Term 3 Study Techniques revisited.pptx
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
Cell Types and Its function , kingdom of life
Basic Mud Logging Guide for educational purpose
Business Ethics Teaching Materials for college
Classroom Observation Tools for Teachers
Introduction to Child Health Nursing – Unit I | Child Health Nursing I | B.Sc...
Microbial diseases, their pathogenesis and prophylaxis
Final Presentation General Medicine 03-08-2024.pptx
Insiders guide to clinical Medicine.pdf
Mark Klimek Lecture Notes_240423 revision books _173037.pdf
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
Ad

Fundamental of Data Science BCA 6th Sem Notes

  • 2. 2 e,o r Freq uent pattern refer to sets of item s,subsequenc substructures that app earfret uently to gether in a dataset. U se of frequent pattern m ining: Id entify ing p rod ucts that are o ften purchased to gether to o ptim ize invento ry, im pro ve sales strateg ies and desig n better prom otion c am paigns.
  • 4. 4 W hat is m arket basket analysis? M arket basket analysis is a data m ining tec hnique used by retailers to inc rease sales by better understand ing c ustom er purc hasing p atterns. It invo lves analyz ing larg e data sets, suc h as p urchase history , to reveal pro duc t grouping s, as w ell as pro ducts that are likely to be purchased tog ether. Marketing analysis uses data mining techniques to understand customer behavior, preferences, and market trends to improve decision-making in marketing strategies.
  • 5. 5
  • 6. 6 • Data Marts: These are smaller, more focused data repositories derived from the data warehouse, designed to meet the needs of specific business departments or functions. • OLAP (Online Analytical Processing) Tools: OLAP tools allow users to analyze data in multiple dimensions, providing deeper insights and supporting complex analytical queries. • End-User Access Tools: These are reporting and analysis tools, such as dashboards or Business Intelligence (BI) tools, that enable business
  • 7. 7
  • 8. 8 Market Basket Analysis (MBA)A popular data mining technique.Goal: Find associations between products bought together.Example Rule:If a customer buys bread and butter, they are likely to buy milk.---5. Applications in MarketingCross- selling and Up- selling: Recommend related products.Customer Segmentation: Group customers for targeted advertising.Churn Prediction: Identify customers likely to leave.Personalized Marketing: Offer deals based on purchase history.Campaign Management: Evaluate the success of promotional campaigns.---6. BenefitsBetter understanding of customer needs.Increased sales and customer loyalty.More effective and efficient marketing strategies.---
  • 9. •9im D p a rove ta mi c n ro in s g s- c s o e n l c il e n p g ts op ar p e o i r n ut u n s i e it e fo ,s r to Sa i l n e c s re a a n s d e m d a rei rk c e t t m ing ia l to er p s r p o o v n id s e e b ar e te tte .s r customer service, to • Customer Retention in the form of pattern identification and prediction of likely defections is possible by Data mining. • Risk Assessment and Fraud area also use the data-mining concept for identifying inappropriate or unusual behavior etc. Market basket analysis mainly works with the ASSOCIATION RULE {IF} -> {THEN}. • IF means Antecedent: An antecedent is an item found within the data • THEN means Consequent: A consequent is an item found in combination with the antecedent.
  • 10. 10 Support Sup port is a m easure of how freq uently the item s app ear in the dataset. It helps to id entify the m o st c om m on item s o r item sets in the dataset. SUPPORT: It is been calculated with the number of transactions divided by the total number of transactions made,
  • 11. 11 Confidence C onfidence is a m easure o f the reliab ility of the inferenc e m ade by a rule. It quantifies the likeliho od of finding item B in transactions under the co ndition that the transac tio n already co ntains item A. CONFIDENCE: It is been calculated for whether the product sales are popular on individual sales or through combined sales. That is calculated with combined transactions/individual transactions.
  • 12. 12 Lif t Lift m easures the streng th o f a rule o ver the rando m co -occ urrence of the item set, p rov iding a m etric to und erstand ho w m uch m o re likely item B is to be b ought w hen item A is b oug ht co m pared to if B w as b ought indep endently. LIFT: Lift is calculated for knowing the ratio for the sales
  • 13. 13 Mining techniques are methods used to discover patterns, relationships, and insights from large datasets. Some common mining techniques include: *Types of Mining Techniques* 1. *Frequent Pattern Mining*: Discovers frequent patterns and relationships in data. 2. *Association Rule Mining*: Generates rules that describe relationships between items. 3. *Classification*: Predicts the class or category of an item based on its attributes. 4. *Clustering*: Groups similar items together based on their attributes. 5. *Anomaly Detection*: Identifies unusual or outlier data points. *.
  • 14. 14 *Applications* 1. *Market Basket Analysis*: Identifies products that are frequently purchased together. 2. *Recommendation Systems*: Suggests products based on user behavior. 3. *Customer Segmentation*: Groups customers based on their behavior and attributes. 4. *Fraud Detection*: Identifies unusual patterns in data that may indicate fraud. *Benefits* 5. *Improved Decision-Making*: Mining techniques help businesses make informed decisions. 6. *Increased Efficiency*: Automated pattern discovery saves time and resources. 7. *Enhanced Customer Insights*: Mining techniques provide valuable insights into customer behavior. *Challenges* 8. *Data Quality*: Poor data quality can affect the accuracy of mining results. 9. *Scalability*: Handling large datasets can be computationally expensive. 10.*Interpretability*: Understanding and interpreting mining results can be challenging.
  • 15. 15 --- What is Apriori Algorithm? It’s used for Association Rule Mining — finding frequent itemsets in a database and deriving rules (like "If people buy milk, they also buy bread"). Works on the principle: "If an itemset is frequent, all its subsets must also be frequent." (This is called the Apriori Property.) T--h- e Apriori algorithm is a popular algorithm for mining frequent itemsets and generating association rules. Here's a step-by- step overview: *2A- priori Algorithm Steps* 1. *Generate Candidate Itemsets*: Generate all possible itemsets from the dataset. 2. *Calculate Support*: Calculate the support for each candidate itemset. 3.*Prune Itemsets*: Prune itemsets that do not meet the minimum support threshold. 4--.- *Repeat*: Repeat steps 1-3 until no more frequent itemsets can be generated. 5. *Generate Association Rules*: Generate association rules from the frequent itemsets.
  • 16. 16 2-Step Apriori Algorithm Process Step 1: Find Frequent 1-itemsets and 2-itemsets Scan the database and count the support (how often it appears) for single items and pairs of items. Keep only those items/pairs that meet the minimum support threshold. Step 2: Generate Association Rules From the frequent 2-itemsets, generate rules. Check if the rules meet the minimum confidence thresho
  • 17. 17 Step 1: Find Frequent 1-itemsets and 2- itemsets 1-itemsets: A (4 times), B (3 times), C (3 times), D (1 time), E (2 times) Keep A, B, C, E (because D has support 1 < 2) 2-itemsets: (A, B): 2 times (A, C): 3 times (A, E): 1 time (B, C): 2 times (B, E): 2 times (C, E): 1 time Keep (A, B), (A, C), (B, C), (B, E)l Step 2: Generate Rules (A → C), (C → A), (B → C), (C → B), etc. Check Confidence for each rule (Confidence = Support(Itemset) / Support(Antecedent)) Example: Confidence(A → C) = Support(A, C) / Support(A) =
  • 18. 18
  • 19. 19
  • 20. 20
  • 21. 21
  • 22. 22
  • 23. 23 Fundamentals of Data Science Dr. Chandrajit M, MIT First Grade college 1. Simplicity & ease of implementation 2. The rules are easy to human-readable 3.Works well on unlabelled data 4.Flexibility & customisability 5.Extensions for multiple use cases can be created easily 6. The algorithm is widely used & studied Disadvantages of Apriori algorithm:1.Computational complexity: Requires many database scans. 7. Higher memory usage: Assumes transaction database is memory resident. 8. It needs to generate a huge no. of candidate sets. 9. Limited discovery of complex patterns
  • 24. 24 Improving the efficiency of Apriori Algorithm; Here are some of the methods how to improve efficiency of apriori algorithm - 1. Hash-Based Technique: This method uses a hash-based structure called a hash table for generating the k-iternsets and their corresponding count. It uses a hash function for 2 generating the table Transaction Reduction: This method reduces the number of transactions scanned in iterations. The transactions which do not contain frequent items are marked or removed.Partitioning:This method requires only two database scans to mine the frequent itemsets. It says that for any itemset to be potentially frequent in the database, it should be frequent in at least one of the partitions of the database. 4. Sampling: This method picks a random sample S from Database D and then searches for frequent itemset in S. It may be posible to lose a global frequent itenset. This can be reduced bv lowerins the min sun 5.Dynamic Itemset Counting: This technique can add new candidate itemsets at any marked start point of the database during the scanning of the database.
  • 25. 25 Frequent Pattern-growth Algorithm FP-growth is an algorithm for mining frequent patterns that uses a divide-and-conquer approach.FP Growth algoritbm was developed by Han in 2000. It constructs a tree-like data structure called the frequent pattern (FP) tree, where each node represents an item in a frequent pattern, and its children represent its immediate sub-patterns. By scanning the dataset only twice, FP-growth can efficiently mine all frequent itensets without generating candidate itemsets explicitly. It is particularly suitable for datasets with long patterns and relatively low support thresholds.
  • 26. 26 Working on FP Growth Algorithm The working of the FP Growth algorithm in data mining can be summarized in the following steps: Scan the database: In this step, the algorithm scans the input dataset to determine the frequency of each item. This determines the order in which items are added to the FP tree, with the most frequent items added first Sort items: In this step, the items in the dataset are sorted in descending order of frequency. The infrequent items that do not meet the minimum support threshold are removed from the dataset. This helps to reduce the dataset's size and improve the algorithm's efficiency. Construct the FP-tree; In this step, the FP-ree is constructed. The FP-tree is a compact data structure that stores the frequent itemsets and their support counts.
  • 27. 27 Generate frequent itemsets: Once the FP-tree has been constructed, frequent itemsets can be generated by recursively mining the tree. Starting at the bottom of the tree, the algorithm finds all combinations of frequent item sets that satisfy the minimum support threshold. Generate association rules: Once all frequent item sets have been generated, the algorithm post-processes the generated frequent item sets to generate association rules, which can be used to identify interesting relationshins between the itens in the dataset
  • 28. 28 FP Tree The FP-tree (Frequent Pattern tree) is a data structure used in the FP Growth algorithm for frequent patterm mining, It represents the frequent itemsets in the input dataset compactly and efficiently. The FP tree consists of the following components: Root Node: The root node of the FP-tree represents an empty set. It has no associated item but a pointer to the first node of each item in the tree. Item Node: Each item node in the FP-tree represents a unique item in the dataset. It stores the item name and the frequency count of the item in the dataset. Header Table: The header table lists all the unique items in the dataset, along with their frequency count. It is used to track each item's location in the FP tree Child Node Each child node of an item node represents an item that co-occurs with the item the parent node represents in at least one transaction in the dataset. Node Link: The node-link is a pointer that connects each item in the header table to the first node of that item in the FP-tree. It is used to traverse the conditional pattern base of each item during the mining process.
  • 29. 29 Construction Steps1. Scan the database once to count item frequencies.2. Discard infrequent items (below min support).3. Sort items in each transaction by frequency (descending).4. Insert transactions into the tree:Shared prefixes are merged.Frequencies are updated.
  • 30. 30 Working of FP- Growth Algorithm
  • 31. 31 Step 1: Count Item Frequencies
  • 32. 32 Step 2: Sort Items in Each Transaction by Frequency (Descending) descending order of their respective frequencies. After insertion of the relevant items, the set L looks like this:- L = {K : 5, E : 4, M : 3, O : 4, Y : 3}
  • 33. 33 Step 3: Build the FP-TreeStart with a null root node, and add transactions one by one Inserting the set {K, E, M, O, Y}:
  • 34. 34 Inserting the set {K, E, O, Y}: Till the insertion of the elements K and E, simply the support count is increased by 1. On inserting O we can see that there is no direct link between E and O, therefore a new node for the item O is initialized with the support count as 1 and item E is linked to this new node. On inserting Y, we first initialize a new node for the item Y with support count as 1 and link the new node of O with the new node of Y.
  • 35. 35 Inserting the set {K, E, M}: simply the support count of each element is increased by 1.
  • 36. 36 Inserting the set {K, M, Y}: Similar to step b), first the support count of K is increased, then new nodes for M and Y are initialized and linked accordingly.
  • 37. 37 Inserting the set {K, E, O}: Here simply the support counts of the respective elements are increased. Note that the support count of the new node of item O is increased.
  • 38. 38 Multilevel Association Rule : Association rules created from mining information at different degrees of reflection are called various level or staggered association rules. Multilevel association rules can be mined effectively utilizing idea progressions under a help certainty system. Rules at a high idea level may add to good judgment while rules at a low idea level may not be valuable consistently.
  • 39. 39 Needs of Multidimensional Rule : •Sometimes at the low data level, data does not show any significant pattern but there is useful information hiding behind it. •The aim is to find the hidden information in or between levels of abstraction.
  • 40. 40 Multidimensional Association Rules : In Multi dimensional association rule Qualities can be absolute or quantitative. • Quantitative characteristics are numeric and consolidates order. • Numeric traits should be discretized. • Multi dimensional affiliation rule comprises of more than one measurement. • Example –buys(X, “IBM Laptop computer”)buys(X, “HP Inkjet Printer”)
  • 41. 41 Multilevel Association Rules M ultilevel asso ciation rules invo lve find ing relationship s betw een item s at d ifferent levels o f abstractio n in a hierarchical struc ture. F or instance, in a retail sc enario, produc ts c an be o rganized into c atego ries suc h as "Electro nic s" and "H om e Ap pliances, " w hich c an be further d ivided into subcateg ories like "M ob ile Phones" and "Refrig erators." 1. Hierarchy of Items: Item s are organized in a hierarchical m anner. F or exam ple: ● Level 1: Electronic s, H om e App liances ● Level 2: M o bile P hones, Lapto ps (under Electro nic s), Refrig erators, W ashing M ac hines (und er H o m e Ap pliances) ● Level 3: S pecific brand s or m odels o f m o bile p hones, lapto ps, etc . 2. Support and Confidence: At d ifferent levels, the supp ort (frequency of item sets) and co nfid enc e (reliability o f the asso ciation) are c alculated.
  • 42. 42
  • 43. 43
  • 44. 44
  • 45. 45
  • 46. 46
  • 47. 47
  • 48. 48
  • 49. 49
  • 50. 50
  • 51. 51
  • 52. 52
  • 53. 53
  • 54. 54
  • 55. 55
  • 56. 56
  • 57. 57
  • 58. 58
  • 59. 59
  • 60. 60
  • 61. 61
  • 62. 62
  • 63. 63
  • 64. 64
  • 65. 65
  • 66. 66
  • 67. 67
  • 68. 68
  • 69. 69
  • 70. 70