SlideShare a Scribd company logo
DATA MINING & DATA
WAREHOUSES
S4 BCA – KU – MODULE II NOTES ( PREPARED BY VINEETH P )
CHRIST NAGAR COLLEGE , MARANALLOOR
1
SYLLABUS - MODULE II
PREPARED BY VINEETH P 2
What is Market Basket Analysis
Market basket analysis is a data mining technique used by retailers to increase sales by better
understanding customer purchasing patterns. It involves analysing large data sets, such as
purchase history, to reveal product groupings, as well as products that are likely to be purchased
together.
The adoption of market basket analysis was aided by the advent of electronic point-of-sale (POS)
systems. Compared to handwritten records kept by store owners, the digital records generated by
POS systems made it easier for applications to process and analyse large volumes of purchase
data.
PREPARED BY VINEETH P 3
TYPES OF MARKET BASKET ANALYSIS
Retailers should understand the following types of market basket analysis:
•Predictive market basket analysis. This type considers items purchased in sequence to
determine cross-sell.
•Differential market basket analysis. This type considers data across different stores, as well as
purchases from different customer groups during different times of the day, month or year. If a
rule holds in one dimension, such as store, time period or customer group, but does not hold in
the others, analysts can determine the factors responsible for the exception. These insights can
lead to new product offers that drive higher sales.
PREPARED BY VINEETH P 4
ALGORITHM FOR MARKET BASKET
ANALYSIS
In market basket analysis, association rules are used to predict the likelihood of products being
purchased together. Association rules count the frequency of items that occur together,
seeking to find associations that occur far more often than expected.
Algorithms that use association rules include AIS, SETM and Apriori. The Apriori algorithm is
commonly cited by data scientists in research articles about market basket analysis and is used
to identify frequent items in the database, then evaluate their frequency as the datasets are
expanded to larger sizes.
PREPARED BY VINEETH P 5
Well Known Example
Amazon's website uses a well-known example of market basket analysis. On a
product page, Amazon presents users with related products, under the headings
of "Frequently bought together" and "Customers who bought this item also
bought."
PREPARED BY VINEETH P 6
Benefits of Market-Basket Analysis
Market basket analysis can increase sales and customer satisfaction. Using data
to determine that products are often purchased together, retailers can optimize
product placement, offer special deals and create new product bundles to
encourage further sales of these combinations.
These improvements can generate additional sales for the retailer, while making
the shopping experience more productive and valuable for customers. By using
market basket analysis, customers may feel a stronger sentiment or brand
loyalty toward the company.
PREPARED BY VINEETH P 7
APRIORI ALGORITHM - INTRODUCTION
R. Agarwal and R. Srikantto are the creators of the Apriori algorithm. They created it in 1994 by
identifying the most frequent themes through Boolean association rules. The algorithm has
found great use in performing Market Basket Analysis, allowing businesses to sell their products
more effectively.
The use of this algorithm is not just for market basket analysis. Various fields, like healthcare,
education, etc, also use it. Its widespread use is primarily due to its simple yet effective
implementation, as it utilizes the knowledge of previous common itemset features. The Apriori
algorithm greatly helps to increase the effectiveness of level-wise production of frequent item-
sets.
PREPARED BY VINEETH P 8
Terms in Apriori Algorithm
•Itemset
Item-set refers to a set of items combined. We can refer to an item as a k-itemset because
it has a k number of unique items. Typically, an itemset contains at least two items.
•Frequent Itemset
The next important concept is frequent itemset. A frequent itemset refers to an itemset
that occurs most frequently. For example, a frequent itemset can be of {bread, butter},
{chips, cold drink}, {laptop, antivirus software} etc.
PREPARED BY VINEETH P 9
Terms in Apriori Algorithm
Support is a metric that indicates transactions with products or items purchased together
(in a single transaction). Confidence indicates those transactions where the
products/items are purchased one after the other.
Support(X) means – how many times item X got purchased in the list of total no of
transactions
Support(X ^ Y) means - How many times the items X and Y together purchased in the list
of total transactions
PREPARED BY VINEETH P 10
Process of extracting frequent item-sets
Mining frequent item-sets is the process of identifying them, and this involves
using specific thresholds for Support and Confidence to define the frequent
item-sets. The issue, however, is finding the correct threshold values for these
metrics.
Normally the threshold value minimum support ( call min_sup ) will be given in
question itself.
PREPARED BY VINEETH P 11
To further explain the Apriori Algorithm, we need to understand Association
Rule Mining. The Apriori algorithm works by finding relationships among
numerous items in a dataset. The method known as association rule mining
makes this discovery.
For example, in a supermarket, a pattern emerges where people buy certain
items together. Let’s assume that individuals might buy cold drinks and chips
together to make the example more concrete. Similarly, it’s also found that
customers also put notebooks and pens together in a purchase.
PREPARED BY VINEETH P 12
Through association rule mining, you, as a supermarket owner, can leverage
identified relationships to boost sales. Strategies like packaging associated
products together, placing them in close proximity, offering group discounts, and
optimizing inventory management can lead to increased profits.
PREPARED BY VINEETH P 13
Support of an Item
Support indicates an item’s popularity, calculated by counting the transactions
where that particular item was present. For item ‘Z,’ its Support would be the
number of times the item was purchased, as the transaction data indicates.
Sometimes, this count is divided by the total number of transactions to make
the number easily representable. Let’s understand Support with an example.
Suppose there is transaction data for a day having 1,000 transactions.
PREPARED BY VINEETH P 14
The items you are interested in are apples, oranges, and apples+oranges (a
combination item). Now, you count the transactions where these items were
bought and find that the count for apples, oranges, and apples+oranges is 200,
150, and 100.
The formula for Support is-
Support (Z) = Transactions containing item Z / Total transactions
PREPARED BY VINEETH P 15
PREPARED BY VINEETH P 16
In the Apriori algorithm, such a metric is used to calculate the “support” for
different items and item-sets to establish that the frequency of the item-sets is
enough to be considered for generating candidate item-sets for the next iteration.
Here, the support threshold plays a crucial role as it’s used to define items/item-
sets that are not frequent enough.
PREPARED BY VINEETH P 17
Confidence of a rule
This key metric is used in the Apriori algorithm to indicate the probability of an
item ‘Y’ being purchased if a customer has bought an item ‘Z’. If you notice,
here, conditional probability is getting calculated, i.e., in this case, it’s the
conditional probability that item Z appears in a transaction, given that another
item Y appears in the same transaction. Therefore, the formula for calculating
Confidence is
PREPARED BY VINEETH P 18
Confidence of a rule
P(Z|Y) = P(Y and Z) / P(Y)
It can also be written as
Support(Y ∪ Z) / Support(Y)
Confidence is typically denoted by
(Y → Z)
Ex:
Confidence (Apples → Oranges) = 100
/ 200
Confidence (Apples → Oranges) = 0.5
[ Meaning when apples are purchased ,
there is 50% of chance that customer
also buy oranges ]
PREPARED BY VINEETH P 19
Lift to determine strength of a rule
Lift denotes the strength of an association rule. Suppose you need to calculate
the Lift(Y → Z); then you can do so by dividing Confidence(Y → Z) by Support(Z),
i.e.,
Lift(Y -> Z) = Confidence(Y -> Z) / Support(Z)
Another way of calculating Lift is by considering Support of (Y, Z) and dividing by
Support(Y)*Support(Z), i.e., it’s the ratio of Support of two items occurring together
to the Support of the individual items multiplied together.
PREPARED BY VINEETH P 20
Lift to determine strength of a rule
In the above example, the Lift for Apples 🡪 Oranges would be the following-
Lift(Apple -> Orange) = Confidence(Apple -> Orange) / Support(Orange)
Lift(Apple -> Orange) = 0.5 / 0.15
Lift(Apple -> Orange) = 33.33
PREPARED BY VINEETH P 21
Interpreting Lift Value
❖A Lift value of 1 generally indicates randomness, suggesting independent
items, and the association rule can be disregarded.
❖A value above 1 signifies a positive association, indicating that two items will
likely be purchased together.
❖Conversely, a value below 1 indicates a negative association, suggesting that
the two items are more likely to be purchased separately.
PREPARED BY VINEETH P 22
Overall Steps of Apriori Algorithm
PREPARED BY VINEETH P 23
Steps in Apriori Algorithm
1. Start
2. Define the minimum threshold
3. Create a list of frequent items
4. Create candidate item-sets
5. Calculate the support of each candidate
6. Prune the candidate item-sets
7. Repeats the above steps until a single item-set remains in candidate set ( iteration )
8. Generate association rules
9. Evaluate association rules
10. Stop
PREPARED BY VINEETH P 24
Steps in Apriori Algorithm
PREPARED BY VINEETH P 25
Example Problem
Consider the following dataset and we will find frequent itemsets and generate association rules
for them.
minimum support count is 2
minimum confidence is 60%
PREPARED BY VINEETH P 26
Example Problem
Step-1: K=1
(I) Create a table containing support count of each
item present in dataset – Called C1(candidate set)
(II) compare candidate set item’s support count with
minimum support count(here min_support=2 if
support_count of candidate set items is less than
min_support then remove those items). This gives us
itemset L1
PREPARED BY VINEETH P 27
Example Problem
Step-2: K=2
•Generate candidate set C2 using L1 (this is
called join step). Condition of joining Lk-1 and Lk-
1 is that it should have (K-2) elements in
common.
•Check all subsets of an itemset are frequent or
not and if not frequent remove that
itemset.(Example subset of{I1, I2} are {I1}, {I2}
they are frequent.Check for each itemset)
•Now find support count of these itemsets by
searching in dataset.
PREPARED BY VINEETH P 28
Example Problem
(II) compare candidate (C2) support count with minimum support count(here
min_support=2 if support_count of candidate set item is less than min_support
then remove those items) this gives us itemset L2.
PREPARED BY VINEETH P 29
Example Problem
Step-3:
•Generate candidate set C3 using L2 (join step).
Condition of joining Lk-1 and Lk-1 is that it should
have (K-2) elements in common. So here, for
L2, first element should match.
So itemset generated by joining L2 is {I1, I2,
I3}{I1, I2, I5}{I1, I3, i5}{I2, I3, I4}{I2, I4, I5}{I2, I3,
I5}
•Check if all subsets of these itemsets are
frequent or not and if not, then remove that
itemset.(Here subset of {I1, I2, I3} are {I1,
I2},{I2, I3},{I1, I3} which are frequent. For {I2, I3,
I4}, subset {I3, I4} is not frequent so remove it.
Similarly check for every itemset)
•find support count of these remaining itemset
by searching in dataset.
PREPARED BY VINEETH P 30
Example Problem
(II) Compare candidate (C3) support count with minimum support
count(here min_support=2 if support_count of candidate set item is
less than min_support then remove those items) this gives us
itemset L3.
PREPARED BY VINEETH P 31
Example Problem
Step-4:
•Generate candidate set C4 using L3 (join step).
Condition of joining Lk-1 and Lk-1 (K=4) is that, they
should have (K-2) elements in common. So here, for
L3, first 2 elements (items) should match.
•Check all subsets of these itemsets are frequent or
not (Here itemset formed by joining L3 is {I1, I2, I3,
I5} so its subset contains {I1, I3, I5}, which is not
frequent). So no itemset in C4
•We stop here because no frequent itemsets are
found further
Thus, we have discovered all the frequent item-
sets. Now generation of strong association rule
comes into picture. For that we need to calculate
confidence of each rule.
Confidence –
A confidence of 60% means that 60% of the
customers, who purchased milk and bread also
bought butter.
PREPARED BY VINEETH P 32
Example Problem
PREPARED BY VINEETH P 33
Limitations of Apriori Algorithm
•Computational complexity.
•Time & space overhead.
•Difficulty handling sparse data.
•Limited discovery of complex patterns.
•Higher memory usage.
•Bias of minimum support threshold.
•Inability to handle numeric data.
•Lack of incorporation of context.
PREPARED BY VINEETH P 34
Ways to Improve the efficiency of Apriori
Algorithm
There are some variations of the Apriori algorithm that have been projected that target
developing the efficiency of the original algorithm which are as follows −
The hash-based technique (hashing itemsets into corresponding buckets) − A hash-
based technique can be used to decrease the size of the candidate k-itemsets, Ck, for k
> 1. For instance, when scanning each transaction in the database to create the
frequent 1-itemsets,L1, from the candidate 1-itemsets in C1, it can make some 2-
itemsets for each transaction, hash (i.e., map) them into the several buckets of a hash
table structure, and increase the equivalent bucket counts.
PREPARED BY VINEETH P 35
Ways to Improve the efficiency of Apriori
Algorithm
Transaction reduction − A transaction that does not include some frequent k-itemsets cannot
include some frequent (k + 1)-itemsets. Thus, such a transaction can be marked or deleted
from further consideration because subsequent scans of the database for j-itemsets, where j >
k, will not need it.
Partitioning − A partitioning technique can be used that needed two database scans to mine
the frequent itemsets. It includes two phases involving In Phase I, the algorithm subdivides the
transactions of D into n non-overlapping partitions. If the minimum support threshold for
transactions in D is min_sup, therefore the minimum support count for a partition is min_sup ×
the number of transactions in that partition.
PREPARED BY VINEETH P 36
Ways to Improve the efficiency of Apriori
Algorithm
For each partition, all frequent itemsets within the partition are discovered. These
are defined as local frequent itemsets. The process employs a specific data
structure that, for each itemset, records the TIDs of the transactions including the
items in the itemset. This enables it to find all of the local frequent k-itemsets, for k
= 1, 2... in only one scan of the database.
PREPARED BY VINEETH P 37
Ways to Improve the efficiency of Apriori
Algorithm
A local frequent itemset can or cannot be frequently related to the whole database, D. Any
itemset that is possibly frequent related D must appear as a frequent itemset is partially
one of the partitions. Thus, all local frequent itemsets are candidate itemsets slightly D.
The set of frequent itemsets from all partitions forms the worldwise candidate itemsets for
D. In Phase II, the second scan of D is organized in which the actual support of each
candidate is assessed to decide the global frequent itemsets.
PREPARED BY VINEETH P 38
Ways to Improve the efficiency of Apriori
Algorithm
Sampling − The fundamental idea of the sampling approach is to select a random
sample S of the given data D, and then search for frequent itemsets in S rather than D.
In this method, it can trade off some degree of accuracy against efficiency. The sample
size of S is such that the search for frequent itemsets in S can be completed in main
memory, and therefore only one scan of the transactions in S is needed overall.
PREPARED BY VINEETH P 39
References
Apriori Algorithm In Data Mining : Methods, Examples, and More (analytixlabs.co.in)
https://www.geeksforgeeks.org/apriori-algorithm/
PREPARED BY VINEETH P 40

More Related Content

PPTX
Business intelligence
PDF
6. Association Rule.pdf
PPTX
1.pptx .
PDF
Data Science - Part VI - Market Basket and Product Recommendation Engines
PDF
Market Basket Analysis of bakery Shop
PPTX
Association rule mining and Apriori algorithm
PPTX
WEEK 11 - Association Mining_020520.pptx
PDF
ML_Unit_V_RDC_ASSOCIATION AND DIMENSIONALITY REDUCTION.pdf
Business intelligence
6. Association Rule.pdf
1.pptx .
Data Science - Part VI - Market Basket and Product Recommendation Engines
Market Basket Analysis of bakery Shop
Association rule mining and Apriori algorithm
WEEK 11 - Association Mining_020520.pptx
ML_Unit_V_RDC_ASSOCIATION AND DIMENSIONALITY REDUCTION.pdf

Similar to DATA MINING-MODULE II NOTES(S4 BCA)_________.pdf (20)

PPTX
Unit 4_ML.pptx
PPTX
1. Introduction to Association Rule 2. Frequent Item Set Mining 3. Market Bas...
PDF
Association Mining
PPTX
Data SAcience with r progarmming Unit - V Part-1.pptx
PDF
What goes with what (Market Basket Analysis)
PPTX
breakthroughinpredictiveanalyticsforretailers-150728062328-lva1-app6891
PPTX
Breakthrough in Predictive Analytics for retailers: How our recommendation en...
PDF
Assortment Planning And Upselling
PDF
Inventory and Cost of Goods Sold Frauds (6).pdf
DOCX
5Association AnalysisBasic Concepts an.docx
PDF
Retail terminologies
PPTX
Behindthe scenesofanalytics
PPTX
PPT
MarketBasket(BahanAR-2)gfhjghhhbjbjbn.ppt
PDF
Basic procure-to-pay
PPTX
Chapter 6: Moving from Traditional to Digital 4.0
PDF
Performance Inventory Solution San Francisco
PPTX
Trade smart case studies
PPTX
Trade smart case studies
PPTX
Association and Classification Algorithm
Unit 4_ML.pptx
1. Introduction to Association Rule 2. Frequent Item Set Mining 3. Market Bas...
Association Mining
Data SAcience with r progarmming Unit - V Part-1.pptx
What goes with what (Market Basket Analysis)
breakthroughinpredictiveanalyticsforretailers-150728062328-lva1-app6891
Breakthrough in Predictive Analytics for retailers: How our recommendation en...
Assortment Planning And Upselling
Inventory and Cost of Goods Sold Frauds (6).pdf
5Association AnalysisBasic Concepts an.docx
Retail terminologies
Behindthe scenesofanalytics
MarketBasket(BahanAR-2)gfhjghhhbjbjbn.ppt
Basic procure-to-pay
Chapter 6: Moving from Traditional to Digital 4.0
Performance Inventory Solution San Francisco
Trade smart case studies
Trade smart case studies
Association and Classification Algorithm
Ad

Recently uploaded (20)

PPTX
IB Computer Science - Internal Assessment.pptx
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PDF
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
PDF
Clinical guidelines as a resource for EBP(1).pdf
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PDF
Business Analytics and business intelligence.pdf
PPTX
Computer network topology notes for revision
PPTX
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
PPTX
Database Infoormation System (DBIS).pptx
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PDF
Foundation of Data Science unit number two notes
PDF
annual-report-2024-2025 original latest.
PPTX
climate analysis of Dhaka ,Banglades.pptx
PDF
Fluorescence-microscope_Botany_detailed content
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PPTX
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
PPTX
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
IB Computer Science - Internal Assessment.pptx
Introduction-to-Cloud-ComputingFinal.pptx
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
Clinical guidelines as a resource for EBP(1).pdf
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
Business Analytics and business intelligence.pdf
Computer network topology notes for revision
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
Database Infoormation System (DBIS).pptx
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
oil_refinery_comprehensive_20250804084928 (1).pptx
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
Foundation of Data Science unit number two notes
annual-report-2024-2025 original latest.
climate analysis of Dhaka ,Banglades.pptx
Fluorescence-microscope_Botany_detailed content
Data_Analytics_and_PowerBI_Presentation.pptx
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
Ad

DATA MINING-MODULE II NOTES(S4 BCA)_________.pdf

  • 1. DATA MINING & DATA WAREHOUSES S4 BCA – KU – MODULE II NOTES ( PREPARED BY VINEETH P ) CHRIST NAGAR COLLEGE , MARANALLOOR 1
  • 2. SYLLABUS - MODULE II PREPARED BY VINEETH P 2
  • 3. What is Market Basket Analysis Market basket analysis is a data mining technique used by retailers to increase sales by better understanding customer purchasing patterns. It involves analysing large data sets, such as purchase history, to reveal product groupings, as well as products that are likely to be purchased together. The adoption of market basket analysis was aided by the advent of electronic point-of-sale (POS) systems. Compared to handwritten records kept by store owners, the digital records generated by POS systems made it easier for applications to process and analyse large volumes of purchase data. PREPARED BY VINEETH P 3
  • 4. TYPES OF MARKET BASKET ANALYSIS Retailers should understand the following types of market basket analysis: •Predictive market basket analysis. This type considers items purchased in sequence to determine cross-sell. •Differential market basket analysis. This type considers data across different stores, as well as purchases from different customer groups during different times of the day, month or year. If a rule holds in one dimension, such as store, time period or customer group, but does not hold in the others, analysts can determine the factors responsible for the exception. These insights can lead to new product offers that drive higher sales. PREPARED BY VINEETH P 4
  • 5. ALGORITHM FOR MARKET BASKET ANALYSIS In market basket analysis, association rules are used to predict the likelihood of products being purchased together. Association rules count the frequency of items that occur together, seeking to find associations that occur far more often than expected. Algorithms that use association rules include AIS, SETM and Apriori. The Apriori algorithm is commonly cited by data scientists in research articles about market basket analysis and is used to identify frequent items in the database, then evaluate their frequency as the datasets are expanded to larger sizes. PREPARED BY VINEETH P 5
  • 6. Well Known Example Amazon's website uses a well-known example of market basket analysis. On a product page, Amazon presents users with related products, under the headings of "Frequently bought together" and "Customers who bought this item also bought." PREPARED BY VINEETH P 6
  • 7. Benefits of Market-Basket Analysis Market basket analysis can increase sales and customer satisfaction. Using data to determine that products are often purchased together, retailers can optimize product placement, offer special deals and create new product bundles to encourage further sales of these combinations. These improvements can generate additional sales for the retailer, while making the shopping experience more productive and valuable for customers. By using market basket analysis, customers may feel a stronger sentiment or brand loyalty toward the company. PREPARED BY VINEETH P 7
  • 8. APRIORI ALGORITHM - INTRODUCTION R. Agarwal and R. Srikantto are the creators of the Apriori algorithm. They created it in 1994 by identifying the most frequent themes through Boolean association rules. The algorithm has found great use in performing Market Basket Analysis, allowing businesses to sell their products more effectively. The use of this algorithm is not just for market basket analysis. Various fields, like healthcare, education, etc, also use it. Its widespread use is primarily due to its simple yet effective implementation, as it utilizes the knowledge of previous common itemset features. The Apriori algorithm greatly helps to increase the effectiveness of level-wise production of frequent item- sets. PREPARED BY VINEETH P 8
  • 9. Terms in Apriori Algorithm •Itemset Item-set refers to a set of items combined. We can refer to an item as a k-itemset because it has a k number of unique items. Typically, an itemset contains at least two items. •Frequent Itemset The next important concept is frequent itemset. A frequent itemset refers to an itemset that occurs most frequently. For example, a frequent itemset can be of {bread, butter}, {chips, cold drink}, {laptop, antivirus software} etc. PREPARED BY VINEETH P 9
  • 10. Terms in Apriori Algorithm Support is a metric that indicates transactions with products or items purchased together (in a single transaction). Confidence indicates those transactions where the products/items are purchased one after the other. Support(X) means – how many times item X got purchased in the list of total no of transactions Support(X ^ Y) means - How many times the items X and Y together purchased in the list of total transactions PREPARED BY VINEETH P 10
  • 11. Process of extracting frequent item-sets Mining frequent item-sets is the process of identifying them, and this involves using specific thresholds for Support and Confidence to define the frequent item-sets. The issue, however, is finding the correct threshold values for these metrics. Normally the threshold value minimum support ( call min_sup ) will be given in question itself. PREPARED BY VINEETH P 11
  • 12. To further explain the Apriori Algorithm, we need to understand Association Rule Mining. The Apriori algorithm works by finding relationships among numerous items in a dataset. The method known as association rule mining makes this discovery. For example, in a supermarket, a pattern emerges where people buy certain items together. Let’s assume that individuals might buy cold drinks and chips together to make the example more concrete. Similarly, it’s also found that customers also put notebooks and pens together in a purchase. PREPARED BY VINEETH P 12
  • 13. Through association rule mining, you, as a supermarket owner, can leverage identified relationships to boost sales. Strategies like packaging associated products together, placing them in close proximity, offering group discounts, and optimizing inventory management can lead to increased profits. PREPARED BY VINEETH P 13
  • 14. Support of an Item Support indicates an item’s popularity, calculated by counting the transactions where that particular item was present. For item ‘Z,’ its Support would be the number of times the item was purchased, as the transaction data indicates. Sometimes, this count is divided by the total number of transactions to make the number easily representable. Let’s understand Support with an example. Suppose there is transaction data for a day having 1,000 transactions. PREPARED BY VINEETH P 14
  • 15. The items you are interested in are apples, oranges, and apples+oranges (a combination item). Now, you count the transactions where these items were bought and find that the count for apples, oranges, and apples+oranges is 200, 150, and 100. The formula for Support is- Support (Z) = Transactions containing item Z / Total transactions PREPARED BY VINEETH P 15
  • 17. In the Apriori algorithm, such a metric is used to calculate the “support” for different items and item-sets to establish that the frequency of the item-sets is enough to be considered for generating candidate item-sets for the next iteration. Here, the support threshold plays a crucial role as it’s used to define items/item- sets that are not frequent enough. PREPARED BY VINEETH P 17
  • 18. Confidence of a rule This key metric is used in the Apriori algorithm to indicate the probability of an item ‘Y’ being purchased if a customer has bought an item ‘Z’. If you notice, here, conditional probability is getting calculated, i.e., in this case, it’s the conditional probability that item Z appears in a transaction, given that another item Y appears in the same transaction. Therefore, the formula for calculating Confidence is PREPARED BY VINEETH P 18
  • 19. Confidence of a rule P(Z|Y) = P(Y and Z) / P(Y) It can also be written as Support(Y ∪ Z) / Support(Y) Confidence is typically denoted by (Y → Z) Ex: Confidence (Apples → Oranges) = 100 / 200 Confidence (Apples → Oranges) = 0.5 [ Meaning when apples are purchased , there is 50% of chance that customer also buy oranges ] PREPARED BY VINEETH P 19
  • 20. Lift to determine strength of a rule Lift denotes the strength of an association rule. Suppose you need to calculate the Lift(Y → Z); then you can do so by dividing Confidence(Y → Z) by Support(Z), i.e., Lift(Y -> Z) = Confidence(Y -> Z) / Support(Z) Another way of calculating Lift is by considering Support of (Y, Z) and dividing by Support(Y)*Support(Z), i.e., it’s the ratio of Support of two items occurring together to the Support of the individual items multiplied together. PREPARED BY VINEETH P 20
  • 21. Lift to determine strength of a rule In the above example, the Lift for Apples 🡪 Oranges would be the following- Lift(Apple -> Orange) = Confidence(Apple -> Orange) / Support(Orange) Lift(Apple -> Orange) = 0.5 / 0.15 Lift(Apple -> Orange) = 33.33 PREPARED BY VINEETH P 21
  • 22. Interpreting Lift Value ❖A Lift value of 1 generally indicates randomness, suggesting independent items, and the association rule can be disregarded. ❖A value above 1 signifies a positive association, indicating that two items will likely be purchased together. ❖Conversely, a value below 1 indicates a negative association, suggesting that the two items are more likely to be purchased separately. PREPARED BY VINEETH P 22
  • 23. Overall Steps of Apriori Algorithm PREPARED BY VINEETH P 23
  • 24. Steps in Apriori Algorithm 1. Start 2. Define the minimum threshold 3. Create a list of frequent items 4. Create candidate item-sets 5. Calculate the support of each candidate 6. Prune the candidate item-sets 7. Repeats the above steps until a single item-set remains in candidate set ( iteration ) 8. Generate association rules 9. Evaluate association rules 10. Stop PREPARED BY VINEETH P 24
  • 25. Steps in Apriori Algorithm PREPARED BY VINEETH P 25
  • 26. Example Problem Consider the following dataset and we will find frequent itemsets and generate association rules for them. minimum support count is 2 minimum confidence is 60% PREPARED BY VINEETH P 26
  • 27. Example Problem Step-1: K=1 (I) Create a table containing support count of each item present in dataset – Called C1(candidate set) (II) compare candidate set item’s support count with minimum support count(here min_support=2 if support_count of candidate set items is less than min_support then remove those items). This gives us itemset L1 PREPARED BY VINEETH P 27
  • 28. Example Problem Step-2: K=2 •Generate candidate set C2 using L1 (this is called join step). Condition of joining Lk-1 and Lk- 1 is that it should have (K-2) elements in common. •Check all subsets of an itemset are frequent or not and if not frequent remove that itemset.(Example subset of{I1, I2} are {I1}, {I2} they are frequent.Check for each itemset) •Now find support count of these itemsets by searching in dataset. PREPARED BY VINEETH P 28
  • 29. Example Problem (II) compare candidate (C2) support count with minimum support count(here min_support=2 if support_count of candidate set item is less than min_support then remove those items) this gives us itemset L2. PREPARED BY VINEETH P 29
  • 30. Example Problem Step-3: •Generate candidate set C3 using L2 (join step). Condition of joining Lk-1 and Lk-1 is that it should have (K-2) elements in common. So here, for L2, first element should match. So itemset generated by joining L2 is {I1, I2, I3}{I1, I2, I5}{I1, I3, i5}{I2, I3, I4}{I2, I4, I5}{I2, I3, I5} •Check if all subsets of these itemsets are frequent or not and if not, then remove that itemset.(Here subset of {I1, I2, I3} are {I1, I2},{I2, I3},{I1, I3} which are frequent. For {I2, I3, I4}, subset {I3, I4} is not frequent so remove it. Similarly check for every itemset) •find support count of these remaining itemset by searching in dataset. PREPARED BY VINEETH P 30
  • 31. Example Problem (II) Compare candidate (C3) support count with minimum support count(here min_support=2 if support_count of candidate set item is less than min_support then remove those items) this gives us itemset L3. PREPARED BY VINEETH P 31
  • 32. Example Problem Step-4: •Generate candidate set C4 using L3 (join step). Condition of joining Lk-1 and Lk-1 (K=4) is that, they should have (K-2) elements in common. So here, for L3, first 2 elements (items) should match. •Check all subsets of these itemsets are frequent or not (Here itemset formed by joining L3 is {I1, I2, I3, I5} so its subset contains {I1, I3, I5}, which is not frequent). So no itemset in C4 •We stop here because no frequent itemsets are found further Thus, we have discovered all the frequent item- sets. Now generation of strong association rule comes into picture. For that we need to calculate confidence of each rule. Confidence – A confidence of 60% means that 60% of the customers, who purchased milk and bread also bought butter. PREPARED BY VINEETH P 32
  • 34. Limitations of Apriori Algorithm •Computational complexity. •Time & space overhead. •Difficulty handling sparse data. •Limited discovery of complex patterns. •Higher memory usage. •Bias of minimum support threshold. •Inability to handle numeric data. •Lack of incorporation of context. PREPARED BY VINEETH P 34
  • 35. Ways to Improve the efficiency of Apriori Algorithm There are some variations of the Apriori algorithm that have been projected that target developing the efficiency of the original algorithm which are as follows − The hash-based technique (hashing itemsets into corresponding buckets) − A hash- based technique can be used to decrease the size of the candidate k-itemsets, Ck, for k > 1. For instance, when scanning each transaction in the database to create the frequent 1-itemsets,L1, from the candidate 1-itemsets in C1, it can make some 2- itemsets for each transaction, hash (i.e., map) them into the several buckets of a hash table structure, and increase the equivalent bucket counts. PREPARED BY VINEETH P 35
  • 36. Ways to Improve the efficiency of Apriori Algorithm Transaction reduction − A transaction that does not include some frequent k-itemsets cannot include some frequent (k + 1)-itemsets. Thus, such a transaction can be marked or deleted from further consideration because subsequent scans of the database for j-itemsets, where j > k, will not need it. Partitioning − A partitioning technique can be used that needed two database scans to mine the frequent itemsets. It includes two phases involving In Phase I, the algorithm subdivides the transactions of D into n non-overlapping partitions. If the minimum support threshold for transactions in D is min_sup, therefore the minimum support count for a partition is min_sup × the number of transactions in that partition. PREPARED BY VINEETH P 36
  • 37. Ways to Improve the efficiency of Apriori Algorithm For each partition, all frequent itemsets within the partition are discovered. These are defined as local frequent itemsets. The process employs a specific data structure that, for each itemset, records the TIDs of the transactions including the items in the itemset. This enables it to find all of the local frequent k-itemsets, for k = 1, 2... in only one scan of the database. PREPARED BY VINEETH P 37
  • 38. Ways to Improve the efficiency of Apriori Algorithm A local frequent itemset can or cannot be frequently related to the whole database, D. Any itemset that is possibly frequent related D must appear as a frequent itemset is partially one of the partitions. Thus, all local frequent itemsets are candidate itemsets slightly D. The set of frequent itemsets from all partitions forms the worldwise candidate itemsets for D. In Phase II, the second scan of D is organized in which the actual support of each candidate is assessed to decide the global frequent itemsets. PREPARED BY VINEETH P 38
  • 39. Ways to Improve the efficiency of Apriori Algorithm Sampling − The fundamental idea of the sampling approach is to select a random sample S of the given data D, and then search for frequent itemsets in S rather than D. In this method, it can trade off some degree of accuracy against efficiency. The sample size of S is such that the search for frequent itemsets in S can be completed in main memory, and therefore only one scan of the transactions in S is needed overall. PREPARED BY VINEETH P 39
  • 40. References Apriori Algorithm In Data Mining : Methods, Examples, and More (analytixlabs.co.in) https://www.geeksforgeeks.org/apriori-algorithm/ PREPARED BY VINEETH P 40