Data Mining Module 4 Business Analytics.pdf

Copyright © 2024 Jayanti Rajdevendra Pande. All rights reserved.
RASHTRASANT TUKDOJI MAHARAJ NAGPUR UNIVERSITY
MBA
SEMESTER: 3
SPECIALIZATION
BUSINESS ANALYTICS (BA 2)
SUBJECT
DATA MINING
MODULE NO : 4
ASSOCIATION RULES
- Jayanti R Pande
DGICM College, Nagpur

Q1 What is Market Basket Analysis? Give significance of Market Basket Analysis for retailers? What are necessary steps for
implementing Market Basket Analysis?
MARKET BASKET ANALYSIS (MBA) is a data mining technique used in the field of retail and marketing to discover associations
and correlations between items that customers frequently buy together. The primary goal is to identify patterns and
relationships within transactional data, helping retailers understand customer behavior and preferences. Here's a breakdown of
its significance and essential steps for implementation:
SIGNIFICANCE OF MARKET BASKET ANALYSIS FOR RETAILERS
1. Increases Customer Engagement: By understanding the relationships between products, retailers can create targeted
marketing campaigns and promotions, increasing customer engagement.
2. Boosts Sales and Increases ROI: Tailoring promotions based on customer buying patterns can lead to higher sales and a
better return on investment.
3. Improves Customer Experience: Personalized recommendations and promotions enhance the overall shopping experience
for customers.
4. Optimizes Marketing Strategies and Campaigns: Retailers can optimize their marketing efforts by focusing on promoting
items that are frequently purchased together.
5. Helps Understand Customers Better: MBA provides insights into customer preferences, enabling retailers to stock relevant
products and improve overall satisfaction.
6. Identifies Customer Behavior and Patterns: Retailers can uncover hidden patterns and trends in customer behavior, aiding in
strategic decision-making.

ESSENTIAL STEPS FOR IMPLEMENTING MARKET BASKET ANALYSIS
1 Define Minimum Support and Confidence:
Support: The proportion of transactions that contain a particular itemset.
Confidence: The probability that a rule is true, given that the antecedent is true.
2 Identify Subsets with Higher Support: Find all itemsets (subsets) with a support higher than the defined minimum support threshold.
3 Generate Association Rules: For each high-support itemset, generate association rules based on the defined minimum confidence
threshold.
4 Sort Association Rules : Rank the association rules in decreasing order of confidence.
5 Analyze Rules: Examine the association rules along with their confidence and support values. Identify meaningful and actionable insights
from the discovered patterns.
Implementing Market Basket Analysis typically involves using algorithms like the Apriori algorithm or FP-growth algorithm. These
algorithms efficiently mine frequent itemsets and generate association rules from transactional data.
1 Define Minimum Support and Confidence
2 Identify Subsets with Higher Support
3 Generate Association Rules
5 Analyse Rules
4 Sort Association Rules

Apriori FP Growth
• Array-based algorithm • Tree-based algorithm
• Uses Join and Prune techniques • Constructs conditional frequent pattern tree
• Utilizes breadth-first search algorithm • Utilizes depth-first search algorithm
• Level-wise approach for pattern generation • Pattern growth approach considering existing data
• Exponentially slow candidate generation • Linear runtime complexity
• Highly parallelizable candidate generation • Data interdependency, each node needs root
• Requires large memory space • Requires less memory space due to compact structure
• Scans database multiple times • Scans dataset only twice for constructing the tree
• Performance impacted by the number of items • Less impacted by the number of items
• Memory-intensive due to candidate generation • More memory-efficient with a compact structure
Q2 Compare Apriori and FP Growth Algorithm
Apriori Algorithm
The Apriori algorithm is a classic algorithm used for association rule mining, a technique in data mining that identifies relationships
between variables in large datasets. It was proposed by Agrawal and Srikant in 1994. The primary objective of the Apriori algorithm is to
find frequent item sets in a transaction database, which are sets of items that frequently occur together. These frequent itemsets are
then used to generate association rules.
FP-Growth Algorithm
The FP-Growth (Frequent Pattern Growth) algorithm is an alternative approach to association rule mining that aims to address some of
the limitations of the Apriori algorithm. It was proposed by Han, Pei, and Yin in 2000.

Q3 What is FP Growth algorithm? State the advantages and Disadvantages of FP Growth Algorithm.
FP-GROWTH ALGORITHM
The FP-Growth Algorithm is an approach for finding frequent item sets in a database without using candidate generation. It
utilizes a divide-and-conquer strategy and employs a special data structure known as the frequent-pattern tree (FP-tree).
Algorithm Workflow:
Compresses the input database by creating an FP-tree to represent frequent items.
Divides the compressed database into sets of conditional databases, each associated with one frequent pattern.
Mines each conditional database separately.
Search Cost Reduction: Reduces search costs by recursively looking for short patterns and then concatenating them into long
frequent patterns.
Handling Large Databases: In large databases, where holding the FP tree in main memory is impractical, the algorithm partitions
the database into smaller databases (projected databases) and constructs an FP-tree for each.
ADVANTAGES OF FP-GROWTH ALGORITHM
1. Reduced Database Scans: Needs to scan the database twice, as opposed to Apriori, which scans transactions for each iteration.
2. Faster Execution: The pairing of items is not performed, making it faster compared to some other algorithms.
3. Compact Memory Storage: Stores the database in a compact version in memory, improving efficiency.
4. Scalability: Efficient and scalable for mining both long and short frequent patterns.
DISADVANTAGES OF FP-GROWTH ALGORITHM
1. Complex FP Tree Construction: Building the FP tree is more cumbersome and challenging than the Apriori algorithm.
2. Potential Expense: May be relatively expensive, particularly in certain scenarios.
3. Memory Constraints: The algorithm may face challenges fitting into shared memory when dealing with large databases.

Q4 What are different types of Association rules in data mining? Briefly mention about the algorithms used for Association Rule
mining
TYPES OF ASSOCIATION RULES IN DATA MINING
1.Multi-Relational Association Rule (MRAR) : Derived from multi-relational databases, MRAR involves rules with one entity
having different relationships, representing indirect relationships between entities.
2.Generalized Association Rule : Used to discover hidden patterns in data, generalized association rules provide a rough idea
about interesting patterns.
3.Quantitative Association Rules : Involves numeric attributes in at least one part of the rule, distinguishing it from generalized
association rules where both sides consist of categorical attributes.
1
Multi-Relational
Association Rule
2
Generalized Association Rule
3
Quantitative Association
Rules
TYPES OF ASSOCIATION RULES
IN DATA MINING

ALGORITHMS FOR ASSOCIATION RULE MINING
1 Apriori Algorithm:
Description: Identifies frequent individual items in a database and expands them to larger item sets, ensuring that these item
sets appear sufficiently often in the database.
Key Characteristics: Utilizes a breadth-first search algorithm and generates candidate itemsets with the Apriori property.
2 Eclat Algorithm:
Description: Also known as Equivalence Class Clustering and bottom-up, Eclat is considered by some as a more efficient
version of the Apriori algorithm. It employs lattice traversal to find frequent itemsets.
Key Characteristics: Focuses on intersection and support counting, avoiding the need for candidate generation.
3 FP-Growth Algorithm:
Description: Operates in two stages, including FP-tree construction and the extraction of frequently used item sets.
Particularly useful for finding frequent patterns without candidate generation.
Key Characteristics: Uses a divide-and-conquer strategy, creating an FP-tree to represent frequent items and then dividing the
database into conditional databases.
These algorithms play a crucial role in discovering associations and patterns within large datasets. While Apriori and Eclat
focus on candidate generation and support counting, FP-Growth eliminates the need for explicit candidate generation,
making it more efficient for certain scenarios. Each algorithm has its strengths and weaknesses, and the choice of the
algorithm depends on the specific requirements of the data mining task and the characteristics of the dataset.

Q5 How to Apply the Apriori algorithm for the given data.
Step-1: K=1
1.Create Candidate Set C1: Count the occurrences of each individual item (I1, I2, I3, I4, I5) in the dataset.
2.Generate Frequent Itemset L1: Keep only the items with a support count greater than or equal to the minimum support
count (min_support=2).
Step-2: K=2
1.Generate Candidate Set C2: Join the items in L1 to form pairs and filter out those pairs that do not satisfy the Apriori
property (having subsets with minimum support). Count the occurrences of each candidate pair in the dataset.
2.Generate Frequent Itemset L2: Keep only the pairs with a support count greater than or equal to the minimum support
count.
Step-3: K=3
1.Generate Candidate Set C3: Join the items in L2 to form triplets and filter out those triplets that do not satisfy the Apriori
property. Count the occurrences of each candidate triplet in the dataset.
2.Generate Frequent Itemset L3: Keep only the triplets with a support count greater than or equal to the minimum support
count.
Continue this process until no more frequent itemsets can be generated.
Association Rule Generation:
1.For each frequent itemset, generate all possible non-empty subsets (itemset A and its complement B).
2.Calculate the confidence for each rule: Confidence(A->B) = Support_count(A∪B) / Support_count(A).
3.Keep only the rules with confidence greater than or equal to the minimum confidence threshold (min_confidence=50%).

Q6 Write about Associative Classification method.
ASSOCIATIVE CLASSIFICATION is a method that combines principles from association rule mining and classification algorithms. It
aims to leverage the discovered associations in a dataset to enhance the performance of classification models. The following are
the key steps involved in the Associative Classification method:
1.Association Rule Mining: The first step involves mining association rules from the dataset. Association rule mining identifies
interesting relationships or associations between different attributes in the data. Common algorithms used for this step include
Apriori and FP-Growth.
2.Rule Pruning: Once the association rules are generated, a pruning step is often applied to filter out less relevant or less
significant rules. Pruning criteria may include measures like support, confidence, or other relevance measures.
3.Rule-to-Class Transformation: The association rules, typically in the form of "if-then" statements, are transformed into
classification rules. The antecedent part of the association rule becomes the condition for classifying instances, and the
consequent part becomes the predicted class label.
4.Building the Classification Model: The transformed rules are used to build a classification model. This model captures the
relationships and dependencies identified during the association rule mining phase.
5.Classifying New Instances: When a new, unseen instance needs to be classified, the rules are applied to determine the
predicted class label. Multiple rules may apply to a single instance, and conflict resolution strategies are employed to handle
such situations.
6.Conflict Resolution: Conflict resolution addresses cases where multiple rules predict conflicting class labels for the same
instance. Strategies include selecting the rule with the highest confidence, using a voting mechanism, or considering additional
criteria to resolve conflicts.
7.Evaluation and Fine-Tuning: The performance of the associative classification model is evaluated using standard metrics such as
accuracy, precision, recall, and F1-score. Fine-tuning may be performed to improve the model's performance.

Copyright © 2024 Jayanti Rajdevendra Pande.
All rights reserved.
This content may be printed for personal use only. It may not be copied, distributed, or used for any other purpose
without the express written permission of the copyright owner.
This content is protected by copyright law. Any unauthorized use of the content may violate copyright laws and
other applicable laws.
For any further queries contact on email: jayantipande17@gmail.com

Data Mining Module 4 Business Analytics.pdf

More Related Content

What's hot (20)

Similar to Data Mining Module 4 Business Analytics.pdf (20)

More from Jayanti Pande (20)

Recently uploaded (20)

Data Mining Module 4 Business Analytics.pdf