SlideShare a Scribd company logo
2
Most read
3
Most read
5
Most read
Copyright © 2024 Jayanti Rajdevendra Pande. All rights reserved.
RASHTRASANT TUKDOJI MAHARAJ NAGPUR UNIVERSITY
MBA
SEMESTER: 3
SPECIALIZATION
BUSINESS ANALYTICS (BA 2)
SUBJECT
DATA MINING
MODULE NO : 4
ASSOCIATION RULES
- Jayanti R Pande
DGICM College, Nagpur
Copyright © 2024 Jayanti Rajdevendra Pande. All rights reserved.
Q1 What is Market Basket Analysis? Give significance of Market Basket Analysis for retailers? What are necessary steps for
implementing Market Basket Analysis?
MARKET BASKET ANALYSIS (MBA) is a data mining technique used in the field of retail and marketing to discover associations
and correlations between items that customers frequently buy together. The primary goal is to identify patterns and
relationships within transactional data, helping retailers understand customer behavior and preferences. Here's a breakdown of
its significance and essential steps for implementation:
SIGNIFICANCE OF MARKET BASKET ANALYSIS FOR RETAILERS
1. Increases Customer Engagement: By understanding the relationships between products, retailers can create targeted
marketing campaigns and promotions, increasing customer engagement.
2. Boosts Sales and Increases ROI: Tailoring promotions based on customer buying patterns can lead to higher sales and a
better return on investment.
3. Improves Customer Experience: Personalized recommendations and promotions enhance the overall shopping experience
for customers.
4. Optimizes Marketing Strategies and Campaigns: Retailers can optimize their marketing efforts by focusing on promoting
items that are frequently purchased together.
5. Helps Understand Customers Better: MBA provides insights into customer preferences, enabling retailers to stock relevant
products and improve overall satisfaction.
6. Identifies Customer Behavior and Patterns: Retailers can uncover hidden patterns and trends in customer behavior, aiding in
strategic decision-making.
Copyright © 2024 Jayanti Rajdevendra Pande. All rights reserved.
ESSENTIAL STEPS FOR IMPLEMENTING MARKET BASKET ANALYSIS
1 Define Minimum Support and Confidence:
Support: The proportion of transactions that contain a particular itemset.
Confidence: The probability that a rule is true, given that the antecedent is true.
2 Identify Subsets with Higher Support: Find all itemsets (subsets) with a support higher than the defined minimum support threshold.
3 Generate Association Rules: For each high-support itemset, generate association rules based on the defined minimum confidence
threshold.
4 Sort Association Rules : Rank the association rules in decreasing order of confidence.
5 Analyze Rules: Examine the association rules along with their confidence and support values. Identify meaningful and actionable insights
from the discovered patterns.
Implementing Market Basket Analysis typically involves using algorithms like the Apriori algorithm or FP-growth algorithm. These
algorithms efficiently mine frequent itemsets and generate association rules from transactional data.
1 Define Minimum Support and Confidence
2 Identify Subsets with Higher Support
3 Generate Association Rules
5 Analyse Rules
4 Sort Association Rules
Copyright © 2024 Jayanti Rajdevendra Pande. All rights reserved.
Apriori FP Growth
• Array-based algorithm • Tree-based algorithm
• Uses Join and Prune techniques • Constructs conditional frequent pattern tree
• Utilizes breadth-first search algorithm • Utilizes depth-first search algorithm
• Level-wise approach for pattern generation • Pattern growth approach considering existing data
• Exponentially slow candidate generation • Linear runtime complexity
• Highly parallelizable candidate generation • Data interdependency, each node needs root
• Requires large memory space • Requires less memory space due to compact structure
• Scans database multiple times • Scans dataset only twice for constructing the tree
• Performance impacted by the number of items • Less impacted by the number of items
• Memory-intensive due to candidate generation • More memory-efficient with a compact structure
Q2 Compare Apriori and FP Growth Algorithm
Apriori Algorithm
The Apriori algorithm is a classic algorithm used for association rule mining, a technique in data mining that identifies relationships
between variables in large datasets. It was proposed by Agrawal and Srikant in 1994. The primary objective of the Apriori algorithm is to
find frequent item sets in a transaction database, which are sets of items that frequently occur together. These frequent itemsets are
then used to generate association rules.
FP-Growth Algorithm
The FP-Growth (Frequent Pattern Growth) algorithm is an alternative approach to association rule mining that aims to address some of
the limitations of the Apriori algorithm. It was proposed by Han, Pei, and Yin in 2000.
Copyright © 2024 Jayanti Rajdevendra Pande. All rights reserved.
Q3 What is FP Growth algorithm? State the advantages and Disadvantages of FP Growth Algorithm.
FP-GROWTH ALGORITHM
The FP-Growth Algorithm is an approach for finding frequent item sets in a database without using candidate generation. It
utilizes a divide-and-conquer strategy and employs a special data structure known as the frequent-pattern tree (FP-tree).
Algorithm Workflow:
Compresses the input database by creating an FP-tree to represent frequent items.
Divides the compressed database into sets of conditional databases, each associated with one frequent pattern.
Mines each conditional database separately.
Search Cost Reduction: Reduces search costs by recursively looking for short patterns and then concatenating them into long
frequent patterns.
Handling Large Databases: In large databases, where holding the FP tree in main memory is impractical, the algorithm partitions
the database into smaller databases (projected databases) and constructs an FP-tree for each.
ADVANTAGES OF FP-GROWTH ALGORITHM
1. Reduced Database Scans: Needs to scan the database twice, as opposed to Apriori, which scans transactions for each iteration.
2. Faster Execution: The pairing of items is not performed, making it faster compared to some other algorithms.
3. Compact Memory Storage: Stores the database in a compact version in memory, improving efficiency.
4. Scalability: Efficient and scalable for mining both long and short frequent patterns.
DISADVANTAGES OF FP-GROWTH ALGORITHM
1. Complex FP Tree Construction: Building the FP tree is more cumbersome and challenging than the Apriori algorithm.
2. Potential Expense: May be relatively expensive, particularly in certain scenarios.
3. Memory Constraints: The algorithm may face challenges fitting into shared memory when dealing with large databases.
Copyright © 2024 Jayanti Rajdevendra Pande. All rights reserved.
Q4 What are different types of Association rules in data mining? Briefly mention about the algorithms used for Association Rule
mining
TYPES OF ASSOCIATION RULES IN DATA MINING
1.Multi-Relational Association Rule (MRAR) : Derived from multi-relational databases, MRAR involves rules with one entity
having different relationships, representing indirect relationships between entities.
2.Generalized Association Rule : Used to discover hidden patterns in data, generalized association rules provide a rough idea
about interesting patterns.
3.Quantitative Association Rules : Involves numeric attributes in at least one part of the rule, distinguishing it from generalized
association rules where both sides consist of categorical attributes.
1
Multi-Relational
Association Rule
2
Generalized Association Rule
3
Quantitative Association
Rules
TYPES OF ASSOCIATION RULES
IN DATA MINING
Copyright © 2024 Jayanti Rajdevendra Pande. All rights reserved.
ALGORITHMS FOR ASSOCIATION RULE MINING
1 Apriori Algorithm:
Description: Identifies frequent individual items in a database and expands them to larger item sets, ensuring that these item
sets appear sufficiently often in the database.
Key Characteristics: Utilizes a breadth-first search algorithm and generates candidate itemsets with the Apriori property.
2 Eclat Algorithm:
Description: Also known as Equivalence Class Clustering and bottom-up, Eclat is considered by some as a more efficient
version of the Apriori algorithm. It employs lattice traversal to find frequent itemsets.
Key Characteristics: Focuses on intersection and support counting, avoiding the need for candidate generation.
3 FP-Growth Algorithm:
Description: Operates in two stages, including FP-tree construction and the extraction of frequently used item sets.
Particularly useful for finding frequent patterns without candidate generation.
Key Characteristics: Uses a divide-and-conquer strategy, creating an FP-tree to represent frequent items and then dividing the
database into conditional databases.
These algorithms play a crucial role in discovering associations and patterns within large datasets. While Apriori and Eclat
focus on candidate generation and support counting, FP-Growth eliminates the need for explicit candidate generation,
making it more efficient for certain scenarios. Each algorithm has its strengths and weaknesses, and the choice of the
algorithm depends on the specific requirements of the data mining task and the characteristics of the dataset.
Copyright © 2024 Jayanti Rajdevendra Pande. All rights reserved.
Q5 How to Apply the Apriori algorithm for the given data.
Step-1: K=1
1.Create Candidate Set C1: Count the occurrences of each individual item (I1, I2, I3, I4, I5) in the dataset.
2.Generate Frequent Itemset L1: Keep only the items with a support count greater than or equal to the minimum support
count (min_support=2).
Step-2: K=2
1.Generate Candidate Set C2: Join the items in L1 to form pairs and filter out those pairs that do not satisfy the Apriori
property (having subsets with minimum support). Count the occurrences of each candidate pair in the dataset.
2.Generate Frequent Itemset L2: Keep only the pairs with a support count greater than or equal to the minimum support
count.
Step-3: K=3
1.Generate Candidate Set C3: Join the items in L2 to form triplets and filter out those triplets that do not satisfy the Apriori
property. Count the occurrences of each candidate triplet in the dataset.
2.Generate Frequent Itemset L3: Keep only the triplets with a support count greater than or equal to the minimum support
count.
Continue this process until no more frequent itemsets can be generated.
Association Rule Generation:
1.For each frequent itemset, generate all possible non-empty subsets (itemset A and its complement B).
2.Calculate the confidence for each rule: Confidence(A->B) = Support_count(A∪B) / Support_count(A).
3.Keep only the rules with confidence greater than or equal to the minimum confidence threshold (min_confidence=50%).
Copyright © 2024 Jayanti Rajdevendra Pande. All rights reserved.
Q6 Write about Associative Classification method.
ASSOCIATIVE CLASSIFICATION is a method that combines principles from association rule mining and classification algorithms. It
aims to leverage the discovered associations in a dataset to enhance the performance of classification models. The following are
the key steps involved in the Associative Classification method:
1.Association Rule Mining: The first step involves mining association rules from the dataset. Association rule mining identifies
interesting relationships or associations between different attributes in the data. Common algorithms used for this step include
Apriori and FP-Growth.
2.Rule Pruning: Once the association rules are generated, a pruning step is often applied to filter out less relevant or less
significant rules. Pruning criteria may include measures like support, confidence, or other relevance measures.
3.Rule-to-Class Transformation: The association rules, typically in the form of "if-then" statements, are transformed into
classification rules. The antecedent part of the association rule becomes the condition for classifying instances, and the
consequent part becomes the predicted class label.
4.Building the Classification Model: The transformed rules are used to build a classification model. This model captures the
relationships and dependencies identified during the association rule mining phase.
5.Classifying New Instances: When a new, unseen instance needs to be classified, the rules are applied to determine the
predicted class label. Multiple rules may apply to a single instance, and conflict resolution strategies are employed to handle
such situations.
6.Conflict Resolution: Conflict resolution addresses cases where multiple rules predict conflicting class labels for the same
instance. Strategies include selecting the rule with the highest confidence, using a voting mechanism, or considering additional
criteria to resolve conflicts.
7.Evaluation and Fine-Tuning: The performance of the associative classification model is evaluated using standard metrics such as
accuracy, precision, recall, and F1-score. Fine-tuning may be performed to improve the model's performance.
Copyright © 2024 Jayanti Rajdevendra Pande. All rights reserved.
Copyright © 2024 Jayanti Rajdevendra Pande.
All rights reserved.
This content may be printed for personal use only. It may not be copied, distributed, or used for any other purpose
without the express written permission of the copyright owner.
This content is protected by copyright law. Any unauthorized use of the content may violate copyright laws and
other applicable laws.
For any further queries contact on email: jayantipande17@gmail.com

More Related Content

PDF
Data Mining Module 3 Business Analtics..pdf
PDF
Data Mining Module 2 Business Analytics.
PDF
Business Analytics 1 Module 2.pdf
PDF
Data Mining Module 1 Business Analytics.
PDF
Business Analytics 1 Module 3.pdf
PDF
Data Mining Module 5 Business Analytics.pdf
PDF
Business Analytics 1 Module 1.pdf
PDF
Business Analytics 1 Module 5.pdf
Data Mining Module 3 Business Analtics..pdf
Data Mining Module 2 Business Analytics.
Business Analytics 1 Module 2.pdf
Data Mining Module 1 Business Analytics.
Business Analytics 1 Module 3.pdf
Data Mining Module 5 Business Analytics.pdf
Business Analytics 1 Module 1.pdf
Business Analytics 1 Module 5.pdf

What's hot (20)

PDF
Web & Social Media Analytics Module 1.pdf
PDF
Business Analytics 1 Module 4.pdf
PDF
Web & Social Media Analytics Module 2.pdf
PDF
Web & Social Media Analytics Module 5.pdf
PDF
Web & Social Media Analytics Module 3.pdf
PDF
Web & Social Media Analytics Module 4.pdf
PDF
Lecture3 business intelligence
PPSX
Business Intelligence concepts
PPT
Different data models
PDF
Data Mining & Predictive Analytics - Lesson 14 - Concepts Recapitulation and ...
PDF
Retail Sales Mod 3.pdf
PDF
Data Analyst Interview Questions & Answers
PDF
Team_Dynamics_Mod_2.pdf
PPT
Introduction to Business Intelligence
PPT
Data warehouse
PDF
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...
PDF
Data Visualisation & Analytics with Tableau (Beginner) - by Maria Koumandraki
PPTX
Introduction to Data mining
PPTX
Data mining
PPT
Learning Tableau - Data, Graphs, Filters, Dashboards and Advanced features
Web & Social Media Analytics Module 1.pdf
Business Analytics 1 Module 4.pdf
Web & Social Media Analytics Module 2.pdf
Web & Social Media Analytics Module 5.pdf
Web & Social Media Analytics Module 3.pdf
Web & Social Media Analytics Module 4.pdf
Lecture3 business intelligence
Business Intelligence concepts
Different data models
Data Mining & Predictive Analytics - Lesson 14 - Concepts Recapitulation and ...
Retail Sales Mod 3.pdf
Data Analyst Interview Questions & Answers
Team_Dynamics_Mod_2.pdf
Introduction to Business Intelligence
Data warehouse
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...
Data Visualisation & Analytics with Tableau (Beginner) - by Maria Koumandraki
Introduction to Data mining
Data mining
Learning Tableau - Data, Graphs, Filters, Dashboards and Advanced features
Ad

Similar to Data Mining Module 4 Business Analytics.pdf (20)

PDF
Irjet v4 iA Survey on FP (Growth) Tree using Association Rule Mining7351
PDF
Gr2411971203
PDF
Multiple Minimum Support Implementations with Dynamic Matrix Apriori Algorith...
PPTX
Fundamental of Data Science BCA 6th Sem Notes
PPTX
Fundamental of Data Science BCA 6th Sem Notes
PPTX
Fundamental of Data Science BCA 6th Sem ppt
PDF
IRJET- Classification of Pattern Storage System and Analysis of Online Shoppi...
PDF
Data Mining based on Hashing Technique
PDF
Frequent Item Set Mining - A Review
PDF
Volume 2-issue-6-2081-2084
PDF
Volume 2-issue-6-2081-2084
PDF
H044063843
PDF
Dy33753757
PDF
Dy33753757
PDF
Review on: Techniques for Predicting Frequent Items
PDF
International Journal of Engineering Research and Development
PPTX
Association rule mining.pptx
PDF
Data Mining For Supermarket Sale Analysis Using Association Rule
DOCX
2014 IEEE JAVA DATA MINING PROJECT Secure mining of association rules in hori...
DOCX
IEEE 2014 JAVA DATA MINING PROJECTS Secure mining of association rules in hor...
Irjet v4 iA Survey on FP (Growth) Tree using Association Rule Mining7351
Gr2411971203
Multiple Minimum Support Implementations with Dynamic Matrix Apriori Algorith...
Fundamental of Data Science BCA 6th Sem Notes
Fundamental of Data Science BCA 6th Sem Notes
Fundamental of Data Science BCA 6th Sem ppt
IRJET- Classification of Pattern Storage System and Analysis of Online Shoppi...
Data Mining based on Hashing Technique
Frequent Item Set Mining - A Review
Volume 2-issue-6-2081-2084
Volume 2-issue-6-2081-2084
H044063843
Dy33753757
Dy33753757
Review on: Techniques for Predicting Frequent Items
International Journal of Engineering Research and Development
Association rule mining.pptx
Data Mining For Supermarket Sale Analysis Using Association Rule
2014 IEEE JAVA DATA MINING PROJECT Secure mining of association rules in hori...
IEEE 2014 JAVA DATA MINING PROJECTS Secure mining of association rules in hor...
Ad

More from Jayanti Pande (20)

PDF
UGC NET 2025 Current Affairs Module 3.pdf
PDF
UGC NET 2025 Current Affairs Module 2.pdf
PDF
UGC NET 2025 Current Affairs Module 1.pdf
PDF
BBA Business Law Unit 4 Summary Notes.pdf
PDF
BBA Business Law Unit 3 Summary Notes.pdf
PDF
BBA Business Law Unit 2 Summary Notes.pdf
PDF
BBA Business Law Unit 1 Summary Notes.pdf
PDF
Asst Prof most probable Interview Questions.pdf
PDF
Digital and Social Media Marketing Module 2.pdf
PDF
Digital & Social Media Marketing Module 1.pdf
PDF
Marketing Management Paper 3 Module 5.pdf
PDF
Marketing Management Paper 3 Module 4.pdf
PDF
Marketing Management Paper 3 Module 3 .pdf
PDF
Marketing Management Paper 3 Module 2.pdf
PDF
World Tread Organization [WTO] Overview.pdf
PDF
Marketing Management Paper 3 Module 1.pdf
PDF
Research Aptitude MCQ Series 1 for MAH SET Exam.pdf
PDF
Strategy to qualify MH SET Exam in Management.pdf
PDF
Digital Marketing Careers after MBA..pdf
PDF
HRM Guide| Covering All HRM important topics | Best for Interview Preparation...
UGC NET 2025 Current Affairs Module 3.pdf
UGC NET 2025 Current Affairs Module 2.pdf
UGC NET 2025 Current Affairs Module 1.pdf
BBA Business Law Unit 4 Summary Notes.pdf
BBA Business Law Unit 3 Summary Notes.pdf
BBA Business Law Unit 2 Summary Notes.pdf
BBA Business Law Unit 1 Summary Notes.pdf
Asst Prof most probable Interview Questions.pdf
Digital and Social Media Marketing Module 2.pdf
Digital & Social Media Marketing Module 1.pdf
Marketing Management Paper 3 Module 5.pdf
Marketing Management Paper 3 Module 4.pdf
Marketing Management Paper 3 Module 3 .pdf
Marketing Management Paper 3 Module 2.pdf
World Tread Organization [WTO] Overview.pdf
Marketing Management Paper 3 Module 1.pdf
Research Aptitude MCQ Series 1 for MAH SET Exam.pdf
Strategy to qualify MH SET Exam in Management.pdf
Digital Marketing Careers after MBA..pdf
HRM Guide| Covering All HRM important topics | Best for Interview Preparation...

Recently uploaded (20)

PDF
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
PDF
Complications of Minimal Access Surgery at WLH
PDF
TR - Agricultural Crops Production NC III.pdf
PDF
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
PDF
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
PPTX
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
PDF
O7-L3 Supply Chain Operations - ICLT Program
PDF
FourierSeries-QuestionsWithAnswers(Part-A).pdf
PDF
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
PDF
Insiders guide to clinical Medicine.pdf
PDF
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
PDF
Microbial disease of the cardiovascular and lymphatic systems
PPTX
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
PPTX
Pharmacology of Heart Failure /Pharmacotherapy of CHF
PPTX
Institutional Correction lecture only . . .
PPTX
Cell Structure & Organelles in detailed.
PDF
STATICS OF THE RIGID BODIES Hibbelers.pdf
PDF
01-Introduction-to-Information-Management.pdf
PPTX
BOWEL ELIMINATION FACTORS AFFECTING AND TYPES
PDF
RMMM.pdf make it easy to upload and study
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
Complications of Minimal Access Surgery at WLH
TR - Agricultural Crops Production NC III.pdf
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
O7-L3 Supply Chain Operations - ICLT Program
FourierSeries-QuestionsWithAnswers(Part-A).pdf
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
Insiders guide to clinical Medicine.pdf
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
Microbial disease of the cardiovascular and lymphatic systems
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
Pharmacology of Heart Failure /Pharmacotherapy of CHF
Institutional Correction lecture only . . .
Cell Structure & Organelles in detailed.
STATICS OF THE RIGID BODIES Hibbelers.pdf
01-Introduction-to-Information-Management.pdf
BOWEL ELIMINATION FACTORS AFFECTING AND TYPES
RMMM.pdf make it easy to upload and study

Data Mining Module 4 Business Analytics.pdf

  • 1. Copyright © 2024 Jayanti Rajdevendra Pande. All rights reserved. RASHTRASANT TUKDOJI MAHARAJ NAGPUR UNIVERSITY MBA SEMESTER: 3 SPECIALIZATION BUSINESS ANALYTICS (BA 2) SUBJECT DATA MINING MODULE NO : 4 ASSOCIATION RULES - Jayanti R Pande DGICM College, Nagpur
  • 2. Copyright © 2024 Jayanti Rajdevendra Pande. All rights reserved. Q1 What is Market Basket Analysis? Give significance of Market Basket Analysis for retailers? What are necessary steps for implementing Market Basket Analysis? MARKET BASKET ANALYSIS (MBA) is a data mining technique used in the field of retail and marketing to discover associations and correlations between items that customers frequently buy together. The primary goal is to identify patterns and relationships within transactional data, helping retailers understand customer behavior and preferences. Here's a breakdown of its significance and essential steps for implementation: SIGNIFICANCE OF MARKET BASKET ANALYSIS FOR RETAILERS 1. Increases Customer Engagement: By understanding the relationships between products, retailers can create targeted marketing campaigns and promotions, increasing customer engagement. 2. Boosts Sales and Increases ROI: Tailoring promotions based on customer buying patterns can lead to higher sales and a better return on investment. 3. Improves Customer Experience: Personalized recommendations and promotions enhance the overall shopping experience for customers. 4. Optimizes Marketing Strategies and Campaigns: Retailers can optimize their marketing efforts by focusing on promoting items that are frequently purchased together. 5. Helps Understand Customers Better: MBA provides insights into customer preferences, enabling retailers to stock relevant products and improve overall satisfaction. 6. Identifies Customer Behavior and Patterns: Retailers can uncover hidden patterns and trends in customer behavior, aiding in strategic decision-making.
  • 3. Copyright © 2024 Jayanti Rajdevendra Pande. All rights reserved. ESSENTIAL STEPS FOR IMPLEMENTING MARKET BASKET ANALYSIS 1 Define Minimum Support and Confidence: Support: The proportion of transactions that contain a particular itemset. Confidence: The probability that a rule is true, given that the antecedent is true. 2 Identify Subsets with Higher Support: Find all itemsets (subsets) with a support higher than the defined minimum support threshold. 3 Generate Association Rules: For each high-support itemset, generate association rules based on the defined minimum confidence threshold. 4 Sort Association Rules : Rank the association rules in decreasing order of confidence. 5 Analyze Rules: Examine the association rules along with their confidence and support values. Identify meaningful and actionable insights from the discovered patterns. Implementing Market Basket Analysis typically involves using algorithms like the Apriori algorithm or FP-growth algorithm. These algorithms efficiently mine frequent itemsets and generate association rules from transactional data. 1 Define Minimum Support and Confidence 2 Identify Subsets with Higher Support 3 Generate Association Rules 5 Analyse Rules 4 Sort Association Rules
  • 4. Copyright © 2024 Jayanti Rajdevendra Pande. All rights reserved. Apriori FP Growth • Array-based algorithm • Tree-based algorithm • Uses Join and Prune techniques • Constructs conditional frequent pattern tree • Utilizes breadth-first search algorithm • Utilizes depth-first search algorithm • Level-wise approach for pattern generation • Pattern growth approach considering existing data • Exponentially slow candidate generation • Linear runtime complexity • Highly parallelizable candidate generation • Data interdependency, each node needs root • Requires large memory space • Requires less memory space due to compact structure • Scans database multiple times • Scans dataset only twice for constructing the tree • Performance impacted by the number of items • Less impacted by the number of items • Memory-intensive due to candidate generation • More memory-efficient with a compact structure Q2 Compare Apriori and FP Growth Algorithm Apriori Algorithm The Apriori algorithm is a classic algorithm used for association rule mining, a technique in data mining that identifies relationships between variables in large datasets. It was proposed by Agrawal and Srikant in 1994. The primary objective of the Apriori algorithm is to find frequent item sets in a transaction database, which are sets of items that frequently occur together. These frequent itemsets are then used to generate association rules. FP-Growth Algorithm The FP-Growth (Frequent Pattern Growth) algorithm is an alternative approach to association rule mining that aims to address some of the limitations of the Apriori algorithm. It was proposed by Han, Pei, and Yin in 2000.
  • 5. Copyright © 2024 Jayanti Rajdevendra Pande. All rights reserved. Q3 What is FP Growth algorithm? State the advantages and Disadvantages of FP Growth Algorithm. FP-GROWTH ALGORITHM The FP-Growth Algorithm is an approach for finding frequent item sets in a database without using candidate generation. It utilizes a divide-and-conquer strategy and employs a special data structure known as the frequent-pattern tree (FP-tree). Algorithm Workflow: Compresses the input database by creating an FP-tree to represent frequent items. Divides the compressed database into sets of conditional databases, each associated with one frequent pattern. Mines each conditional database separately. Search Cost Reduction: Reduces search costs by recursively looking for short patterns and then concatenating them into long frequent patterns. Handling Large Databases: In large databases, where holding the FP tree in main memory is impractical, the algorithm partitions the database into smaller databases (projected databases) and constructs an FP-tree for each. ADVANTAGES OF FP-GROWTH ALGORITHM 1. Reduced Database Scans: Needs to scan the database twice, as opposed to Apriori, which scans transactions for each iteration. 2. Faster Execution: The pairing of items is not performed, making it faster compared to some other algorithms. 3. Compact Memory Storage: Stores the database in a compact version in memory, improving efficiency. 4. Scalability: Efficient and scalable for mining both long and short frequent patterns. DISADVANTAGES OF FP-GROWTH ALGORITHM 1. Complex FP Tree Construction: Building the FP tree is more cumbersome and challenging than the Apriori algorithm. 2. Potential Expense: May be relatively expensive, particularly in certain scenarios. 3. Memory Constraints: The algorithm may face challenges fitting into shared memory when dealing with large databases.
  • 6. Copyright © 2024 Jayanti Rajdevendra Pande. All rights reserved. Q4 What are different types of Association rules in data mining? Briefly mention about the algorithms used for Association Rule mining TYPES OF ASSOCIATION RULES IN DATA MINING 1.Multi-Relational Association Rule (MRAR) : Derived from multi-relational databases, MRAR involves rules with one entity having different relationships, representing indirect relationships between entities. 2.Generalized Association Rule : Used to discover hidden patterns in data, generalized association rules provide a rough idea about interesting patterns. 3.Quantitative Association Rules : Involves numeric attributes in at least one part of the rule, distinguishing it from generalized association rules where both sides consist of categorical attributes. 1 Multi-Relational Association Rule 2 Generalized Association Rule 3 Quantitative Association Rules TYPES OF ASSOCIATION RULES IN DATA MINING
  • 7. Copyright © 2024 Jayanti Rajdevendra Pande. All rights reserved. ALGORITHMS FOR ASSOCIATION RULE MINING 1 Apriori Algorithm: Description: Identifies frequent individual items in a database and expands them to larger item sets, ensuring that these item sets appear sufficiently often in the database. Key Characteristics: Utilizes a breadth-first search algorithm and generates candidate itemsets with the Apriori property. 2 Eclat Algorithm: Description: Also known as Equivalence Class Clustering and bottom-up, Eclat is considered by some as a more efficient version of the Apriori algorithm. It employs lattice traversal to find frequent itemsets. Key Characteristics: Focuses on intersection and support counting, avoiding the need for candidate generation. 3 FP-Growth Algorithm: Description: Operates in two stages, including FP-tree construction and the extraction of frequently used item sets. Particularly useful for finding frequent patterns without candidate generation. Key Characteristics: Uses a divide-and-conquer strategy, creating an FP-tree to represent frequent items and then dividing the database into conditional databases. These algorithms play a crucial role in discovering associations and patterns within large datasets. While Apriori and Eclat focus on candidate generation and support counting, FP-Growth eliminates the need for explicit candidate generation, making it more efficient for certain scenarios. Each algorithm has its strengths and weaknesses, and the choice of the algorithm depends on the specific requirements of the data mining task and the characteristics of the dataset.
  • 8. Copyright © 2024 Jayanti Rajdevendra Pande. All rights reserved. Q5 How to Apply the Apriori algorithm for the given data. Step-1: K=1 1.Create Candidate Set C1: Count the occurrences of each individual item (I1, I2, I3, I4, I5) in the dataset. 2.Generate Frequent Itemset L1: Keep only the items with a support count greater than or equal to the minimum support count (min_support=2). Step-2: K=2 1.Generate Candidate Set C2: Join the items in L1 to form pairs and filter out those pairs that do not satisfy the Apriori property (having subsets with minimum support). Count the occurrences of each candidate pair in the dataset. 2.Generate Frequent Itemset L2: Keep only the pairs with a support count greater than or equal to the minimum support count. Step-3: K=3 1.Generate Candidate Set C3: Join the items in L2 to form triplets and filter out those triplets that do not satisfy the Apriori property. Count the occurrences of each candidate triplet in the dataset. 2.Generate Frequent Itemset L3: Keep only the triplets with a support count greater than or equal to the minimum support count. Continue this process until no more frequent itemsets can be generated. Association Rule Generation: 1.For each frequent itemset, generate all possible non-empty subsets (itemset A and its complement B). 2.Calculate the confidence for each rule: Confidence(A->B) = Support_count(A∪B) / Support_count(A). 3.Keep only the rules with confidence greater than or equal to the minimum confidence threshold (min_confidence=50%).
  • 9. Copyright © 2024 Jayanti Rajdevendra Pande. All rights reserved. Q6 Write about Associative Classification method. ASSOCIATIVE CLASSIFICATION is a method that combines principles from association rule mining and classification algorithms. It aims to leverage the discovered associations in a dataset to enhance the performance of classification models. The following are the key steps involved in the Associative Classification method: 1.Association Rule Mining: The first step involves mining association rules from the dataset. Association rule mining identifies interesting relationships or associations between different attributes in the data. Common algorithms used for this step include Apriori and FP-Growth. 2.Rule Pruning: Once the association rules are generated, a pruning step is often applied to filter out less relevant or less significant rules. Pruning criteria may include measures like support, confidence, or other relevance measures. 3.Rule-to-Class Transformation: The association rules, typically in the form of "if-then" statements, are transformed into classification rules. The antecedent part of the association rule becomes the condition for classifying instances, and the consequent part becomes the predicted class label. 4.Building the Classification Model: The transformed rules are used to build a classification model. This model captures the relationships and dependencies identified during the association rule mining phase. 5.Classifying New Instances: When a new, unseen instance needs to be classified, the rules are applied to determine the predicted class label. Multiple rules may apply to a single instance, and conflict resolution strategies are employed to handle such situations. 6.Conflict Resolution: Conflict resolution addresses cases where multiple rules predict conflicting class labels for the same instance. Strategies include selecting the rule with the highest confidence, using a voting mechanism, or considering additional criteria to resolve conflicts. 7.Evaluation and Fine-Tuning: The performance of the associative classification model is evaluated using standard metrics such as accuracy, precision, recall, and F1-score. Fine-tuning may be performed to improve the model's performance.
  • 10. Copyright © 2024 Jayanti Rajdevendra Pande. All rights reserved. Copyright © 2024 Jayanti Rajdevendra Pande. All rights reserved. This content may be printed for personal use only. It may not be copied, distributed, or used for any other purpose without the express written permission of the copyright owner. This content is protected by copyright law. Any unauthorized use of the content may violate copyright laws and other applicable laws. For any further queries contact on email: jayantipande17@gmail.com