SlideShare a Scribd company logo
Association Rules and Frequent Pattern Growth Algorithms
CIS 435
Francisco E. Figueroa
Executive Summary
During the last years, we have witnessed an exponential growth in the amount of data
generated and stored from all fields including science, business, and retailing. Data mining
could be defined as the process concerned with applying computational techniques to find
patterns in the data to generate knowledge and wisdom for the creation of new value for the
companies. By conducting association rules mining on on given historical sales data, the
results will be able to provide actionable intelligence to the business leadership team to the
store can be prepare for the heavy snowstorm.
Association Rules Overview
The goal of the association rule is to identify all frequent itemsets above a user specified
threshold (called support) and to generate all association rules above another threshold (called
confident) using these frequent itemsets as input. The association analysis is useful for
discovering relationships hidden in large data sets. The uncovered relationships can be
represented in the form of association rules or sets of frequent items. Retailers can use this
type of rules to help them identify new business opportunities for cross selling the products to
the clients. For example, the following rule can be extracted potentially from the data:
{milk} ----> {bread}. The rule suggests that a strong relationship exists between the sale of milk
and bread because many customers who buy bread also buy milk. The association rule is an
implication expression of the form X ---> Y, where X and Y are disjoint itemsets.
Strength, Confidence and Lift
The strength of the association rule can be measured in terms of its support and
confidence. The Support determines how often a rule is applicable to a given data set, while the
confidence determines how frequently items in Y appear in transactions that contain X. Support
is an important measure because a rule that has very low support may occur simply by chance
or that is likely to be uninteresting from a business perspective. Support can be used to
eliminate uninteresting rules.
The confidence measures the reliability of the infce made by the rule. For a given rule
X--->Y, the higher the confidence, the more likely is for Y to be present in transactions that
contain X. It also provides an estimate of the conditional probability of Y given X. The inference
made by an association reul suggest a strong co-occurrence relationship between items in the
antecedent and consequent rule.
The Lift is equal to the confidence factor divided by the expected confidence. A credible
rule has a large relative confidence factor, a relatively large level of support, and a value of lift
greater than 1. Rules having a high level of confidence but little support should be interpreted
with caution. (SAS, 2000)
So, when you analyze, the Lift of the rule is X=>Y is the confidence of the rule divided by
the expected confidence, assuming that the item sets are independent. Then we can say that:
- if lift value is greater than 1 indicates that X and Y appear more often together than
expected; this means that the occurrence of X has a positive effect on the occurrence of Y or
that X is positively correlated with Y.
- if lift is smaller than 1 indicates that X and Y appear less often together than expected,
this means that the occurrence of X has a negative effect on the occurrence of Y or that X is
negatively correlated with Y
-if lift value is near 1 indicates that X and Y appear almost as often together as expected;
this means that the occurrence of X has almost no effect on the occurrence of Y or that X and Y
have
Appriori
The Apriori algorithm was proposed for mining frequent item sets to obtain strong Boolean
association rules. A frequent itemset is a set of transactions that occurs with a minimum
specified support. A strong rule is one that satisfies both minimum support and minimum
confidence. Apriori algorithm uses an iterative level-wise search, where k-itemsets (an itemset
that contains k items) are used to explore k+1 itemsets, to mine frequent itemsets from
transactional database for Boolean association rules. The rule involved, is to first find the set of
frequent 1-itemsets (k=1). This set is denoted L1. L1 is then used to find the set of frequent
2-itemsets, L2, which is in turn used to find L3, and so on, until no more frequent k-itemsets can
be found. Each iteration involves two steps – 1) Generate large k-itemsets and 2) Determine
the support of each itemset using the transaction database. Infrequent itemsets are then pruned
and strong rules are generated from the frequent itemsets.
FP Growth
FP-Growth is an improvement of apriori designed to eliminate some of the heavy bottlenecks in
apriori. FP-Growth simplifies all the problems present in apriori by using a structure called an
FP-Tree. In the FP-Tree each node represents an item and it's current count, and each branch
represents a different association. The whole algorithm is divided in 5 simple steps: first step,
count all the items in all transaction; second step, apply the threshold; third step, sort the lists to
the count of each item; fourth step, build the tree based on each transaction and all items in
order they appear in the short list; and fifth step, every branch of the tree and only include in the
association all the nodes whose count passed the threshold. The biggest advantage of the
FP-Growth is that the algorithm needs to read the file twice, removes the need to calculate the
pairs to be counted, does not required the amount of memory resources as the apriori. (Alfaro,
2016)
Top 10 Products
When applying Apriori: MetricType: confidence; numrules 40; car:True we obtained the following
prooducts: Apriori: bath tissue, hat, water, soap, beer, flashlights, rock salt, protein bars,
blankets and milk
Top 10 Association Rules
When applying Apriori and FP Growth we obtained the following results:
FP Growth: MetricType: confidence; numrules: 40
1. [WATER=T]: 99 ==> [Soap=T]: 99 <conf:(1)> lift:(1.01) lev:(0.01) conv:(0.99)
2. [Soap=T]: 99 ==> [WATER=T]: 99 <conf:(1)> lift:(1.01) lev:(0.01) conv:(0.99)
3. [Beer=T]: 88 ==> [WATER=T]: 88 <conf:(1)> lift:(1.01) lev:(0.01) conv:(0.88)
4. [Flashlights=T]: 77 ==> [WATER=T]: 77 <conf:(1)> lift:(1.01) lev:(0.01) conv:(0.77)
5. [Milk=T]: 64 ==> [WATER=T]: 64 <conf:(1)> lift:(1.01) lev:(0.01) conv:(0.64)
6. [Blankets=T]: 64 ==> [WATER=T]: 64 <conf:(1)> lift:(1.01) lev:(0.01) conv:(0.64)
7. [Beer=T]: 88 ==> [Soap=T]: 88 <conf:(1)> lift:(1.01) lev:(0.01) conv:(0.88)
8. [Flashlights=T]: 77 ==> [Soap=T]: 77 <conf:(1)> lift:(1.01) lev:(0.01) conv:(0.77)
9. [Milk=T]: 64 ==> [Soap=T]: 64 <conf:(1)> lift:(1.01) lev:(0.01) conv:(0.64)
10. [Blankets=T]: 64 ==> [Soap=T]: 64 <conf:(1)> lift:(1.01) lev:(0.01) conv:(0.64)
Apriori: MetricType: confidence; numrules 40; car:True
1. Bath Tissue=T 55 ==> Hat=T 55 conf:(1)
2. WATER=T Bath Tissue=T 54 ==> Hat=T 54 conf:(1)
3. Bath Tissue=T Soap=T 54 ==> Hat=T 54 conf:(1)
4. WATER=T Bath Tissue=T Soap=T 54 ==> Hat=T 54 conf:(1)
5. Beer=T Bath Tissue=T 48 ==> Hat=T 48 conf:(1)
6. WATER=T Beer=T Bath Tissue=T 48 ==> Hat=T 48 conf:(1)
7. Beer=T Bath Tissue=T Soap=T 48 ==> Hat=T 48 conf:(1)
8. WATER=T Beer=T Bath Tissue=T Soap=T 48 ==> Hat=T 48 conf:(1)
9. Flashlights=T Bath Tissue=T 39 ==> Hat=T 39 conf:(1)
10. Flashlights=T WATER=T Bath Tissue=T 39 ==> Hat=T 39 conf:(1)
We can appreciate that water, soap, beer, and flashlights are strong products.
Top 2 Products Purchased
The FP Growth found 19 rules associated to “Generator”. ​If the lift is > 1, that lets us know the
degree to which those two occurrences are dependent on one another, and makes those rules
potentially useful for predicting the consequent in future data sets. In addition, the conviction show
how often the rule can be incorrect. Based on those measures, we found that water and soap
because it has a lift of 1.01 and a conviction of 0.1. Now beer is another strong product to purchase
with the “Generator” because is has a lift of 1.14 but has a conviction of 1.2, so the rule can be 20%
of the time incorrect.
FPGrowth found 19 rules (displaying top 19)
Showing only rules that contain: Generator
1. [Generator=T]: 10 ==> [WATER=T]: 10 <conf:(1)> lift:(1.01) lev:(0) conv:(0.1)
2. [Generator=T]: 10 ==> [Soap=T]: 10 <conf:(1)> lift:(1.01) lev:(0) conv:(0.1)
3. [Generator=T]: 10 ==> [Beer=T]: 10 <conf:(1)> lift:(1.14) lev:(0.01) conv:(1.2)
4. [Generator=T]: 10 ==> [WATER=T, Soap=T]: 10 <conf:(1)> lift:(1.01) lev:(0) conv:(0.1)
5. [WATER=T, Generator=T]: 10 ==> [Soap=T]: 10 <conf:(1)> lift:(1.01) lev:(0) conv:(0.1)
6. [Soap=T, Generator=T]: 10 ==> [WATER=T]: 10 <conf:(1)> lift:(1.01) lev:(0) conv:(0.1)
7. [Generator=T]: 10 ==> [WATER=T, Beer=T]: 10 <conf:(1)> lift:(1.14) lev:(0.01) conv:(1.2)
8. [WATER=T, Generator=T]: 10 ==> [Beer=T]: 10 <conf:(1)> lift:(1.14) lev:(0.01) conv:(1.2)
9. [Beer=T, Generator=T]: 10 ==> [WATER=T]: 10 <conf:(1)> lift:(1.01) lev:(0) conv:(0.1)
9 Large Itemsets
To achieve a rules which contains 9 items (L(9)), Weka had to be configured with the
following parameters: Apriori, CAR: True, lowerboundMinSupport: 0.1, metricType: confidency,
minMetric 0.09, numrules: 400, outputitemsets: true. We obtain the following Large Itemsets
L(9).
Large Itemsets L(9):
Rock salt=T WATER=T Snow shovels=T Blankets=T Protien Bars=T Bath Tissue=T Soap=T Hygine Products=T
Milk=T 10
Flashlights=T WATER=T Blankets=T Canned food=T Protien Bars=T Beer=T Bath Tissue=T Soap=T Milk=T 10
Flashlights=T WATER=T Blankets=T Canned food=T Protien Bars=T Beer=T Bath Tissue=T Soap=T Bread=T 10
Flashlights=T WATER=T Blankets=T Canned food=T Protien Bars=T Beer=T Bath Tissue=T Milk=T Bread=T 10
Flashlights=T WATER=T Blankets=T Canned food=T Protien Bars=T Beer=T Soap=T Milk=T Bread=T 10
Flashlights=T WATER=T Blankets=T Canned food=T Protien Bars=T Bath Tissue=T Soap=T Milk=T Bread=T 10
Flashlights=T WATER=T Blankets=T Canned food=T Beer=T Bath Tissue=T Soap=T Milk=T Bread=T 13
Flashlights=T WATER=T Blankets=T Protien Bars=T Beer=T Bath Tissue=T Soap=T Milk=T Bread=T 10
Flashlights=T WATER=T Canned food=T Protien Bars=T Beer=T Bath Tissue=T Soap=T Milk=T Bread=T 10
Flashlights=T Blankets=T Canned food=T Protien Bars=T Beer=T Bath Tissue=T Soap=T Milk=T Bread=T 10
WATER=T Snow shovels=T Blankets=T Protien Bars=T Beer=T Bath Tissue=T Soap=T Hygine Products=T Milk=T
WATER=T Blankets=T Canned food=T Protien Bars=T Beer=T Bath Tissue=T Soap=T Milk=T Bread=T 10
Real-World Association Rules - Healthcare
The Institute for Integrated and Intelligent Systems implemented system-prototype,
named CSCP system, using the association rules of data mining technique applied to a patients’
(assumed) database for discovering patterns of diseases that might be carried by a patient. The
recognised pattern by this implementation definitely can improve the healthcare services along
with medical researchers for further exploring trends of diseases that are correlated. The
technique allow the IIIS to generate correlations among diseases. (Rashid)
Real-World Association Rules - Retailing
Retailers collect data every day – such as transactional data, customer demographics
and product sales based on parameters such as seasons and festivals. To convert this data into
knowledge and wisdom, it is necessary to discover and understand the underlying patterns
involved in the organisation’s operations from these data. Analysis of past transaction data is a
commonly used approach in order to improve the quality of such decisions. Extraction of
frequent itemsets is essential towards mining interesting patterns from datasets. A typical
usage scenario for searching frequent patterns is the so called “market basket analysis” that
involves analysing the transactional data of a supermarket or retail store in order to determine
which products are purchased together and how often and also examine customer purchase
preferences. (Prasad, 2011)
Real-World Association Rules - Finance
The bankruptcy prediction is very important for any organization. The financial statement is
used to predict the bankruptcy. The financial analysis is integrated to analyze the financial
statement. The financial statement has both balance sheet and income statement. The financial
statement is then used to build a bankruptcy prediction model. The Association Rule mining
Algorithm augments the efficiency of the proposed method by providing relevant results based
on the association between the businesses’ financial statements. (Martin, 2011)
References:
Rashid, M. , Hoque, T , Sattar, A. Association Rules Mining Based Clinical Observations.
Institute for Integrated and Intelligent Systems (IIIS). Retrieved from
https://arxiv.org/pdf/1401.2571.pdf
Kouris, I.N, Makris, C., Theodoridis, E., Tsakalidis, A. Association Rules Mining for Retail
Organizations. Retrieved from
http://www.igi-global.com/viewtitlesample.aspx?id=13583&ptid=362&t=association+rules+minin
g+for+retail+organizations
Prasad, P. Malik, L., Using Association Rule Mining for Extracting Product Sales Patterns in
Retail Store Transactions. 2011. Interational Journal on Computer Science and Engineering.
Retrieved from ​http://www.enggjournals.com/ijcse/doc/IJCSE11-03-05-185.pdf
SAS. The Assoc Procedure. Retrieved from
http://support.sas.com/documentation/onlinedoc/miner/em43/assoc.pdf
Martin, A. , Manjula, M., Venkatesan, P. A Business Intelligence Model to Predict Bankruptcy
using Financial Domain Ontology with Association Rule Mining Algorithm. 2011. International
Journal of Computer Science. Retrieved from ​https://arxiv.org/pdf/1109.1087.pdf
Alfaro, F., Solano, J. Apriori vs. FP-Growth for Frequent Item Set Mining. 2016. Retrieved from
http://singularities.com/blog/2015/08/apriori-vs-fpgrowth-for-frequent-item-set-mining

More Related Content

PPTX
11.19.2013.the.apriori.algorithm.and.its.extension.by.the.application.of.de m...
XLS
Performance Pay Modeller
PDF
Neural networks, naïve bayes and decision tree machine learning
PDF
Applying data mining for wine industry
PDF
Classification and decision tree classifier machine learning
PDF
El Nuevo Dia - Apuesta Tecnologica para la Salud 7 Agosto 2016 DHS Optimized
PDF
Collect Pro Datasheet
PDF
The iron triangle of healthcare
11.19.2013.the.apriori.algorithm.and.its.extension.by.the.application.of.de m...
Performance Pay Modeller
Neural networks, naïve bayes and decision tree machine learning
Applying data mining for wine industry
Classification and decision tree classifier machine learning
El Nuevo Dia - Apuesta Tecnologica para la Salud 7 Agosto 2016 DHS Optimized
Collect Pro Datasheet
The iron triangle of healthcare

Similar to Association rules and frequent pattern growth algorithms (20)

PPTX
Association rules
PPTX
Data SAcience with r progarmming Unit - V Part-1.pptx
PDF
Data Science - Part VI - Market Basket and Product Recommendation Engines
PPTX
BAS 250 Lecture 4
DOCX
Assignment #3 10.19.14
PPTX
Hiding slides
PDF
Data Mining Apriori Algorithm Implementation using R
PDF
6. Association Rule.pdf
PDF
MCA-IV_DataMining16_DataMining_AssociationRules_APriori_Keerti_Dixit.pdf
PDF
IRJET-Comparative Analysis of Apriori and Apriori with Hashing Algorithm
PDF
Top Down Approach to find Maximal Frequent Item Sets using Subset Creation
PDF
An Enhanced Approach of Sensitive Information Hiding
PDF
Understanding Association Rule Mining
PDF
Intake 37 DM
PPT
Rmining
DOCX
5Association AnalysisBasic Concepts an.docx
PDF
IRJET- Effecient Support Itemset Mining using Parallel Map Reducing
PDF
Eco550 Assignment 1
PDF
An Optimal Approach to derive Disjunctive Positive and Negative Rules from As...
PDF
B0950814
Association rules
Data SAcience with r progarmming Unit - V Part-1.pptx
Data Science - Part VI - Market Basket and Product Recommendation Engines
BAS 250 Lecture 4
Assignment #3 10.19.14
Hiding slides
Data Mining Apriori Algorithm Implementation using R
6. Association Rule.pdf
MCA-IV_DataMining16_DataMining_AssociationRules_APriori_Keerti_Dixit.pdf
IRJET-Comparative Analysis of Apriori and Apriori with Hashing Algorithm
Top Down Approach to find Maximal Frequent Item Sets using Subset Creation
An Enhanced Approach of Sensitive Information Hiding
Understanding Association Rule Mining
Intake 37 DM
Rmining
5Association AnalysisBasic Concepts an.docx
IRJET- Effecient Support Itemset Mining using Parallel Map Reducing
Eco550 Assignment 1
An Optimal Approach to derive Disjunctive Positive and Negative Rules from As...
B0950814
Ad

More from Francisco E. Figueroa-Nigaglioni (8)

PDF
Healthcare terminologies recommendations
PDF
Interoperability critique
PDF
Data mining applications
PDF
Integration and interoperability LOINC
PDF
Clustering algorithm Machine Learning
PDF
Caribbean Business News - eCloud Suite 050512
PPTX
Resumen Solucion CollectPro
PPTX
Introduction to CollectPro
Healthcare terminologies recommendations
Interoperability critique
Data mining applications
Integration and interoperability LOINC
Clustering algorithm Machine Learning
Caribbean Business News - eCloud Suite 050512
Resumen Solucion CollectPro
Introduction to CollectPro
Ad

Recently uploaded (20)

PDF
Understanding Forklifts - TECH EHS Solution
PDF
medical staffing services at VALiNTRY
PDF
top salesforce developer skills in 2025.pdf
PPTX
VVF-Customer-Presentation2025-Ver1.9.pptx
PDF
Odoo Companies in India – Driving Business Transformation.pdf
PDF
How Creative Agencies Leverage Project Management Software.pdf
PDF
Digital Strategies for Manufacturing Companies
PPTX
Odoo POS Development Services by CandidRoot Solutions
PDF
Design an Analysis of Algorithms II-SECS-1021-03
PDF
Design an Analysis of Algorithms I-SECS-1021-03
PDF
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
PPTX
ai tools demonstartion for schools and inter college
PDF
Adobe Illustrator 28.6 Crack My Vision of Vector Design
PPTX
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
PDF
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
PPTX
Essential Infomation Tech presentation.pptx
PDF
Upgrade and Innovation Strategies for SAP ERP Customers
PDF
Which alternative to Crystal Reports is best for small or large businesses.pdf
PDF
2025 Textile ERP Trends: SAP, Odoo & Oracle
PPTX
Introduction to Artificial Intelligence
Understanding Forklifts - TECH EHS Solution
medical staffing services at VALiNTRY
top salesforce developer skills in 2025.pdf
VVF-Customer-Presentation2025-Ver1.9.pptx
Odoo Companies in India – Driving Business Transformation.pdf
How Creative Agencies Leverage Project Management Software.pdf
Digital Strategies for Manufacturing Companies
Odoo POS Development Services by CandidRoot Solutions
Design an Analysis of Algorithms II-SECS-1021-03
Design an Analysis of Algorithms I-SECS-1021-03
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
ai tools demonstartion for schools and inter college
Adobe Illustrator 28.6 Crack My Vision of Vector Design
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
Essential Infomation Tech presentation.pptx
Upgrade and Innovation Strategies for SAP ERP Customers
Which alternative to Crystal Reports is best for small or large businesses.pdf
2025 Textile ERP Trends: SAP, Odoo & Oracle
Introduction to Artificial Intelligence

Association rules and frequent pattern growth algorithms

  • 1. Association Rules and Frequent Pattern Growth Algorithms CIS 435 Francisco E. Figueroa Executive Summary During the last years, we have witnessed an exponential growth in the amount of data generated and stored from all fields including science, business, and retailing. Data mining could be defined as the process concerned with applying computational techniques to find patterns in the data to generate knowledge and wisdom for the creation of new value for the companies. By conducting association rules mining on on given historical sales data, the results will be able to provide actionable intelligence to the business leadership team to the store can be prepare for the heavy snowstorm. Association Rules Overview The goal of the association rule is to identify all frequent itemsets above a user specified threshold (called support) and to generate all association rules above another threshold (called confident) using these frequent itemsets as input. The association analysis is useful for discovering relationships hidden in large data sets. The uncovered relationships can be represented in the form of association rules or sets of frequent items. Retailers can use this type of rules to help them identify new business opportunities for cross selling the products to the clients. For example, the following rule can be extracted potentially from the data: {milk} ----> {bread}. The rule suggests that a strong relationship exists between the sale of milk and bread because many customers who buy bread also buy milk. The association rule is an implication expression of the form X ---> Y, where X and Y are disjoint itemsets. Strength, Confidence and Lift The strength of the association rule can be measured in terms of its support and confidence. The Support determines how often a rule is applicable to a given data set, while the confidence determines how frequently items in Y appear in transactions that contain X. Support is an important measure because a rule that has very low support may occur simply by chance or that is likely to be uninteresting from a business perspective. Support can be used to eliminate uninteresting rules. The confidence measures the reliability of the infce made by the rule. For a given rule X--->Y, the higher the confidence, the more likely is for Y to be present in transactions that contain X. It also provides an estimate of the conditional probability of Y given X. The inference made by an association reul suggest a strong co-occurrence relationship between items in the antecedent and consequent rule. The Lift is equal to the confidence factor divided by the expected confidence. A credible rule has a large relative confidence factor, a relatively large level of support, and a value of lift greater than 1. Rules having a high level of confidence but little support should be interpreted with caution. (SAS, 2000)
  • 2. So, when you analyze, the Lift of the rule is X=>Y is the confidence of the rule divided by the expected confidence, assuming that the item sets are independent. Then we can say that: - if lift value is greater than 1 indicates that X and Y appear more often together than expected; this means that the occurrence of X has a positive effect on the occurrence of Y or that X is positively correlated with Y. - if lift is smaller than 1 indicates that X and Y appear less often together than expected, this means that the occurrence of X has a negative effect on the occurrence of Y or that X is negatively correlated with Y -if lift value is near 1 indicates that X and Y appear almost as often together as expected; this means that the occurrence of X has almost no effect on the occurrence of Y or that X and Y have Appriori The Apriori algorithm was proposed for mining frequent item sets to obtain strong Boolean association rules. A frequent itemset is a set of transactions that occurs with a minimum specified support. A strong rule is one that satisfies both minimum support and minimum confidence. Apriori algorithm uses an iterative level-wise search, where k-itemsets (an itemset that contains k items) are used to explore k+1 itemsets, to mine frequent itemsets from transactional database for Boolean association rules. The rule involved, is to first find the set of frequent 1-itemsets (k=1). This set is denoted L1. L1 is then used to find the set of frequent 2-itemsets, L2, which is in turn used to find L3, and so on, until no more frequent k-itemsets can be found. Each iteration involves two steps – 1) Generate large k-itemsets and 2) Determine the support of each itemset using the transaction database. Infrequent itemsets are then pruned and strong rules are generated from the frequent itemsets. FP Growth FP-Growth is an improvement of apriori designed to eliminate some of the heavy bottlenecks in apriori. FP-Growth simplifies all the problems present in apriori by using a structure called an FP-Tree. In the FP-Tree each node represents an item and it's current count, and each branch represents a different association. The whole algorithm is divided in 5 simple steps: first step, count all the items in all transaction; second step, apply the threshold; third step, sort the lists to the count of each item; fourth step, build the tree based on each transaction and all items in order they appear in the short list; and fifth step, every branch of the tree and only include in the association all the nodes whose count passed the threshold. The biggest advantage of the FP-Growth is that the algorithm needs to read the file twice, removes the need to calculate the pairs to be counted, does not required the amount of memory resources as the apriori. (Alfaro, 2016) Top 10 Products When applying Apriori: MetricType: confidence; numrules 40; car:True we obtained the following prooducts: Apriori: bath tissue, hat, water, soap, beer, flashlights, rock salt, protein bars, blankets and milk
  • 3. Top 10 Association Rules When applying Apriori and FP Growth we obtained the following results: FP Growth: MetricType: confidence; numrules: 40 1. [WATER=T]: 99 ==> [Soap=T]: 99 <conf:(1)> lift:(1.01) lev:(0.01) conv:(0.99) 2. [Soap=T]: 99 ==> [WATER=T]: 99 <conf:(1)> lift:(1.01) lev:(0.01) conv:(0.99) 3. [Beer=T]: 88 ==> [WATER=T]: 88 <conf:(1)> lift:(1.01) lev:(0.01) conv:(0.88) 4. [Flashlights=T]: 77 ==> [WATER=T]: 77 <conf:(1)> lift:(1.01) lev:(0.01) conv:(0.77) 5. [Milk=T]: 64 ==> [WATER=T]: 64 <conf:(1)> lift:(1.01) lev:(0.01) conv:(0.64) 6. [Blankets=T]: 64 ==> [WATER=T]: 64 <conf:(1)> lift:(1.01) lev:(0.01) conv:(0.64) 7. [Beer=T]: 88 ==> [Soap=T]: 88 <conf:(1)> lift:(1.01) lev:(0.01) conv:(0.88) 8. [Flashlights=T]: 77 ==> [Soap=T]: 77 <conf:(1)> lift:(1.01) lev:(0.01) conv:(0.77) 9. [Milk=T]: 64 ==> [Soap=T]: 64 <conf:(1)> lift:(1.01) lev:(0.01) conv:(0.64) 10. [Blankets=T]: 64 ==> [Soap=T]: 64 <conf:(1)> lift:(1.01) lev:(0.01) conv:(0.64) Apriori: MetricType: confidence; numrules 40; car:True 1. Bath Tissue=T 55 ==> Hat=T 55 conf:(1) 2. WATER=T Bath Tissue=T 54 ==> Hat=T 54 conf:(1) 3. Bath Tissue=T Soap=T 54 ==> Hat=T 54 conf:(1) 4. WATER=T Bath Tissue=T Soap=T 54 ==> Hat=T 54 conf:(1) 5. Beer=T Bath Tissue=T 48 ==> Hat=T 48 conf:(1) 6. WATER=T Beer=T Bath Tissue=T 48 ==> Hat=T 48 conf:(1) 7. Beer=T Bath Tissue=T Soap=T 48 ==> Hat=T 48 conf:(1) 8. WATER=T Beer=T Bath Tissue=T Soap=T 48 ==> Hat=T 48 conf:(1) 9. Flashlights=T Bath Tissue=T 39 ==> Hat=T 39 conf:(1) 10. Flashlights=T WATER=T Bath Tissue=T 39 ==> Hat=T 39 conf:(1) We can appreciate that water, soap, beer, and flashlights are strong products. Top 2 Products Purchased The FP Growth found 19 rules associated to “Generator”. ​If the lift is > 1, that lets us know the degree to which those two occurrences are dependent on one another, and makes those rules potentially useful for predicting the consequent in future data sets. In addition, the conviction show how often the rule can be incorrect. Based on those measures, we found that water and soap because it has a lift of 1.01 and a conviction of 0.1. Now beer is another strong product to purchase with the “Generator” because is has a lift of 1.14 but has a conviction of 1.2, so the rule can be 20% of the time incorrect. FPGrowth found 19 rules (displaying top 19) Showing only rules that contain: Generator 1. [Generator=T]: 10 ==> [WATER=T]: 10 <conf:(1)> lift:(1.01) lev:(0) conv:(0.1) 2. [Generator=T]: 10 ==> [Soap=T]: 10 <conf:(1)> lift:(1.01) lev:(0) conv:(0.1)
  • 4. 3. [Generator=T]: 10 ==> [Beer=T]: 10 <conf:(1)> lift:(1.14) lev:(0.01) conv:(1.2) 4. [Generator=T]: 10 ==> [WATER=T, Soap=T]: 10 <conf:(1)> lift:(1.01) lev:(0) conv:(0.1) 5. [WATER=T, Generator=T]: 10 ==> [Soap=T]: 10 <conf:(1)> lift:(1.01) lev:(0) conv:(0.1) 6. [Soap=T, Generator=T]: 10 ==> [WATER=T]: 10 <conf:(1)> lift:(1.01) lev:(0) conv:(0.1) 7. [Generator=T]: 10 ==> [WATER=T, Beer=T]: 10 <conf:(1)> lift:(1.14) lev:(0.01) conv:(1.2) 8. [WATER=T, Generator=T]: 10 ==> [Beer=T]: 10 <conf:(1)> lift:(1.14) lev:(0.01) conv:(1.2) 9. [Beer=T, Generator=T]: 10 ==> [WATER=T]: 10 <conf:(1)> lift:(1.01) lev:(0) conv:(0.1) 9 Large Itemsets To achieve a rules which contains 9 items (L(9)), Weka had to be configured with the following parameters: Apriori, CAR: True, lowerboundMinSupport: 0.1, metricType: confidency, minMetric 0.09, numrules: 400, outputitemsets: true. We obtain the following Large Itemsets L(9). Large Itemsets L(9): Rock salt=T WATER=T Snow shovels=T Blankets=T Protien Bars=T Bath Tissue=T Soap=T Hygine Products=T Milk=T 10 Flashlights=T WATER=T Blankets=T Canned food=T Protien Bars=T Beer=T Bath Tissue=T Soap=T Milk=T 10 Flashlights=T WATER=T Blankets=T Canned food=T Protien Bars=T Beer=T Bath Tissue=T Soap=T Bread=T 10 Flashlights=T WATER=T Blankets=T Canned food=T Protien Bars=T Beer=T Bath Tissue=T Milk=T Bread=T 10 Flashlights=T WATER=T Blankets=T Canned food=T Protien Bars=T Beer=T Soap=T Milk=T Bread=T 10 Flashlights=T WATER=T Blankets=T Canned food=T Protien Bars=T Bath Tissue=T Soap=T Milk=T Bread=T 10 Flashlights=T WATER=T Blankets=T Canned food=T Beer=T Bath Tissue=T Soap=T Milk=T Bread=T 13 Flashlights=T WATER=T Blankets=T Protien Bars=T Beer=T Bath Tissue=T Soap=T Milk=T Bread=T 10 Flashlights=T WATER=T Canned food=T Protien Bars=T Beer=T Bath Tissue=T Soap=T Milk=T Bread=T 10 Flashlights=T Blankets=T Canned food=T Protien Bars=T Beer=T Bath Tissue=T Soap=T Milk=T Bread=T 10 WATER=T Snow shovels=T Blankets=T Protien Bars=T Beer=T Bath Tissue=T Soap=T Hygine Products=T Milk=T WATER=T Blankets=T Canned food=T Protien Bars=T Beer=T Bath Tissue=T Soap=T Milk=T Bread=T 10 Real-World Association Rules - Healthcare The Institute for Integrated and Intelligent Systems implemented system-prototype, named CSCP system, using the association rules of data mining technique applied to a patients’ (assumed) database for discovering patterns of diseases that might be carried by a patient. The recognised pattern by this implementation definitely can improve the healthcare services along with medical researchers for further exploring trends of diseases that are correlated. The technique allow the IIIS to generate correlations among diseases. (Rashid) Real-World Association Rules - Retailing Retailers collect data every day – such as transactional data, customer demographics and product sales based on parameters such as seasons and festivals. To convert this data into knowledge and wisdom, it is necessary to discover and understand the underlying patterns involved in the organisation’s operations from these data. Analysis of past transaction data is a commonly used approach in order to improve the quality of such decisions. Extraction of
  • 5. frequent itemsets is essential towards mining interesting patterns from datasets. A typical usage scenario for searching frequent patterns is the so called “market basket analysis” that involves analysing the transactional data of a supermarket or retail store in order to determine which products are purchased together and how often and also examine customer purchase preferences. (Prasad, 2011) Real-World Association Rules - Finance The bankruptcy prediction is very important for any organization. The financial statement is used to predict the bankruptcy. The financial analysis is integrated to analyze the financial statement. The financial statement has both balance sheet and income statement. The financial statement is then used to build a bankruptcy prediction model. The Association Rule mining Algorithm augments the efficiency of the proposed method by providing relevant results based on the association between the businesses’ financial statements. (Martin, 2011) References: Rashid, M. , Hoque, T , Sattar, A. Association Rules Mining Based Clinical Observations. Institute for Integrated and Intelligent Systems (IIIS). Retrieved from https://arxiv.org/pdf/1401.2571.pdf Kouris, I.N, Makris, C., Theodoridis, E., Tsakalidis, A. Association Rules Mining for Retail Organizations. Retrieved from http://www.igi-global.com/viewtitlesample.aspx?id=13583&ptid=362&t=association+rules+minin g+for+retail+organizations Prasad, P. Malik, L., Using Association Rule Mining for Extracting Product Sales Patterns in Retail Store Transactions. 2011. Interational Journal on Computer Science and Engineering. Retrieved from ​http://www.enggjournals.com/ijcse/doc/IJCSE11-03-05-185.pdf SAS. The Assoc Procedure. Retrieved from http://support.sas.com/documentation/onlinedoc/miner/em43/assoc.pdf Martin, A. , Manjula, M., Venkatesan, P. A Business Intelligence Model to Predict Bankruptcy using Financial Domain Ontology with Association Rule Mining Algorithm. 2011. International Journal of Computer Science. Retrieved from ​https://arxiv.org/pdf/1109.1087.pdf Alfaro, F., Solano, J. Apriori vs. FP-Growth for Frequent Item Set Mining. 2016. Retrieved from http://singularities.com/blog/2015/08/apriori-vs-fpgrowth-for-frequent-item-set-mining