SlideShare a Scribd company logo
BAS 250
Lesson 4: Association Rules
 Explain the concept of association rules
 Develop an association rules data mining model
 Understand the Apriori algorithm
 Interpret output generated by model
 Effectively employ the CRISP-DM method to your
assignment
This Week’s Learning Objectives
 Association rules are a data mining methodology that
seeks to find frequent connections between attributes
in a data set
o An example of this could be a shopping basket analysis
where marketers and vendors try to find which products
are most frequently purchased together
Association Rules
 Study of “what goes with what”
o “Customers who bought X also bought Y”
o What symptoms go with what diagnosis
 Transaction-based or event-based
 Also called “market basket analysis” and “affinity analysis”
 Originated with study of customer transactions databases
to determine associations among items purchased
What are Association Rules?
 So what kind of items are we talking about? There are many
applications of association:
o Product recommendation – like Amazon’s “customers who bought that,
also bought this”
o Medical diagnosis – like with diabetes
o Content optimization – magazine websites or blogs or even menu
design for a restaurant
o DNA genome analysis – patterns in cellular data
o Fraud detection in finance
Association Rules
Used in many recommender systems
Screenshot, Amazon.com
 Key Point:
 Association rules use “if/then” statements to
uncover relationships between seemingly
unrelated data.
 “If a customer buys a dozen eggs, then he is 80%
more likely to also purchase milk.”
Association Rules
 “IF” part = antecedent
 “THEN” part = consequent
 “Item set” = the items (e.g., products) comprising the antecedent or
consequent
 Antecedent and consequent are disjoint when there are no items in
common
Terms
 Although association rule operators require
binominal data types, it’s helpful to evaluate
the average (avg) and standard deviation for
each attribute.
Association Rules
Association Rules
 Standard deviations are measurements of how
dispersed or varied the values in an attribute are
o A good rule of thumb- any value that is smaller than two
standard deviations below the mean or two standard
deviations above the mean is a statistical outlier
 For example, if Average = 36.731 and Standard Deviation =
10.647, acceptable range of values should be 15.437 to 58.025
 Not a hard-and-fast rule
Association Rules
 RapidMiner uses binominal instead of binomial
o Binomial means one of two numbers (usually 0 and 1),
meaning the basic underlying data type is still numeric
o Binominal means one of two values which may be numeric
or character based
Association Rules
 An example of a data type transformation of all attributes in
RapidMiner
Association Rules
 Frequency Pattern analysis is handy for many
kinds of data mining and is a necessary
component of association rule mining
o We use this to determine whether any of the
patterns in the data occur often enough to be
considered rules
Association Rules
Results of an FP-Growth operator in RapidMiner
Association Rules
 Two main factors that dictate whether or not frequency
patterns get translated into association rules:
 Confidence percent - how confident we are that when one
attribute is flagged as true, the associated attribute will also
be flagged as true
 Support percent - the number of times that the rule did
occur, divided by the number of observations in the data set
Confidence & Support Percent
 Out of 10 shopping baskets…
 Cookies were purchased in 4 baskets
 Milk was purchased in 7 baskets
 Cookies  3 out of 4 instances where cookies were purchased milk
was also in those baskets
 Confidence % = How often are we when 1 attribute = True that an associated
attribute = True
 Therefore, we have a 75% confidence in the association rule cookies g milk
Confidence Percent - Example
 Out of 10 shopping baskets…
 Milk was purchased in 7 baskets
 Cookies were purchased in 4 baskets
 Milk  3 out of 7 instances where milk was purchased that cookies
were also in those baskets
 Confidence % = How often are we when 1 attribute = True that an associated
attribute = True
 Therefore, we have a 43% confidence in the association rule milk g cookies
Confidence Percent - Example
 Premise (or antecedent) g Conclusion (or consequent)
 Low Confidence % = only happens by chance and can
be used to eliminate uninteresting rules
 When evaluating associations between three or more
attributes, the confidence percentages are calculated
based on the two attributes being found with the third
Confidence Percent (cont.)
 An example of association rules found with
50% confidence threshold in RapidMiner
Confidence Percent (cont.)
 If cookies and milk were found together 3 out of
the 10 shopping baskets, support percentage is
calculated as 30% (3/10 = .3)
o There is no reciprocal for support percentages since
this metric is simply the number of times the
association occurred over the number of times it could
have occurred in the data set
Support Percent
 The higher the Support %, the higher the likelihood for Y to be present in
purchases of X
 Estimate of conditional probability of Y given X.
o Remember: “If a customer buys a dozen eggs, then he is 80% more likely to
also purchase milk.”
 There is no reciprocal for support percentages since this metric is simply
the number of times the association occurred over the number of times it
could have occurred in the data set
Support Percent
 Minimum Confidence = 50%
 Minimum Support = 20%
 However, for large data sets, your minimum support
could be much lower and still be valid.
 Hint: Your homework will contain a large data set.
Typical Ranges
 6 colors within 10 Transactions
Tiny Example: Phone Faceplates
Transaction Faceplate Colors Purchased
1 Red White Green
2 White Orange
3 White Blue
4 Red White Orange
5 Red Blue
6 White Blue
7 White Orange
8 Red White Blue Green
9 Red White Blue
10 yellow
 For example: Transaction 1 supports several
rules, such as
o “If red, then white” (“If a red faceplate is purchased,
then so is a white one”)
o “If white, then red”
o “If red and white, then green”
o + several more
Many Rules are Possible
 Ideally, we want to create all possible combinations of items
 Problem: computation time grows exponentially as # items
increases
 Solution: consider only “frequent item sets”
 Criterion for frequent: Support %
Frequent Item Sets
 Support = # (or percent) of transactions that
include both the antecedent and the consequent
 Example: support for the item set {red, white} is 4
out of 10 transactions, or 40%
Support
Apriori Algorithm
 Apriori Principle:
 “If an item set is frequent, then all of its
subsets MUST also be frequent.”
Apriori Algorithm
 The strategy of trimming the exponential search
space based on the Support measure is known
as “Support-based Pruning”
 This is the foundational component of the Apriori
Algorithm.
Apriori Algorithm
 For k products…
 User sets a minimum support criterion
 Next, generate list of one-item sets that meet the support criterion
 Use the list of one-item sets to generate list of two-item sets that
meet the support criterion
 Use list of two-item sets to generate list of three-item sets
 Continue up through k-item sets
Generating Frequent Item Sets
 Confidence: the % of antecedent transactions that also have the
consequent item set
 Lift = confidence/(benchmark confidence)
 Benchmark confidence = transactions with consequent as % of all
transactions
 Lift > 1 indicates a rule that is useful in finding consequent items sets (i.e.,
more useful than just selecting transactions randomly)
Measures of Performance
 Generate all rules that meet specified support
& confidence
o Find frequent item sets (those with sufficient
support – see above)
o From these item sets, generate rules with
sufficient confidence
Process of Rule Selection
 {red, white} > {green} with confidence = 2/4 = 50%
o [(support {red, white, green})/(support {red, white})]
 {red, green} > {white} with confidence = 2/2 = 100%
o [(support {red, white, green})/(support {red, green})]
 Plus 4 more with confidence of 100%, 33%, 29% & 100%
 If confidence criterion is 70%, report only rules 2, 3 and 6
Example: Rules from {red, white,
green}
 Lift ratio shows how effective the rule is in finding consequents
(useful if finding particular consequents is important)
 Confidence shows the rate at which consequents will be found
(useful in learning costs of promotion)
 Support measures overall impact
Interpretation
 Random data can generate apparently interesting
association rules
 The more rules you produce, the greater this danger
 Rules based on large numbers of records are less
subject to this danger
Caution: The Role of Chance
 Association rules (or affinity analysis, or market basket analysis) produce
rules on associations between items from a database of transactions
 Widely used in recommender systems
 Most popular method is Apriori algorithm
 To reduce computation, we consider only “frequent” item sets (=support)
 Performance is measured by confidence and lift
 Can produce a profusion of rules; review is required to identify useful rules
and to reduce redundancy
Summary
 Explain the concept of association rules
 Develop an association rules data mining model
 Understand the Apriori algorithm
 Interpret output generated by model
 Effectively employ the CRISP-DM method to your
assignment
Summary - Learning Objectives
“This workforce solution was funded by a grant awarded by the U.S. Department of Labor’s
Employment and Training Administration. The solution was created by the grantee and does not
necessarily reflect the official position of the U.S. Department of Labor. The Department of Labor
makes no guarantees, warranties, or assurances of any kind, express or implied, with respect to such
information, including any information on linked sites and including, but not limited to, accuracy of the
information or its completeness, timeliness, usefulness, adequacy, continued availability, or ownership.”
Except where otherwise stated, this work by Wake Technical Community College Building Capacity in
Business Analytics, a Department of Labor, TAACCCT funded project, is licensed under the Creative
Commons Attribution 4.0 International License. To view a copy of this license, visit
http://creativecommons.org/licenses/by/4.0/
Copyright Information

More Related Content

PDF
Understanding Association Rule Mining
PPTX
Association rule mining
PPTX
Mining Association Rules in Large Database
PDF
What Is a Model, Anyhow?
PPT
Data mining arm-2009-v0
DOCX
Data Analytics Notes
PDF
G0364347
PDF
Using Machine Learning to Find a needle in a haystack Aureus Analytics
Understanding Association Rule Mining
Association rule mining
Mining Association Rules in Large Database
What Is a Model, Anyhow?
Data mining arm-2009-v0
Data Analytics Notes
G0364347
Using Machine Learning to Find a needle in a haystack Aureus Analytics

What's hot (7)

PPTX
Parameter
PPTX
How to Enter the Data Analytics Industry?
PPT
cfa in mplus
PDF
Chris Stuccio - Data science - Conversion Hotel 2015
PDF
Data science course in mysore
PPTX
Basic statistical & pharmaceutical statistical applications
PPTX
Pearson Chi-Square
Parameter
How to Enter the Data Analytics Industry?
cfa in mplus
Chris Stuccio - Data science - Conversion Hotel 2015
Data science course in mysore
Basic statistical & pharmaceutical statistical applications
Pearson Chi-Square
Ad

Viewers also liked (9)

PDF
PPTX
BAS 250 Lecture 2
PPT
Dwh lecture slides-week15
PDF
Lecture13 - Association Rules
PDF
Data Mining: Association Rules Basics
PPTX
Data mining
PPT
Data Warehousing and Data Mining
PPT
Data Mining Concepts
BAS 250 Lecture 2
Dwh lecture slides-week15
Lecture13 - Association Rules
Data Mining: Association Rules Basics
Data mining
Data Warehousing and Data Mining
Data Mining Concepts
Ad

Similar to BAS 250 Lecture 4 (20)

PPTX
Unit 4_ML.pptx
PPT
Pert 06 association rules
PPTX
Data mining presentation.ppt
PPT
dm14-association-rules (BahanAR-3).ppt
PPTX
Lect6 Association rule & Apriori algorithm
PPT
Data Mining: Association-Rules Techniques.ppt
PPTX
Association rule mining and Apriori algorithm
PPTX
Introduction to Association Rules.pptx
PPTX
Business intelligence
PPTX
big data seminar.pptx
PPTX
Association in Frequent Pattern Mining
PDF
Data Science - Part VI - Market Basket and Product Recommendation Engines
PPT
associations and Data Mining in Machine learning.ppt
PDF
Association Rule Mining(ARM) notes for class
PDF
Probability Distributions of Univariate Data
PPTX
Association Rule Mining
PPTX
MODULE 5 _ Mining frequent patterns and associations.pptx
PPTX
Association Analysis in Data Mining
PPTX
Data SAcience with r progarmming Unit - V Part-1.pptx
PDF
What goes with what (Market Basket Analysis)
Unit 4_ML.pptx
Pert 06 association rules
Data mining presentation.ppt
dm14-association-rules (BahanAR-3).ppt
Lect6 Association rule & Apriori algorithm
Data Mining: Association-Rules Techniques.ppt
Association rule mining and Apriori algorithm
Introduction to Association Rules.pptx
Business intelligence
big data seminar.pptx
Association in Frequent Pattern Mining
Data Science - Part VI - Market Basket and Product Recommendation Engines
associations and Data Mining in Machine learning.ppt
Association Rule Mining(ARM) notes for class
Probability Distributions of Univariate Data
Association Rule Mining
MODULE 5 _ Mining frequent patterns and associations.pptx
Association Analysis in Data Mining
Data SAcience with r progarmming Unit - V Part-1.pptx
What goes with what (Market Basket Analysis)

More from Wake Tech BAS (12)

PPTX
BAS 250 Lecture 8
PPTX
BAS 250 Lecture 5
PPTX
BAS 250 Lecture 3
PPTX
BAS 250 Lecture 1
PPTX
BAS 150 Lesson 8 Lecture
PPTX
BAS 150 Lesson 7 Lecture
PPTX
BAS 150 Lesson 6 Lecture
PPTX
BAS 150 Lesson 5 Lecture
PPTX
BAS 150 Lesson 4 Lecture
PPTX
BAS 150 Lesson 3 Lecture
PPTX
BAS 150 Lesson 2 Lecture
PPTX
BAS 150 Lesson 1 Lecture
BAS 250 Lecture 8
BAS 250 Lecture 5
BAS 250 Lecture 3
BAS 250 Lecture 1
BAS 150 Lesson 8 Lecture
BAS 150 Lesson 7 Lecture
BAS 150 Lesson 6 Lecture
BAS 150 Lesson 5 Lecture
BAS 150 Lesson 4 Lecture
BAS 150 Lesson 3 Lecture
BAS 150 Lesson 2 Lecture
BAS 150 Lesson 1 Lecture

Recently uploaded (20)

PDF
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
PPTX
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
PDF
O7-L3 Supply Chain Operations - ICLT Program
PDF
Mark Klimek Lecture Notes_240423 revision books _173037.pdf
PDF
Pre independence Education in Inndia.pdf
PPTX
Pharmacology of Heart Failure /Pharmacotherapy of CHF
PPTX
Cell Types and Its function , kingdom of life
PPTX
Week 4 Term 3 Study Techniques revisited.pptx
PDF
Microbial disease of the cardiovascular and lymphatic systems
PDF
Classroom Observation Tools for Teachers
PDF
RMMM.pdf make it easy to upload and study
PDF
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
PDF
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
PPTX
master seminar digital applications in india
PDF
Origin of periodic table-Mendeleev’s Periodic-Modern Periodic table
PDF
STATICS OF THE RIGID BODIES Hibbelers.pdf
PDF
102 student loan defaulters named and shamed – Is someone you know on the list?
PDF
Module 4: Burden of Disease Tutorial Slides S2 2025
PPTX
Institutional Correction lecture only . . .
PDF
TR - Agricultural Crops Production NC III.pdf
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
O7-L3 Supply Chain Operations - ICLT Program
Mark Klimek Lecture Notes_240423 revision books _173037.pdf
Pre independence Education in Inndia.pdf
Pharmacology of Heart Failure /Pharmacotherapy of CHF
Cell Types and Its function , kingdom of life
Week 4 Term 3 Study Techniques revisited.pptx
Microbial disease of the cardiovascular and lymphatic systems
Classroom Observation Tools for Teachers
RMMM.pdf make it easy to upload and study
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
master seminar digital applications in india
Origin of periodic table-Mendeleev’s Periodic-Modern Periodic table
STATICS OF THE RIGID BODIES Hibbelers.pdf
102 student loan defaulters named and shamed – Is someone you know on the list?
Module 4: Burden of Disease Tutorial Slides S2 2025
Institutional Correction lecture only . . .
TR - Agricultural Crops Production NC III.pdf

BAS 250 Lecture 4

  • 1. BAS 250 Lesson 4: Association Rules
  • 2.  Explain the concept of association rules  Develop an association rules data mining model  Understand the Apriori algorithm  Interpret output generated by model  Effectively employ the CRISP-DM method to your assignment This Week’s Learning Objectives
  • 3.  Association rules are a data mining methodology that seeks to find frequent connections between attributes in a data set o An example of this could be a shopping basket analysis where marketers and vendors try to find which products are most frequently purchased together Association Rules
  • 4.  Study of “what goes with what” o “Customers who bought X also bought Y” o What symptoms go with what diagnosis  Transaction-based or event-based  Also called “market basket analysis” and “affinity analysis”  Originated with study of customer transactions databases to determine associations among items purchased What are Association Rules?
  • 5.  So what kind of items are we talking about? There are many applications of association: o Product recommendation – like Amazon’s “customers who bought that, also bought this” o Medical diagnosis – like with diabetes o Content optimization – magazine websites or blogs or even menu design for a restaurant o DNA genome analysis – patterns in cellular data o Fraud detection in finance Association Rules
  • 6. Used in many recommender systems Screenshot, Amazon.com
  • 7.  Key Point:  Association rules use “if/then” statements to uncover relationships between seemingly unrelated data.  “If a customer buys a dozen eggs, then he is 80% more likely to also purchase milk.” Association Rules
  • 8.  “IF” part = antecedent  “THEN” part = consequent  “Item set” = the items (e.g., products) comprising the antecedent or consequent  Antecedent and consequent are disjoint when there are no items in common Terms
  • 9.  Although association rule operators require binominal data types, it’s helpful to evaluate the average (avg) and standard deviation for each attribute. Association Rules
  • 11.  Standard deviations are measurements of how dispersed or varied the values in an attribute are o A good rule of thumb- any value that is smaller than two standard deviations below the mean or two standard deviations above the mean is a statistical outlier  For example, if Average = 36.731 and Standard Deviation = 10.647, acceptable range of values should be 15.437 to 58.025  Not a hard-and-fast rule Association Rules
  • 12.  RapidMiner uses binominal instead of binomial o Binomial means one of two numbers (usually 0 and 1), meaning the basic underlying data type is still numeric o Binominal means one of two values which may be numeric or character based Association Rules
  • 13.  An example of a data type transformation of all attributes in RapidMiner Association Rules
  • 14.  Frequency Pattern analysis is handy for many kinds of data mining and is a necessary component of association rule mining o We use this to determine whether any of the patterns in the data occur often enough to be considered rules Association Rules
  • 15. Results of an FP-Growth operator in RapidMiner Association Rules
  • 16.  Two main factors that dictate whether or not frequency patterns get translated into association rules:  Confidence percent - how confident we are that when one attribute is flagged as true, the associated attribute will also be flagged as true  Support percent - the number of times that the rule did occur, divided by the number of observations in the data set Confidence & Support Percent
  • 17.  Out of 10 shopping baskets…  Cookies were purchased in 4 baskets  Milk was purchased in 7 baskets  Cookies  3 out of 4 instances where cookies were purchased milk was also in those baskets  Confidence % = How often are we when 1 attribute = True that an associated attribute = True  Therefore, we have a 75% confidence in the association rule cookies g milk Confidence Percent - Example
  • 18.  Out of 10 shopping baskets…  Milk was purchased in 7 baskets  Cookies were purchased in 4 baskets  Milk  3 out of 7 instances where milk was purchased that cookies were also in those baskets  Confidence % = How often are we when 1 attribute = True that an associated attribute = True  Therefore, we have a 43% confidence in the association rule milk g cookies Confidence Percent - Example
  • 19.  Premise (or antecedent) g Conclusion (or consequent)  Low Confidence % = only happens by chance and can be used to eliminate uninteresting rules  When evaluating associations between three or more attributes, the confidence percentages are calculated based on the two attributes being found with the third Confidence Percent (cont.)
  • 20.  An example of association rules found with 50% confidence threshold in RapidMiner Confidence Percent (cont.)
  • 21.  If cookies and milk were found together 3 out of the 10 shopping baskets, support percentage is calculated as 30% (3/10 = .3) o There is no reciprocal for support percentages since this metric is simply the number of times the association occurred over the number of times it could have occurred in the data set Support Percent
  • 22.  The higher the Support %, the higher the likelihood for Y to be present in purchases of X  Estimate of conditional probability of Y given X. o Remember: “If a customer buys a dozen eggs, then he is 80% more likely to also purchase milk.”  There is no reciprocal for support percentages since this metric is simply the number of times the association occurred over the number of times it could have occurred in the data set Support Percent
  • 23.  Minimum Confidence = 50%  Minimum Support = 20%  However, for large data sets, your minimum support could be much lower and still be valid.  Hint: Your homework will contain a large data set. Typical Ranges
  • 24.  6 colors within 10 Transactions Tiny Example: Phone Faceplates Transaction Faceplate Colors Purchased 1 Red White Green 2 White Orange 3 White Blue 4 Red White Orange 5 Red Blue 6 White Blue 7 White Orange 8 Red White Blue Green 9 Red White Blue 10 yellow
  • 25.  For example: Transaction 1 supports several rules, such as o “If red, then white” (“If a red faceplate is purchased, then so is a white one”) o “If white, then red” o “If red and white, then green” o + several more Many Rules are Possible
  • 26.  Ideally, we want to create all possible combinations of items  Problem: computation time grows exponentially as # items increases  Solution: consider only “frequent item sets”  Criterion for frequent: Support % Frequent Item Sets
  • 27.  Support = # (or percent) of transactions that include both the antecedent and the consequent  Example: support for the item set {red, white} is 4 out of 10 transactions, or 40% Support
  • 29.  Apriori Principle:  “If an item set is frequent, then all of its subsets MUST also be frequent.” Apriori Algorithm
  • 30.  The strategy of trimming the exponential search space based on the Support measure is known as “Support-based Pruning”  This is the foundational component of the Apriori Algorithm. Apriori Algorithm
  • 31.  For k products…  User sets a minimum support criterion  Next, generate list of one-item sets that meet the support criterion  Use the list of one-item sets to generate list of two-item sets that meet the support criterion  Use list of two-item sets to generate list of three-item sets  Continue up through k-item sets Generating Frequent Item Sets
  • 32.  Confidence: the % of antecedent transactions that also have the consequent item set  Lift = confidence/(benchmark confidence)  Benchmark confidence = transactions with consequent as % of all transactions  Lift > 1 indicates a rule that is useful in finding consequent items sets (i.e., more useful than just selecting transactions randomly) Measures of Performance
  • 33.  Generate all rules that meet specified support & confidence o Find frequent item sets (those with sufficient support – see above) o From these item sets, generate rules with sufficient confidence Process of Rule Selection
  • 34.  {red, white} > {green} with confidence = 2/4 = 50% o [(support {red, white, green})/(support {red, white})]  {red, green} > {white} with confidence = 2/2 = 100% o [(support {red, white, green})/(support {red, green})]  Plus 4 more with confidence of 100%, 33%, 29% & 100%  If confidence criterion is 70%, report only rules 2, 3 and 6 Example: Rules from {red, white, green}
  • 35.  Lift ratio shows how effective the rule is in finding consequents (useful if finding particular consequents is important)  Confidence shows the rate at which consequents will be found (useful in learning costs of promotion)  Support measures overall impact Interpretation
  • 36.  Random data can generate apparently interesting association rules  The more rules you produce, the greater this danger  Rules based on large numbers of records are less subject to this danger Caution: The Role of Chance
  • 37.  Association rules (or affinity analysis, or market basket analysis) produce rules on associations between items from a database of transactions  Widely used in recommender systems  Most popular method is Apriori algorithm  To reduce computation, we consider only “frequent” item sets (=support)  Performance is measured by confidence and lift  Can produce a profusion of rules; review is required to identify useful rules and to reduce redundancy Summary
  • 38.  Explain the concept of association rules  Develop an association rules data mining model  Understand the Apriori algorithm  Interpret output generated by model  Effectively employ the CRISP-DM method to your assignment Summary - Learning Objectives
  • 39. “This workforce solution was funded by a grant awarded by the U.S. Department of Labor’s Employment and Training Administration. The solution was created by the grantee and does not necessarily reflect the official position of the U.S. Department of Labor. The Department of Labor makes no guarantees, warranties, or assurances of any kind, express or implied, with respect to such information, including any information on linked sites and including, but not limited to, accuracy of the information or its completeness, timeliness, usefulness, adequacy, continued availability, or ownership.” Except where otherwise stated, this work by Wake Technical Community College Building Capacity in Business Analytics, a Department of Labor, TAACCCT funded project, is licensed under the Creative Commons Attribution 4.0 International License. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ Copyright Information