SlideShare a Scribd company logo
Support is an indication of how frequently the items appear in the database. 
Confidence indicates the number of times the if/then statements have been found to be true. 
In data mining, association rules are useful for analyzing and predicting customer behavior. They 
play an important part in shopping basket data analysis, product clustering, catalog design and store 
layout. 
In data mining and association rule learning, lift is a measure of the performance of a targeting 
model (association rule) at predicting or classifying cases as having an enhanced response (with 
respect to the population as a whole), measured against a random choice targeting model. A 
targeting model is doing a good job if the response within the target is much better than the 
average for the population as a whole. Lift is simply the ratio of these values: target response 
divided by average response. 
For example, suppose a population has an average response rate of 5%, but a certain model (or 
rule) has identified a segment with a response rate of 20%. Then that segment would have a lift 
of 4.0 (20%/5%). 
Typically, the modeller seeks to divide the population into quantiles, and rank the quantiles by 
lift. Organizations can then consider each quantile, and by weighing the predicted response rate 
(and associated financial benefit) against the cost, they can decide whether to market to that 
quantile or not. 
Lift is analogous to information retrieval's average precision metric, if one treats the precision 
(fraction of the positives that are true positives) as the target response probability. 
The lift curve can also be considered a variation on the receiver operating characteristic (ROC) 
curve, and is also known in econometrics as the Lorenz or power curve.[1] 
The difference between the lifts observed on two different subgroups is called the uplift. The 
subtraction of two lift curves forms the uplift curve, which is a metric used in uplift modelling.[2] 
[3] 
From your book below definitions
# 3 Determining what consists of a frequent item set is related to the concept of support. The support 
of a rule is simply the number of transactions that include both the antecedent and consequent item 
sets. It is called a support because it measures the degree to which the data “ support” the validity of 
the rule. The support is sometimes expressed as a percentage of the total number of records in the 
database. For example, the support for the item set { red, white} in the phone faceplate example is 4 ( 
100 × 4 10 = 40%). What constitutes a frequent item set is therefore defined as an item set that has a 
support that exceeds a selected minimum support, determined by the user. 
It is easy to generate frequent one- item sets. All we need to do is to count, for each item, how many 
transactions in the database include the item. These transaction counts are the supports for the one - 
item sets. We drop one- item sets that have support below the desired minimum support to create a list 
of the frequent one- item sets. To generate frequent two- item sets, we use the frequent one- item sets. 
The reasoning is that if a certain one- item set did not exceed the minimum support, any larger size item 
set that includes it will not exceed the minimum support. 
Confidence = To measure the strength of association implied by a rule, we use the measures of 
confidence and lift ratio, as described below. Support and Confidence In addition to support, which we 
described earlier, there is another measure that expresses the degree of uncertainty about the if – then 
rule. This is known as the confidence1 of the rule. This measure compares the co- occurrence of the 
antecedent and consequent item sets in the database to the occurrence of the antecedent item sets. 
Confidence is defined as the ratio of the number of transactions that include all antecedent and 
consequent item sets ( namely, the support) to the number of transactions that include all the 
antecedent item sets: Confidence = no. transactions with both antecedent and consequent item sets no. 
transactions tFor example, suppose that a supermarket database has 100,000 point- of-sale 
transactions. Of these transactions, 2000 include both orange juice and ( over- the- counter) flu 
medication, and 800 of these include soup purchases. The association rule “ IF orange juice and flu 
medication are purchased THEN soup is purchased on the same trip” has a support of 800 transactions ( 
alternatively, 0.8% = 800/ 100,000) and a confidence of 40% (= 800/ 2000) . To see the relationship 
between support and confidence, let us think about what each is measuring ( estimating). One way to 
think of support is that it is the ( estimated) probability that a transaction selected randomly from the 
database will contain all items in the antecedent and the consequent: P( antecedent AND consequent). 
In comparison, the confidence is the ( estimated) conditional probability that a trans-action selected 
randomly will include all the items in the consequent given that the transaction includes all the items in 
the antecedent: 
P( antecedent AND consequent) P( antecedent) = P( consequent | antecedent). 
A high value of confidence suggests a strong association rule ( in which we are highly confident). 
However, this can be deceptive because if the antecedentecedent item set and/ or the consequent has a 
high level of support, we can have a high value for confidence even when the antecedent and 
consequent are independent! For example, if nearly all customers buy bananas and nearly all customers
buy ice cream, the confidence level will be high regardless of whether there is an association between 
the items. 
Lift Ratio A better way to judge the strength of an association rule is to compare the confi -dence of the 
rule with a benchmark value, where we assume that the occurrence of the consequent item set in a 
transaction is independent of the occurrence of the antecedent for each rule. In other words, if the 
antecedent and conse-quent item sets are independent, what confidence values would we expect to 
see? Under independence, the support would be 
P( antecedent AND consequent) = P( antecedent) × P( consequent), 
and the benchmark confidence would be 
P( antecedent) × P( consequent) P( antecedent) = P( consequent). 
The estimate of this benchmark from the data, called the benchmark confidence value for a rule, is 
computed by Benchmark confidence = no. transactions with consequent item set no. transactions in 
database . We compare the confidence to the benchmark confidence by looking at their ratio: This is 
called the lift ratio of a rule. The lift ratio is the confidence of the rule divided by the confidence, 
assuming independence of consequent from antecedent: Lift ratio = confidence benchmark confidence . 
and the benchmark confidence would be 
P( antecedent) × P( consequent) P( antecedent) = P( consequent). 
The estimate of this benchmark from the data, called the benchmark confidence value for a rule, is 
computed by Benchmark confidence = no. transactions with consequent item set no. transactions in 
database . We compare the confidence to the benchmark confidence by looking at their ratio: This is 
called the lift ratio of a rule. The lift ratio is the confidence of the rule divided by the confidence, 
assuming independence of consequent from antecedent: Lift ratio = confidence benchmark confidence .

More Related Content

PDF
Data Science - Part VI - Market Basket and Product Recommendation Engines
PDF
Data Preparation with the help of Analytics Methodology
PPTX
1. Introduction to Association Rule 2. Frequent Item Set Mining 3. Market Bas...
DOCX
Exam Short Preparation on Data Analytics
PDF
Types of analytics & the structures of data
PDF
Data Averaging
PDF
A NEW ASSOCIATION RULE MINING BASED ON FREQUENT ITEM SET
PDF
Statistics for data scientists
Data Science - Part VI - Market Basket and Product Recommendation Engines
Data Preparation with the help of Analytics Methodology
1. Introduction to Association Rule 2. Frequent Item Set Mining 3. Market Bas...
Exam Short Preparation on Data Analytics
Types of analytics & the structures of data
Data Averaging
A NEW ASSOCIATION RULE MINING BASED ON FREQUENT ITEM SET
Statistics for data scientists

What's hot (19)

PDF
Data Science - Part III - EDA & Model Selection
PDF
Logistic regression
DOCX
Demand forecasting
PDF
Reducing False Positives - BSA AML Transaction Monitoring Re-Tuning Approach
PDF
Stock market prediction using Twitter sentiment analysis
DOCX
AI IoT data science and consumer behaviour with assortment planning , pricing...
PPTX
Data Analysis and Statistics
PDF
Types of Probability Distributions - Statistics II
PDF
The Use of Bitcoin for Portfolio Optimization
DOCX
Scope and objective of the assignment
PDF
An Enhanced Approach of Sensitive Information Hiding
PDF
Google Stock Price Forecasting
PDF
BOOTSTRAPPING TO EVALUATE RESPONSE MODELS: A SAS® MACRO
PDF
Leveraging Technology and Analytics BSA Risk Assessment
PDF
IRJET- Effecient Support Itemset Mining using Parallel Map Reducing
PPTX
Presentation on the topic of association rule mining
PDF
Types of Statistics
PDF
Multiple sample test - Anova, Chi-square, Test of association, Goodness of Fit
DOC
Statistics Assignments 090427
Data Science - Part III - EDA & Model Selection
Logistic regression
Demand forecasting
Reducing False Positives - BSA AML Transaction Monitoring Re-Tuning Approach
Stock market prediction using Twitter sentiment analysis
AI IoT data science and consumer behaviour with assortment planning , pricing...
Data Analysis and Statistics
Types of Probability Distributions - Statistics II
The Use of Bitcoin for Portfolio Optimization
Scope and objective of the assignment
An Enhanced Approach of Sensitive Information Hiding
Google Stock Price Forecasting
BOOTSTRAPPING TO EVALUATE RESPONSE MODELS: A SAS® MACRO
Leveraging Technology and Analytics BSA Risk Assessment
IRJET- Effecient Support Itemset Mining using Parallel Map Reducing
Presentation on the topic of association rule mining
Types of Statistics
Multiple sample test - Anova, Chi-square, Test of association, Goodness of Fit
Statistics Assignments 090427
Ad

Viewers also liked (14)

DOCX
Trabajo de parto
PPTX
Pola ayat dasar
PPT
Laranja baten historia - Historia de una naranja
PPTX
6LH Mugikortasuna Astea
PDF
Arizonako olatua
PPTX
Argazki Lehiaketa Mugikortasuna - Concurso fotografías movilidad
PDF
Time usa 14 april 2014
PPTX
Radiologia de torax
PPTX
Scientific Method Notes
PPT
Atmosferaren kutsadura kimikoa eta osasuna
DOCX
Trabajo de parto
PPT
Kutsadura akustikoa eta osasuna
PDF
Ipuina: zubi misteriotsuan
PPTX
vlsi design summer training ppt
Trabajo de parto
Pola ayat dasar
Laranja baten historia - Historia de una naranja
6LH Mugikortasuna Astea
Arizonako olatua
Argazki Lehiaketa Mugikortasuna - Concurso fotografías movilidad
Time usa 14 april 2014
Radiologia de torax
Scientific Method Notes
Atmosferaren kutsadura kimikoa eta osasuna
Trabajo de parto
Kutsadura akustikoa eta osasuna
Ipuina: zubi misteriotsuan
vlsi design summer training ppt
Ad

Similar to Assignment #3 10.19.14 (20)

PPTX
Data SAcience with r progarmming Unit - V Part-1.pptx
PPTX
BAS 250 Lecture 4
PDF
Understanding Association Rule Mining
PPTX
Association rules
PPTX
Business intelligence
PDF
Market Basket Analysis of bakery Shop
PPTX
1.pptx .
PDF
Association rules and frequent pattern growth algorithms
PPTX
Association rule mining and Apriori algorithm
PDF
IRJET-Comparative Analysis of Apriori and Apriori with Hashing Algorithm
PDF
Mining Negative Association Rules
PPTX
Factor Analysis from sets of measures.pptx
PDF
IRJET- Minning Frequent Patterns,Associations and Correlations
DOCX
5Association AnalysisBasic Concepts an.docx
PPTX
Analyzing Adverse Drug Events Using Data Mining Approach
DOCX
Buy iso 9001
PPTX
Unit 4_ML.pptx
DOCX
An iso 9001 certified company
PPTX
MBKM_Minggu 9_Association Rule with R Studio.pptx
DOCX
How to get iso 9001 certification
Data SAcience with r progarmming Unit - V Part-1.pptx
BAS 250 Lecture 4
Understanding Association Rule Mining
Association rules
Business intelligence
Market Basket Analysis of bakery Shop
1.pptx .
Association rules and frequent pattern growth algorithms
Association rule mining and Apriori algorithm
IRJET-Comparative Analysis of Apriori and Apriori with Hashing Algorithm
Mining Negative Association Rules
Factor Analysis from sets of measures.pptx
IRJET- Minning Frequent Patterns,Associations and Correlations
5Association AnalysisBasic Concepts an.docx
Analyzing Adverse Drug Events Using Data Mining Approach
Buy iso 9001
Unit 4_ML.pptx
An iso 9001 certified company
MBKM_Minggu 9_Association Rule with R Studio.pptx
How to get iso 9001 certification

Assignment #3 10.19.14

  • 1. Support is an indication of how frequently the items appear in the database. Confidence indicates the number of times the if/then statements have been found to be true. In data mining, association rules are useful for analyzing and predicting customer behavior. They play an important part in shopping basket data analysis, product clustering, catalog design and store layout. In data mining and association rule learning, lift is a measure of the performance of a targeting model (association rule) at predicting or classifying cases as having an enhanced response (with respect to the population as a whole), measured against a random choice targeting model. A targeting model is doing a good job if the response within the target is much better than the average for the population as a whole. Lift is simply the ratio of these values: target response divided by average response. For example, suppose a population has an average response rate of 5%, but a certain model (or rule) has identified a segment with a response rate of 20%. Then that segment would have a lift of 4.0 (20%/5%). Typically, the modeller seeks to divide the population into quantiles, and rank the quantiles by lift. Organizations can then consider each quantile, and by weighing the predicted response rate (and associated financial benefit) against the cost, they can decide whether to market to that quantile or not. Lift is analogous to information retrieval's average precision metric, if one treats the precision (fraction of the positives that are true positives) as the target response probability. The lift curve can also be considered a variation on the receiver operating characteristic (ROC) curve, and is also known in econometrics as the Lorenz or power curve.[1] The difference between the lifts observed on two different subgroups is called the uplift. The subtraction of two lift curves forms the uplift curve, which is a metric used in uplift modelling.[2] [3] From your book below definitions
  • 2. # 3 Determining what consists of a frequent item set is related to the concept of support. The support of a rule is simply the number of transactions that include both the antecedent and consequent item sets. It is called a support because it measures the degree to which the data “ support” the validity of the rule. The support is sometimes expressed as a percentage of the total number of records in the database. For example, the support for the item set { red, white} in the phone faceplate example is 4 ( 100 × 4 10 = 40%). What constitutes a frequent item set is therefore defined as an item set that has a support that exceeds a selected minimum support, determined by the user. It is easy to generate frequent one- item sets. All we need to do is to count, for each item, how many transactions in the database include the item. These transaction counts are the supports for the one - item sets. We drop one- item sets that have support below the desired minimum support to create a list of the frequent one- item sets. To generate frequent two- item sets, we use the frequent one- item sets. The reasoning is that if a certain one- item set did not exceed the minimum support, any larger size item set that includes it will not exceed the minimum support. Confidence = To measure the strength of association implied by a rule, we use the measures of confidence and lift ratio, as described below. Support and Confidence In addition to support, which we described earlier, there is another measure that expresses the degree of uncertainty about the if – then rule. This is known as the confidence1 of the rule. This measure compares the co- occurrence of the antecedent and consequent item sets in the database to the occurrence of the antecedent item sets. Confidence is defined as the ratio of the number of transactions that include all antecedent and consequent item sets ( namely, the support) to the number of transactions that include all the antecedent item sets: Confidence = no. transactions with both antecedent and consequent item sets no. transactions tFor example, suppose that a supermarket database has 100,000 point- of-sale transactions. Of these transactions, 2000 include both orange juice and ( over- the- counter) flu medication, and 800 of these include soup purchases. The association rule “ IF orange juice and flu medication are purchased THEN soup is purchased on the same trip” has a support of 800 transactions ( alternatively, 0.8% = 800/ 100,000) and a confidence of 40% (= 800/ 2000) . To see the relationship between support and confidence, let us think about what each is measuring ( estimating). One way to think of support is that it is the ( estimated) probability that a transaction selected randomly from the database will contain all items in the antecedent and the consequent: P( antecedent AND consequent). In comparison, the confidence is the ( estimated) conditional probability that a trans-action selected randomly will include all the items in the consequent given that the transaction includes all the items in the antecedent: P( antecedent AND consequent) P( antecedent) = P( consequent | antecedent). A high value of confidence suggests a strong association rule ( in which we are highly confident). However, this can be deceptive because if the antecedentecedent item set and/ or the consequent has a high level of support, we can have a high value for confidence even when the antecedent and consequent are independent! For example, if nearly all customers buy bananas and nearly all customers
  • 3. buy ice cream, the confidence level will be high regardless of whether there is an association between the items. Lift Ratio A better way to judge the strength of an association rule is to compare the confi -dence of the rule with a benchmark value, where we assume that the occurrence of the consequent item set in a transaction is independent of the occurrence of the antecedent for each rule. In other words, if the antecedent and conse-quent item sets are independent, what confidence values would we expect to see? Under independence, the support would be P( antecedent AND consequent) = P( antecedent) × P( consequent), and the benchmark confidence would be P( antecedent) × P( consequent) P( antecedent) = P( consequent). The estimate of this benchmark from the data, called the benchmark confidence value for a rule, is computed by Benchmark confidence = no. transactions with consequent item set no. transactions in database . We compare the confidence to the benchmark confidence by looking at their ratio: This is called the lift ratio of a rule. The lift ratio is the confidence of the rule divided by the confidence, assuming independence of consequent from antecedent: Lift ratio = confidence benchmark confidence . and the benchmark confidence would be P( antecedent) × P( consequent) P( antecedent) = P( consequent). The estimate of this benchmark from the data, called the benchmark confidence value for a rule, is computed by Benchmark confidence = no. transactions with consequent item set no. transactions in database . We compare the confidence to the benchmark confidence by looking at their ratio: This is called the lift ratio of a rule. The lift ratio is the confidence of the rule divided by the confidence, assuming independence of consequent from antecedent: Lift ratio = confidence benchmark confidence .