Assignment #3 10.19.14

Support is an indication of how frequently the items appear in the database.
Confidence indicates the number of times the if/then statements have been found to be true.
In data mining, association rules are useful for analyzing and predicting customer behavior. They
play an important part in shopping basket data analysis, product clustering, catalog design and store
layout.
In data mining and association rule learning, lift is a measure of the performance of a targeting
model (association rule) at predicting or classifying cases as having an enhanced response (with
respect to the population as a whole), measured against a random choice targeting model. A
targeting model is doing a good job if the response within the target is much better than the
average for the population as a whole. Lift is simply the ratio of these values: target response
divided by average response.
For example, suppose a population has an average response rate of 5%, but a certain model (or
rule) has identified a segment with a response rate of 20%. Then that segment would have a lift
of 4.0 (20%/5%).
Typically, the modeller seeks to divide the population into quantiles, and rank the quantiles by
lift. Organizations can then consider each quantile, and by weighing the predicted response rate
(and associated financial benefit) against the cost, they can decide whether to market to that
quantile or not.
Lift is analogous to information retrieval's average precision metric, if one treats the precision
(fraction of the positives that are true positives) as the target response probability.
The lift curve can also be considered a variation on the receiver operating characteristic (ROC)
curve, and is also known in econometrics as the Lorenz or power curve.[1]
The difference between the lifts observed on two different subgroups is called the uplift. The
subtraction of two lift curves forms the uplift curve, which is a metric used in uplift modelling.[2]
[3]
From your book below definitions

# 3 Determining what consists of a frequent item set is related to the concept of support. The support
of a rule is simply the number of transactions that include both the antecedent and consequent item
sets. It is called a support because it measures the degree to which the data “ support” the validity of
the rule. The support is sometimes expressed as a percentage of the total number of records in the
database. For example, the support for the item set { red, white} in the phone faceplate example is 4 (
100 × 4 10 = 40%). What constitutes a frequent item set is therefore defined as an item set that has a
support that exceeds a selected minimum support, determined by the user.
It is easy to generate frequent one- item sets. All we need to do is to count, for each item, how many
transactions in the database include the item. These transaction counts are the supports for the one -
item sets. We drop one- item sets that have support below the desired minimum support to create a list
of the frequent one- item sets. To generate frequent two- item sets, we use the frequent one- item sets.
The reasoning is that if a certain one- item set did not exceed the minimum support, any larger size item
set that includes it will not exceed the minimum support.
Confidence = To measure the strength of association implied by a rule, we use the measures of
confidence and lift ratio, as described below. Support and Confidence In addition to support, which we
described earlier, there is another measure that expresses the degree of uncertainty about the if – then
rule. This is known as the confidence1 of the rule. This measure compares the co- occurrence of the
antecedent and consequent item sets in the database to the occurrence of the antecedent item sets.
Confidence is defined as the ratio of the number of transactions that include all antecedent and
consequent item sets ( namely, the support) to the number of transactions that include all the
antecedent item sets: Confidence = no. transactions with both antecedent and consequent item sets no.
transactions tFor example, suppose that a supermarket database has 100,000 point- of-sale
transactions. Of these transactions, 2000 include both orange juice and ( over- the- counter) flu
medication, and 800 of these include soup purchases. The association rule “ IF orange juice and flu
medication are purchased THEN soup is purchased on the same trip” has a support of 800 transactions (
alternatively, 0.8% = 800/ 100,000) and a confidence of 40% (= 800/ 2000) . To see the relationship
between support and confidence, let us think about what each is measuring ( estimating). One way to
think of support is that it is the ( estimated) probability that a transaction selected randomly from the
database will contain all items in the antecedent and the consequent: P( antecedent AND consequent).
In comparison, the confidence is the ( estimated) conditional probability that a trans-action selected
randomly will include all the items in the consequent given that the transaction includes all the items in
the antecedent:
P( antecedent AND consequent) P( antecedent) = P( consequent | antecedent).
A high value of confidence suggests a strong association rule ( in which we are highly confident).
However, this can be deceptive because if the antecedentecedent item set and/ or the consequent has a
high level of support, we can have a high value for confidence even when the antecedent and
consequent are independent! For example, if nearly all customers buy bananas and nearly all customers

buy ice cream, the confidence level will be high regardless of whether there is an association between
the items.
Lift Ratio A better way to judge the strength of an association rule is to compare the confi -dence of the
rule with a benchmark value, where we assume that the occurrence of the consequent item set in a
transaction is independent of the occurrence of the antecedent for each rule. In other words, if the
antecedent and conse-quent item sets are independent, what confidence values would we expect to
see? Under independence, the support would be
P( antecedent AND consequent) = P( antecedent) × P( consequent),
and the benchmark confidence would be
P( antecedent) × P( consequent) P( antecedent) = P( consequent).
The estimate of this benchmark from the data, called the benchmark confidence value for a rule, is
computed by Benchmark confidence = no. transactions with consequent item set no. transactions in
database . We compare the confidence to the benchmark confidence by looking at their ratio: This is
called the lift ratio of a rule. The lift ratio is the confidence of the rule divided by the confidence,
assuming independence of consequent from antecedent: Lift ratio = confidence benchmark confidence .
and the benchmark confidence would be
P( antecedent) × P( consequent) P( antecedent) = P( consequent).
The estimate of this benchmark from the data, called the benchmark confidence value for a rule, is
computed by Benchmark confidence = no. transactions with consequent item set no. transactions in
database . We compare the confidence to the benchmark confidence by looking at their ratio: This is
called the lift ratio of a rule. The lift ratio is the confidence of the rule divided by the confidence,
assuming independence of consequent from antecedent: Lift ratio = confidence benchmark confidence .

Assignment #3 10.19.14

More Related Content

What's hot (19)

Viewers also liked (14)

Similar to Assignment #3 10.19.14 (20)

Assignment #3 10.19.14