Predictive Model for Customer Segmentation using Database Marketing Techniques

Database
Marketing
and
CRM
–

Analyzing
DONOR
data
set

Akanksha
Jain

Project
Goals

•  Goal:
Using
historical
data
set
DONOR_RAW,
develop
a

model
which
can
predict
whether
the
prospect
will

donate/
not
donate

•  Scope:
DONOR_RAW
data
set

•  50
Variables

•  19,372
observaKons

•  Dependent
Variable:
TARGET_B(Binary)

•  Responder:
1

•  Non-‐Responder:
0

TOOLS

•  SAS
Enterprise
Miner
4.3

•  SAS
9.3_M1

Data
Source

•  Reject
Variables:

•  TARGET_D
(using
TARGET_B
as
target)

•  ID
(an
id
number)

•  WEALTH_RATING
(huge
no.
of
missing
values)

•  Variable
TARGET_B

•  Change
Role
to
TARGET

•  Change
Order
to
DESCENDING

•  Select
complete
data
set
as
Sample

•  Set
Prior
ProbabiliKes

•  Responder:
0.05

•  Non-‐Responder:
0.95

Data
Partition

•  Train
–
60%

•  Validate
–
25%

•  Test
–
15%

Variable
Transformation

Taking
Log
TransformaKon
to
reduce
Skewness

•  LIFETIME_GIFT_RANGE

•  LIFETIME_MAX_GIFT_AMT

•  LIFETIME_MIN_GIFT_AMT

•  MOR_HIT_RATE

•  FILE_AVG_GIFT

•  LIFETIME_AVG_GIFT_AMT

•  PCT_ATTRIBUTE1

•  LAST_GIFT_AMT

•  RECENT_AVG_GIFT_AMT

Keep
all
variables,
original
and
log
transformaKons

Model:
CHAID

•  Nominal
Criterion:
Chi
Square

•  Signiﬁcance
Level:
0.1

•  Minimum
number
of
observaKons
in
a
leaf
=
25

•  ObservaKons
required
for
a
split
search
=
55

•  Model
assessment
measure:
Total
Leaf
Impurity
(Gini

Index)

Model:
CHAID
(con’t)

Model:
CHAID
(con’t)

Inference:

FREQUENCY_STATUS_97NK
=
3
or
4;

MONTHS_SINCE_LAST_GIFT
<
8.5

1%
=
56%

Less
MarkeKng
Eﬀort
needed
as
most
likely
that

they
will
donate
anyways

=
3
or
4;

>=
8.5;

NUMBER_PROM_12
<11.5

1%
=
43%

Will
also
donate
but
the
company
should
be

careful
and
not
send
them
too
many
promoKons

=
3
or
4;

>=
8.5;

NUMBER_PROM_12
>=
11.5

1%
=
30%

Are
geong
too
many
promoKons;
and
hence

company
should
cut
on
sending
them

promoKons

=
1,
2
or
Missing

1%
=
21%

Study
them
more
closely
as
in
why
they
are
not

donaKng,
what
other
factors
are
responsible
and

then
decide
how
to
design
a
markeKng

campaign
for
them.

Variable
Selection

•  Target
AssociaKons:
Select
Chi
Square

Model:
Forward
Regression

MODEL
OPTIONS
-‐>
INPUT
CODING
-‐>DEVIATION

SELECTION
METHOD
-‐>
FORWARD

CRITERIA
-‐>
CROSS
VALIDATION
MISCLASSIFICATION

ADVANCED
-‐>
OPTIMIZATION
METHOD
-‐>
NEWTON-‐RAPHSON

w/
LINE
SEARCH

•  SL
Entry:
0.05

• 
• 
• 
•

Model:
Forward
Regression

(con’t)

Model:
Backward
Regression

MODEL
OPTIONS
-‐>
INPUT
CODING
-‐>DEVIATION

SELECTION
METHOD
-‐>
BACKWARD

CRITERIA
-‐>
CROSS
VALIDATION
MISCLASSIFICATION

ADVANCED
-‐>
OPTIMIZATION
METHOD
-‐>
NEWTON-‐RAPHSON

w/
LINE
SEARCH

•  SL
Stay:
0.05

• 
• 
• 
•

Model:
Backward
Regression

(con’t)

Model:
Stepwise
Regression

MODEL
OPTIONS
-‐>
INPUT
CODING
-‐>DEVIATION

SELECTION
METHOD
-‐>
STEPWISE

CRITERIA
-‐>
CROSS
VALIDATION
MISCLASSIFICATION

ADVANCED
-‐>
OPTIMIZATION
METHOD
-‐>
NEWTON-‐RAPHSON

w/
LINE
SEARCH

•  SL
Entry:
0.15

•  SL
Stay:
0.05

• 
• 
• 
•

Model:
Stepwise
Regression

(con’t)

Variable
Comparison

Forward

Backward

Stepwise

FILE_CARD_GIFT

FILE_CARD_GIFT

FILE_CARD_GIFT




INCOME_GROUP

INCOME_GROUP

INCOME_GROUP

LIFE_AV9*

LIFE_AV9*

LIFE_AV9*




PEP_STAR

PEP_STAR

PEP_STAR

LIFETIME_GIFT_AMOUNT

MEDIAN_HOUSEHOLD_INCOME

RECENT_RESPONSE_PROP

*LIFE_AV9
is
the
log(LIFETIME_AVG_GIFT_AMOUNT)

Model
Comparison

(Validation):
Cumulative
LIFT

Model
Comparison

(Validation):
Cumulative
LIFT

Inference:

•  Capture
top
20%
of
the
market
-‐>FORWARD

•  Capture
top
30%
of
the
market
-‐>BACKWARD

Model
Comparison
(TEST):

Cumulative
LIFT

Model
Comparison
(TEST):

Cumulative
LIFT

Inference:

•  Capture
top
20%
of
the
market
-‐>FORWARD

•  Capture
top
30%
of
the
market
-‐>
ANY

Forward
versus
Backward

•  Variables:

•  LIFETIME_GIFT_AMOUNT

•  MEDIAN_HOUSEHOLD_INCOME

•  RECENT_RESPONSE_PROP

•  CorrelaKons:

•  MEDIAN_HOUSEHOLD_INCOME
and
INCOME_GROUP
=
43%

•  LIFE_AV9
and
LIFETIME_AVG_GIFT_AMT
=
83%

•  FILE_CARD_GIFT
and
=
30%

Model:
Forward
+


•  Variable
SelecKon
(call
it
Variable_1Extra):

• 
• 
• 
• 
• 
• 
• 

FILE_CARD_GIFT


INCOME_GROUP

LIFE_AV9


PEP_STAR


•  Reject
other
variables
manually

•  Call
this
model
For_1Extra

Model
Comparison

(Validation):
Cumulative
LIFT

Inference:

•  Capture
top
20%
of
the
market
-‐>For_1Extra

•  Capture
top
30%
of
the
market
-‐>
BACKWARD

Model
Comparison
(TEST):

Cumulative
LIFT

Inference:

•  Capture
top
20%
of
the
market
-‐>For_1Extra

•  Capture
top
30%
of
the
market
-‐>
ANY

Model:
Decision

Final
Model:
FOR_1EXTRA

Variables:

• 
• 
• 
• 
• 
• 
• 

FILE_CARD_GIFT


INCOME_GROUP

LIFE_AV9


PEP_STAR


Interaction
Terms

•  FREQ_PEP
=
*
PEP_STAR

•  FREQ_MONTH
=
*


•  FREQ_INCOME
=
*

INCOME_GROUP

Model:
Forward
Regression

with
Interaction
Terms

Rename
model
as
FOR_1E_INT

MODEL
OPTIONS
-‐>
INPUT
CODING
-‐>DEVIATION

SELECTION
METHOD
-‐>
FORWARD

CRITERIA
-‐>
CROSS
VALIDATION
MISCLASSIFICATION

ADVANCED
-‐>
OPTIMIZATION
METHOD
-‐>
NEWTON-‐RAPHSON

w/
LINE
SEARCH

•  SL
Entry:
0.05

• 
• 
• 
• 
•

Model
FOR_1E_INT:

Cumulative
LIFT

Model
FOR_1E_INT:
Variable

List

Model
Comparison

(Validation):
Cumulative
LIFT

Inference:

•  Capture
top
20%
of
the
market
-‐>FOR_1E_INT

•  Capture
top
30%
of
the
market
-‐>
FOR_1EXTRA

Model
Comparison
(TEST):

Cumulative
LIFT

Inference:

•  Capture
top
20%
of
the
market
-‐>ANY

•  Capture
top
30%
of
the
market
-‐>
FOR_1EXTRA

Model:
For_1EXTRA
+

Interaction
terms

•  Variable
SelecKon
(call
it
Variable_UNION):

• 
• 
• 
• 
• 
• 
• 
• 
• 
• 

FILE_CARD_GIFT

FREQENCY_STATUS_97NK

INCOME_GROUP

LIFE_AV9


PEP_STAR


FREQ_PEP

FREQ_MONTH

FREQ_INCOME

•  Reject
other
variables
manually

•  Call
this
model
For_Union

Model
Comparison

(Validation):
Cumulative
LIFT

Inference:

•  Capture
top
20%
of
the
market
-‐>
FOR_1E_INT

•  Capture
top
30%
of
the
market
-‐>
FOR_UNION

Model
Comparison
(Test):

Cumulative
LIFT

Model
Comparison
(Test):

Cumulative
LIFT

Inference:

•  Capture
top
20%
of
the
market
-‐>
ANY

•  Capture
top
30%
of
the
market
-‐>
FOR_UNION

Model:
Decision

Final
Model:
FOR_1EXTRA
because

•  No
signiﬁcant
improvement
with
other
models

•  InteracKon
terms
bring
along
complexity

Variables:

• 
• 
• 
• 
• 
• 
• 

FILE_CARD_GIFT


INCOME_GROUP

LIFE_AV9


PEP_STAR


Score:
On
Donor_Raw_Data

Predictive Model for Customer Segmentation using Database Marketing Techniques

More Related Content

Similar to Predictive Model for Customer Segmentation using Database Marketing Techniques (20)

Recently uploaded (20)

Predictive Model for Customer Segmentation using Database Marketing Techniques