“On visualizing Direct and Partial Correlations – ELI plots”
Leonardo E. Auslender
SAS Institute, Inc., Bedminster, NJ
1. Introduction
Statisticians and data analysts focus on correlations among pairs of
variables to understand the strength of linear relationships in the data.
Since correlations measure relations among pairs of variables, the
standard output is in matrix form, which tends to be difficult to interpret
for a large number of variables. The superlative analyst may also
incorporate partial correlations to further deepen the analysis, which at
least doubles the standard output. The hapless data-miner who faces
hundreds, if not thousands, of variables does not long to wade through
reams of outputs of correlations to find “interesting” patterns.1
In this paper, I present a method that enables to visualize any number of
Pearson (and partial) correlations by using a Proc-Timeplot-like output I
call Exploratory Linear Information (ELI) plots. Proc Timeplot is a
procedure available in SAS Base, of the SAS Institute SAS software,
since at least version 5.18. 2
Proc Timeplot “plots one or more variables
over time intervals” (SAS Procedures Guide, v. 6, 3rd
. edition, p. 579);
the time interval variable acts as an index for the observations being
plotted. Notice that the index variable is itself not plotted and, moreover,
that it is not at all necessary to have a time variable as an index (p. 581
of the same manual, ‘date’ variable.). In this paper, our index is a
variable that contains the names of the variables being correlated against
a ‘with’ variable, and we plot correlations (and partial correlations if so
desired) in an overlay fashion.
The proposed method, embedded in a SAS macro, allows to:
a) Plot correlations of either all variables against each
other or against a single 'with' variable, properly sorted
by the absolute value of the correlation.
b) Plot on the same graph described in a) the first ‘nth’
largest absolute value partial correlations, ‘n’ being a
chosen parameter dependent upon the desired crowding
of information in the plot.
c) Print the correlation and p-value matrices in a tabulate
fashion. The standard output is usually difficult to read
due to the intricacies of conceptualizing of long
sequences of numbers. 3
The tabulate presentation,
neater but still difficult to interpret, is necessary for
documentation.
2. Exploratory data analysis, variable selection and
correlation matrices.
The typical practice of data analysis includes, at least in principle,
exploratory data analysis, as espoused by Tukey (1977). More recently,
Cleveland (1993) emphasized visualization techniques, and many
research papers investigate the topic. This paper addresses the issue of
visualizing correlations, itself a component of EDA, with simple tools
available in the SAS System.
In addition, the hurried data mining practitioner finds himself/herself in
search of selecting variables for a model, a segmentation algorithm or a
customer profile, in an environment of hundreds and perhaps thousands
of variables. Stepwise methods, however much criticized, are one of the
present methodologies used to address variable selection.
In addition to variable selection techniques, practitioners also look at
correlations among variables to investigate linear dependencies. Less
frequently, practitioners look at squared partial (first order) correlation
coefficients. Given the linear model Y = α + β X + δ Z + ε with the
typical assumptions, these coefficients measure the proportion of
variation of a variable Y not estimated by X that is estimated by Z in
linear models. Equivalently, they measure the correlation between Y and
X holding Z constant. Direct and indirect effects of X and Z on Y can be
measured by the partial correlation coefficients. In the same vein, second
order partial correlation coefficients can be defined by partialling out an
additional variable from a first-order partial correlation. And third,
fourth, etc.
Specifically, given X, Y and Z, the zero order correlation between X and
Y is given by:
rxy = ( Σ (xi - x’) (yi -y’)) / √ Σ (xi - x’)2
Σ(yi - y’)2
where the apostrophe denotes mean value.
The partial correlation of x and y, given z, is:
rxy.z = ( rxy - rxz ryz) / √ (1 – rxz
2
) (1 – ryz
2
).
3. Programming considerations.
The Corr Procedure (with which the reader should be familiar to fully
understand this paper) is the basic tool for finding correlations, as in the
following code embedded in a macro:
PROC CORR DATA = &INDATA. OUTP = &OUTDATA. (WHERE = (_TYPE_ IN (“CORR”, “N))
RENAME = (_NAME_ = WITH)) NOPRINT;
%IF %NRBQUOTE(&WITH.) > %THEN WITH &WITH.; %STR(;)
VAR %DO K = 1 %TO &NUMVAR.; &&VAR&K. %END; %STR(;)
RUN;
In this macro-code, we are requesting not to print (NOPRINT) the
correlations, but to keep them in the data set &OUTDATA. The rest of
the code allows for the use of a ‘with’ variable and of selected VAR
variables. The names of the variables have been kept in macro variables
var1 through var&numvar. (&numvar. being the number of variables)
because we require the variables to be alphabetically ordered to search
for missing values later on. The standard output data set referenced by
&OUTDATA. provides the correlations but not the number of
observations for the ‘with’ variable. This number is critical in
determining p-values, and given the prevalence of missing values in
large databases, it forces us to re-capture that information. 4
(See section
3 below the typical Proc corr output).
OUTDATA AFTER PROC CORROUTDATA AFTER PROC CORROUTDATA AFTER PROC CORROUTDATA AFTER PROC CORR
OBS _TYPE_ _WITH LN_DAY N_DAYLS2 N_DAYLST N_DAYSEX N_INTRST RESPONSEOBS _TYPE_ _WITH LN_DAY N_DAYLS2 N_DAYLST N_DAYSEX N_INTRST RESPONSEOBS _TYPE_ _WITH LN_DAY N_DAYLS2 N_DAYLST N_DAYSEX N_INTRST RESPONSEOBS _TYPE_ _WITH LN_DAY N_DAYLS2 N_DAYLST N_DAYSEX N_INTRST RESPONSE
1 N 26610.00 38185.00 38185.00 38185.00 38185.00 22931.001 N 26610.00 38185.00 38185.00 38185.00 38185.00 22931.001 N 26610.00 38185.00 38185.00 38185.00 38185.00 22931.001 N 26610.00 38185.00 38185.00 38185.00 38185.00 22931.00
2 CORR LN_DAY 1.00 0.77 0.92 0.72 0.11 0.992 CORR LN_DAY 1.00 0.77 0.92 0.72 0.11 0.992 CORR LN_DAY 1.00 0.77 0.92 0.72 0.11 0.992 CORR LN_DAY 1.00 0.77 0.92 0.72 0.11 0.99
3 CORR N_DAYLS2 0.77 1.00 0.95 0.86 0.03 0.683 CORR N_DAYLS2 0.77 1.00 0.95 0.86 0.03 0.683 CORR N_DAYLS2 0.77 1.00 0.95 0.86 0.03 0.683 CORR N_DAYLS2 0.77 1.00 0.95 0.86 0.03 0.68
4 CORR N_DAYLST 0.924 CORR N_DAYLST 0.924 CORR N_DAYLST 0.924 CORR N_DAYLST 0.92 0.95 1.00 0.85 0.06 0.870.95 1.00 0.85 0.06 0.870.95 1.00 0.85 0.06 0.870.95 1.00 0.85 0.06 0.87
5 CORR N_DAYSEX 0.72 0.86 0.85 1.00 0.03 0.665 CORR N_DAYSEX 0.72 0.86 0.85 1.00 0.03 0.665 CORR N_DAYSEX 0.72 0.86 0.85 1.00 0.03 0.665 CORR N_DAYSEX 0.72 0.86 0.85 1.00 0.03 0.66
6 CORR N_INTRST 0.11 0.03 0.06 0.03 1.006 CORR N_INTRST 0.11 0.03 0.06 0.03 1.006 CORR N_INTRST 0.11 0.03 0.06 0.03 1.006 CORR N_INTRST 0.11 0.03 0.06 0.03 1.00 0.120.120.120.12
7 CORR RESPONSE 0.99 0.68 0.87 0.66 0.12 1.007 CORR RESPONSE 0.99 0.68 0.87 0.66 0.12 1.007 CORR RESPONSE 0.99 0.68 0.87 0.66 0.12 1.007 CORR RESPONSE 0.99 0.68 0.87 0.66 0.12 1.00
8 CORR SEXUNKN8 CORR SEXUNKN8 CORR SEXUNKN8 CORR SEXUNKN ----0.21 0.020.21 0.020.21 0.020.21 0.02 ----0.08 0.320.08 0.320.08 0.320.08 0.32 ----0.070.070.070.07 ----0.240.240.240.24
9 CORR TENURE9 CORR TENURE9 CORR TENURE9 CORR TENURE ----0.050.050.050.05 0.010.010.010.01 ----0.01 0.030.01 0.030.01 0.030.01 0.03 ----0.050.050.050.05 ----0.040.040.040.04
Due to the likelihood of the presence of missing values, it is necessary to
find out the number of non-missing observations for every pair of
variables. Since the &outdata. data set provides the number of present
observations for individual variables (but not for the ‘with’ variable), it
is necessary to obtain the information for those pairs in which at least
one variable has missing values. Once the number of non-missing values
is determined for every pair of variables, the p-values are computed by:
√√√√ (N – 2). Corr
____________ , ∼∼∼∼ t (N - 2).
√√√√ (1 – Corr2
)
which can be programmed as:
_STAT = ABS (SQRT(_NUMOBS - 2) * _CORR / SQRT ( 1 - (_CORR * _CORR)));
IF _NUMOBS > 100 OR _STAT > 40
THEN _P_VAL = ROUND ( 2 * (1 - PROBNORM (_STAT)),.00001);
ELSE IF _STAT > . THEN _P_VAL =
ROUND ( 2 * (1 - PROBT ( _STAT, _NUMOBS - 2 ,0 )),.00001);
ELSE _P_VAL = .;
At this point, we have obtained or calculated correlations and p-values
that allow us to “timeplot”. Since we have p-value information (in sas
data set &SASWORK.7 below), the analyst may desire to plot only
significant correlations, usually given by a p-value threshold. The
Timeplot code is:
PROC TIMEPLOT DATA = &SASWORK.7;
PLOT _CORR = "0" %IF &PARTIAL. = Y %THEN %DO K = 1 %TO &N_PRTLS.;
MXPART&K. = "&K."
%END;
/ OVERLAY NPP POS = 60 HILOC REF = 0 REFCHAR = '|' OVPCHAR = "*"
AXIS = -1 TO 1 BY .02 ;
ID _VARLBL ; /* VAR NAME + LABEL */
BY _WITH; /* SET OF WITH VARS */
TITLE2
%IF &PARTIAL. = Y %THEN "CORRS BY #BYVAL1, &N_PRTLS. PARTIALS REQUESTED";
%ELSE "CORRELATIONS BY #BYVAL1";
%STR(;)
%IF &SGNFCNT. = Y %THEN TITLE3 "SIGNIFICANT CORRS 95% ONLY"; %STR(;)
RUN;
In this code, we request at least to plot the correlation between a set of
‘with’ and ‘var’ variables (_WITH, _CORR) identified in the plot by the
value 0 (zero level correlation). If partial correlations are requested as
well, calculated in a “PROC IML” step, (“%DO K = 1 %TO
&N_PRTLS. …”), their values are identified by 1, 2, 3 … &N_prtls. in
descending order, where &n_prtls. is a user determined parameter. The
names of the variables partialled out corresponding to 1, 2, 3… are
found in a later printout under the names PART1, PART2, PART3 … .
We use * to denote overprinting (Ovpchar option).
3. Case Study.
I present one case, without a ‘with’ variable. 5
The ‘with’ variable case is
merely a subset of the more general case. All the variables are
continuous and their meaning is unimportant for this exercise. The usual
(clipped) printout of Proc Corr and the (clipped) Output data set
generated in this case are:
LN_DAYLN_DAYLN_DAYLN_DAY
LN_DAY RESPONSE N_DAYLST N_DAYLS2 N_DAYSEX TOT_RCVDLN_DAY RESPONSE N_DAYLST N_DAYLS2 N_DAYSEX TOT_RCVDLN_DAY RESPONSE N_DAYLST N_DAYLS2 N_DAYSEX TOT_RCVDLN_DAY RESPONSE N_DAYLST N_DAYLS2 N_DAYSEX TOT_RCVD
1.00000 0.99097 0.92451 0.76645 0.72429 0.224471.00000 0.99097 0.92451 0.76645 0.72429 0.224471.00000 0.99097 0.92451 0.76645 0.72429 0.224471.00000 0.99097 0.92451 0.76645 0.72429 0.22447
0.0 0.0001 0.0001 00.0 0.0001 0.0001 00.0 0.0001 0.0001 00.0 0.0001 0.0001 0.0001 0.0001 0.0001.0001 0.0001 0.0001.0001 0.0001 0.0001.0001 0.0001 0.0001
26610 16057 26610 26610 26610 2661026610 16057 26610 26610 26610 2661026610 16057 26610 26610 26610 2661026610 16057 26610 26610 26610 26610
SEXUNKN N_INTRST TENURE V3 V1 V2SEXUNKN N_INTRST TENURE V3 V1 V2SEXUNKN N_INTRST TENURE V3 V1 V2SEXUNKN N_INTRST TENURE V3 V1 V2
----0.21161 0.109580.21161 0.109580.21161 0.109580.21161 0.10958 ----0.053240.053240.053240.05324 ----0.01432 0.004370.01432 0.004370.01432 0.004370.01432 0.00437 ----0.001370.001370.001370.00137
0.0001 0.0001 0.0001 0.0195 0.4757 0.82280.0001 0.0001 0.0001 0.0195 0.4757 0.82280.0001 0.0001 0.0001 0.0195 0.4757 0.82280.0001 0.0001 0.0001 0.0195 0.4757 0.8228
26610 26610 26610 26610 26610 266126610 26610 26610 26610 26610 266126610 26610 26610 26610 26610 266126610 26610 26610 26610 26610 26610000
N_DAYLS2N_DAYLS2N_DAYLS2N_DAYLS2
N_DAYLS2 N_DAYLST N_DAYSEX LN_DAY RESPONSE TOT_RCVDN_DAYLS2 N_DAYLST N_DAYSEX LN_DAY RESPONSE TOT_RCVDN_DAYLS2 N_DAYLST N_DAYSEX LN_DAY RESPONSE TOT_RCVDN_DAYLS2 N_DAYLST N_DAYSEX LN_DAY RESPONSE TOT_RCVD
1.00000 0.95119 0.86207 0.76645 0.67704 0.199001.00000 0.95119 0.86207 0.76645 0.67704 0.199001.00000 0.95119 0.86207 0.76645 0.67704 0.199001.00000 0.95119 0.86207 0.76645 0.67704 0.19900
0.0 0.000.0 0.000.0 0.000.0 0.0001 0.0001 0.0001 0.0001 0.000101 0.0001 0.0001 0.0001 0.000101 0.0001 0.0001 0.0001 0.000101 0.0001 0.0001 0.0001 0.0001
38185 38185 38185 26610 22931 3818538185 38185 38185 26610 22931 3818538185 38185 38185 26610 22931 3818538185 38185 38185 26610 22931 38185
N_INTRST SEXUNKN TENURE V3 V1 V2N_INTRST SEXUNKN TENURE V3 V1 V2N_INTRST SEXUNKN TENURE V3 V1 V2N_INTRST SEXUNKN TENURE V3 V1 V2
0.02730 0.01862 0.009800.02730 0.01862 0.009800.02730 0.01862 0.009800.02730 0.01862 0.00980 ----0.00816 0.00204 0.001020.00816 0.00204 0.001020.00816 0.00204 0.001020.00816 0.00204 0.00102
0.0001 0.0003 0.0555 0.1109 0.6904 0.84230.0001 0.0003 0.0555 0.1109 0.6904 0.84230.0001 0.0003 0.0555 0.1109 0.6904 0.84230.0001 0.0003 0.0555 0.1109 0.6904 0.8423
38185 38185 38185 3818538185 38185 38185 3818538185 38185 38185 3818538185 38185 38185 38185 38185 3818538185 3818538185 3818538185 38185
The first line of numbers in the Proc Corr output is the corresponding
correlation coefficients, while the second is the corresponding p-values.
For the case of hundreds or thousands of variables, this presentation is
non-informative, and the wrapping-around effect will make it tedious to
review. It becomes more cumbersome when the analyst wants to
simplify the task by only looking at correlations with significant p-
values. In this light, we propose the following Timeplot-like output
(which corresponds to the set of correlations associated with LN_DAY),
adapted for visualization:
ELI PLOT: CORRELATIONS BY LN_DAYELI PLOT: CORRELATIONS BY LN_DAYELI PLOT: CORRELATIONS BY LN_DAYELI PLOT: CORRELATIONS BY LN_DAY
WITH:=LN_DAYWITH:=LN_DAYWITH:=LN_DAYWITH:=LN_DAY
VAR_NAME_+_LABEL min maxVAR_NAME_+_LABEL min maxVAR_NAME_+_LABEL min maxVAR_NAME_+_LABEL min max
----1 11 11 11 1
****------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------****
N_DAYLS2: |N_DAYLS2: |N_DAYLS2: |N_DAYLS2: | | 0 || 0 || 0 || 0 |
N_DAYLST: #_days lst_clkth | | 0 |N_DAYLST: #_days lst_clkth | | 0 |N_DAYLST: #_days lst_clkth | | 0 |N_DAYLST: #_days lst_clkth | | 0 |
N_DAYSEX: | | 0N_DAYSEX: | | 0N_DAYSEX: | | 0N_DAYSEX: | | 0 ||||
N_INTRST: #_intrsts e_intr | | 0 |N_INTRST: #_intrsts e_intr | | 0 |N_INTRST: #_intrsts e_intr | | 0 |N_INTRST: #_intrsts e_intr | | 0 |
RESPONSE: | | 0 |RESPONSE: | | 0 |RESPONSE: | | 0 |RESPONSE: | | 0 |
SEXUNKN: |SEXUNKN: |SEXUNKN: |SEXUNKN: | 0 | |0 | |0 | |0 | |
TENURE: # days since bec | 0 | |TENURE: # days since bec | 0 | |TENURE: # days since bec | 0 | |TENURE: # days since bec | 0 | |
TOT_RCVD: tot rcvd e_rcvd | | 0TOT_RCVD: tot rcvd e_rcvd | | 0TOT_RCVD: tot rcvd e_rcvd | | 0TOT_RCVD: tot rcvd e_rcvd | | 0 ||||
V1: | 0 |V1: | 0 |V1: | 0 |V1: | 0 |
V2: | 0 |V2: | 0 |V2: | 0 |V2: | 0 |
V3: |V3: |V3: |V3: | 0| |0| |0| |0| |
****------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------****
The previous ELI plot illustrates the correlation patterns among the
variables. ‘0’ marks direct (or zero order) correlations. The plot allows
the ‘stepwise-prone’ analyst to focus directly on areas of high-
correlation if interested in variable selection. In this case, N_Dayls2, N-
daylst, N-daysex, etc. These areas will be the ones closer to the (-1, +1)
axes. The midpoint of the plot marks the zero correlation mark.
Further, for every “(with, var)” pair, we can also plot the four (or any
number so desired) largest 1st
order partial correlations, denoted by the
numbers 1 through 4. Overlaps are denoted by ‘*’. The printout titled
“DIRECT & PARTIAL VAR NAMES” details the names of the
variables for each of the plotted correlations.
ELI PLOT: CORRS BY LN_DAY, 4 PARTIALS REQUESTEDELI PLOT: CORRS BY LN_DAY, 4 PARTIALS REQUESTEDELI PLOT: CORRS BY LN_DAY, 4 PARTIALS REQUESTEDELI PLOT: CORRS BY LN_DAY, 4 PARTIALS REQUESTED
WITH:=LN_DAYWITH:=LN_DAYWITH:=LN_DAYWITH:=LN_DAY
VAR_NAME_+_LABEL mVAR_NAME_+_LABEL mVAR_NAME_+_LABEL mVAR_NAME_+_LABEL min maxin maxin maxin max
----1 11 11 11 1
****------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------****
N_DAYLS2: | 2N_DAYLS2: | 2N_DAYLS2: | 2N_DAYLS2: | 2------------------------------------------------------------------------------------------------------------||||------------------------------------------------------------------------------------*3*3*3*3----------------1 |1 |1 |1 |
N_DAYLST: #_days lst_clkth | | *3* |N_DAYLST: #_days lst_clkth | | *3* |N_DAYLST: #_days lst_clkth | | *3* |N_DAYLST: #_days lst_clkth | | *3* |
N_DAYSEX:N_DAYSEX:N_DAYSEX:N_DAYSEX: | | *| | *| | *| | *------------1 |1 |1 |1 |
N_INTRST: #_intrsts e_intr | | ** |N_INTRST: #_intrsts e_intr | | ** |N_INTRST: #_intrsts e_intr | | ** |N_INTRST: #_intrsts e_intr | | ** |
RESPONSE: | |RESPONSE: | |RESPONSE: | |RESPONSE: | | * |* |* |* |
SEXUNKN: | 1SEXUNKN: | 1SEXUNKN: | 1SEXUNKN: | 1--------------------------------****------------40 | |40 | |40 | |40 | |
TENURE: # days since bec | *3* | |TENURE: # days since bec | *3* | |TENURE: # days since bec | *3* | |TENURE: # days since bec | *3* | |
TOT_RCVD: tot rcvd e_rcvdTOT_RCVD: tot rcvd e_rcvdTOT_RCVD: tot rcvd e_rcvdTOT_RCVD: tot rcvd e_rcvd | | *1 || | *1 || | *1 || | *1 |
V1: | 1V1: | 1V1: | 1V1: | 1----* |* |* |* |
V2: | *V2: | *V2: | *V2: | * ||||
V3: | *|1 |V3: | *|1 |V3: | *|1 |V3: | *|1 |
****------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------****
ELI PLOT: CORRS BY N_DAYLS2, 4 PELI PLOT: CORRS BY N_DAYLS2, 4 PELI PLOT: CORRS BY N_DAYLS2, 4 PELI PLOT: CORRS BY N_DAYLS2, 4 PARTIALS REQUESTEDARTIALS REQUESTEDARTIALS REQUESTEDARTIALS REQUESTED
WITH:=N_DAYLS2WITH:=N_DAYLS2WITH:=N_DAYLS2WITH:=N_DAYLS2
VAR_NAME_+_LABEL min maxVAR_NAME_+_LABEL min maxVAR_NAME_+_LABEL min maxVAR_NAME_+_LABEL min max
----1 11 11 11 1
****------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------****
LN_DAY: | 2LN_DAY: | 2LN_DAY: | 2LN_DAY: | 2------------------------------------------------------------------------------------------------------------||||------------------------------------------------------------------------------------*3*3*3*3----------------1 |1 |1 |1 |
N_DAYLST: #_days lst_clkth |N_DAYLST: #_days lst_clkth |N_DAYLST: #_days lst_clkth |N_DAYLST: #_days lst_clkth | | ** || ** || ** || ** |
N_DAYSEX: | | *N_DAYSEX: | | *N_DAYSEX: | | *N_DAYSEX: | | *----1 |1 |1 |1 |
N_INTRST: #_intrsts e_intr | 1*4|0 |N_INTRST: #_intrsts e_intr | 1*4|0 |N_INTRST: #_intrsts e_intr | 1*4|0 |N_INTRST: #_intrsts e_intr | 1*4|0 |
RESPRESPRESPRESPONSE: | *ONSE: | *ONSE: | *ONSE: | *------------------------------------------------------------------------------------------------------------||||----------------------------------------------------------------------------*3 |*3 |*3 |*3 |
SEXUNKN: | 1SEXUNKN: | 1SEXUNKN: | 1SEXUNKN: | 1------------------------------------------------------------|0|0|0|0------------------------*2 |*2 |*2 |*2 |
TENURE: # days since bec |TENURE: # days since bec |TENURE: # days since bec |TENURE: # days since bec | 40404040----* |* |* |* |
TOT_RCVD: tot rcvd e_rcvd | | * |TOT_RCVD: tot rcvd e_rcvd | | * |TOT_RCVD: tot rcvd e_rcvd | | * |TOT_RCVD: tot rcvd e_rcvd | | * |
V1: | 1* |V1: | 1* |V1: | 1* |V1: | 1* |
VVVV2: | * |2: | * |2: | * |2: | * |
V3: | * |V3: | * |V3: | * |V3: | * |
****------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------****
Let us concentrate on a specific example. For instance, the first line of
the first diagram above (shown just below for clarity of exposition) plots
LN_DAY (‘with’ variable) against N_DAYLS2, and four first-order
partials in decreasing absolute order of magnitude. The correlations are
joined by hyphens that allow for a more compact view. ‘1’ in the first
line of the graph corresponds to the correlation between LN_DAY and
N_DAYLS2 after partialling out RESPONSE (which corresponds to
variable PART1 in the first observation of the printout below). ‘2’
corresponds to the next largest absolute partial correlation, which
corresponds to N_DAYLST, etc. In the diagram, there is an overlap
between the zero-order correlation and the partial corresponding to
N_INTRST (PART4), denoted by ‘*’. Given the distance of all these
correlations from the mid-point of zero correlation, the analyst might
deem these variables worth for further study. While p-values for direct
correlations are given in a tabulate below, corresponding p-values for the
partial correlations are not calculated at present.
WITH:=LN_DAYWITH:=LN_DAYWITH:=LN_DAYWITH:=LN_DAY
VAR_NAME_+_LABEL min maxVAR_NAME_+_LABEL min maxVAR_NAME_+_LABEL min maxVAR_NAME_+_LABEL min max
----1 11 11 11 1
****------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------****
N_DAYLS2: | 2N_DAYLS2: | 2N_DAYLS2: | 2N_DAYLS2: | 2------------------------------------------------------------------------------------------------------------||||------------------------------------------------------------------------------------*3*3*3*3----------------1 |1 |1 |1 |
DIRECT & PARTIAL VAR NAMESDIRECT & PARTIAL VAR NAMESDIRECT & PARTIAL VAR NAMESDIRECT & PARTIAL VAR NAMES
WITH=LN_DAYWITH=LN_DAYWITH=LN_DAYWITH=LN_DAY
OBS VAR PART1 PART2 PART3 PART4OBS VAR PART1 PART2 PART3 PART4OBS VAR PART1 PART2 PART3 PART4OBS VAR PART1 PART2 PART3 PART4
1 N_DAYLS2 RESPONSE N_DAYLST SEXUNKN N_INTRST
2 N_DAYLST RESPONSE N_DAYLS2 SEXUNKN TENURE
3 N_DAYSEX SEXUNKN TENURE N_INTRST V1
4 N_INTRST N_DAYLS2 N_DAYLST N_DAYSEX V3
5 RESPONSE N_DAYLST N_DAYLS2 V1 TENURE
6 SEXUNKN N_DAYSEX N_DAYLST N_DAYLS2 TOT_RCVD
7 TENURE N_DAYLST N_DAYSEX N_DAYLS2 RESPONSE
8 TOT_RCVD SEXUNKN TENURE V3 V1
9 V1 RESPONSE N_DAYSEX N_DAYLST TOT_RCVD
10 V2 RESPONSE N_DAYLS2 N_DAYLST N_DAYSEX
11 V3 RESPONSE TOT_RCVD N_INTRST V2
WITH=N_DAYLS2WITH=N_DAYLS2WITH=N_DAYLS2WITH=N_DAYLS2
OBS VAR PART1 PART2 PART3 PART4OBS VAR PART1 PART2 PART3 PART4OBS VAR PART1 PART2 PART3 PART4OBS VAR PART1 PART2 PART3 PART4
12 LN_DAY RESPONSE N_DAYLST SEXUNKN N_INTRST
13 N_DAYLST LN_DAY RESPONSE SEXUNKN N_INTRST
14 N_DAYSEX SEXUNKN TENURE V1 V2
15 N_INTRST N_DAYLST LN_DAY RESPONSE TOT_RCVD
16 RESPONSE LN_DAY N_DAYLST SEXUNKN N_INTRST
17 SEXUNKN N_DAYSEX N_DAYLST LN_DAY RESPONSE
18 TENURE LN_DAY N_DAYLST RESPONSE N_DAYSEX
19 TOT_RCVD N_INTRST V3 V1 V2
20 V1 RESPONSE N_DAYSEX TOT_RCVD TENURE
21 V2 N_DAYLST LN_DAY N_DAYSEX RESPONSE
22 V3 TOT_RCVD SEXUNKN TENURE N_INTRST
ELI plots allow for a different configuration as well. Instead of plotting
the largest first-order partial correlations in addition to the zero order
one, we can plot the largest of the first-order, second largest, third
largest, etc. For the sake of brevity, this excursion is omitted.
Finally, and for documentation purposes, the correlation coefficients and
corresponding p-values are also tabulated
:
UPPER TRIANGULAR MATRIX
ALPHABETICALLY ORDERED
„ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒ…ƒƒƒƒƒƒ…ƒƒƒƒƒƒ…ƒƒƒƒƒƒ…ƒƒƒƒƒƒ…ƒƒƒƒƒƒ…ƒƒƒƒƒƒ…ƒƒƒƒƒƒ…ƒƒƒƒƒƒ†
‚CORRELATIONS ‚ ‚ ‚#_days‚ ‚ ‚ ‚ ‚ ‚ ‚
‚ ‚ ‚ ‚lst_c-‚ ‚#_int-‚ ‚ ‚# days‚ ‚
‚ ‚ ‚ ‚lkthru‚ ‚ rsts ‚ ‚ ‚since ‚ tot ‚
‚ ‚ ‚N_DAY-‚&_dec-‚N_DAY-‚e_int-‚RESPO-‚SEXUN-‚became‚ rcvd ‚
‚ ‚LN_DAY‚ LS2 ‚.16.99‚ SEX ‚ rs2 ‚ NSE ‚ KN ‚member‚e_rcvd‚
‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒ‰
‚VARIABLE ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚
‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ‰ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚
‚LN_DAY ‚ ‚ 0.77‚ 0.92‚ 0.73‚ 0.10‚ 0.99‚ -0.21‚ -0.04‚ 0.23‚
‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒ‰
‚N_DAYLS2 ‚ ‚ ‚ 0.95‚ 0.86‚ 0.03‚ 0.68‚ 0.02‚ 0.01‚ 0.20‚
‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒ‰
‚N_DAYLST ‚ ‚ ‚ ‚ 0.85‚ 0.06‚ 0.87‚ -0.08‚ -0.01‚ 0.22‚
‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒ‰
‚N_DAYSEX ‚ ‚ ‚ ‚ ‚ 0.03‚ 0.66‚ 0.32‚ 0.03‚ 0.22‚
‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒ‰
‚N_INTRST ‚ ‚ ‚ ‚ ‚ ‚ 0.12‚ -0.07‚ -0.05‚ 0.29‚
‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒ‰
‚RESPONSE ‚ ‚ ‚ ‚ ‚ ‚ ‚ -0.24‚ -0.04‚ 0.22‚
‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒ‰
‚SEXUNKN ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ 0.11‚ 0.06‚
‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒ‰
‚TENURE ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ 0.00‚
‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒ‰
‚TOT_RCVD ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚
Šƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒ‹ƒƒƒƒƒƒ‹ƒƒƒƒƒƒ‹ƒƒƒƒƒƒ‹ƒƒƒƒƒƒ‹ƒƒƒƒƒƒ‹ƒƒƒƒƒƒ‹ƒƒƒƒƒƒ‹ƒƒƒƒƒƒŒ
„ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒ…ƒƒƒƒƒƒ…ƒƒƒƒƒƒ…ƒƒƒƒƒƒ…ƒƒƒƒƒƒ…ƒƒƒƒƒƒ…ƒƒƒƒƒƒ…ƒƒƒƒƒƒ…ƒƒƒƒƒƒ†
‚P_VALS OF CORRS ‚ ‚ ‚#_days‚ ‚ ‚ ‚ ‚ ‚ ‚
‚ ‚ ‚ ‚lst_c-‚ ‚#_int-‚ ‚ ‚# days‚ ‚
‚ ‚ ‚ ‚lkthru‚ ‚ rsts ‚ ‚ ‚since ‚ tot ‚
‚ ‚ ‚N_DAY-‚&_dec-‚N_DAY-‚e_int-‚RESPO-‚SEXUN-‚became‚ rcvd ‚
‚ ‚LN_DAY‚ LS2 ‚.16.99‚ SEX ‚ rs2 ‚ NSE ‚ KN ‚member‚e_rcvd‚
‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒ‰
‚VARIABLE ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚
‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ‰ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚
‚LN_DAY ‚ ‚ 0.000‚ 0.000‚ 0.000‚ 0.000‚ 0.000‚ 0.000‚ 0.000‚ 0.000‚
‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒ‰
‚N_DAYLS2 ‚ ‚ ‚ 0.000‚ 0.000‚ 0.000‚ 0.000‚ 0.000‚ 0.056‚ 0.000‚
‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒ‰
‚N_DAYLST ‚ ‚ ‚ ‚ 0.000‚ 0.000‚ 0.000‚ 0.000‚ 0.028‚ 0.000‚
‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒ‰
‚N_DAYSEX ‚ ‚ ‚ ‚ ‚ 0.000‚ 0.000‚ 0.000‚ 0.000‚ 0.000‚
‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒ‰
‚N_INTRST ‚ ‚ ‚ ‚ ‚ ‚ 0.000‚ 0.000‚ 0.000‚ 0.000‚
‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒ‰
‚RESPONSE ‚ ‚ ‚ ‚ ‚ ‚ ‚ 0.000‚ 0.000‚ 0.000‚
‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒ‰
‚SEXUNKN ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ 0.000‚ 0.000‚
‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒ‰
‚TENURE ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ 0.831‚
‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒ‰
‚TOT_RCVD ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚
Šƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒ‹ƒƒƒƒƒƒ‹ƒƒƒƒƒƒ‹ƒƒƒƒƒƒ‹ƒƒƒƒƒƒ‹ƒƒƒƒƒƒ‹ƒƒƒƒƒƒ‹ƒƒƒƒƒƒ‹ƒƒƒƒƒƒŒ
Since many correlations may not be significant at an alpha
level of, say, 95%, the ELI graphs can be made to portray
significant correlations only. In our example however, we
presented all possible effects with corresponding partial
correlations.
6. Trademarks.
SAS and all other SAS Institute Inc. product or service names
are registered trademarks or trademarks of SAS Institute Inc.
in the USA and other countries. indicates USA registration.
Other brand and product names are registered trademarks or
trademarks of their respective companies.
7. End Notes.
1
Data mining has often been defined as the search for
patterns, interesting or otherwise. Curiously, “interesting” is
in the eye of the beholder, and patterns are not well defined.
Ergo, any tool that purports to find interesting patterns
belongs under the rubric of data mining, which thus cannot
properly define any scientific application, since almost
anything can belong to it. My own preference is “Giga-data
analysis” (as opposed to the more traditional statistician’s
“small data set analysis”). It is in this spirit that I envision
this paper.
Since information from data requires the processes of
summarization, conceptualization, interpretation and
application, the data analyst victorious in all these steps after
successful perusal of reams of pages might require
hospitalization as well
2
Yes, I am that old. This paper deals only with Pearson
correlation coefficients, but the additional use of other
measures contained in Proc Corr is straightforward.
Programming Timeplot-like diagrams in other software
should not pose an insurmountable task. I created my first
diagram in Basic in 1980.
Additionally, the adjustment necessary for correlations
among continuous and categorical as well as among
categorical variables can be easily added.
3
I consider the name Timeplot a limiting and misleading
denomination. C’est la vie.
5
Partial correlations can also be understood as the
correlation between the residuals of a regression between Y
and X, and between Y and Z. See Cohen and Cohen (1983)
for an overall discussion, and Leahy (1996) for suppression
effects in the area of data base marketing.
6
The skillful programmer might be enticed to utilize Proc
Printto. My preference for a more arduous route is based on
the additional flexibility provided to enhance the overall
procedure, such as including partial correlations in one step,
multiple comparisons of correlations, Drezner’s Multirelation
(1995), etc.
Missing values are excluded from the calculation of
correlations in a pair-wise form. For a proposed solution to
the problem of missing values in the context of large
databases, see Auslender (1997).
7
The macro at present accepts only one ‘with’ variable. It is
a straightforward modification to enhance the code to accept
multiple ‘with’ variables.
8. Bibliography
Auslender L., Missing Value Imputation Methods for Large
DataBases, Proceedings of the 1997 northeastern SAS Users
Group Meeting, 1997.
Cleveland W., Visualizing Data, Hobart Press, USA, 1993.
Cohen J., Cohen P. Applied Multiple Regression/Correlation
Analysis for the Behavioral Sciences, Lawrence Erlbaum
Associates, Publishers, 1983.
Drezner, Z., Multirelation – Correlation among more than
two variables, Computational Statistics and Data Analysis,
1995, March.
Hoaglin D., Mosteller F., Tukey J., Understanding Robust
and Exploratory Data Analysis, John Wiley & Sons, 1983.
Leahy K., Nature, prevalence, and benefits of suppression
effects in direct response segmentation, Proceedings of the
American Statistical Association 1995 Meeting, 1996.
9. Contact Information
Your comments and questions are valued and encouraged.
Contact the author at:
Leonardo E. Auslender
SAS Institute
1545 Rt. 206 N, Suite 270
Bedminster, NJ 07921
908 470 0080 x 8217 (o)
908 470 0081 (f)
leonardo.auslender@sas.com

More Related Content

PPTX
FORBAC ETH
PDF
How to-run-ols-diagnostics-02
DOCX
9.1 9.2 9.3 using the graph calc
PDF
KNN and ARL Based Imputation to Estimate Missing Values
PDF
Principal components
DOC
Bt0065
PDF
Essay on-data-analysis
PDF
ALGORITHM FOR RELATIONAL DATABASE NORMALIZATION UP TO 3NF
FORBAC ETH
How to-run-ols-diagnostics-02
9.1 9.2 9.3 using the graph calc
KNN and ARL Based Imputation to Estimate Missing Values
Principal components
Bt0065
Essay on-data-analysis
ALGORITHM FOR RELATIONAL DATABASE NORMALIZATION UP TO 3NF

What's hot (7)

PDF
Design of State Estimator for a Class of Generalized Chaotic Systems
PDF
Additional Relational Algebra Operations
DOCX
Sharepoint quality management system
PPTX
Extended relational algebra
PPTX
Understanding databases and querying
PPTX
Transportation and logistics modeling 2
PDF
Bbs11 ppt ch10
Design of State Estimator for a Class of Generalized Chaotic Systems
Additional Relational Algebra Operations
Sharepoint quality management system
Extended relational algebra
Understanding databases and querying
Transportation and logistics modeling 2
Bbs11 ppt ch10
Ad

Similar to Eli plots visualizing innumerable number of correlations (20)

DOCX
BUS 308 Week 4 Lecture 3 Developing Relationships in Exc.docx
DOCX
BUS 308 – Week 4 Lecture 2 Interpreting Relationships .docx
DOCX
BUS 308 – Week 4 Lecture 2 Interpreting Relationships .docx
DOCX
Week 4 Lecture 10 We have been examining the question of equal p.docx
PPT
Data ware housing and data mining for educational purpose
PPTX
SAS Notes
PPTX
12 rhl gta
PPTX
PPTX
QR II Lect 15 (Bivariate analysis and scatter plot, correlation).pptx
PPTX
Correlation Analysis PRESENTED.pptx
PPT
correlation and r3433333333333333333333333333333333333333333333333egratio111n...
PDF
IBM SPSS Statistics Algorithms.pdf
PPT
Lab 4 excel basics
PPT
Lab 4 excel basics
PPTX
Advanced Econometrics L3-4.pptx
PPTX
Stats chapter 3
PPT
Simple linear regressionn and Correlation
PDF
Correlation and Regression
PDF
Correlation
DOCX
Week 4 Lecture 12 Significance Earlier we discussed co.docx
BUS 308 Week 4 Lecture 3 Developing Relationships in Exc.docx
BUS 308 – Week 4 Lecture 2 Interpreting Relationships .docx
BUS 308 – Week 4 Lecture 2 Interpreting Relationships .docx
Week 4 Lecture 10 We have been examining the question of equal p.docx
Data ware housing and data mining for educational purpose
SAS Notes
12 rhl gta
QR II Lect 15 (Bivariate analysis and scatter plot, correlation).pptx
Correlation Analysis PRESENTED.pptx
correlation and r3433333333333333333333333333333333333333333333333egratio111n...
IBM SPSS Statistics Algorithms.pdf
Lab 4 excel basics
Lab 4 excel basics
Advanced Econometrics L3-4.pptx
Stats chapter 3
Simple linear regressionn and Correlation
Correlation and Regression
Correlation
Week 4 Lecture 12 Significance Earlier we discussed co.docx
Ad

More from Leonardo Auslender (20)

PDF
PDF
Ensembles.pdf
PDF
Suppression Enhancement.pdf
PDF
4_2_Ensemble models and gradient boosting2.pdf
PDF
4_5_Model Interpretation and diagnostics part 4_B.pdf
PDF
4_2_Ensemble models and grad boost part 2.pdf
PDF
4_2_Ensemble models and grad boost part 3.pdf
PDF
4_5_Model Interpretation and diagnostics part 4.pdf
PDF
4_3_Ensemble models and grad boost part 2.pdf
PDF
4_2_Ensemble models and grad boost part 1.pdf
PDF
4_1_Tree World.pdf
PDF
Classification methods and assessment.pdf
PDF
Linear Regression.pdf
PDF
4 MEDA.pdf
PDF
2 UEDA.pdf
PDF
3 BEDA.pdf
PDF
PDF
0 Statistics Intro.pdf
PDF
0 Model Interpretation setting.pdf
PDF
4 2 ensemble models and grad boost part 3 2019-10-07
Ensembles.pdf
Suppression Enhancement.pdf
4_2_Ensemble models and gradient boosting2.pdf
4_5_Model Interpretation and diagnostics part 4_B.pdf
4_2_Ensemble models and grad boost part 2.pdf
4_2_Ensemble models and grad boost part 3.pdf
4_5_Model Interpretation and diagnostics part 4.pdf
4_3_Ensemble models and grad boost part 2.pdf
4_2_Ensemble models and grad boost part 1.pdf
4_1_Tree World.pdf
Classification methods and assessment.pdf
Linear Regression.pdf
4 MEDA.pdf
2 UEDA.pdf
3 BEDA.pdf
0 Statistics Intro.pdf
0 Model Interpretation setting.pdf
4 2 ensemble models and grad boost part 3 2019-10-07

Recently uploaded (20)

PDF
Navigating the Thai Supplements Landscape.pdf
PDF
©️ 02_SKU Automatic SW Robotics for Microsoft PC.pdf
PDF
Session 11 - Data Visualization Storytelling (2).pdf
PPTX
1 hour to get there before the game is done so you don’t need a car seat for ...
PDF
OneRead_20250728_1808.pdfhdhddhshahwhwwjjaaja
PPTX
Lesson-01intheselfoflifeofthekennyrogersoftheunderstandoftheunderstanded
PDF
Jean-Georges Perrin - Spark in Action, Second Edition (2020, Manning Publicat...
PPTX
Copy of 16 Timeline & Flowchart Templates – HubSpot.pptx
PPT
statistic analysis for study - data collection
PDF
Systems Analysis and Design, 12th Edition by Scott Tilley Test Bank.pdf
PDF
Tetra Pak Index 2023 - The future of health and nutrition - Full report.pdf
PPTX
New ISO 27001_2022 standard and the changes
PDF
CS3352FOUNDATION OF DATA SCIENCE _1_MAterial.pdf
PDF
Global Data and Analytics Market Outlook Report
PPTX
Caseware_IDEA_Detailed_Presentation.pptx
PPTX
retention in jsjsksksksnbsndjddjdnFPD.pptx
PPTX
DS-40-Pre-Engagement and Kickoff deck - v8.0.pptx
PPTX
Machine Learning and working of machine Learning
PPT
statistics analysis - topic 3 - describing data visually
PDF
An essential collection of rules designed to help businesses manage and reduc...
Navigating the Thai Supplements Landscape.pdf
©️ 02_SKU Automatic SW Robotics for Microsoft PC.pdf
Session 11 - Data Visualization Storytelling (2).pdf
1 hour to get there before the game is done so you don’t need a car seat for ...
OneRead_20250728_1808.pdfhdhddhshahwhwwjjaaja
Lesson-01intheselfoflifeofthekennyrogersoftheunderstandoftheunderstanded
Jean-Georges Perrin - Spark in Action, Second Edition (2020, Manning Publicat...
Copy of 16 Timeline & Flowchart Templates – HubSpot.pptx
statistic analysis for study - data collection
Systems Analysis and Design, 12th Edition by Scott Tilley Test Bank.pdf
Tetra Pak Index 2023 - The future of health and nutrition - Full report.pdf
New ISO 27001_2022 standard and the changes
CS3352FOUNDATION OF DATA SCIENCE _1_MAterial.pdf
Global Data and Analytics Market Outlook Report
Caseware_IDEA_Detailed_Presentation.pptx
retention in jsjsksksksnbsndjddjdnFPD.pptx
DS-40-Pre-Engagement and Kickoff deck - v8.0.pptx
Machine Learning and working of machine Learning
statistics analysis - topic 3 - describing data visually
An essential collection of rules designed to help businesses manage and reduc...

Eli plots visualizing innumerable number of correlations

  • 1. “On visualizing Direct and Partial Correlations – ELI plots” Leonardo E. Auslender SAS Institute, Inc., Bedminster, NJ 1. Introduction Statisticians and data analysts focus on correlations among pairs of variables to understand the strength of linear relationships in the data. Since correlations measure relations among pairs of variables, the standard output is in matrix form, which tends to be difficult to interpret for a large number of variables. The superlative analyst may also incorporate partial correlations to further deepen the analysis, which at least doubles the standard output. The hapless data-miner who faces hundreds, if not thousands, of variables does not long to wade through reams of outputs of correlations to find “interesting” patterns.1 In this paper, I present a method that enables to visualize any number of Pearson (and partial) correlations by using a Proc-Timeplot-like output I call Exploratory Linear Information (ELI) plots. Proc Timeplot is a procedure available in SAS Base, of the SAS Institute SAS software, since at least version 5.18. 2 Proc Timeplot “plots one or more variables over time intervals” (SAS Procedures Guide, v. 6, 3rd . edition, p. 579); the time interval variable acts as an index for the observations being plotted. Notice that the index variable is itself not plotted and, moreover, that it is not at all necessary to have a time variable as an index (p. 581 of the same manual, ‘date’ variable.). In this paper, our index is a variable that contains the names of the variables being correlated against a ‘with’ variable, and we plot correlations (and partial correlations if so desired) in an overlay fashion. The proposed method, embedded in a SAS macro, allows to: a) Plot correlations of either all variables against each other or against a single 'with' variable, properly sorted by the absolute value of the correlation. b) Plot on the same graph described in a) the first ‘nth’ largest absolute value partial correlations, ‘n’ being a chosen parameter dependent upon the desired crowding of information in the plot. c) Print the correlation and p-value matrices in a tabulate fashion. The standard output is usually difficult to read due to the intricacies of conceptualizing of long sequences of numbers. 3 The tabulate presentation, neater but still difficult to interpret, is necessary for documentation. 2. Exploratory data analysis, variable selection and correlation matrices. The typical practice of data analysis includes, at least in principle, exploratory data analysis, as espoused by Tukey (1977). More recently, Cleveland (1993) emphasized visualization techniques, and many research papers investigate the topic. This paper addresses the issue of visualizing correlations, itself a component of EDA, with simple tools available in the SAS System. In addition, the hurried data mining practitioner finds himself/herself in search of selecting variables for a model, a segmentation algorithm or a customer profile, in an environment of hundreds and perhaps thousands of variables. Stepwise methods, however much criticized, are one of the present methodologies used to address variable selection. In addition to variable selection techniques, practitioners also look at correlations among variables to investigate linear dependencies. Less frequently, practitioners look at squared partial (first order) correlation coefficients. Given the linear model Y = α + β X + δ Z + ε with the typical assumptions, these coefficients measure the proportion of variation of a variable Y not estimated by X that is estimated by Z in linear models. Equivalently, they measure the correlation between Y and X holding Z constant. Direct and indirect effects of X and Z on Y can be measured by the partial correlation coefficients. In the same vein, second order partial correlation coefficients can be defined by partialling out an additional variable from a first-order partial correlation. And third, fourth, etc. Specifically, given X, Y and Z, the zero order correlation between X and Y is given by: rxy = ( Σ (xi - x’) (yi -y’)) / √ Σ (xi - x’)2 Σ(yi - y’)2 where the apostrophe denotes mean value. The partial correlation of x and y, given z, is: rxy.z = ( rxy - rxz ryz) / √ (1 – rxz 2 ) (1 – ryz 2 ). 3. Programming considerations. The Corr Procedure (with which the reader should be familiar to fully understand this paper) is the basic tool for finding correlations, as in the following code embedded in a macro: PROC CORR DATA = &INDATA. OUTP = &OUTDATA. (WHERE = (_TYPE_ IN (“CORR”, “N)) RENAME = (_NAME_ = WITH)) NOPRINT; %IF %NRBQUOTE(&WITH.) > %THEN WITH &WITH.; %STR(;) VAR %DO K = 1 %TO &NUMVAR.; &&VAR&K. %END; %STR(;) RUN; In this macro-code, we are requesting not to print (NOPRINT) the correlations, but to keep them in the data set &OUTDATA. The rest of the code allows for the use of a ‘with’ variable and of selected VAR variables. The names of the variables have been kept in macro variables var1 through var&numvar. (&numvar. being the number of variables) because we require the variables to be alphabetically ordered to search for missing values later on. The standard output data set referenced by &OUTDATA. provides the correlations but not the number of observations for the ‘with’ variable. This number is critical in determining p-values, and given the prevalence of missing values in large databases, it forces us to re-capture that information. 4 (See section 3 below the typical Proc corr output).
  • 2. OUTDATA AFTER PROC CORROUTDATA AFTER PROC CORROUTDATA AFTER PROC CORROUTDATA AFTER PROC CORR OBS _TYPE_ _WITH LN_DAY N_DAYLS2 N_DAYLST N_DAYSEX N_INTRST RESPONSEOBS _TYPE_ _WITH LN_DAY N_DAYLS2 N_DAYLST N_DAYSEX N_INTRST RESPONSEOBS _TYPE_ _WITH LN_DAY N_DAYLS2 N_DAYLST N_DAYSEX N_INTRST RESPONSEOBS _TYPE_ _WITH LN_DAY N_DAYLS2 N_DAYLST N_DAYSEX N_INTRST RESPONSE 1 N 26610.00 38185.00 38185.00 38185.00 38185.00 22931.001 N 26610.00 38185.00 38185.00 38185.00 38185.00 22931.001 N 26610.00 38185.00 38185.00 38185.00 38185.00 22931.001 N 26610.00 38185.00 38185.00 38185.00 38185.00 22931.00 2 CORR LN_DAY 1.00 0.77 0.92 0.72 0.11 0.992 CORR LN_DAY 1.00 0.77 0.92 0.72 0.11 0.992 CORR LN_DAY 1.00 0.77 0.92 0.72 0.11 0.992 CORR LN_DAY 1.00 0.77 0.92 0.72 0.11 0.99 3 CORR N_DAYLS2 0.77 1.00 0.95 0.86 0.03 0.683 CORR N_DAYLS2 0.77 1.00 0.95 0.86 0.03 0.683 CORR N_DAYLS2 0.77 1.00 0.95 0.86 0.03 0.683 CORR N_DAYLS2 0.77 1.00 0.95 0.86 0.03 0.68 4 CORR N_DAYLST 0.924 CORR N_DAYLST 0.924 CORR N_DAYLST 0.924 CORR N_DAYLST 0.92 0.95 1.00 0.85 0.06 0.870.95 1.00 0.85 0.06 0.870.95 1.00 0.85 0.06 0.870.95 1.00 0.85 0.06 0.87 5 CORR N_DAYSEX 0.72 0.86 0.85 1.00 0.03 0.665 CORR N_DAYSEX 0.72 0.86 0.85 1.00 0.03 0.665 CORR N_DAYSEX 0.72 0.86 0.85 1.00 0.03 0.665 CORR N_DAYSEX 0.72 0.86 0.85 1.00 0.03 0.66 6 CORR N_INTRST 0.11 0.03 0.06 0.03 1.006 CORR N_INTRST 0.11 0.03 0.06 0.03 1.006 CORR N_INTRST 0.11 0.03 0.06 0.03 1.006 CORR N_INTRST 0.11 0.03 0.06 0.03 1.00 0.120.120.120.12 7 CORR RESPONSE 0.99 0.68 0.87 0.66 0.12 1.007 CORR RESPONSE 0.99 0.68 0.87 0.66 0.12 1.007 CORR RESPONSE 0.99 0.68 0.87 0.66 0.12 1.007 CORR RESPONSE 0.99 0.68 0.87 0.66 0.12 1.00 8 CORR SEXUNKN8 CORR SEXUNKN8 CORR SEXUNKN8 CORR SEXUNKN ----0.21 0.020.21 0.020.21 0.020.21 0.02 ----0.08 0.320.08 0.320.08 0.320.08 0.32 ----0.070.070.070.07 ----0.240.240.240.24 9 CORR TENURE9 CORR TENURE9 CORR TENURE9 CORR TENURE ----0.050.050.050.05 0.010.010.010.01 ----0.01 0.030.01 0.030.01 0.030.01 0.03 ----0.050.050.050.05 ----0.040.040.040.04 Due to the likelihood of the presence of missing values, it is necessary to find out the number of non-missing observations for every pair of variables. Since the &outdata. data set provides the number of present observations for individual variables (but not for the ‘with’ variable), it is necessary to obtain the information for those pairs in which at least one variable has missing values. Once the number of non-missing values is determined for every pair of variables, the p-values are computed by: √√√√ (N – 2). Corr ____________ , ∼∼∼∼ t (N - 2). √√√√ (1 – Corr2 ) which can be programmed as: _STAT = ABS (SQRT(_NUMOBS - 2) * _CORR / SQRT ( 1 - (_CORR * _CORR))); IF _NUMOBS > 100 OR _STAT > 40 THEN _P_VAL = ROUND ( 2 * (1 - PROBNORM (_STAT)),.00001); ELSE IF _STAT > . THEN _P_VAL = ROUND ( 2 * (1 - PROBT ( _STAT, _NUMOBS - 2 ,0 )),.00001); ELSE _P_VAL = .; At this point, we have obtained or calculated correlations and p-values that allow us to “timeplot”. Since we have p-value information (in sas data set &SASWORK.7 below), the analyst may desire to plot only significant correlations, usually given by a p-value threshold. The Timeplot code is: PROC TIMEPLOT DATA = &SASWORK.7; PLOT _CORR = "0" %IF &PARTIAL. = Y %THEN %DO K = 1 %TO &N_PRTLS.; MXPART&K. = "&K." %END; / OVERLAY NPP POS = 60 HILOC REF = 0 REFCHAR = '|' OVPCHAR = "*" AXIS = -1 TO 1 BY .02 ; ID _VARLBL ; /* VAR NAME + LABEL */ BY _WITH; /* SET OF WITH VARS */ TITLE2 %IF &PARTIAL. = Y %THEN "CORRS BY #BYVAL1, &N_PRTLS. PARTIALS REQUESTED"; %ELSE "CORRELATIONS BY #BYVAL1"; %STR(;) %IF &SGNFCNT. = Y %THEN TITLE3 "SIGNIFICANT CORRS 95% ONLY"; %STR(;) RUN;
  • 3. In this code, we request at least to plot the correlation between a set of ‘with’ and ‘var’ variables (_WITH, _CORR) identified in the plot by the value 0 (zero level correlation). If partial correlations are requested as well, calculated in a “PROC IML” step, (“%DO K = 1 %TO &N_PRTLS. …”), their values are identified by 1, 2, 3 … &N_prtls. in descending order, where &n_prtls. is a user determined parameter. The names of the variables partialled out corresponding to 1, 2, 3… are found in a later printout under the names PART1, PART2, PART3 … . We use * to denote overprinting (Ovpchar option). 3. Case Study. I present one case, without a ‘with’ variable. 5 The ‘with’ variable case is merely a subset of the more general case. All the variables are continuous and their meaning is unimportant for this exercise. The usual (clipped) printout of Proc Corr and the (clipped) Output data set generated in this case are: LN_DAYLN_DAYLN_DAYLN_DAY LN_DAY RESPONSE N_DAYLST N_DAYLS2 N_DAYSEX TOT_RCVDLN_DAY RESPONSE N_DAYLST N_DAYLS2 N_DAYSEX TOT_RCVDLN_DAY RESPONSE N_DAYLST N_DAYLS2 N_DAYSEX TOT_RCVDLN_DAY RESPONSE N_DAYLST N_DAYLS2 N_DAYSEX TOT_RCVD 1.00000 0.99097 0.92451 0.76645 0.72429 0.224471.00000 0.99097 0.92451 0.76645 0.72429 0.224471.00000 0.99097 0.92451 0.76645 0.72429 0.224471.00000 0.99097 0.92451 0.76645 0.72429 0.22447 0.0 0.0001 0.0001 00.0 0.0001 0.0001 00.0 0.0001 0.0001 00.0 0.0001 0.0001 0.0001 0.0001 0.0001.0001 0.0001 0.0001.0001 0.0001 0.0001.0001 0.0001 0.0001 26610 16057 26610 26610 26610 2661026610 16057 26610 26610 26610 2661026610 16057 26610 26610 26610 2661026610 16057 26610 26610 26610 26610 SEXUNKN N_INTRST TENURE V3 V1 V2SEXUNKN N_INTRST TENURE V3 V1 V2SEXUNKN N_INTRST TENURE V3 V1 V2SEXUNKN N_INTRST TENURE V3 V1 V2 ----0.21161 0.109580.21161 0.109580.21161 0.109580.21161 0.10958 ----0.053240.053240.053240.05324 ----0.01432 0.004370.01432 0.004370.01432 0.004370.01432 0.00437 ----0.001370.001370.001370.00137 0.0001 0.0001 0.0001 0.0195 0.4757 0.82280.0001 0.0001 0.0001 0.0195 0.4757 0.82280.0001 0.0001 0.0001 0.0195 0.4757 0.82280.0001 0.0001 0.0001 0.0195 0.4757 0.8228 26610 26610 26610 26610 26610 266126610 26610 26610 26610 26610 266126610 26610 26610 26610 26610 266126610 26610 26610 26610 26610 26610000 N_DAYLS2N_DAYLS2N_DAYLS2N_DAYLS2 N_DAYLS2 N_DAYLST N_DAYSEX LN_DAY RESPONSE TOT_RCVDN_DAYLS2 N_DAYLST N_DAYSEX LN_DAY RESPONSE TOT_RCVDN_DAYLS2 N_DAYLST N_DAYSEX LN_DAY RESPONSE TOT_RCVDN_DAYLS2 N_DAYLST N_DAYSEX LN_DAY RESPONSE TOT_RCVD 1.00000 0.95119 0.86207 0.76645 0.67704 0.199001.00000 0.95119 0.86207 0.76645 0.67704 0.199001.00000 0.95119 0.86207 0.76645 0.67704 0.199001.00000 0.95119 0.86207 0.76645 0.67704 0.19900 0.0 0.000.0 0.000.0 0.000.0 0.0001 0.0001 0.0001 0.0001 0.000101 0.0001 0.0001 0.0001 0.000101 0.0001 0.0001 0.0001 0.000101 0.0001 0.0001 0.0001 0.0001 38185 38185 38185 26610 22931 3818538185 38185 38185 26610 22931 3818538185 38185 38185 26610 22931 3818538185 38185 38185 26610 22931 38185 N_INTRST SEXUNKN TENURE V3 V1 V2N_INTRST SEXUNKN TENURE V3 V1 V2N_INTRST SEXUNKN TENURE V3 V1 V2N_INTRST SEXUNKN TENURE V3 V1 V2 0.02730 0.01862 0.009800.02730 0.01862 0.009800.02730 0.01862 0.009800.02730 0.01862 0.00980 ----0.00816 0.00204 0.001020.00816 0.00204 0.001020.00816 0.00204 0.001020.00816 0.00204 0.00102 0.0001 0.0003 0.0555 0.1109 0.6904 0.84230.0001 0.0003 0.0555 0.1109 0.6904 0.84230.0001 0.0003 0.0555 0.1109 0.6904 0.84230.0001 0.0003 0.0555 0.1109 0.6904 0.8423 38185 38185 38185 3818538185 38185 38185 3818538185 38185 38185 3818538185 38185 38185 38185 38185 3818538185 3818538185 3818538185 38185 The first line of numbers in the Proc Corr output is the corresponding correlation coefficients, while the second is the corresponding p-values. For the case of hundreds or thousands of variables, this presentation is non-informative, and the wrapping-around effect will make it tedious to review. It becomes more cumbersome when the analyst wants to simplify the task by only looking at correlations with significant p- values. In this light, we propose the following Timeplot-like output (which corresponds to the set of correlations associated with LN_DAY), adapted for visualization: ELI PLOT: CORRELATIONS BY LN_DAYELI PLOT: CORRELATIONS BY LN_DAYELI PLOT: CORRELATIONS BY LN_DAYELI PLOT: CORRELATIONS BY LN_DAY WITH:=LN_DAYWITH:=LN_DAYWITH:=LN_DAYWITH:=LN_DAY VAR_NAME_+_LABEL min maxVAR_NAME_+_LABEL min maxVAR_NAME_+_LABEL min maxVAR_NAME_+_LABEL min max ----1 11 11 11 1 ****------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------**** N_DAYLS2: |N_DAYLS2: |N_DAYLS2: |N_DAYLS2: | | 0 || 0 || 0 || 0 | N_DAYLST: #_days lst_clkth | | 0 |N_DAYLST: #_days lst_clkth | | 0 |N_DAYLST: #_days lst_clkth | | 0 |N_DAYLST: #_days lst_clkth | | 0 | N_DAYSEX: | | 0N_DAYSEX: | | 0N_DAYSEX: | | 0N_DAYSEX: | | 0 |||| N_INTRST: #_intrsts e_intr | | 0 |N_INTRST: #_intrsts e_intr | | 0 |N_INTRST: #_intrsts e_intr | | 0 |N_INTRST: #_intrsts e_intr | | 0 | RESPONSE: | | 0 |RESPONSE: | | 0 |RESPONSE: | | 0 |RESPONSE: | | 0 | SEXUNKN: |SEXUNKN: |SEXUNKN: |SEXUNKN: | 0 | |0 | |0 | |0 | | TENURE: # days since bec | 0 | |TENURE: # days since bec | 0 | |TENURE: # days since bec | 0 | |TENURE: # days since bec | 0 | | TOT_RCVD: tot rcvd e_rcvd | | 0TOT_RCVD: tot rcvd e_rcvd | | 0TOT_RCVD: tot rcvd e_rcvd | | 0TOT_RCVD: tot rcvd e_rcvd | | 0 |||| V1: | 0 |V1: | 0 |V1: | 0 |V1: | 0 | V2: | 0 |V2: | 0 |V2: | 0 |V2: | 0 | V3: |V3: |V3: |V3: | 0| |0| |0| |0| | ****------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------****
  • 4. The previous ELI plot illustrates the correlation patterns among the variables. ‘0’ marks direct (or zero order) correlations. The plot allows the ‘stepwise-prone’ analyst to focus directly on areas of high- correlation if interested in variable selection. In this case, N_Dayls2, N- daylst, N-daysex, etc. These areas will be the ones closer to the (-1, +1) axes. The midpoint of the plot marks the zero correlation mark. Further, for every “(with, var)” pair, we can also plot the four (or any number so desired) largest 1st order partial correlations, denoted by the numbers 1 through 4. Overlaps are denoted by ‘*’. The printout titled “DIRECT & PARTIAL VAR NAMES” details the names of the variables for each of the plotted correlations. ELI PLOT: CORRS BY LN_DAY, 4 PARTIALS REQUESTEDELI PLOT: CORRS BY LN_DAY, 4 PARTIALS REQUESTEDELI PLOT: CORRS BY LN_DAY, 4 PARTIALS REQUESTEDELI PLOT: CORRS BY LN_DAY, 4 PARTIALS REQUESTED WITH:=LN_DAYWITH:=LN_DAYWITH:=LN_DAYWITH:=LN_DAY VAR_NAME_+_LABEL mVAR_NAME_+_LABEL mVAR_NAME_+_LABEL mVAR_NAME_+_LABEL min maxin maxin maxin max ----1 11 11 11 1 ****------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------**** N_DAYLS2: | 2N_DAYLS2: | 2N_DAYLS2: | 2N_DAYLS2: | 2------------------------------------------------------------------------------------------------------------||||------------------------------------------------------------------------------------*3*3*3*3----------------1 |1 |1 |1 | N_DAYLST: #_days lst_clkth | | *3* |N_DAYLST: #_days lst_clkth | | *3* |N_DAYLST: #_days lst_clkth | | *3* |N_DAYLST: #_days lst_clkth | | *3* | N_DAYSEX:N_DAYSEX:N_DAYSEX:N_DAYSEX: | | *| | *| | *| | *------------1 |1 |1 |1 | N_INTRST: #_intrsts e_intr | | ** |N_INTRST: #_intrsts e_intr | | ** |N_INTRST: #_intrsts e_intr | | ** |N_INTRST: #_intrsts e_intr | | ** | RESPONSE: | |RESPONSE: | |RESPONSE: | |RESPONSE: | | * |* |* |* | SEXUNKN: | 1SEXUNKN: | 1SEXUNKN: | 1SEXUNKN: | 1--------------------------------****------------40 | |40 | |40 | |40 | | TENURE: # days since bec | *3* | |TENURE: # days since bec | *3* | |TENURE: # days since bec | *3* | |TENURE: # days since bec | *3* | | TOT_RCVD: tot rcvd e_rcvdTOT_RCVD: tot rcvd e_rcvdTOT_RCVD: tot rcvd e_rcvdTOT_RCVD: tot rcvd e_rcvd | | *1 || | *1 || | *1 || | *1 | V1: | 1V1: | 1V1: | 1V1: | 1----* |* |* |* | V2: | *V2: | *V2: | *V2: | * |||| V3: | *|1 |V3: | *|1 |V3: | *|1 |V3: | *|1 | ****------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------**** ELI PLOT: CORRS BY N_DAYLS2, 4 PELI PLOT: CORRS BY N_DAYLS2, 4 PELI PLOT: CORRS BY N_DAYLS2, 4 PELI PLOT: CORRS BY N_DAYLS2, 4 PARTIALS REQUESTEDARTIALS REQUESTEDARTIALS REQUESTEDARTIALS REQUESTED WITH:=N_DAYLS2WITH:=N_DAYLS2WITH:=N_DAYLS2WITH:=N_DAYLS2 VAR_NAME_+_LABEL min maxVAR_NAME_+_LABEL min maxVAR_NAME_+_LABEL min maxVAR_NAME_+_LABEL min max ----1 11 11 11 1 ****------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------**** LN_DAY: | 2LN_DAY: | 2LN_DAY: | 2LN_DAY: | 2------------------------------------------------------------------------------------------------------------||||------------------------------------------------------------------------------------*3*3*3*3----------------1 |1 |1 |1 | N_DAYLST: #_days lst_clkth |N_DAYLST: #_days lst_clkth |N_DAYLST: #_days lst_clkth |N_DAYLST: #_days lst_clkth | | ** || ** || ** || ** | N_DAYSEX: | | *N_DAYSEX: | | *N_DAYSEX: | | *N_DAYSEX: | | *----1 |1 |1 |1 | N_INTRST: #_intrsts e_intr | 1*4|0 |N_INTRST: #_intrsts e_intr | 1*4|0 |N_INTRST: #_intrsts e_intr | 1*4|0 |N_INTRST: #_intrsts e_intr | 1*4|0 | RESPRESPRESPRESPONSE: | *ONSE: | *ONSE: | *ONSE: | *------------------------------------------------------------------------------------------------------------||||----------------------------------------------------------------------------*3 |*3 |*3 |*3 | SEXUNKN: | 1SEXUNKN: | 1SEXUNKN: | 1SEXUNKN: | 1------------------------------------------------------------|0|0|0|0------------------------*2 |*2 |*2 |*2 | TENURE: # days since bec |TENURE: # days since bec |TENURE: # days since bec |TENURE: # days since bec | 40404040----* |* |* |* | TOT_RCVD: tot rcvd e_rcvd | | * |TOT_RCVD: tot rcvd e_rcvd | | * |TOT_RCVD: tot rcvd e_rcvd | | * |TOT_RCVD: tot rcvd e_rcvd | | * | V1: | 1* |V1: | 1* |V1: | 1* |V1: | 1* | VVVV2: | * |2: | * |2: | * |2: | * | V3: | * |V3: | * |V3: | * |V3: | * | ****------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------**** Let us concentrate on a specific example. For instance, the first line of the first diagram above (shown just below for clarity of exposition) plots LN_DAY (‘with’ variable) against N_DAYLS2, and four first-order partials in decreasing absolute order of magnitude. The correlations are joined by hyphens that allow for a more compact view. ‘1’ in the first line of the graph corresponds to the correlation between LN_DAY and N_DAYLS2 after partialling out RESPONSE (which corresponds to variable PART1 in the first observation of the printout below). ‘2’
  • 5. corresponds to the next largest absolute partial correlation, which corresponds to N_DAYLST, etc. In the diagram, there is an overlap between the zero-order correlation and the partial corresponding to N_INTRST (PART4), denoted by ‘*’. Given the distance of all these correlations from the mid-point of zero correlation, the analyst might deem these variables worth for further study. While p-values for direct correlations are given in a tabulate below, corresponding p-values for the partial correlations are not calculated at present. WITH:=LN_DAYWITH:=LN_DAYWITH:=LN_DAYWITH:=LN_DAY VAR_NAME_+_LABEL min maxVAR_NAME_+_LABEL min maxVAR_NAME_+_LABEL min maxVAR_NAME_+_LABEL min max ----1 11 11 11 1 ****------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------**** N_DAYLS2: | 2N_DAYLS2: | 2N_DAYLS2: | 2N_DAYLS2: | 2------------------------------------------------------------------------------------------------------------||||------------------------------------------------------------------------------------*3*3*3*3----------------1 |1 |1 |1 | DIRECT & PARTIAL VAR NAMESDIRECT & PARTIAL VAR NAMESDIRECT & PARTIAL VAR NAMESDIRECT & PARTIAL VAR NAMES WITH=LN_DAYWITH=LN_DAYWITH=LN_DAYWITH=LN_DAY OBS VAR PART1 PART2 PART3 PART4OBS VAR PART1 PART2 PART3 PART4OBS VAR PART1 PART2 PART3 PART4OBS VAR PART1 PART2 PART3 PART4 1 N_DAYLS2 RESPONSE N_DAYLST SEXUNKN N_INTRST 2 N_DAYLST RESPONSE N_DAYLS2 SEXUNKN TENURE 3 N_DAYSEX SEXUNKN TENURE N_INTRST V1 4 N_INTRST N_DAYLS2 N_DAYLST N_DAYSEX V3 5 RESPONSE N_DAYLST N_DAYLS2 V1 TENURE 6 SEXUNKN N_DAYSEX N_DAYLST N_DAYLS2 TOT_RCVD 7 TENURE N_DAYLST N_DAYSEX N_DAYLS2 RESPONSE 8 TOT_RCVD SEXUNKN TENURE V3 V1 9 V1 RESPONSE N_DAYSEX N_DAYLST TOT_RCVD 10 V2 RESPONSE N_DAYLS2 N_DAYLST N_DAYSEX 11 V3 RESPONSE TOT_RCVD N_INTRST V2 WITH=N_DAYLS2WITH=N_DAYLS2WITH=N_DAYLS2WITH=N_DAYLS2 OBS VAR PART1 PART2 PART3 PART4OBS VAR PART1 PART2 PART3 PART4OBS VAR PART1 PART2 PART3 PART4OBS VAR PART1 PART2 PART3 PART4 12 LN_DAY RESPONSE N_DAYLST SEXUNKN N_INTRST 13 N_DAYLST LN_DAY RESPONSE SEXUNKN N_INTRST 14 N_DAYSEX SEXUNKN TENURE V1 V2 15 N_INTRST N_DAYLST LN_DAY RESPONSE TOT_RCVD 16 RESPONSE LN_DAY N_DAYLST SEXUNKN N_INTRST 17 SEXUNKN N_DAYSEX N_DAYLST LN_DAY RESPONSE 18 TENURE LN_DAY N_DAYLST RESPONSE N_DAYSEX 19 TOT_RCVD N_INTRST V3 V1 V2 20 V1 RESPONSE N_DAYSEX TOT_RCVD TENURE 21 V2 N_DAYLST LN_DAY N_DAYSEX RESPONSE 22 V3 TOT_RCVD SEXUNKN TENURE N_INTRST ELI plots allow for a different configuration as well. Instead of plotting the largest first-order partial correlations in addition to the zero order one, we can plot the largest of the first-order, second largest, third largest, etc. For the sake of brevity, this excursion is omitted. Finally, and for documentation purposes, the correlation coefficients and corresponding p-values are also tabulated
  • 6. : UPPER TRIANGULAR MATRIX ALPHABETICALLY ORDERED „ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒ…ƒƒƒƒƒƒ…ƒƒƒƒƒƒ…ƒƒƒƒƒƒ…ƒƒƒƒƒƒ…ƒƒƒƒƒƒ…ƒƒƒƒƒƒ…ƒƒƒƒƒƒ…ƒƒƒƒƒƒ† ‚CORRELATIONS ‚ ‚ ‚#_days‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚lst_c-‚ ‚#_int-‚ ‚ ‚# days‚ ‚ ‚ ‚ ‚ ‚lkthru‚ ‚ rsts ‚ ‚ ‚since ‚ tot ‚ ‚ ‚ ‚N_DAY-‚&_dec-‚N_DAY-‚e_int-‚RESPO-‚SEXUN-‚became‚ rcvd ‚ ‚ ‚LN_DAY‚ LS2 ‚.16.99‚ SEX ‚ rs2 ‚ NSE ‚ KN ‚member‚e_rcvd‚ ‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒ‰ ‚VARIABLE ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ‰ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚LN_DAY ‚ ‚ 0.77‚ 0.92‚ 0.73‚ 0.10‚ 0.99‚ -0.21‚ -0.04‚ 0.23‚ ‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒ‰ ‚N_DAYLS2 ‚ ‚ ‚ 0.95‚ 0.86‚ 0.03‚ 0.68‚ 0.02‚ 0.01‚ 0.20‚ ‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒ‰ ‚N_DAYLST ‚ ‚ ‚ ‚ 0.85‚ 0.06‚ 0.87‚ -0.08‚ -0.01‚ 0.22‚ ‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒ‰ ‚N_DAYSEX ‚ ‚ ‚ ‚ ‚ 0.03‚ 0.66‚ 0.32‚ 0.03‚ 0.22‚ ‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒ‰ ‚N_INTRST ‚ ‚ ‚ ‚ ‚ ‚ 0.12‚ -0.07‚ -0.05‚ 0.29‚ ‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒ‰ ‚RESPONSE ‚ ‚ ‚ ‚ ‚ ‚ ‚ -0.24‚ -0.04‚ 0.22‚ ‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒ‰ ‚SEXUNKN ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ 0.11‚ 0.06‚ ‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒ‰ ‚TENURE ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ 0.00‚ ‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒ‰ ‚TOT_RCVD ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ Šƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒ‹ƒƒƒƒƒƒ‹ƒƒƒƒƒƒ‹ƒƒƒƒƒƒ‹ƒƒƒƒƒƒ‹ƒƒƒƒƒƒ‹ƒƒƒƒƒƒ‹ƒƒƒƒƒƒ‹ƒƒƒƒƒƒŒ
  • 7. „ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒ…ƒƒƒƒƒƒ…ƒƒƒƒƒƒ…ƒƒƒƒƒƒ…ƒƒƒƒƒƒ…ƒƒƒƒƒƒ…ƒƒƒƒƒƒ…ƒƒƒƒƒƒ…ƒƒƒƒƒƒ† ‚P_VALS OF CORRS ‚ ‚ ‚#_days‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚lst_c-‚ ‚#_int-‚ ‚ ‚# days‚ ‚ ‚ ‚ ‚ ‚lkthru‚ ‚ rsts ‚ ‚ ‚since ‚ tot ‚ ‚ ‚ ‚N_DAY-‚&_dec-‚N_DAY-‚e_int-‚RESPO-‚SEXUN-‚became‚ rcvd ‚ ‚ ‚LN_DAY‚ LS2 ‚.16.99‚ SEX ‚ rs2 ‚ NSE ‚ KN ‚member‚e_rcvd‚ ‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒ‰ ‚VARIABLE ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ‰ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚LN_DAY ‚ ‚ 0.000‚ 0.000‚ 0.000‚ 0.000‚ 0.000‚ 0.000‚ 0.000‚ 0.000‚ ‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒ‰ ‚N_DAYLS2 ‚ ‚ ‚ 0.000‚ 0.000‚ 0.000‚ 0.000‚ 0.000‚ 0.056‚ 0.000‚ ‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒ‰ ‚N_DAYLST ‚ ‚ ‚ ‚ 0.000‚ 0.000‚ 0.000‚ 0.000‚ 0.028‚ 0.000‚ ‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒ‰ ‚N_DAYSEX ‚ ‚ ‚ ‚ ‚ 0.000‚ 0.000‚ 0.000‚ 0.000‚ 0.000‚ ‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒ‰ ‚N_INTRST ‚ ‚ ‚ ‚ ‚ ‚ 0.000‚ 0.000‚ 0.000‚ 0.000‚ ‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒ‰ ‚RESPONSE ‚ ‚ ‚ ‚ ‚ ‚ ‚ 0.000‚ 0.000‚ 0.000‚ ‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒ‰ ‚SEXUNKN ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ 0.000‚ 0.000‚ ‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒ‰ ‚TENURE ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ 0.831‚ ‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒ‰ ‚TOT_RCVD ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ Šƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒ‹ƒƒƒƒƒƒ‹ƒƒƒƒƒƒ‹ƒƒƒƒƒƒ‹ƒƒƒƒƒƒ‹ƒƒƒƒƒƒ‹ƒƒƒƒƒƒ‹ƒƒƒƒƒƒ‹ƒƒƒƒƒƒŒ Since many correlations may not be significant at an alpha level of, say, 95%, the ELI graphs can be made to portray significant correlations only. In our example however, we presented all possible effects with corresponding partial correlations. 6. Trademarks. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. indicates USA registration. Other brand and product names are registered trademarks or trademarks of their respective companies. 7. End Notes. 1 Data mining has often been defined as the search for patterns, interesting or otherwise. Curiously, “interesting” is in the eye of the beholder, and patterns are not well defined. Ergo, any tool that purports to find interesting patterns belongs under the rubric of data mining, which thus cannot properly define any scientific application, since almost anything can belong to it. My own preference is “Giga-data analysis” (as opposed to the more traditional statistician’s “small data set analysis”). It is in this spirit that I envision this paper. Since information from data requires the processes of summarization, conceptualization, interpretation and application, the data analyst victorious in all these steps after successful perusal of reams of pages might require hospitalization as well 2 Yes, I am that old. This paper deals only with Pearson correlation coefficients, but the additional use of other measures contained in Proc Corr is straightforward. Programming Timeplot-like diagrams in other software should not pose an insurmountable task. I created my first diagram in Basic in 1980. Additionally, the adjustment necessary for correlations among continuous and categorical as well as among categorical variables can be easily added. 3 I consider the name Timeplot a limiting and misleading denomination. C’est la vie. 5 Partial correlations can also be understood as the correlation between the residuals of a regression between Y and X, and between Y and Z. See Cohen and Cohen (1983) for an overall discussion, and Leahy (1996) for suppression effects in the area of data base marketing. 6 The skillful programmer might be enticed to utilize Proc Printto. My preference for a more arduous route is based on the additional flexibility provided to enhance the overall procedure, such as including partial correlations in one step, multiple comparisons of correlations, Drezner’s Multirelation (1995), etc. Missing values are excluded from the calculation of correlations in a pair-wise form. For a proposed solution to the problem of missing values in the context of large databases, see Auslender (1997).
  • 8. 7 The macro at present accepts only one ‘with’ variable. It is a straightforward modification to enhance the code to accept multiple ‘with’ variables. 8. Bibliography Auslender L., Missing Value Imputation Methods for Large DataBases, Proceedings of the 1997 northeastern SAS Users Group Meeting, 1997. Cleveland W., Visualizing Data, Hobart Press, USA, 1993. Cohen J., Cohen P. Applied Multiple Regression/Correlation Analysis for the Behavioral Sciences, Lawrence Erlbaum Associates, Publishers, 1983. Drezner, Z., Multirelation – Correlation among more than two variables, Computational Statistics and Data Analysis, 1995, March. Hoaglin D., Mosteller F., Tukey J., Understanding Robust and Exploratory Data Analysis, John Wiley & Sons, 1983. Leahy K., Nature, prevalence, and benefits of suppression effects in direct response segmentation, Proceedings of the American Statistical Association 1995 Meeting, 1996. 9. Contact Information Your comments and questions are valued and encouraged. Contact the author at: Leonardo E. Auslender SAS Institute 1545 Rt. 206 N, Suite 270 Bedminster, NJ 07921 908 470 0080 x 8217 (o) 908 470 0081 (f) leonardo.auslender@sas.com