/* __________________________________________________________________________ | | | SAS macro to calculate estimates | | of the heterogeneity linear mixed model | | (i.e. of the linear mixed model with finite normal mixtures | | as random effects distribution), | | version 1.1 | |__________________________________________________________________________| __________________________________________________________________________ | | | Prepared by: Arnost KOMAREK | | Biostatistical Centre | | Katholieke Universiteit Leuven | | Leuven, Belgium | | | | e-mail: arnost.komarek@med.kuleuven.ac.be | | | | | | Geert VERBEKE | | Biostatistical Centre | | Katholieke Universiteit Leuven | | Leuven, Belgium | | | | e-mail: geert.verbeke@med.kuleuven.ac.be | |__________________________________________________________________________| Reference: Verbeke, G. and Molenberghs, G. (2000) Linear Mixed Models for Longitudinal Data. New York: Springer-Verlag, pp. 169-188. Syntax notes: DATA: name of the SAS dataset which contains the data, structured as required by the SAS procedure MIXED (1999, version 8). SUBJECT: the variable with identification numbers of the subjects in the data set. REPEATED: variable indicating ordering of observations within subject (time variable). Note that this variable is treated as the categorical one ('CLASS') inside the macro. Thus if you want to use this time variable in a model in its continuous form, you have to have two replications of such variable. RESPONSE: the response variable. FIXED: the fixed effects in the model. Note that, in contrast to most regression procedures in SAS, no intercept is specified by default. Hence, if an intercept is to be included it has to be specified explicitly as one of the effects in the `fixed=' statement. RANDOM: the random effects in the model. Note that, as is the case of the SAS procedure MIXED, no intercept is specified by default. Hence, if an intercept is to be included it should be specified explicitly as one of the effects in the `random=' statement. IMPORTANT NOTE: the user is encouraged to avoid random effect names that finish by a number, e.g. time2. Such effect labels will cause errors of the macro. TYPEREP: the type of the variance matrix in repeated statement (matrix Sigma_i). All matrices available within proc mixed are allowed. The default value is `simple'. TYPERAND: the type of the variance matrix in random statement (matrix D). All matrices available within proc mixed are allowed. The default value is `un' (unstructured). G: the number of populations. The default value is 1. AMIN: the parameter for multiplication of the second part of the objective function used for the first EM algorithm. AMAX: maximal allowed value of A (the parameter for multiplication of the second part of the objective function. ABY: a step for increase of the A value if one EM algorithm converge. DECISWIT: ("decision within"), when to finish one EM algorithm with one A value. Decision can be based either on absolute difference of values of the objective function in two last iterations (decision=1, default value) or on average absolute difference between estimates of all unknown parameters in two last iterations (decision=2) or on maximal absolute difference between estimates of all unknown parameters in two last iterations (decision=3) or on absolute difference of values of the objective function in two last iterations (decision=3). DECISBET: ("decision between"), when to finish completely. Last iteration of the EM algorith with A and the first iteration of the EM algorithm with A+ABY are compared. The same choices as for DECISWIT are allowed. STOPWIT: ("stoprule within"), one EM algorithm finishes when chosen absolute difference (see DECISWIT) is smaller than STOPWIT. STOPBET: ("stoprule between"), the all iteration process finishes when chosen absolute difference between the last iteration with A and the first iteration with A+ABY (see DECISBET) is smaller than STOPBET. MAXITER: maximal number of iterations of all EM algorithms. MIXEDMI: maximal number of iterations of PROC MIXED within each M step of EM algorithm. The default value is 50 which is also the default value of PROC MIXED. INITPOST: data set of initial posterior probabilities (optional). The data set with one variable corresponding to &subject and &G variables corresponding to number of population. The name of the variable &subject has to be the same in both &data and &initpost data sets. Names of variables with initial probabilities have to be 'POST1','POST2',...,'POSTg'. The data set from endpost created during previous invocation of macro HetMixed can be used. ENDPOST: name of the data set where posterior probabilities after the last iteration of EM algorithm are to be stored (optional). EB: name of the data set where empirical Bayes estimates of random effects are to be saved. If not specified, EB estimates of random effects are not computed. PIEPS: parameter for drawing initial posterior probabilities. See a part of the macro where a matrix waarpost is created for more details. IMPORTANT NOTE 1: dummy variables for categorical covariates have to be formed "by hand" in a separate data step before the macro is invoked. IMPORTANT NOTE 2: in contrast to SAS procedure MIXED, all effects included in `random' statement MUST NOT be included in `fixed' statement. All random effects are assumed implicitely to be present also as fixed effects in the model. Moreover, all effects from the `random' statement must not be expressable as linear combinations of arbitrary effects from the `fixed' statement. As a consequence, only "contrast" parametrization is allowed. E x a m p l e: suppose one wants to distinguish "young" and "old" subjects. It is possible to define two AGE variables. AGE1 which takes value 1 if subject is young and 0 otherwise and AGE2 which takes value 1 if subject is old and 1 otherwise. Let the data set contains also a variable INT which takes always value 1. Suppose one wish to fit a random intercept model with fixed effects that distinguish young and old subjects. There are at least two equivalent parametrizations of such model. a) response = beta0*INT + beta1*AGE1 + b0*INT; b) response = beta1*AGE1 + beta2*AGE2 + b0*INT, where beta's state for fixed effects and b0 for the random effect. The only possible parametrization for this macro is the first one and one has to specify "FIXEDEF = AGE1" and "RANDEF = INT". NOTE 3: number of subjects in the multiplied data set must not exceed 32767 (restriction given by proc mixed for obtaining requested covariance matrices). */ %macro HetMixed(DATA = , SUBJECT = , REPEATED = , RESPONSE = , FIXED = , RANDOM = , TYPEREP = simple, TYPERAND = un, G = 1, AMIN = 10, AMAX = &AMIN, ABY = 10, DECISWIT = 1, DECISBET = &DECISWIT, STOPWIT = 0.00001, STOPBET = 0.0001, MAXITER = 100, MIXEDMI = 50, INITPOST = , ENDPOST = , EB = , PIEPS = 0.1); /* --- default data set --- */ %if %bquote(&data)= %then %let data=&syslast; /* in the following, the data set _usedata will be used instead of &data to save the original data set from changes */ %global error; %let error=0; /* --- Creating the data set _usedata --- */ /* --- Remove all rows with missing response from the original data set --- */ /* --- Sort the data set according to &subject --- */ /* --- Compute numbers of observations per subject --- */ %initdata; /* --- Put all observation of the subject with the most observations to the beginning of the data set _usedata --- */ %reorg; /* --- Compute (take) initial estimates of posterior probabilities --- */ %initpis; /* --- Labels for new random effects are created. --- */ /* --- Also macro variables &numran and &numfix containing numbers of random and fixed effects are created. --- */ %addnumb; /* --- EM algorithm to compute estimates --- */ %global stopmax; %global iterat; %global Qvalue; %global adiffer; %global differ; %global AverPDif; %global MaxPDif; %global stopEM; %global decision; %let decision=&DECISWIT; %global stoprule; %let stoprule=&STOPWIT; %global A; %let A=&AMIN; %firstEM; %nextEM; /* --- Compute final likelihood and log-likelihood. --- */ %Lhood; /* --- Compute final estimates of beta's and mu's paramaters. --- */ %mubetas; /* --- Compute and save empirical Bayes estimates of random effects if the user wants them. --- */ %EmpBayes; /* --- Prepare and print output. --- */ %if %eval(&error=0) %then %output; %else %put Computation was interruped due to errors.; %mend HetMixed; /********************************************************************************/ /*** SubMacros ***/ /********************************************************************************/ /* ##################################################################################### */ /*** --- Create new data set _usedata and remove all observations with missing values --- ***/ %macro initdata; /* --- creating the data set _usedata --- */ /* --- remove all rows with missing response from the original data set --- */ data _usedata; set &data (where=(&response<>.)); keep &subject &repeated &response &fixed &random; run; /* --- sort the data set according to &subject --- */ proc sort data=_usedata; by &subject &repeated; run; /* --- sort initial posterior probabilities (if there are some) according to &subject --- */ %if %bquote(&initpost)= %then %let koe=0; /* macro variable that will never be used */ %else %do; proc sort data=&initpost; by &subject; run; %end; /* --- compute numbers of observations per subject --- */ proc freq data=_usedata; tables &subject / out=_uit noprint; run; /* the data set _uit contains variables &subject, COUNT (number of observations per subject) and PERCENT (obs. per subject/all obs.) */ %mend initdata; /* ################################################################################# */ /*** --- Reorganize data --- ***/ /*** Put the subject with the highest number of observations to the ***/ /*** beginning of the data set. ***/ /* If more subjects have the same maximal number of observations, only the first one is put to the beginning of the data set. */ /* Such operation is useful due to the Output Delivery System to have complete estimates of variance matrices for all subjects from proc mixed. */ /* The same operation has to be done also with the data set &initpost (if there is some). */ %macro reorg; proc iml; reset noprint; use _usedata; read all into datamat; etiketen=contents(_usedata); close _usedata; use _uit; read all var {count} into aantal; read all var {&subject} into idvector; close _uit; /* aantal is a vector with numbers of observations per subject */ individu=nrow(aantal); /* number of subjects */ %if %bquote(&initpost)= %then %do; postmat=j(individu,&G+1,999); %end; %else %do; use &initpost; read all into postmat; postetik=contents(&initpost); close &initpost; %end; hoger=max(aantal); /* maximal number of observations */ indicate=(aantal=hoger); meest=loc(indicate)[1]; /* index of the subject with the most observations */ if (meest>1) then voor=sum(aantal[1:(meest-1)]); else voor=0; /* number of observations that preced all obs. of the subject with the maximal number of obs. */ if (meest0)&(naar>0)) then nieuwdat=datamat[(voor+1):(voor+hoger),]//datamat[1:voor,]//datamat[(voor+hoger+1):(voor+hoger+naar),]; else if ((voor=0)&(naar>0)) then nieuwdat=datamat; else if ((voor>0)&(naar=0)) then nieuwdat=datamat[(voor+1):(voor+hoger),]//datamat[1:voor,]; else nieuwdat=datamat; create _usedata var etiketen; append from nieuwdat; /* --- reorganize numbers of observations per subject --- */ /* --- it is not possible to use proc freqnow! --- */ /* --- also reorganize initial posterior probabilities --- */ nieuwpos=postmat; if (meest=individu) then do; id1=idvector[meest]; id2=idvector[1:(meest-1)]; idvector=id1//id2; aantal1=aantal[meest]; aantal2=aantal[1:(meest-1)]; aantal=aantal1//aantal2; nieuwpos1=postmat[meest,]; nieuwpos2=postmat[1:(meest-1),]; nieuwpos=nieuwpos1//nieuwpos2; end; else if (meest>1) then do; id1=idvector[meest]; id2=idvector[1:(meest-1)]; id3=idvector[(meest+1):individu]; idvector=id1//id2//id3; aantal1=aantal[meest]; aantal2=aantal[1:(meest-1)]; aantal3=aantal[(meest+1):individu]; aantal=aantal1//aantal2//aantal3; nieuwpos1=postmat[meest,]; nieuwpos2=postmat[1:(meest-1),]; nieuwpos3=postmat[(meest+1):individu,]; nieuwpos=nieuwpos1//nieuwpos2//nieuwpos3; end; percvec=j(individu,1,999); /* vector of percents that will be used never more */ nieuwuit=idvector||aantal||percvec; create _uit var {&subject 'Count' 'Percent'}; append from nieuwuit; %if %bquote(&initpost)= %then %let koe=0; /* macro variable that will never be used */ %else %do; create &initpost var postetik; append from nieuwpos; %end; quit; %mend reorg; /* ############################################################################### */ /*** --- Compute initial estimates of posterior probabilities. --- ***/ /* --- Creating matrix waarpost with initial posterior probabilities. --- */ /* --- Initial posterior probabilities are drawn from uniform distribution on interval 1/&G plus minus (1/(1+&pieps))*(1/&G). --- */ /* --- Also create matrix waarpi with probabilities pi, but this matrix is created only due to labels of variables. Real initial estimates of population probabilities pi are computed later using macro %piest. The last matrix that is created is matrix gewicht. It is also created only due to labels of variables. --- */ /* Content of these matrices is subsequently transformed into data sets _postset, _piset, _weigset. */ /* Data set _postset will be always used for current estimates of posterior probabilities, data set _piset for current estimates of population probabilities and data set _weigset for currently used weights, i.e. &A*posterior probabilities. */ %macro initpis; proc iml; reset noprint; use _uit; read all var {count} into aantal; close _uit; /* aantal is a vector with numbers of observations per subject */ individu=nrow(aantal); /* number of subjects */ elkew=1/&G; /* each initial posterior prob. will be approximately equal to 1/&G */ waarpi=j(1,&G,elkew); eenvec=j(&G,1,1); /* column vector of all ones */ hulp=j(individu,&G,999); /* initial value */ hulp=uniform(hulp); epsilon=(1/(1+&pieps))*(1/&G); hulp=(1/&G)+(2*epsilon)#hulp-epsilon; sumvec=hulp*eenvec; /* standardizing constant for each subject (N x 1 column vector) */ sumcol=t(eenvec)@sumvec; /* N x g matrix */ waarpost=hulp/sumcol; %let poststr=%str(post); %let pistr=%str(pi); %let weigstr=%str(weight); labelpos={'POST1'}; labelpi={'PI1'}; labelwei={'WEIGHT1'}; %if %eval(&G>1) %then %do imacro=2 %to &G %by 1; labelpos=labelpos||{%bquote(&poststr&imacro)}; labelpi=labelpi||{%bquote(&pistr&imacro)}; labelwei=labelwei||{%bquote(&weigstr&imacro)}; %end; create _postset var labelpos; append from waarpost; create _piset var labelpi; append from waarpi; create _weigset var labelwei; append from waarpost; quit; /* Check if there are initial posterior probabilities from the user. */ %if %bquote(&initpost)= %then %do; data _postset; set _postset; run; %end; %else %do; data _postset; set &initpost; drop &subject; run; %end; data _subvar; set _uit; keep &subject; run; data _postout1; /* add &subject into data set with initial posterior prob. */ merge _subvar _postset; run; proc sort data=_postout1; by &subject; run; proc print data=_postout1; title1 'The SAS System'; title2 ' '; title3 'The HetMixed Macro'; title4 'Initial Posterior Probabilities'; run; title; %mend initpis; /* ############################################################################ */ /* --- Macro to add numbers (indeces) to variable names of random effects. --- */ /* It will be useful when creating interactions between random effects and population indicators. */ /* It also creates a new macro variable &newrand with labels of new random effects (interactions between original random effects and population indicators). */ /* Also numbers of fixed and random effects are computed. */ %macro addnumb; %global newrand; %global numran; /* number of random effects (not multiplied by &G) */ %global numfix; /* number of fixed effects */ %let newrand=; %do pop=1 %to %eval(&G) %by 1; %let j=1; %do %while(%length(%scan(&random,&j,' '))); %let var=%scan(&random,&j,' '); %let newrand= &newrand &var&pop; %let j=%eval(&j+1); %end; %end; %let numran=%eval(&j-1); %let j=1; %do %while(%length(%scan(&fixed,&j,' '))); %let var=%scan(&fixed,&j,' '); %let j=%eval(&j+1); %end; %let numfix=%eval(&j-1); %mend addnumb; /* ########################################################################### */ /*** --- 1st iteration of EM algorithm --- ***/ %macro firstEM; %multdata; %piest; %put EM algorithm reached the iteration number 1.; %put Currently used value of A is equal to &A%str(.); %gammaest; proc iml; reset noprint; use _objfun1; read all into objfun1; close _objfun1; use _objfun2; read all into objfun2; close _objfun2; oldobjfn=objfun1+objfun2; create _oldobfn var {Qold}; /* data set with value of objective function from */ append from oldobjfn; /* this iteration */ hulp={&A}||objfun1||objfun2||oldobjfn||{.}||{.}||{.}; create _history var {A Q1 Q2 Q QDiff AverPDif MaxPDif}; /* data set with iteration history */ append from hulp; quit; %let Qvalue=.; data _oldobfn; set _oldobfn; call symput('Qvalue',Qold); run; %let differ=.; %let AverPDif=.; %let MaxPDif=.; %mend firstEM; /* ####################################################################################### */ /*** --- Next iterations of EM algorithm --- ***/ %macro nextEM; %let stopmax=0; /* stopmax will be number of iterations if iterations finish due to the stop rule */ %let iterat=2; /* and 0 if iterations finish by reaching &maxiter */ %let skipA=0; /* skipA will be 0 if the previous A is the same as the current A and 1 if the previous A is smaller than the current A, i.e. EM algorithm with a new (higher) A started */ %do %while(&iterat<%eval(&maxiter+1)); data _oldpis; /* store estimates from the previous iteration */ set _piset; run; data _oldmus; set _Muset; run; data _oldcovp; set _CovPset; run; %multdata; %piest; %put ****************************************************************************; %put ; %put EM algorithm reached the iteration number &iterat%str(.); %put Currently used value of A is equal to &A%str(.); %put ; %put The value of the objective function in the previous iteration; %put was equal to &Qvalue%str(.); %put ; %put Difference between values of objective function in previous two iterations; %put was equal to &differ%str(.); %put ; %put Average absolute difference between estimates in previous two iterations; %put was equal to &AverPDif%str(.); %put ; %put Maximal absolute difference between estimates in previous two iterations; %put was equal to &MaxPDif%str(.); %put ; %put ***************************************************************************; %gammaest; proc iml; reset noprint; /*--- Computation of a difference between new and old value of the objective function. ---*/ use _objfun1; read all into objfun1; close _objfun1; use _objfun2; read all into objfun2; close _objfun2; use _oldobfn; read all into oldobjfn; close _oldobfn; newobjfn=objfun1+objfun2; adiffer=abs(newobjfn-oldobjfn); differ=newobjfn-oldobjfn; create _differ var {diff}; append from differ; create _adiffer var {adiff}; append from adiffer; create _oldobfn var {Qold}; /* data set with value of objective function from */ append from newobjfn; /* this iteration */ /*--- Computation of an average absolute difference between new and old estimates ---*/ use _oldpis; read all into oldpis; close _oldpis; use _oldmus; read all var {Estimate} into oldmus; close _oldmus; use _oldcovp; read all var {Estimate} into oldcovp; close _oldcovp; use _piset; read all into newpis; close _piset; use _Muset; read all var {Estimate} into newmus; close _Muset; use _CovPset; read all var {Estimate} into newcovp; close _CovPset; parmaant=ncol(oldpis)+nrow(oldmus)+nrow(oldcovp); /* number of parameters */ oldall=t(oldpis)//oldmus//oldcovp; newall=t(newpis)//newmus//newcovp; adiffpi=abs(sum(newpis-oldpis)); adiffmu=abs(sum(newmus-oldmus)); adiffcov=abs(sum(newcovp-oldcovp)); adiffpar=(1/parmaant)*(adiffpi+adiffmu+adiffcov); create _adifpar var {adiffpar}; append from adiffpar; maxdiffp=max(abs(newall-oldall)); create _mdifpar var {mdiffpar}; append from maxdiffp; /*--- Add information into iteration history. ---*/ hulp={&A}||objfun1||objfun2||newobjfn||differ||adiffpar||maxdiffp; edit _history var {A Q1 Q2 Q QDiff AverPDif MaxPDif}; /* data set with iteration history */ append from hulp; close _history; quit; data _differ; set _differ; call symput('differ',diff); run; data _adiffer; set _adiffer; call symput('adiffer',adiff); run; data _oldobfn; set _oldobfn; call symput('Qvalue',Qold); run; data _adifpar; set _adifpar; call symput('AverPDif',adiffpar); run; data _mdifpar; set _mdifpar; call symput('MaxPDif',mdiffpar); run; %if (&decision = 1) %then %do; /* do number 1 */ proc iml; /* This funny iml proc is necessary */ reset noprint; /* since SAS has problems with */ stopdiff=&stoprule-&adiffer; /* comparison such numbers like */ if stopdiff>0 then stop=1; /* e.g. 5.5E-18 and 0.0001 */ else stop=0; /* in a macro %if statement. */ create _helpset var {stop}; /* For SAS, (5.5E-18<0.0001) */ append from stop; /* is FALSE! */ quit; data _helpset; set _helpset; call symput('stopEM',stop); run; %if %eval(&stopEM=1) /* either EM algorithm with a current A finished */ %then /* or everything finished, it depends on the value of &skipA */ %if %eval(&skipA=1) /* everything finished since the previous A was smaller and the difference between the two Q's is small enough */ %then %do; %let stopmax=%eval(&iterat); %let iterat=%eval(&maxiter+1); %end; %else %do; /* only EM with one current A finished */ %let A=%eval(&A+&ABY); /* increase &A */ %let skipA=1; /* A was changed ==> indicate it using skipA */ %let decision=&DECISBET; /* the next decision will be "between decision" */ %let stoprule=&STOPBET; /* the next stoprule will be "between stoprule" */ %let iterat=%eval(&iterat+1); /* increase &iterat */ %if %eval(&A>&AMAX) %then %do; /* A is already too big ==> finish */ %let stopmax=%eval(&iterat); %let iterat=%eval(&maxiter+1); %end; %end; %else %do; /* the EM algorithm will continue */ %let iterat=%eval(&iterat+1); /* increase &iterat */ %if %eval(&skipA=1) %then %do; /* difference between two Q's for two A's was too big ==> EM algorithm with higher A will continue ==> change everything into within values */ %let skipA=0; %let decision=&DECISWIT; %let stoprule=&STOPWIT; %end; %end; %end; /* end of do number 1 */ %else %do; /* do number 2 */ %if (&decision = 2) %then %do; /* do number 4 */ proc iml; /* This funny iml proc is necessary */ reset noprint; /* since SAS has problems with */ stopdiff=&stoprule-&AverPDif; /* comparison such numbers like */ if stopdiff>0 then stop=1; /* e.g. 5.5E-18 and 0.0001 */ else stop=0; /* in a macro %if statement. */ create _helpset var {stop}; /* For SAS, (5.5E-18<0.0001) */ append from stop; /* is FALSE! */ quit; data _helpset; set _helpset; call symput('stopEM',stop); run; %if %eval(&stopEM=1) /* either EM algorithm with a current A finished */ %then /* or everything finished, it depends on the value of &skipA */ %if %eval(&skipA=1) /* everything finished since the previous A was smaller and the difference between the two Q's is small enough */ %then %do; %let stopmax=%eval(&iterat); %let iterat=%eval(&maxiter+1); %end; %else %do; /* only EM with one current A finished */ %let A=%eval(&A+&ABY); /* increase &A */ %let skipA=1; /* A was changed ==> indicate it using skipA */ %let decision=&DECISBET; /* the next decision will be "between decision" */ %let stoprule=&STOPBET; /* the next stoprule will be "between stoprule" */ %let iterat=%eval(&iterat+1); /* increase &iterat */ %if %eval(&A>&AMAX) %then %do; /* A is already too big ==> finish */ %let stopmax=%eval(&iterat); %let iterat=%eval(&maxiter+1); %end; %end; %else %do; /* the EM algorithm will continue */ %let iterat=%eval(&iterat+1); /* increase &iterat */ %if %eval(&skipA=1) %then %do; /* difference between two Q's for two A's was too big ==> EM algorithm with higher A will continue ==> change everything into within values */ %let skipA=0; %let decision=&DECISWIT; %let stoprule=&STOPWIT; %end; %end; %end; /* end of do number 4 */ %else %do; /* do number 3 */ %if (&decision = 3) %then %do; /* do number 5 */ proc iml; /* This funny iml proc is necessary */ reset noprint; /* since SAS has problems with */ stopdiff=&stoprule-&MaxPDif; /* comparison such numbers like */ if stopdiff>0 then stop=1; /* e.g. 5.5E-18 and 0.0001 */ else stop=0; /* in a macro %if statement. */ create _helpset var {stop}; /* For SAS, (5.5E-18<0.0001) */ append from stop; /* is FALSE! */ quit; data _helpset; set _helpset; call symput('stopEM',stop); run; %if %eval(&stopEM=1) /* either EM algorithm with a current A finished */ %then /* or everything finished, it depends on the value of &skipA */ %if %eval(&skipA=1) /* everything finished since the previous A was smaller and the difference between the two Q's is small enough */ %then %do; %let stopmax=%eval(&iterat); %let iterat=%eval(&maxiter+1); %end; %else %do; /* only EM with one current A finished */ %let A=%eval(&A+&ABY); /* increase &A */ %let skipA=1; /* A was changed ==> indicate it using skipA */ %let decision=&DECISBET; /* the next decision will be "between decision" */ %let stoprule=&STOPBET; /* the next stoprule will be "between stoprule" */ %let iterat=%eval(&iterat+1); /* increase &iterat */ %if %eval(&A>&AMAX) %then %do; /* A is already too big ==> finish */ %let stopmax=%eval(&iterat); %let iterat=%eval(&maxiter+1); %end; %end; %else %do; /* the EM algorithm will continue */ %let iterat=%eval(&iterat+1); /* increase &iterat */ %if %eval(&skipA=1) %then %do; /* difference between two Q's for two A's was too big ==> EM algorithm with higher A will continue ==> change everything into within values */ %let skipA=0; %let decision=&DECISWIT; %let stoprule=&STOPWIT; %end; %end; %end; /* end of do number 5 */ %else %do; /* do number 6 */ %put Bad value of a parameter 'decisbet or deciswit'!; %put You have to choose 1, 2 or 3!; %let stopmax=%eval(&iterat); %let iterat=%eval(&maxiter+1); %let error=1; %end; /* end of do number 6 */ %end; /* end of do number 3 */ %end; /* end of do number 2 */ %end; /* end of %while from the third row of this submacro */ %if %eval(&stopmax=0) %then %put No convergence. Maximum number of iterations was reached.; %else %if %eval(&A>&AMAX) %then %put No convergence. Maximum A was reached.; %else %put Convergence reached after &stopmax iterations.; %mend nextEM; /* ############################################################################# */ /* --- Macro to split estimates of mu into beta's and real mu's. --- */ %macro mubetas; proc iml; reset noprint; use _Muset; read all var {Estimate} into origmus; close _Muset; use _piset; read all into pis; /* population probabilities (vector 1 x G) */ close _piset; random=origmus[&numfix+1:&numfix+(&G*&numran)]; /* only estimates of fixed parts of random effects */ Ig=i(&numran); /* identity matrix */ /* (vector &numfix*&G x 1) */ piig=t(pis)@Ig; betas=t(random)*piig; /* estimates of betas parameters (vector 1 x &numran)*/ Jg=j(&G,1,1); extbeta=Jg@t(betas); /* vector (&numfix*&G) x 1 */ nieuwmu=random-extbeta; /* real mus (&numfix*&G) x 1 */ betas=t(betas); if &numfix>0 then do; fixed=origmus[1:&numfix]; /* only estimates of pure fixed effects */ betas=fixed//betas; end; Effect=t({&fixed &random}); Estimate=betas; create _Betaset var {Effect Estimate}; append; Effect=t({&newrand}); Estimate=nieuwmu; create _reMuset var {Effect Estimate}; append; quit; data _Muset; /* remove unuseful variables from data set _Muset */ set _Muset; keep Effect Estimate; run; %mend mubetas; /* ############################################################################# */ /* --- Macro to compute empirical Bayes estimates of random effects. --- */ %macro EmpBayes; %if %bquote(&EB)= %then %let koe=0; /* macro variable that will never be used */ %else %do; /* first, EB estimates in populations have to be computed */ %put ****************************************************************************; %put ; %put Empirical Bayes estimates of random effects are computed.; %put ; %put ****************************************************************************; ods listing close; proc mixed data=_muldata method=ml noinfo noclprint; class _nieuwid &repeated; model &response = &fixed &newrand / noint; random &random / type=&typerand subject=_nieuwid solution; repeated &repeated / type=&typerep; ods output SolutionR=_EBset; run; ods listing; data _EBset; set _EBset; keep Effect _nieuwid Estimate; /* keep only useful variables */ run; proc freq data=_EBset; /* number of random effects can be found in this way */ tables _nieuwid / out=_EBfreq noprint; run; proc iml; reset noprint; use _EBset; read all var {Estimate} into EBmat; close _EBset; use _uit; read all var {count} into aantal; close _uit; /* aantal is a vector with numbers of observations per subject */ use _weigset; read all into gewicht; close _weigset; /* matrix of type N x g with weights (multiplication factors) */ use _EBfreq; read point 1 var {count} into qrandom; close _EBfreq; /* qrandom is a number of random effects */ individu=nrow(aantal); /* number of subjects */ use _postset; read all into waarpost; /* matrix with posterior probabilities after the last iteration */ close _postset; /* matrix of type N x g */ use _reMuset; read all var {Estimate} into longmu; /* vector (mu1',mu2',...,mug')' */ close _reMuset; /* its type is (g*qrandom) x 1 */ bb=j((individu*qrandom),1,0); /* initial value for vector of EB estimates */ start=1; do j=1 to &G by 1; start2=1; muj=longmu[((j-1)*qrandom+1):(j*qrandom),1]; /* mu of the jth population */ do i=1 to individu by 1; end=start+qrandom-1; /* end of EB for the current subject */ end2=start2+qrandom-1; bbij=EBmat[start:end,1]; bb[start2:end2,1]=bb[start2:end2,1]+waarpost[i,j]*bbij+waarpost[i,j]*muj; start=start+(qrandom*gewicht[i,j]); /* beginning of the next subject */ start2=start2+qrandom; /* beginning of EB for the next subject */ end; end; /* bb is now a long vector with all EB estimates, its type is (N*qrandom) x 1 */ /* create a matrix with N rows from vector bb */ bbmat=t(bb[1:qrandom]); do i=2 to individu by 1; bbmat=bbmat//t(bb[((i-1)*qrandom+1):(i*qrandom)]); end; create _bb1set var {&random}; append from bbmat; quit; data _subvar; set _uit; keep &subject; run; data &EB; merge _subvar _bb1set; run; proc sort data=&EB; by &subject; run; %end; %mend EmpBayes; /* ############################################################################# */ /* --- Macro to compute final likelihood. --- */ %macro Lhood; proc iml; reset noprint; /* --- function to compute a density of multivariate normal distribution --- */ start den(y,mi,Siginv); /* y and mi must be column vectors of the same dimension */ pii=3.141592653589; dim=nrow(Siginv); dSiginv=det(Siginv); scit1=-(dim/2)*log(2*pii); scit2=0.5*log(dSiginv); scit3=-0.5*(t(y-mi)*Siginv*(y-mi)); hulp=scit1+scit2+scit3; value=exp(hulp); return(value); finish den; use _InvVset; read all into InvVmat; close _InvVset; use _Meanset; read all var {_nieuwid _popul &response Pred} into Meanmat; close _Meanset; use _uit; read all var {count} into aantal; close _uit; /* aantal is a vector with numbers of observations per subject */ use _weigset; read all into gewicht; close _weigset; /* matrix of type N x g with weights */ use _piset; read all into pis; close _piset; /* 1 x g vector of population probabilities */ individu=nrow(aantal); /* number of subjects */ contrib=j(individu,&G,0); /* initial value for matrix with f_{i,j}(y_i|gamma) */ start=1; do j=1 to &G by 1; do i=1 to individu by 1; end=start+aantal[i]-1; /* end of the current subject */ muij=Meanmat[start:end,4]; obsij=Meanmat[start:end,3]; Vinvij=InvVmat[start:end,3:(3+end-start)]; contrib[i,j]=den(obsij,muij,Vinvij); start=start+(aantal[i]*gewicht[i,j]); /* beginning of the next subject */ end; end; prodfpi=contrib*t(pis); /* N x 1 vector with \sum_{j=1}^g p_j f_{i,j}(y_i|gamma) */ logprod=log(prodfpi); loglik=sum(logprod); /* log-likelihood */ likhood=exp(loglik); /* likelihood */ hulp=likhood||loglik; create _Lhood var {L LogL}; /* data set with just only one observation and one variable containing the value of the second part of the objective function */ append from hulp; quit; %mend Lhood; /* ########################################################################### */ /*** --- Prepare and print output --- ***/ %macro output; data _subvar; set _uit; keep &subject; run; data _postout; merge _subvar _postset; run; proc sort data=_postout; by &subject; run; data _history; set _history; rename QDIFF=QDiff AVERPDIF=AverPDif MAXPDIF=MaxPDif; run; data _Lhood; set _Lhood; rename L=_Likelihood_ LogL=_Log_Likelihood_; _2Log_Likelihood_=2*LogL; run; data _Betaset; set _Betaset; rename EFFECT=Effect ESTIMATE=Estimate; run; data _reMuset; set _reMuset; rename EFFECT=Effect ESTIMATE=Estimate; run; /* create the data set with final posterior probabilities if the user wants it */ %if %bquote(&endpost)= %then %do; data _postout; set _postout; run; %end; %else %do; data &endpost; set _postout; run; %end; /* split the data set with all covariance parameters into two sets: for D matrix and for Sigma_i matrix */ proc freq data=_CovPset; tables Subject / out=_Covfreq noprint; run; proc iml; reset noprint; use _Covfreq; read point 1 var {Count} into sigmanum; /* number of parameters of Sigma_i matrix */ read point 2 var {Count} into Dnumber; /* number of parameters of D matrix */ close _Covfreq; use _CovPset; read all var {CovParm} into CPSparam; read all var {Estimate} into CPSest; close _CovPset; Dest=CPSest[1:Dnumber,1]; Sest=CPSest[(Dnumber+1):(Dnumber+sigmanum),1]; /* estimates */ Dparam=CPSparam[1:Dnumber,1]; Sparam=CPSparam[(Dnumber+1):(Dnumber+sigmanum),1]; /* parameter labels */ create _Dest1 var {CovParm}; append from Dparam; create _Dest2 var {Estimate}; append from Dest; create _Sest1 var {CovParm}; append from Sparam; create _Sest2 var {Estimate}; append from Sest; quit; data _Destset; merge _Dest1 _Dest2; run; /* estimates of parameters of D matrix */ data _Sestset; merge _Sest1 _Sest2; run; /* estimates of parameters of Sigma_i matrix */ data _Sestset; set _Sestset; rename COVPARM=CovParm ESTIMATE=Estimate; run; data _Destset; set _Destset; rename COVPARM=CovParm ESTIMATE=Estimate; run; title1 'The SAS System'; title2 ' '; title3 'The HetMixed Macro'; proc print data=_history; title4 'Iteration History'; run; proc print data=_Lhood; title4 'Final Likelihood and Log-Likelihood'; run; proc print data=_piset; title4 'Estimates of Component Probabilities'; run; proc print data=_Muset; title4 'Overall Estimates of Component Means'; run; proc print data=_Betaset; title4 'Estimates of Beta Parameters'; run; proc print data=_reMuset; title4 'Estimates of Mu Parameters'; run; proc print data=_Sestset; title4 'Estimates of Elements of Sigma Matrices'; proc print data=_Destset; title4 'Estimates of Elements of D Matrix'; run; title; title1 'The SAS System'; %mend output; /********************************************************************************/ /*** SubSubMacros ***/ /********************************************************************************/ /* ############################################################################ */ /* --- Two macros for computing of posterior probabilities, updated estimates of probabilities of the mixture (pi) and updated estimates of all other estimates (beta, mu, D matrix, Sigma matrix). --- */ /* --- Macro for computation of updated estimates of mixture probabilities (pi). --- */ /* _postset: data set with &individu observations and &g variables containing posterior probabilities. _piset: data set with one observation and &g variables. It will contain updated estimates of pi. */ %macro piest; proc iml; reset noprint; use _postset; read all into waarpost; close _postset; gg=ncol(waarpost); /* number of populations */ nn=nrow(waarpost); /* number of subjects */ eenvec=j(1,nn,1); /* matrix of type 1 x nn with all ones */ waarpi=(1/nn)#(eenvec*waarpost); /* waarpi is a matrix of type 1 x gg */ use _piset; labelpi=contents(_piset); close _piset; create _piset var labelpi; append from waarpi; /* --- compute the first part of objective function --- */ eenvec2=j(nn,1,1); pibig=eenvec2@waarpi; /* N x g matrix with rows equal to waarpi */ lpibig=log(pibig); hulp=waarpost#lpibig; objfun1=sum(hulp); /* the second part of objective function */ create _objfun1 var {Q1}; /* data set with just only one observation and one variable containing the value of the first part of the objective function */ append from objfun1; quit; %mend piest; /* ######################################################################## */ /* --- Macro for computing of updated estimates of beta, mu, D, Sigma. --- */ %macro gammaest; ods listing close; proc mixed data=_muldata method=ml maxiter=&mixedmi noinfo noclprint; class _nieuwid &repeated; model &response = &fixed &newrand / noint solution outpm=_Meanset; random &random / type=&typerand subject=_nieuwid vi=1 to 32767; /* 32767 is a maximum possible value in SAS v. 8.0 */ repeated &repeated / type=&typerep; id _nieuwid _popul &response; ods output InvV=_InvVset; ods output SolutionF=_Muset; ods output CovParms=_CovPset; run; ods listing; /* --- Compute the second part of the objective function --- --- and new posterior probabilities. --- */ proc iml; reset noprint; /* --- function to compute a log-density of multivariate normal distribution --- */ start logden(y,mi,Siginv); /* y and mi must be column vectors of the same dimension */ pii=3.141592653589; dim=nrow(Siginv); dSiginv=det(Siginv); scit1=-(dim/2)*log(2*pii); scit2=0.5*log(dSiginv); scit3=-0.5*(t(y-mi)*Siginv*(y-mi)); value=scit1+scit2+scit3; return(value); finish logden; use _InvVset; read all into InvVmat; close _InvVset; use _Meanset; read all var {_nieuwid _popul &response Pred} into Meanmat; close _Meanset; use _uit; read all var {count} into aantal; close _uit; /* aantal is a vector with numbers of observations per subject */ use _weigset; read all into gewicht; close _weigset; /* matrix of type N x g with weights (multiplication factors) */ individu=nrow(aantal); /* number of subjects */ use _postset; read all into waarpost; /* matrix with posterior probabilities, it will be overwritten */ labelpos=contents(_postset); close _postset; contrib=j(individu,&G,0); /* initial value for matrix with log(f_{i,j}(y_i|gamma)) */ start=1; do j=1 to &G by 1; do i=1 to individu by 1; end=start+aantal[i]-1; /* end of the current subject */ muij=Meanmat[start:end,4]; obsij=Meanmat[start:end,3]; Vinvij=InvVmat[start:end,3:(3+end-start)]; contrib[i,j]=logden(obsij,muij,Vinvij); start=start+(aantal[i]*gewicht[i,j]); /* beginning of the next subject */ end; end; hulp=waarpost#contrib; objfun2=sum(hulp); /* the second part of objective function */ create _objfun2 var {Q2}; /* data set with just only one observation and one variable containing the value of the second part of the objective function */ append from objfun2; /* new posterior probabilities */ use _piset; read all into waarpi; /* row vector with probabilities pi_j */ close _piset; valden=exp(contrib); /* matrix with f_{i,j}(y_i|gamma) */ denom=valden*t(waarpi); /* N x 1 vector with denominators for posterior probabilities */ eenvec1=j(1,&G,1); eenvec2=j(individu,1,1); denombig=eenvec1@denom; /* N x g matrix with columns equal to denom */ pibig=eenvec2@waarpi; /* N x g matrix with rows equal to waarpi */ lpibig=log(pibig); ldenobig=log(denombig); lposter=lpibig+contrib-ldenobig; waarpost=exp(lposter); create _postset var labelpos; append from waarpost; quit; %mend gammaest; /* ############################################################################# */ /* --- Macro for "multiplying" the data set. --- */ /* Multiplied data set will be fed into the proc mixed to produce estimates of beta, mu, D and Sigma parameters (the second part of M step). */ /* This macro also produces new variables - interactions between random effects and population indicators to be able to estimate mu_1:mu_g. */ /* A macro variable NEWRAND is created: string with new random effects (interactions between original random effects and population indicators). */ %macro multdata; proc iml; reset noprint; /* This is a function that rounds to integers and possible resulting zeros change to one. /* Function to round to integers and to change resulting zeros to ones. */ start roundone(M); helpmm1=int(M); helpmm2=M-helpmm1; if ((helpmm2 >= 0.5)|(helpmm1=0)) then helpmm1=helpmm1+1; return(helpmm1); finish roundone; /* Function to change argument M to the nearest HIGHER integer. start roundone(M); helpmm1=int(M)+1; return(helpmm1); finish roundone; */ use _postset; read all into postmat; close _postset; multmat=apply("roundone",&A#postmat); /* matrix of type N x g with values which the observations will be multiplied by */ use _weigset; weighlab=t(contents(_weigset)); close _weigset; create _weigset var weighlab; append from multmat; /* data set with weights is created */ use _usedata; read all into origdata; etiketen=t(contents(_usedata)//{'_popul'}//{'_nieuwid'}); /* variable names for &outdata - row vector */ close _usedata; use _uit; read all var {count} into aantal; close _uit; /* aantal is a vector with numbers of observations per subject */ individu=nrow(aantal); /* number of subjects */ /* the second column from the end of matrix newdata will contain the number of population to that the appropriate part of the matrix belongs, the last column will contain new subject identification numbers that will be used later in proc mixed */ newdata=j(1,ncol(origdata)+2,999); /* the first row of the matrix where multiplied data will be stored, just only for initialization, this row will be removed later */ idtouse=1; /* the first new identification number */ do popul=1 to &G by 1; /* do loop along the populations */ pointer=1; /* the row in the origdata matrix where observations of the first subject start */ do persoon=1 to individu by 1; /* do loop along the subjects */ pomocm1=j(multmat[persoon,popul],1,1); /* column vector of all ones of length equal to the multiplication factor */ popvec=j(aantal[persoon,1],1,popul); /* vector for variable _popul */ pomocm2=origdata[pointer:(pointer+aantal[persoon,1]-1),]||popvec; pomocm3=pomocm1@pomocm2; /* direct product (data multiplication) */ hheellpp=t(do(idtouse,idtouse+multmat[persoon,popul]-1,1)); idvec=hheellpp@j(aantal[persoon,1],1,1); /* vector for variable _nieuwid */ pomocm3=pomocm3||idvec; newdata=newdata//pomocm3; /* add newly multiplied observations to the matrix with previously multiplied observations */ pointer=pointer+aantal[persoon,1]; /* the row in the origdata matrix where observations of the next subject start */ idtouse=idtouse+multmat[persoon,popul]; /* the next new id */ end; end; cijfer=nrow(newdata); /* number of observations in the multiplied data set plus one (the first initial row) */ newdata=newdata[2:cijfer,]; /* remove the first row with all 999 from the newdata matrix */ /* transform the matrix newdata into data set _muldata */ /* the data set _muldata contains first all observations for the first population then all observations for the second population etc. */ create _muldata var etiketen; append from newdata; /* add interactions between random effects and populations */ use _muldata; etikrand={&random}; read all into Mdatamat; read all var etikrand into Zmatr; read all var {'_popul'} into Popmatr; popisky=contents(_muldata); close _muldata; etikrnlg={&newrand}; /* macro var. &newrand was created by macro %addnumb */ Zmatr2=Popmatr; /*Zmatr2 contains interactions between random effects and population indicators */ do popix=1 to &G by 1; helpm=Zmatr#(Popmatr=popix); Zmatr2=Zmatr2||helpm; end; Zmatr2=Zmatr2[,2:ncol(Zmatr2)]; /* remove the first column with variable _popul */ /* --- add matrix with interactions --- */ Mdatamat=Mdatamat||Zmatr2; Infoetik=t(popisky)||etikrnlg; /* --- put the content of matrix Infomat into the data set _muldata --- */ create _muldata var Infoetik; append from Mdatamat; quit; %mend multdata;