2. COMPARISONS OF MAXIMUM LIKELIHOOD AND BIASED ESTIMATION METHODS USING
GENERALIZED LINER MODELS
M.Phil. Scholar: Syeda Salma Kazmi
Supervised by: Dr. Atif Abbasi
Department of Statistics
King Abdullah Campus Chatter Kalas
The University of Azad Jammu and Kashmir
03/04/2025 2
4. INTRODUCTION
The normal linear models are based on certain assumptions such as; the mean of Y
(the explained variable) should be a linear combination of the explanatory variables X’s.
Additionally, with constant variance the distribution of Y is presumed to be normal.
However, there are numerous experimental conditions in which the linearity and or the
normality of the explained Y may not be completely applicable. For instance, a discrete
random variable is a binary with two possible outcomes, success, or failure
GLMs offer a method of combining several different statistical models, such as
linear regression, binary regression, Poisson regression, gamma, and negative binomial
regressions, etc. GLMs can handle both discrete and continuous responses, and the
standard assumptions of normality and homoscedasticity are not always made on the
concerned explained variable.
03/04/2025 4
5. CONTINUE
Generalized linear model" is the phrase GLM is a larger class of models that
McCullagh and Nelder (1982, 2nd edition 1989) proposed. In this model, the
response variable is anticipated to adhere to an exponential family distribution,
where the mean is represented by This distribution is made up of specific (often
nonlinear) functions of , which some would refer to as "nonlinear." Although the
covariates have a history of being nonlinear, McCullagh and Nelder reflect the
covariates as linear because the variables only have an effect on the distribution of
yi through the linear combination .
03/04/2025 5
6. 03/04/2025 6
COMPONENTS OF GLMs
Random component
The random component represents the conditional distribution of
the dependent variable (for the i-th of n independently sampled
observations), given the values of the independent variables in the model.
The distribution of Yi, according to Nelder and Wedderburn's
original formulation, is a member of an exponential family, such as the
binomial, normal, Gamma, Poisson, or Inverse-Gaussian distributions
etc. The probability density function of exponential family distributions
is commonly represented as
Systematic component
The linear predictor is constructed as a linear combination of
explanatory factors.
which relates, η to a set of independent variables, X1 ,X2 ,…X k
7. 03/04/2025 7
Link Function
A flat and invertible linearizing link function g(. ), which
converts the expected value of response variable's µi = E(Yi)
to the linear predictor (John, 2015):
where β is a q=p×1 vector of unidentified parameters and
Xi = (x1 ,x2 ,...xk) ′ is the vector of explanatory variables.The
association between the linear predictor and the expected value
of the random component is described by link function as
9. OBJECTIVES
The undertaking research is being conducted to
Estimate the parameters of the GLMs by considering binomial and Poisson
regression models.
Compare the maximum likelihood estimator with biased estimators like Ridge
and Principal component regression estimators.
Explore the effects of multicollinearity on the performance of the estimators.
03/04/2025 9
10. MATERIALS AND METHODS
Various method are described which are used in GLMs for estimating the
parameters.
Maximum Likelihood Estimation
Newton Raphson method
Method of Fisher’s Scoring
BIASED ESTIMATION METHODS
Ridge Regression
Generalizing Principal Component Regression
Mean squares error
03/04/2025 10
11. 03/04/2025 11
Non-Linear Regression Models.
Non-Linear Regression Models.
we describe non-linear regression models such as Poisson and binomial
regression models.
Binomial Regression
It is assumed that p(x) is solely dependent on the regressors' x values via a
linear combination of β′x, where β is an unknown.
Pdf of binomial distribution is
n is number of trails, p is the probability of success.
Poisson Regression
The rate parameter is thought to depend on the regression with the linear
predictors β′(x) through the link function when the data are to be treated as
Poisson counts.
Pdf of Poisson distribution is
12. Method for detecting multicolinearity
Multicolinearity
Frisch (1934) was first introduced the term multicollinearity. The full rank condition in
a multiple linear regression model denotes the independence of the regressors. In the event that
this assumption is incorrect, multicollinearity becomes a problem and the least squares
estimation breaks down. Multicollinearity is the high degree of correlation between many
independent variables.
There are many criteria for the detection of multicollinearity in GLMs two of those are
mentioned in the following section.
Condition number
In mathematical analysis, a function's condition number (CN) indicates how much a little
change in the input argument can change the function's output significance.
The high CN value indicated that multicollinearity is an issue. CN has the following
mathematical definition:
If k 30 then there exists a multicollinearity problem.
˃
Variance inflation factor
For the detection of multicollinearity one method is to calculate variance inflation factor
(VIF) for each explanatory variable, value of VIF greater than 1.5 shows multicollinearity. The
range of VIF signifies level of multicollinearity. The value of VIF ‘1’ is non-collinear, which is
considering being negligible. Values of VIF ranging between 1 and 5 are moderate. It shows the
medium level of collinearity. Values of VIF greater than 5 are highly collinear.
03/04/2025 12
13. 03/04/2025 13
Example:Apple juice data
4.1. Numerical Example 1: Binomial Regression (Apple Juice data)
• In this section we explain the use of the maximum likelihood (ML),
principal component regression (PCR) and ridge estimators to a real-life
data set. To fit a binomial regression model we considered an apple juice
data which has been used by Pena et al. (2011) and Özkale (2016). There
are four explanatory variables in this data set which are pH (x1), nisin
concentration (IU/ml) (x2), incubation temperature (C) (x3), and soluble
solids concentration (Brix) (x4). The response variable is the amount of
Alicyclobacillus acidoterrestris growing in apple juice, where 1 indicates
growth and 0 indicates no growth. Prior to calculating the results, we
standardized the independent variables and subsequently incorporated the
intercept term into the model. Then, the logistic regression model is
• where denotes the i-th observation for the j-th explanatory variable and
The eigenvalues of the matrix are obtained as λ1 = 4.2143, λ2 = 0.1774, λ3
= 0.1145, λ4 = 0.0718 and λ5 = 0.0303
•
14. 03/04/2025 14
• . The condition number (CN) is computed as . As, the value of condition number is very large which
indicates that there is a multicollinearity issue within this dataset. First, we obtain the ML estimator by
an iterative procedure. For iterative ML algorithms, we generally choose (sufficiently close to zero) as
the convergence criterion. The iteration ends if the norm of the difference in the parameter estimates
between iterations is less than , such as where m represents the iteration step. The ordinary least square
(OLS) estimator is considered as an initial estimate having the values; the initial working response
variable is defined as and is the initial weight matrix computed as The ridge regression parameter is
computed by following Abbasi and Özkale (2021) such as which gives 8.0874 other value of k is also
selected randomly such as . For choosing the amount of principal components (PCs) we used the
percentage of total variation (PTV) method. The PTV method is defined as follows:
• where r denotes the amount of PCs retained in the model. In this example the amount of PCs retain in
the model is r =2.
• The results are shown in the following Table 4.1 shows the results of iteratively obtain estimators along
with the SMSE values. It is seen that ridge estimators acquire the smallest SMSE as compared to ML
and PCR estimators for k1. Then, PCR has smaller SMSE value. While the SMSE value of ML
estimator is largest, which shows that multicollinearity affects the performance of ML estimator. The
results show that the presentation of the ridge estimator is best amongst all other estimators to
encounter the multicollinearity problem for k1. However, for k2 the PCR estimator has smaller SMSE
value as compared to ridge estimator. Thus, the ridge estimator performs better for large value of k
whereas for small k value PCR performs better than ridge estimator. Table 4.1 shows the results only
for two values of k while figure 4.1 is given to assess the performance of the estimators for remaining
values of k. From figure 4.1 it is clear when the k values fall below approximately 0.14 then PCR
estimator performs better than ridge estimator. However, when k is greater than approximately 0.14
then ridge estimator outperforms its counterparts.
15. Table 4.1 Iteratively obtain estimators and their SMSE values for k1 and k2 Binomial
response (Apple Juice data)
ML Ridge PCR Ridge
Coefficients =8.0874 =8.0874
-1.3159 -0.3187 38.5127 -1.2817
8.9941 0.1184 -7.1565 8.7376
-10.7939 -0.1598 -9.0721 -10.4668
6.1903 0.0671 -3.1447 5.9583
-5.8053 -0.0911 -6.2584 -5.6525
SMSE 61.4822 4.2419 10.0057 58.8241
03/04/2025 15
16. Figure 4.1: SMSE values of the estimators for different k
values
03/04/2025 16
17. Table 4.2 Iteratively obtain estimators and their SMSE values for k1 and k2.
ML Ridge PCR Ridge
Coefficients =0.07776 =0.255
0.1262 0.1450 0.92500 0.1811
1.5576 1.5226 0.1673 1.4448
2.6709 2.5805 0.3033 2.4004
-1.4157 -1.3522 -0.1210 -1.2281
3.8847 3.0314 25.8305 2.5819
SMSE 0.1262 0.1450 0.92500 0.1811
03/04/2025 17
18. Figure 4.2: SMSE values of the estimators for different k
values
03/04/2025 18
25. Summary and Conclusions
This study is aiming to give a smooth approach what to do when facing
the problem of multicollinearity. Specifically in generalized linear
models where the response variable may not be normally distributed.
For the detection of this problem two methods were discussed in this
study that are condition number (CN) and variance inflation factor
(VIF) which suggest the level of multicollinearity. This problem can
overcome by using biased estimation methods. These methods are ridge,
PCR and ML estimation.
The study includes two non-linear regression models for the estimation
of parameters that are Poisson and binomial regression models. For ML
estimation Iterative reweighted least square technique is used. Two
iterative procedures Newton-Raphson and Fisher’s scoring are used for
estimation.
03/04/2025 25
26. Continue
Monte Carlo Simulation experiment also used in the study for
binomial and Poisson response for different sizes and different
independent variables.The performance evaluation criteria of this
study are the expected mean square error (EMSE).
The results show that the ridge estimator obtains the smallest SMSE
as compared to ML and PCR estimators for numerical examples as
well as for the simulation studies. It is conclude that Ridge estimator
is the best amongst three for large value of k, while for smaller value
of k PCR performs better.
03/04/2025 26
27. REFERENCES
Abbasi, A., & Özkale, R. (2021). The r-k class estimator in generalized linear models
applicable with simulation and empirical study using a Poisson and Gamma
responses. Hacettepe Journal of Mathematics and Statistics, 50(2), 594-611..
Abdulkabir, M., Edem, U., Tunde, R., & Kemi, B. (2015). An empirical study of
generalized linear model for count data. Journal of Applied and Computational
Mathematics, 4, 253.
Agresti, A. (2015). Foundations of linear and generalized linear models. John Wiley
& Sons.
Akay, K. U., & Ertan, E. (2022). A new improved Liu-type estimator for Poisson
regression models. Hacettepe Journal of Mathematics and Statistics, 1-20.
Cessie, S. L., & Houwelingen, J. V. (1992). Ridge estimators in logistic
regression. Journal of the Royal Statistical Society Series C: Applied Statistics, 41(1),
191-201.
Ertan, E., & Akay, K. U. (2022). A new Liu-type estimator in binary logistic
regression models. Communications in Statistics-Theory and Methods, 51(13), 4370-
4394.
03/04/2025 27
28. Fox, J. (2015). Applied regression analysis and generalized linear models.
Sage Publications.
Hoerl, A. E., & Kennard, R. W. (1970). Ridge regression: Biased
estimation for nonorthogonal problems. Technometrics, 12(1), 55-
67.
Hubert, M. H., & Wijekoon, P. (2006). Improvement of Liu estimator in
linear regression model . Statistical papers, 47(3), 471-479.
Hussein, S. M., & Yousaf, H. M. (2015). A comparisons among some
biased estimators in generalized linear regression model in present
of multicolinearity. Al-Qadisiyah Journal For Administrative and
Economic sciences, 17(2).
Kurtoglu, F., & Özkale, M. R. (2016). Liu estimation in generalized linear
models : application on gamma distributed response variable.
Statistical papers, 57(4), 911-928.
03/04/2025 28
29. Mackinnon, M. J., & Puterman, M. L. (1989). Collinearity in generalized linear models.
Communications in statistics-theory and methods, 18(9), 3463- 3472.
McCullagh, P. (1973). Nelder. ja (1989), generalized linear models. CRC
Monographs on Statistics & Applied Probability, Springer Verlag, New York, 81.
McDonald, G. C., & Galarneau, D. I. (1975). A Monte Carlo evaluation of some ridge-
type estimators. Journal of the American Statistical Association, 70(350), 407-
416.
Nelder, J. A., & Wedderburn, R. W. (1972). Generalized linear models. Journal of the
Royal Statistical Society: Series A (General), 135(3), 370-384.
Sellers, K. F., &Shmueli, G. (2010). A flexible regression model for count data. The
Annals of Applied Statistics, 943-961.
Smith, E. P., & Marx, B. D. (1990). Ill conditioned information matrices,
‐
generalized linear models and estimation of the effects of acid rain. Environmetrics,
1(1), 57-71.
Weissfeld L.A., and Sereika S. M., A multicollinearity diagnostic for generalized
linear models. Commun. Stat. Theory Methods. 20:1183-1198,1991.
03/04/2025 29