SlideShare a Scribd company logo
Advanced Methods of Statistical Analysis used in Animal Breeding.
Advanced Methods of Statistical Analysis used in Animal Breeding.
Advanced Methods of Statistical Analysis used in Animal Breeding.
Gathering or obtaining the desired information under study.
 Primary Source
 Data is collected by researcher himself
 can be collected by using experiments, surveys, questionnaires,
interviews, and observations.
 Secondary Source
 comes from resources that have already been published.
 Data collected, compiled or written by other researchers eg. books,
journals, newspapers
 Any reference must be acknowledged
DATA
Variables : Any measurable characteristics or quantity which can assume
a range of numerical values within certain limits, i.e. income, Height, age,
weight ,price etc.
• A discrete variable may assume only a
countable number of values: intermediate
values are not meaningful.
• Mastitis, Disease
Discrete
• A continuous variable may assume any real
value within some range. Takes
fractional or integral values.
• Milk yield ,fat yield,Body wt etc
Continuous
Advanced Methods of Statistical Analysis used in Animal Breeding.
A mathematical model is an equation or a set of equations which
represents the behavior of a system.
• Linear Model : Unit increase in independent variable cause a proportionate
increase dependent variable.
• Y= β0+ β1X + β2X2 + e
 A linear model will exactly spell out which effects are affecting which observation
and the different effects (such as breed and feeding regime) are estimated
simultaneously and during this process they are corrected for each other.
 Linear models are the most common type of statistical models used in animal
breeding to predict breeding values based on phenotypic observations.
• Non-linear Model : If one or all the parameter of a model are not appear
• linearly, the model is known as nonlinear model.
• Y= a Xb e(-cx) + e
3/11/2016
MODEL
The model usually consists of factors.
• Discrete factors or class variables such as sex, year, herd
• Continuous factors or covariables such as age
Model Contains 3 components :
 Predictor-Dependent Variable
 Predictant- Independent Variable
 Error term
Model
Types of Analysis :
(A)Univariate Analysis : we group the individuals on the basis of single
performance.
When we use one variable to describe a person, place, or thing.
(B)Bivariate Analysis : When we use two variable .
(C)Multivariate Analysis : When we use more than two variables.
3/11/2016
Univariate Analysis :
1. Linear Regression Model
2. Least square model
a. Random effect Model- Heritability,Repeatabilty estimation.
b. Fixed effect Model-
c. Mixed effect Model- BLUP
3/11/2016
Linear Regression Model :
Where Yi = Dependent Variable
Xi = Independent Variable
β0 & β1 remains fixed ,we can’t found them exactly.
e= Error term
The principle of estimation of regression coefficient is based on Least Square
Analysis.
3/11/2016
Yi = β0+ β1Xi + e
Advanced Methods of Statistical Analysis used in Animal Breeding.
RandomEffects
 Effects which have levels that are considered to be drawn from an
infinite large population of levels.
 Animal effects are often random.
 In repeated experiments there maybe other animals drawn from
the population. e,g. Sire ,Dam effect
FixedEffects
Effects for which the defined classes comprise all the possible levels
of interest, e.g. Herd , Season ,Year effect.
Effects can be considered as fixed when the number of levels are
relatively small and is confined to this number after repeated
sampling.
Predictors- Used for Estimation of Random effect.
Estimators- Used for Estimation of Fixed effect.
Principles of Least Square Analysis :
The Square of difference between observed and estimated/ predicted value of
dependent variable must be least or Zero.
3/11/2016
𝒀= b0+ b1Xi + e then
(𝒀𝒊 − 𝒀) 𝟐 ≅ ε 𝟐 ≅ 0
Random Effect Model :
Where , Yij = jth observation of ith Sire
μ= General mean effect
Si = Effect of ith sire
eij = Error term
3/11/2016
Yij = μ+ Si + eij
Fixed Effect :
Where , Yij = jth observation of ith Herd
μ= General mean effect
Fi = Effect of ith herd
eij = Error term
3/11/2016
Yij = μ+ Fi + eij
 To achieve this ‘mixed models’ are used in which fixed effects
and breeding values (indicated as ‘random effects’) will be
estimated jointly.
3/11/2016
Mixed Effect Models
Where , Yijk = kth observation in ith farm of jth Sire
μ= General mean effect
Sj = Effect of jth sire in ith farm
Fi = ith farm effect
eijk = Error term
Yijk=μ+ Fi+Sj + eijk
Interaction effect :
Yijk=μ+ Si+Fj + (SF)ij + eijk
BLUP estimation of breeding values is based on a mixed model, which is
a linear basis of Best Linear Unbiased Prediction.
BLUP
 Best in the sense that they have minimum mean squared error
within the class of linear unbiased estimators; and predictors to
distinguish them from estimators of fixed effects.
 BLUP estimates of the realized values of the random variables u
are linear in the sense that they are linear functions of the data,
y;
 The Best prediction is that which minimises the prediction error.
 Unbiased in the sense that the average value of the estimate is
equal to the average value of the quantity being estimated;
(G.K Robinson,1991)
3/11/2016
Maximizes the correlation between true and
estimated value of effects by minimizing the error variance.
The factors for which estimates are required
linear functions of the observations.
Estimates of fixed effects and estimable
functions are such that E(T) = 𝜃.
3/11/2016
3 different kinds of BLUP, (Henderson, 1973 )
Henderson model-1
Henderson model-2
Henderson model-3
Only Random Effect
Random + Fixed +
Interaction Effect
Both Random & Fixed
Effect
The general linear model equation in matrix form is
Y = Xβ + Zu + e
Where ,
Y is an n × 1 vector of n observed records
Xis a known incidence matrix of order n × p, which relates the records in y
to the fixed effects in b
β is a p × 1 vector of p levels of fixed effects (to be estimated )
Zis a known incidence matrix of order n × q, which relates the records in y
to the random effects in u
uis a q × 1 vector of q levels of random effects such as individual genetic value
(to be estimated )
eis an n × 1 vector of random, residual terms
3/11/2016
Expectations and Variance Covariance (VCV) Matrices
In general the expectation of y is
> which is also known as the 1st moment. The 2nd moments describe the
variance covariance structure of y:
 G is a dispersion matrix for random effects other than errors ,
 R is the dispersion matrix of error terms, for which both are general square
matrices assumed to be non-singular and positive definite, with elements
that are assumed known.
We usually write V = ZGZT + R
3/11/2016
Estimating fixed Effects & Predicting Random
Effects :-
 For a mixed model, y, X, and Z , 𝛽, u, R, and G are generally
unknown
Two complementary estimation issues
 Estimation of 𝜷 and u
 Estimation of fixed effects,
BLUE = Best Linear Unbiased Estimator
 Prediction of random effects
BLUP = Best Linear Unbiased Predictor
3/11/2016
𝜷= (XT 𝑽−𝟏
X) -1 XT 𝑽−𝟏
y
𝒖= GZT 𝑽−𝟏 ( y-X 𝜷 )
• The BLUP eliminates the non genetic biases in estimating Breeding Value.
• It also removes the genetic biases taking in to account the effects of non-
random mating , genetic merit of Dams and selection.
3/11/2016
3/11/2016
Advantages:
– Handles unbalanced designs
– Uses information for all relatives measured to improve estimates
BLUP can be used to estimate a variety of genetic values
– GCA, SCA, line values (i.e., genotypic values of
pure lines)
– One can also use BLUP to estimate environmental effects, G x E
REML = Restricted Maximum Likelihood.
Standard ML variance estimation assumes fixed factors are known
without error.
 REML is an approach that produces unbiased estimators for these special
cases and produces less biased estimates than ML in general.
 Depending on whom you ask, REML stands for Residual Maximun
Likelihood or Restricted Maximum Likelihood
3/11/2016
REML Variance Component Estimation
 variance components by REML are estimated based on residuals
calculated after fitting by ordinary least squares from fixed effects part of
the model.
 It Maximizes a marginal likelihood function.
 So it is also called Residual Maximun Likelihood or Marginal Maximun
Likelihood.
 For linear mixed effects models, the REML estimators of variance
components produce the same estimates as the unbiased ANOVA-based
estimators formed by taking appropriate linear combinations of mean
squares when the latter are positive and data are balanced.
3/11/2016
3/11/2016
Advanced Methods of Statistical Analysis used in Animal Breeding.
ESTIMATION
3/11/2016
Before getting onto iterative algorithms, it is helpful to review the difference
between the log-likelihood function l used to calculate maximum likelihood
estimates, and that (𝒍 𝟐) used for REML:
 The term log (𝑋 𝑇
𝐻−1
𝑋)makes the adjustment for degrees of freedom used in
estimating treatment effects, so that REML estimates of variance components
are less biased than ML estimates.
 The other major differences are:
 𝒍 𝟐 is not a function of the fixed effects 𝜏
 The constant in 𝒍 𝟐 is a function of the fixed design matrix X
3/11/2016
ML vrs REML
3/11/2016
ML vrs REML
 ML estimates are biased because no account is taken of degree of
freedom in estimating the variance components.
 REML takes care of bias in estimates as well as avoids –ve estimates
of component variance .
(Searle et al, 1992 )
Genetic Evaluation
REML and BLUP applied to multi trait mixed models have become the standard
method for genetic evaluation in all terrestrial animal species.
The main benefits of using this methodology include:
(1) Increasing accuracy of selection;
(2) Managing accumulation of inbreeding;
(3) estimating genetic trend without a control;
(4) the possibility of conducting large scale genetic evaluation across
populations.
N.H. Nguyen and R.W. Ponzoni Vol. 29 No. 3 & 4 Jul-Dec 2006
3/11/2016
Bivariate Analysis :
•Tests of statistical significance.
•Chi-square.
3/11/2016
Multivariate analysis consists of a collection of methods
that can be used when several measurements are made
on each individual or object in one or more samples.
3/11/2016
Types :
1. Multivariate Regression
2.Discriminant Analysis
3. Principal Component Analysis
4. Genetic Divergence Analysis
5. Canonical Variate Analysis
3/11/2016
Yi =β0+ β1X1+β2X2+…………… βnXn+ ei
Yi=β0+ i=1
n
βiXi+ ei
When the number of population is more than one and each animal in the
population has multiple characters.
3/11/2016
1. Multivariate Regression :
2. Discriminant Analysis :
• Purpose : To find out the Discriminant Function (D) that increases the
differences among populations by minimising the variances within the
population and maximising the mean differences between the
populations with respect of characters
• D=λ1 d1 + λ2 d2+ λ3 d3
• = ∑ λi di
Where , di = ith mean difference of the populations in relation to the
character,
λi = weighting coefficient attached to the difference.
3/11/2016
A Linear combinations of independent characters are involved to maximise the
variance accounted for in the original set of characters.
• Z1=a1 x1 + a2 x2+ a3 x3+ a4 x4
• Z2=b1 x1 + b2 x2+ b3 x3+ b4 x4
• Z3=c1 x1 + c2 x2+ c3 x3+ c4 x4
• Z4=d1 x1 + d2 x2+ d3 x3+ d4 x4
ai , bi , ci , di are relative weighting factor attached to each character.
∑ai
2= ∑bi
2= ∑ci
2 =∑di
2=1
 It is mainly confined to single population
Principal Component shows highest variance - 1st Principal Component
Principal Component shows next highest variance – 2nd Principal Component .
3/11/2016
3. Principal Component (Z)
Analysis :
First time this analysis In India was reported by Dhara and
Chakravarty (1996) in large animals for predicting the
breeding value of milk production on the basis of selected
number of principal components.
4. Genetic Divergence Analysis / Genetic distance analysis / D2
analysis : (Given by P C Mahalanobis,1928 )
by summing the squares of deviation of the same
transformed or untransformed traits between the two genotypes in various
combinations.
i= No. of traits varrying from 1-p,
J,k= genotypes. j ≠ k
 Follows chi-square dist. at p degrees of freedom
 More critical is the trait the more no. of distinctly different group develop.
3/11/2016
D2= 𝒊=𝟏
𝒑
𝒅𝒊
𝟐
= (𝒀𝒊
𝒋
− 𝒀𝒊
𝒌
) 𝟐
Few Notes on GD :
 To keep variation within population between different animals,thus many
groups can be formed within population.
 Explain about extent of variability and range of variability within
population.
 It explains the evolutionary divergence.
 Progeny testing programme needs GD for distiguishing unrelated sires.
3/11/2016
 Used in single population
 Traits of the Animals can be divided into two sets and the relationship
between two sets Is to be evaluated.
 2 set of characters – Y set- response character
X set- Predictor character
Y Set is maximally correlated with X set.
• M= p1 y1 + p2 y2
• N= q1 x1 + q2 x2+ q3 x3
The Canonical coefficients (p1 , p2 and q1 , q2 and q3) in such a way so that the
correlation between two sets of characters or Canonical Variate (M and N)
become maximum and that correlation is called Canonical correlation.
3/11/2016
5. Canonical Variate Analysis
In India First used on dairy buffaloes by Thomas
and Chakravarty ( 1999 )
Fundamentals of a Bayesian Analysis :
 Formulate a probability model for the data.
 Decide on a prior distribution, which quantifies the uncertainty in the values
of the unknown model parameters before the data are observed.
 Summarize important features of the posterior distribution, or calculate
quantities of interest based on the posterior distribution.
 These quantities constitute statistical outputs, such as point estimates and
intervals.
Bayesian inference: A form of inference which regards parameters as being
random variables possessed of prior distributions reflecting the accumulated
state of knowledge.
Bayes estimation: The estimation of population parameters by the use of
methods of inverse probability.
( A.L. Pretorius and A.J. van der Merwe (2000)
This theorem is based on Conditional probability.
Conditional probability :
The probability of event B occurring when it is known that some event A
has only occurred and it is noted by P( 𝑩
𝑨).
when A & B are
dependent event.
P( 𝑩
𝑨)=
𝑷 𝑨∩𝑩
𝑷 𝑨
𝑷 𝑨 ∩ 𝑩 = 𝑷 𝑨 × 𝑷( 𝑩
𝑨
)
or
𝑷 𝑨 ∩ 𝑩 = 𝑷 𝑩 × 𝑷( 𝑨
𝑩
)
If B1 , B2….. BK are mutually disjoint events with probability P(BK)≠ 0
(i=1,2,….K) than for any arbitrary event ‘A’ which is a subset of 𝑖=1
𝐾
𝐵𝑖 such
that P(A)≠ 0 then we have
𝑷 𝑩 𝒊
𝑨
=
𝑷 𝑩𝒊 . 𝑷 𝑨
𝑩 𝒊
𝒊=𝟏
𝑲
𝑷 𝑩𝒊 . 𝑷 𝑨
𝑩 𝒊
1. The probability i.e. P(B1), P(B2),……. P(BK) are called “A Priori Probability” as
they exist before we get any information of the experiment itself.
2. The probability i.e. 𝑃 𝐴
𝐵 𝑖
, i=1,2,3…..k are called “Likelihood” because they
indicate how how likely the event under consideration is to occur for given
each and “A Priori Probability”.
3. The probability i.e. 𝑃 𝐵 𝑖
𝐴
are called “Posterior Probability” because they are
determined after the result of experiment are known.
𝑷 𝑩 𝒊
𝑨
=
𝑷 𝑩𝒊 . 𝑷 𝑨
𝑩 𝒊
𝒊=𝟏
𝑲
𝑷 𝑩𝒊 . 𝑷 𝑨
𝑩 𝒊
Few Important Notes :
 The Notation of “priori” and “posterior” in Bayes’ theorem are relative to a
given sample outcome. That is, if a posterior distribution has been
determined from a particular sample, this Posterior distribution would be
considered the prior distribution relative to a new sample.
Priori Posterior-1 Posterior-2
3/11/2016
References :
 Robin Thompson and Esa Mantysaari , Prospects for Statistical Methods
in Animal Breeding ,Jour. Ind. Soc. Ag. Statistics 57 (Special Volume),
2004: 15-25
 P Narain, Statistics And Its Application To Agriculture And Genetics ,
IARI,New Delhi.
 A.K.Chakravarty, Multivariate Analysis In Animal Breeding, NDRI,Karnal.
 Verbyla, A. P. (1990) A conditional derivation of residual maximum
likelihood. Australian Journal of Statistics, 32, 227-230.
 Henderson CR (1975) Best linear unbiased estimation and prediction
under a selection model. Biometrics 31:423–447.
 Henderson CR (1976) A simple method for the inverse of a numerator
relationship matrix used in prediction of breeding values. Biometrics
32:69–83
3/11/2016
Gelman, A., Carlin, J. B., Stern, H. S., and Rubin, D. B. (2003) Bayesian Data
Analysis, 2nd ed. London, Chapman & Hall.
Carlin, B. P., and Louis, T. A. (2000) Bayes and Empirical Bayes Methods for
Data Analysis, 2nd ed. Boca Raton, Chapman & Hall.
 Duchateau, L., Janssen, P., & Rowlands,G.L., 1998. Linear Mixed Models. An
introduction with applications in veterinary research. ILRI, Nairobi, Kenya,
159-170.
3/11/2016 baradashankar.mohanty@gmail.com

More Related Content

PPTX
Improving accuracy by using information from relatives—The animal model
PPTX
Genetic Parameter.pptx
PDF
The science of genomics and livestock genetic improvement
PPT
Animal breeding and selection
PPTX
Traits of economic Importance of Cattle in Nepal
PPTX
L5 Principles of Farm Animal Management.pptx
Improving accuracy by using information from relatives—The animal model
Genetic Parameter.pptx
The science of genomics and livestock genetic improvement
Animal breeding and selection
Traits of economic Importance of Cattle in Nepal
L5 Principles of Farm Animal Management.pptx

What's hot (20)

PPTX
Correlations
PPTX
Multiple trait selection
PDF
Cattle Feeding Practices
PPTX
Genomic selection in Livestock
PPTX
Mating System and Livestock Breeding Policy
PPTX
Sire evaluation
PPTX
PPTX
Transition cow management
PPT
Dairy farming –process
PPTX
Selectionof dairy animals
PDF
Animal breeding
PDF
Methods Adopted to Assess Nutrient Requirement i Livestock
PPTX
Genomic Selection in Dairy Cattle
PDF
Conservation of farm animal genetic resources
PPTX
Outbreeding
PPTX
PPTX
Forces changing gene frequency
PPTX
Repeatability
PPTX
Progeny testing
Correlations
Multiple trait selection
Cattle Feeding Practices
Genomic selection in Livestock
Mating System and Livestock Breeding Policy
Sire evaluation
Transition cow management
Dairy farming –process
Selectionof dairy animals
Animal breeding
Methods Adopted to Assess Nutrient Requirement i Livestock
Genomic Selection in Dairy Cattle
Conservation of farm animal genetic resources
Outbreeding
Forces changing gene frequency
Repeatability
Progeny testing
Ad

Viewers also liked (20)

PDF
The role of theory in research
PPTX
The role of theory in research on the education and learning of adults
PPTX
Publishing Scientific Research and How to Write High-Impact Research Papers
PPT
Scientific method powerpoint
PDF
Communication for Horizon 2020 research projects
PPTX
Numerical and statistical methods new
PDF
Social Media for Research Communication
PPTX
Scientific Method and Models of Mass Communication Research: By Abid Zafar Ms...
PPTX
Types of research, b usiness research
PPT
Communication Research PPT
PPTX
Observation methods of data collection in behavioral science
PDF
How to do a Scientific research ?
PPTX
Fundamental of Communication Research
PPTX
Communication research ppt
PPTX
Scientific method procedures (Teach)
PDF
Visalia Public Opinion Survey PowerPoint
PPTX
Applied vs basic research - Research Methodology - Manu Melwin Joy
PPTX
The Scientific Method
PPTX
Scientific method Powerpoint
PPTX
Presentation on the characteristic of scientific research 1
The role of theory in research
The role of theory in research on the education and learning of adults
Publishing Scientific Research and How to Write High-Impact Research Papers
Scientific method powerpoint
Communication for Horizon 2020 research projects
Numerical and statistical methods new
Social Media for Research Communication
Scientific Method and Models of Mass Communication Research: By Abid Zafar Ms...
Types of research, b usiness research
Communication Research PPT
Observation methods of data collection in behavioral science
How to do a Scientific research ?
Fundamental of Communication Research
Communication research ppt
Scientific method procedures (Teach)
Visalia Public Opinion Survey PowerPoint
Applied vs basic research - Research Methodology - Manu Melwin Joy
The Scientific Method
Scientific method Powerpoint
Presentation on the characteristic of scientific research 1
Ad

Similar to Advanced Methods of Statistical Analysis used in Animal Breeding. (20)

PPT
Lecture-4 Advanced biostatistics BLUP.ppt
PDF
BlUP and BLUE- REML of linear mixed model
PDF
Mixed Model Analysis for Overdispersion
PPT
Mixed models
PPT
GLM ASFFAFSFSFSAASFASFAFAFAFAFAFAFSAFAFSFAFAFA
PDF
LMM, linear models with random effects, lecture 10
PDF
Subject-3---Bayesian-regression-models-2024.pdf
PDF
Modelo Generalizado
PDF
mix2.pdf
PPTX
A gentle introduction to growth curves using SPSS
PDF
Pittsburgh and Toronto "Halloween US trip" seminars
PPTX
Static Models of Continuous Variables
PPT
logit_probit.ppt
PDF
Overview of statistical tests: Data handling and data quality (Part II)
PDF
Panel Data Models
PDF
The bayesian revolution in genetics
PPTX
Math Exam Help
PDF
ABC in Venezia
PDF
ABC & Empirical Lkd
PDF
ABC and empirical likelihood
Lecture-4 Advanced biostatistics BLUP.ppt
BlUP and BLUE- REML of linear mixed model
Mixed Model Analysis for Overdispersion
Mixed models
GLM ASFFAFSFSFSAASFASFAFAFAFAFAFAFSAFAFSFAFAFA
LMM, linear models with random effects, lecture 10
Subject-3---Bayesian-regression-models-2024.pdf
Modelo Generalizado
mix2.pdf
A gentle introduction to growth curves using SPSS
Pittsburgh and Toronto "Halloween US trip" seminars
Static Models of Continuous Variables
logit_probit.ppt
Overview of statistical tests: Data handling and data quality (Part II)
Panel Data Models
The bayesian revolution in genetics
Math Exam Help
ABC in Venezia
ABC & Empirical Lkd
ABC and empirical likelihood

Recently uploaded (20)

PPTX
Microbial diseases, their pathogenesis and prophylaxis
PPTX
PPH.pptx obstetrics and gynecology in nursing
PPTX
Renaissance Architecture: A Journey from Faith to Humanism
PDF
STATICS OF THE RIGID BODIES Hibbelers.pdf
PDF
FourierSeries-QuestionsWithAnswers(Part-A).pdf
PDF
TR - Agricultural Crops Production NC III.pdf
PPTX
human mycosis Human fungal infections are called human mycosis..pptx
PDF
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
PDF
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
PPTX
BOWEL ELIMINATION FACTORS AFFECTING AND TYPES
PPTX
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
PPTX
Cell Types and Its function , kingdom of life
PDF
Origin of periodic table-Mendeleev’s Periodic-Modern Periodic table
PPTX
The Healthy Child – Unit II | Child Health Nursing I | B.Sc Nursing 5th Semester
PDF
102 student loan defaulters named and shamed – Is someone you know on the list?
PPTX
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
PDF
Module 4: Burden of Disease Tutorial Slides S2 2025
PDF
Pre independence Education in Inndia.pdf
PDF
O5-L3 Freight Transport Ops (International) V1.pdf
PDF
Microbial disease of the cardiovascular and lymphatic systems
Microbial diseases, their pathogenesis and prophylaxis
PPH.pptx obstetrics and gynecology in nursing
Renaissance Architecture: A Journey from Faith to Humanism
STATICS OF THE RIGID BODIES Hibbelers.pdf
FourierSeries-QuestionsWithAnswers(Part-A).pdf
TR - Agricultural Crops Production NC III.pdf
human mycosis Human fungal infections are called human mycosis..pptx
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
BOWEL ELIMINATION FACTORS AFFECTING AND TYPES
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
Cell Types and Its function , kingdom of life
Origin of periodic table-Mendeleev’s Periodic-Modern Periodic table
The Healthy Child – Unit II | Child Health Nursing I | B.Sc Nursing 5th Semester
102 student loan defaulters named and shamed – Is someone you know on the list?
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
Module 4: Burden of Disease Tutorial Slides S2 2025
Pre independence Education in Inndia.pdf
O5-L3 Freight Transport Ops (International) V1.pdf
Microbial disease of the cardiovascular and lymphatic systems

Advanced Methods of Statistical Analysis used in Animal Breeding.

  • 4. Gathering or obtaining the desired information under study.  Primary Source  Data is collected by researcher himself  can be collected by using experiments, surveys, questionnaires, interviews, and observations.  Secondary Source  comes from resources that have already been published.  Data collected, compiled or written by other researchers eg. books, journals, newspapers  Any reference must be acknowledged DATA
  • 5. Variables : Any measurable characteristics or quantity which can assume a range of numerical values within certain limits, i.e. income, Height, age, weight ,price etc. • A discrete variable may assume only a countable number of values: intermediate values are not meaningful. • Mastitis, Disease Discrete • A continuous variable may assume any real value within some range. Takes fractional or integral values. • Milk yield ,fat yield,Body wt etc Continuous
  • 7. A mathematical model is an equation or a set of equations which represents the behavior of a system. • Linear Model : Unit increase in independent variable cause a proportionate increase dependent variable. • Y= β0+ β1X + β2X2 + e  A linear model will exactly spell out which effects are affecting which observation and the different effects (such as breed and feeding regime) are estimated simultaneously and during this process they are corrected for each other.  Linear models are the most common type of statistical models used in animal breeding to predict breeding values based on phenotypic observations. • Non-linear Model : If one or all the parameter of a model are not appear • linearly, the model is known as nonlinear model. • Y= a Xb e(-cx) + e 3/11/2016 MODEL
  • 8. The model usually consists of factors. • Discrete factors or class variables such as sex, year, herd • Continuous factors or covariables such as age Model Contains 3 components :  Predictor-Dependent Variable  Predictant- Independent Variable  Error term Model
  • 9. Types of Analysis : (A)Univariate Analysis : we group the individuals on the basis of single performance. When we use one variable to describe a person, place, or thing. (B)Bivariate Analysis : When we use two variable . (C)Multivariate Analysis : When we use more than two variables. 3/11/2016
  • 10. Univariate Analysis : 1. Linear Regression Model 2. Least square model a. Random effect Model- Heritability,Repeatabilty estimation. b. Fixed effect Model- c. Mixed effect Model- BLUP 3/11/2016
  • 11. Linear Regression Model : Where Yi = Dependent Variable Xi = Independent Variable β0 & β1 remains fixed ,we can’t found them exactly. e= Error term The principle of estimation of regression coefficient is based on Least Square Analysis. 3/11/2016 Yi = β0+ β1Xi + e
  • 13. RandomEffects  Effects which have levels that are considered to be drawn from an infinite large population of levels.  Animal effects are often random.  In repeated experiments there maybe other animals drawn from the population. e,g. Sire ,Dam effect FixedEffects Effects for which the defined classes comprise all the possible levels of interest, e.g. Herd , Season ,Year effect. Effects can be considered as fixed when the number of levels are relatively small and is confined to this number after repeated sampling.
  • 14. Predictors- Used for Estimation of Random effect. Estimators- Used for Estimation of Fixed effect. Principles of Least Square Analysis : The Square of difference between observed and estimated/ predicted value of dependent variable must be least or Zero. 3/11/2016 𝒀= b0+ b1Xi + e then (𝒀𝒊 − 𝒀) 𝟐 ≅ ε 𝟐 ≅ 0
  • 15. Random Effect Model : Where , Yij = jth observation of ith Sire μ= General mean effect Si = Effect of ith sire eij = Error term 3/11/2016 Yij = μ+ Si + eij
  • 16. Fixed Effect : Where , Yij = jth observation of ith Herd μ= General mean effect Fi = Effect of ith herd eij = Error term 3/11/2016 Yij = μ+ Fi + eij
  • 17.  To achieve this ‘mixed models’ are used in which fixed effects and breeding values (indicated as ‘random effects’) will be estimated jointly. 3/11/2016 Mixed Effect Models
  • 18. Where , Yijk = kth observation in ith farm of jth Sire μ= General mean effect Sj = Effect of jth sire in ith farm Fi = ith farm effect eijk = Error term Yijk=μ+ Fi+Sj + eijk
  • 19. Interaction effect : Yijk=μ+ Si+Fj + (SF)ij + eijk
  • 20. BLUP estimation of breeding values is based on a mixed model, which is a linear basis of Best Linear Unbiased Prediction. BLUP
  • 21.  Best in the sense that they have minimum mean squared error within the class of linear unbiased estimators; and predictors to distinguish them from estimators of fixed effects.  BLUP estimates of the realized values of the random variables u are linear in the sense that they are linear functions of the data, y;  The Best prediction is that which minimises the prediction error.  Unbiased in the sense that the average value of the estimate is equal to the average value of the quantity being estimated; (G.K Robinson,1991) 3/11/2016
  • 22. Maximizes the correlation between true and estimated value of effects by minimizing the error variance. The factors for which estimates are required linear functions of the observations. Estimates of fixed effects and estimable functions are such that E(T) = 𝜃. 3/11/2016
  • 23. 3 different kinds of BLUP, (Henderson, 1973 ) Henderson model-1 Henderson model-2 Henderson model-3 Only Random Effect Random + Fixed + Interaction Effect Both Random & Fixed Effect
  • 24. The general linear model equation in matrix form is Y = Xβ + Zu + e Where , Y is an n × 1 vector of n observed records Xis a known incidence matrix of order n × p, which relates the records in y to the fixed effects in b β is a p × 1 vector of p levels of fixed effects (to be estimated ) Zis a known incidence matrix of order n × q, which relates the records in y to the random effects in u uis a q × 1 vector of q levels of random effects such as individual genetic value (to be estimated ) eis an n × 1 vector of random, residual terms 3/11/2016
  • 25. Expectations and Variance Covariance (VCV) Matrices In general the expectation of y is > which is also known as the 1st moment. The 2nd moments describe the variance covariance structure of y:  G is a dispersion matrix for random effects other than errors ,  R is the dispersion matrix of error terms, for which both are general square matrices assumed to be non-singular and positive definite, with elements that are assumed known. We usually write V = ZGZT + R 3/11/2016
  • 26. Estimating fixed Effects & Predicting Random Effects :-  For a mixed model, y, X, and Z , 𝛽, u, R, and G are generally unknown Two complementary estimation issues  Estimation of 𝜷 and u  Estimation of fixed effects, BLUE = Best Linear Unbiased Estimator  Prediction of random effects BLUP = Best Linear Unbiased Predictor 3/11/2016 𝜷= (XT 𝑽−𝟏 X) -1 XT 𝑽−𝟏 y 𝒖= GZT 𝑽−𝟏 ( y-X 𝜷 )
  • 27. • The BLUP eliminates the non genetic biases in estimating Breeding Value. • It also removes the genetic biases taking in to account the effects of non- random mating , genetic merit of Dams and selection. 3/11/2016
  • 29. Advantages: – Handles unbalanced designs – Uses information for all relatives measured to improve estimates BLUP can be used to estimate a variety of genetic values – GCA, SCA, line values (i.e., genotypic values of pure lines) – One can also use BLUP to estimate environmental effects, G x E
  • 30. REML = Restricted Maximum Likelihood. Standard ML variance estimation assumes fixed factors are known without error.  REML is an approach that produces unbiased estimators for these special cases and produces less biased estimates than ML in general.  Depending on whom you ask, REML stands for Residual Maximun Likelihood or Restricted Maximum Likelihood 3/11/2016 REML Variance Component Estimation
  • 31.  variance components by REML are estimated based on residuals calculated after fitting by ordinary least squares from fixed effects part of the model.  It Maximizes a marginal likelihood function.  So it is also called Residual Maximun Likelihood or Marginal Maximun Likelihood.  For linear mixed effects models, the REML estimators of variance components produce the same estimates as the unbiased ANOVA-based estimators formed by taking appropriate linear combinations of mean squares when the latter are positive and data are balanced. 3/11/2016
  • 36. Before getting onto iterative algorithms, it is helpful to review the difference between the log-likelihood function l used to calculate maximum likelihood estimates, and that (𝒍 𝟐) used for REML:  The term log (𝑋 𝑇 𝐻−1 𝑋)makes the adjustment for degrees of freedom used in estimating treatment effects, so that REML estimates of variance components are less biased than ML estimates.  The other major differences are:  𝒍 𝟐 is not a function of the fixed effects 𝜏  The constant in 𝒍 𝟐 is a function of the fixed design matrix X 3/11/2016 ML vrs REML
  • 38. ML vrs REML  ML estimates are biased because no account is taken of degree of freedom in estimating the variance components.  REML takes care of bias in estimates as well as avoids –ve estimates of component variance . (Searle et al, 1992 )
  • 39. Genetic Evaluation REML and BLUP applied to multi trait mixed models have become the standard method for genetic evaluation in all terrestrial animal species. The main benefits of using this methodology include: (1) Increasing accuracy of selection; (2) Managing accumulation of inbreeding; (3) estimating genetic trend without a control; (4) the possibility of conducting large scale genetic evaluation across populations. N.H. Nguyen and R.W. Ponzoni Vol. 29 No. 3 & 4 Jul-Dec 2006 3/11/2016
  • 40. Bivariate Analysis : •Tests of statistical significance. •Chi-square.
  • 41. 3/11/2016 Multivariate analysis consists of a collection of methods that can be used when several measurements are made on each individual or object in one or more samples.
  • 43. Types : 1. Multivariate Regression 2.Discriminant Analysis 3. Principal Component Analysis 4. Genetic Divergence Analysis 5. Canonical Variate Analysis 3/11/2016
  • 44. Yi =β0+ β1X1+β2X2+…………… βnXn+ ei Yi=β0+ i=1 n βiXi+ ei When the number of population is more than one and each animal in the population has multiple characters. 3/11/2016 1. Multivariate Regression : 2. Discriminant Analysis :
  • 45. • Purpose : To find out the Discriminant Function (D) that increases the differences among populations by minimising the variances within the population and maximising the mean differences between the populations with respect of characters • D=λ1 d1 + λ2 d2+ λ3 d3 • = ∑ λi di Where , di = ith mean difference of the populations in relation to the character, λi = weighting coefficient attached to the difference. 3/11/2016
  • 46. A Linear combinations of independent characters are involved to maximise the variance accounted for in the original set of characters. • Z1=a1 x1 + a2 x2+ a3 x3+ a4 x4 • Z2=b1 x1 + b2 x2+ b3 x3+ b4 x4 • Z3=c1 x1 + c2 x2+ c3 x3+ c4 x4 • Z4=d1 x1 + d2 x2+ d3 x3+ d4 x4 ai , bi , ci , di are relative weighting factor attached to each character. ∑ai 2= ∑bi 2= ∑ci 2 =∑di 2=1  It is mainly confined to single population Principal Component shows highest variance - 1st Principal Component Principal Component shows next highest variance – 2nd Principal Component . 3/11/2016 3. Principal Component (Z) Analysis : First time this analysis In India was reported by Dhara and Chakravarty (1996) in large animals for predicting the breeding value of milk production on the basis of selected number of principal components.
  • 47. 4. Genetic Divergence Analysis / Genetic distance analysis / D2 analysis : (Given by P C Mahalanobis,1928 ) by summing the squares of deviation of the same transformed or untransformed traits between the two genotypes in various combinations. i= No. of traits varrying from 1-p, J,k= genotypes. j ≠ k  Follows chi-square dist. at p degrees of freedom  More critical is the trait the more no. of distinctly different group develop. 3/11/2016 D2= 𝒊=𝟏 𝒑 𝒅𝒊 𝟐 = (𝒀𝒊 𝒋 − 𝒀𝒊 𝒌 ) 𝟐
  • 48. Few Notes on GD :  To keep variation within population between different animals,thus many groups can be formed within population.  Explain about extent of variability and range of variability within population.  It explains the evolutionary divergence.  Progeny testing programme needs GD for distiguishing unrelated sires. 3/11/2016
  • 49.  Used in single population  Traits of the Animals can be divided into two sets and the relationship between two sets Is to be evaluated.  2 set of characters – Y set- response character X set- Predictor character Y Set is maximally correlated with X set. • M= p1 y1 + p2 y2 • N= q1 x1 + q2 x2+ q3 x3 The Canonical coefficients (p1 , p2 and q1 , q2 and q3) in such a way so that the correlation between two sets of characters or Canonical Variate (M and N) become maximum and that correlation is called Canonical correlation. 3/11/2016 5. Canonical Variate Analysis In India First used on dairy buffaloes by Thomas and Chakravarty ( 1999 )
  • 50. Fundamentals of a Bayesian Analysis :  Formulate a probability model for the data.  Decide on a prior distribution, which quantifies the uncertainty in the values of the unknown model parameters before the data are observed.  Summarize important features of the posterior distribution, or calculate quantities of interest based on the posterior distribution.  These quantities constitute statistical outputs, such as point estimates and intervals.
  • 51. Bayesian inference: A form of inference which regards parameters as being random variables possessed of prior distributions reflecting the accumulated state of knowledge. Bayes estimation: The estimation of population parameters by the use of methods of inverse probability. ( A.L. Pretorius and A.J. van der Merwe (2000)
  • 52. This theorem is based on Conditional probability. Conditional probability : The probability of event B occurring when it is known that some event A has only occurred and it is noted by P( 𝑩 𝑨). when A & B are dependent event. P( 𝑩 𝑨)= 𝑷 𝑨∩𝑩 𝑷 𝑨 𝑷 𝑨 ∩ 𝑩 = 𝑷 𝑨 × 𝑷( 𝑩 𝑨 ) or 𝑷 𝑨 ∩ 𝑩 = 𝑷 𝑩 × 𝑷( 𝑨 𝑩 )
  • 53. If B1 , B2….. BK are mutually disjoint events with probability P(BK)≠ 0 (i=1,2,….K) than for any arbitrary event ‘A’ which is a subset of 𝑖=1 𝐾 𝐵𝑖 such that P(A)≠ 0 then we have 𝑷 𝑩 𝒊 𝑨 = 𝑷 𝑩𝒊 . 𝑷 𝑨 𝑩 𝒊 𝒊=𝟏 𝑲 𝑷 𝑩𝒊 . 𝑷 𝑨 𝑩 𝒊
  • 54. 1. The probability i.e. P(B1), P(B2),……. P(BK) are called “A Priori Probability” as they exist before we get any information of the experiment itself. 2. The probability i.e. 𝑃 𝐴 𝐵 𝑖 , i=1,2,3…..k are called “Likelihood” because they indicate how how likely the event under consideration is to occur for given each and “A Priori Probability”. 3. The probability i.e. 𝑃 𝐵 𝑖 𝐴 are called “Posterior Probability” because they are determined after the result of experiment are known. 𝑷 𝑩 𝒊 𝑨 = 𝑷 𝑩𝒊 . 𝑷 𝑨 𝑩 𝒊 𝒊=𝟏 𝑲 𝑷 𝑩𝒊 . 𝑷 𝑨 𝑩 𝒊
  • 55. Few Important Notes :  The Notation of “priori” and “posterior” in Bayes’ theorem are relative to a given sample outcome. That is, if a posterior distribution has been determined from a particular sample, this Posterior distribution would be considered the prior distribution relative to a new sample. Priori Posterior-1 Posterior-2
  • 56. 3/11/2016 References :  Robin Thompson and Esa Mantysaari , Prospects for Statistical Methods in Animal Breeding ,Jour. Ind. Soc. Ag. Statistics 57 (Special Volume), 2004: 15-25  P Narain, Statistics And Its Application To Agriculture And Genetics , IARI,New Delhi.  A.K.Chakravarty, Multivariate Analysis In Animal Breeding, NDRI,Karnal.  Verbyla, A. P. (1990) A conditional derivation of residual maximum likelihood. Australian Journal of Statistics, 32, 227-230.  Henderson CR (1975) Best linear unbiased estimation and prediction under a selection model. Biometrics 31:423–447.  Henderson CR (1976) A simple method for the inverse of a numerator relationship matrix used in prediction of breeding values. Biometrics 32:69–83
  • 57. 3/11/2016 Gelman, A., Carlin, J. B., Stern, H. S., and Rubin, D. B. (2003) Bayesian Data Analysis, 2nd ed. London, Chapman & Hall. Carlin, B. P., and Louis, T. A. (2000) Bayes and Empirical Bayes Methods for Data Analysis, 2nd ed. Boca Raton, Chapman & Hall.  Duchateau, L., Janssen, P., & Rowlands,G.L., 1998. Linear Mixed Models. An introduction with applications in veterinary research. ILRI, Nairobi, Kenya, 159-170.