Bayesian Regression System for Interval-valued data

Autorizada la entrega del proyecto del alumno:

Rub´ n Salgado Fern´ ndez
e a

EL DIRECTOR DEL PROYECTO

Carlos Mat´ Jim´ nez
e e

Fdo.: Fecha: 12/06/2007

Vo Bo DEL COORDINADOR DE PROYECTOS

Claudia Meseguer Velasco

Fdo.: Fecha: 12/06/2007

UNIVERSIDAD PONTIFICIA DE COMILLAS

ESCUELA TECNICA SUPERIOR DE INGENIER´ (ICAI)
´ IA

´
INGENIERO EN ORGANIZACION INDUSTRIAL

PROYECTO FIN DE CARRERA

Bayesian Regression System
for Interval-Valued Data.
Application to the Spanish Continuous
Stock Market

AUTOR : Salgado Fern´ ndez, Rub´ n
a e

M ADRID , Junio 2007

Acknowlegdements

Firstly, I would like to thank my director, Carlos Mat´ Jim´ nez, PhD, for giving me the chance of
e e
making this project. With him, I have learnt, not only about Statistics and investigation, but also
about how to enjoy with them.

Special thanks to my parents. Their love and all they have taught me in this life are the things
what have made possible being the person I am now.

Thanks to my brothers, my sister and the rest of my family for their support and for the stolen time.

Thanks to Charo for standing my bad mood in the bad moments, for supporting me and for giving
me the inspiration to go ahead.

Madrid, June 2007

i

Resumen

´
En los ultimos a˜ os los m´ todos Bayesianos se han extendido y se han venido utilizando de forma
n e
exitosa en muchos y variados campos tales como marketing, medicina, ingenierá, econometrá o mer-
ı ı
cados financieros. La principal caracter´stica que hace destacar al an´ lisis Bayesiano de datos (AN-
ı a
BAD) frente a otras alternativas es que, no s´ lo tiene en cuenta la informaci´ n objetiva procedente de
o o
los datos del suceso en estudio, sino tambi´ n el conocimiento anterior al mismo. Los beneficios que
e
se obtienen de este enfoque son m´ ltiples ya que, cuanto mayor sea el conocimiento de la situaci´ n,
u o
a ´
con mayor fiabilidad se podr´ n tomar las decisiones y estas ser´ n m´ s acertadas. Pero no siempre todo
a a
han sido ventajas. El ANBAD, hasta hace unos a˜ os, presentaba una serie de dificultades que limita-
n
ban el desarrollo del mismo a los investigadores. Si bien la metodologá Bayesiana existe como tal
ı
desde hace bastante tiempo, no se ha empezado emplear de manera generalizada hasta los 90’s. Esta
expansi´ n ha sido propiciada en gran parte por el avance en el desarrollo computacional y la mejora y
o
perfeccionamiento de distintos m´ todos de c´ lculo como los m´ todos de cadenas de Markov-Monte
e a e
Carlo.

ı ´
En especial, esta metodologá se ha mostrado extraordinariamente util en la aplicaci´ n a los mod-
o
elos de regresi´ n, ampliamente adoptados. En m´ ltiples ocasiones en la pr´ ctica, se dan situaciones
o u a
en las que se requiere analizar la relaci´ n entre dos variables cuantitativas. Los dos objetivos fun-
o
damentales de este an´ lisis ser´ n, por un lado, determinar si dichas variables est´ n asociadas y en
a a a
qu´ sentido se da dicha asociaci´ n (es decir, si los valores de una de las variables tienden a aumentar
e o
-o disminuir- al aumentar los valores de la otra); y por otro, estudiar si los valores de una variable
pueden ser utilizados para predecir el valor de la otra. Un modelo de regresi´ n trata de proporcionar
o
informaci´ n sobre uno o varios sucesos a trav´ s de su relaci´ n con el comportamiento de otros. Con
o e o
la metodologá Bayesiana se permite incorporar el conocimiento del investigador al an´ lisis, haciendo
ı a
los resultados m´ s precisos, ya que no se a´slan los resultados a los datos de una determinada muestra.
a ı

ii

iii

a ´
Por otro lado, se est´ empezando a aceptar que el siglo XXI en el ambito de la estad´stica va a
ı
ser el siglo de la ”estad´stica del conocimiento” a diferencia del anterior que fue el de la ”estad´stica
ı ı
de los datos”. El concepto b´ sico para construir dicha estad´stica es el de dato simb´ lico y se han
a ı o
desarrollado m´ todos estad´sticos para algunos tipos de datos simb´ licos.
e ı o

En la actualidad, la exigencia del mercado, la demanda y, en general, del mundo crece. Esto
implica que cada vez sea mayor el deseo de predecir la ocurrencia de un evento o poder controlar el
comportamiento de ciertas cantidades con el menor error posible con el fin de ofrecer mejores pro-
ductos, obtener mayores beneficios o adelantos cient´ficos y mejores resultados.
ı

Sobre esta realidad, este proyecto trata de responder a dichas necesidades proporcionando una
amplia documentaci´ n sobre varias de las t´ cnicas m´ s utilizadas y m´ s punteras a dá de hoy, como
o e a a ı
son el an´ lisis Bayesiano de datos, los modelos de regresi´ n y los datos simb´ licos, y proponiendo
a o o
diferentes t´ cnicas de regresi´ n. De igual forma se desarrollar´ una herramienta que permita poner
e o a
en pr´ ctica todos los conocimientos adquiridos. Dicha aplicaci´ n estar´ dirigida al mercado burs´ til
a o a a
espa˜ ol y permitir´ al usuario utilizarla de manera sencilla y amigable. En cuanto al desarrollo de esta
n a
herramienta se emplear´ uno de los lenguajes m´ s novedosos y con m´ s proyecci´ n del momento: R.
a a a o

Se trata, por tanto, de un proyecto que combina las t´ cnicas m´ s novedosas y con mayor proyecci´ n
e a o
tanto en materia te´ rica, como es la regresi´ n Bayesiana aplicada a datos de tipo intervalo, como en
o o
materia pr´ ctica, como es el empleo del lenguaje R.
a

Abstract

In the recent years, Bayesian methods have been spread and successfully used in many and several
fields such as Marketing, Medicine, Engineering, Econometrics or Financial Markets. The main char-
acteristic that makes Bayesian Data Analysis (BADAN) remarkable compared with other alternatives
is that not only does it take into account the objective information coming from the analyzed event,
but also the pre-event knowledge. The benefits obtained from this approach are innumerable due to
the fact that the more knowledge of the situation one has, the more reliable and accurate decisions
could be taken. However, although Bayesian methodology was set long time ago, it has not been
applied in a general way until the 90’s because of the computational difficulties. Such expansion has
been mainly favoured by the advances in that field and the improvement on different calculus meth-
ods, such as Markov-chain Monte Carlo methods.

Particularly, this Bayesian methodology has been resulted in an extraordinary useful application
for the regression models, which have been adopted by large. There are many times in real life in
which it is necessary to analyse the situation between two quantitive variables. The two main objec-
tives of this analysis would be, on the one hand, to determine whether such variables are associated
and in what sense that association comes about (that is, whether the value of one of the variables
tends to rise- or to decrease- when augmented the value of the other); and on the other hand, to study
whether the values of one variable can be used to predict the value of the other. A regression model
offers information about one or more events through their relationship with the behaviour of the oth-
ers. With the Bayesian methodology it is possible to add the researcher’s knowledge to the analysis,
making thus the results be more accurate due to the fact that the results are not isolated from the data
of one determined sample.

On the other hand, in the Statistics field, it has been more and more accepted the fact that the XXI
century will be the century of the ”Statistics of knowledge” contrary to the last one, which was the

iv

v

one of the ”Statistics of data”. The most basic concept to constitute such Statistics is the symbolic
data; furthermore, there have been developed more statistics methods for some types of symbolic data.

Nowadays, the requirements of the market, and the demands of the world in general, are growing
up. This implies the continuous increase of the desire for predicting the occurrence of an event or for
the ability of controlling the behaviour of certain quantities with the minimum error with the aim of
offering better products, obtaining more beneﬁts or scientiﬁc improvements and better outcomes.

Under this frame, this project tries to responds such needs by offering a large documentation
about several of the most applied and leading nowadays techniques, such as Bayesian data analysis,
regression models, and symbolic data, and suggesting different regression techniques. Similarly, it
has been developed a tool that allow the reader to put all the acquired knowledge into practice. Such
application will be aimed to the Spanish Continuous Stock Market and it will let the user apply it eas-
ily. As far as the development of this tool is concerned, it has been used one of the more innovative
and with more projection languages of the moment: R.

So, the project is about a combination of the techniques that are most innovative and with the
most projection both in theoretical questions such as Bayesian regression applied to interval- valued
data and in practical questions such us the employment of the R language.

List of Figures

1.1 Project Work Packages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.1 Univariate Normal Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

6.1 Interval time series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

7.1 Classical Regression with single values in training test . . . . . . . . . . . . . . . . 73
7.2 Classical Regression with single values in testing test . . . . . . . . . . . . . . . . . 74
7.3 Classical Regression with interval- valued data . . . . . . . . . . . . . . . . . . . . 75
7.4 Centre Method (2000) in training set . . . . . . . . . . . . . . . . . . . . . . . . . . 75
7.5 Centre Method (2000) in testing set . . . . . . . . . . . . . . . . . . . . . . . . . . 76
7.6 Centre and Radius Method in training set . . . . . . . . . . . . . . . . . . . . . . . 77
7.7 Centre and Radius Method in testing set . . . . . . . . . . . . . . . . . . . . . . . . 78
7.8 Bayesian Centre and Radius Method in testing test . . . . . . . . . . . . . . . . . . 80
7.9 Classical Regression with single values in training test . . . . . . . . . . . . . . . . 81
7.10 Classical Regression with single values in testing test . . . . . . . . . . . . . . . . . 81
7.11 Centre Method (2000) in training set . . . . . . . . . . . . . . . . . . . . . . . . . . 82
7.12 Centre Method (2000) in testing set . . . . . . . . . . . . . . . . . . . . . . . . . . 83
7.13 Centre and Radius Method in training set . . . . . . . . . . . . . . . . . . . . . . . 85
7.14 Centre and Radius Method in testing set . . . . . . . . . . . . . . . . . . . . . . . . 85
7.15 Bayesian Centre and Radius Method in testing set . . . . . . . . . . . . . . . . . . . 87

9.1 BARESIMDA MDI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

10.1 Interface between BARESIMDA and R . . . . . . . . . . . . . . . . . . . . . . . . 104
10.2 Interface between BARESIMDA and Excel . . . . . . . . . . . . . . . . . . . . . . 105
10.3 Logical Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

vi

LIST OF FIGURES vii

C.1 Load Data Menu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
C.2 Select File Dialog . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
C.3 Display Loaded Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
C.4 Deﬁne New Variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
C.5 Enter New Variable Name . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
C.6 Display New Variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
C.7 Edit Variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
C.8 Select Variable to Be Editted . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
C.9 Enter New Name . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
C.10 Conﬁrmation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
C.11 New Row data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
C.12 Type Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
C.13 Look And Feel Menu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
C.14 Look And Feel Styles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
C.15 New Look And Feel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
C.16 Type Of User Menu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
C.17 Select Type Of User . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
C.18 Non-Symbolic Classical Regression Menu . . . . . . . . . . . . . . . . . . . . . . . 131
C.19 Select Non-Symbolic Variables in Simple Regression . . . . . . . . . . . . . . . . . 131
C.20 Brief Report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
C.21 Analysis Options in Non-Symbolic Classical Simple Regression . . . . . . . . . . . 132
C.22 New Prediction in Non-Symbolic Classical Simple Regression . . . . . . . . . . . . 133
C.23 Graphics options in Non-Symbolic Classical Simple Regression . . . . . . . . . . . 133
C.24 Save options in Non-Symbolic Classical Simple Regression . . . . . . . . . . . . . . 134
C.25 Non-Symbolic Classical Multiple Regression Menu . . . . . . . . . . . . . . . . . . 134
C.26 Select Variables in Non-Symbolic Classical Multiple Regression . . . . . . . . . . . 134
C.27 Analysis options in Non-Symbolic Classical Multiple Regression . . . . . . . . . . . 135
C.28 Graphics options in Non-Symbolic Classical Multiple Regression . . . . . . . . . . 135
C.29 Save options in Non-Symbolic Classical Multiple Regression . . . . . . . . . . . . . 136
C.30 Intercept in Non-Symbolic Classical Multiple Regression . . . . . . . . . . . . . . . 136
C.31 Non-Symbolic Bayesian Simple Regression Menu . . . . . . . . . . . . . . . . . . . 136
C.32 Select Variables in Non-Symbolic Bayesian Simple Regression . . . . . . . . . . . . 137
C.33 Analysis Options in Non-Symbolic Bayesian Simple Regression . . . . . . . . . . . 137
C.34 Graphics Options in Non-Symbolic Bayesian Simple Regression . . . . . . . . . . . 138

LIST OF FIGURES viii

C.35 Save Options in Non-Symbolic Bayesian Simple Regression . . . . . . . . . . . . . 138
C.36 Prior Experienced Especification Options in Non-Symbolic Bayesian Simple Regres-
sion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
C.37 Prior Inexperienced Especification in Non-Symbolic Bayesian Simple Regression . . 139
C.38 Prior Experienced Especification in Non-Symbolic Bayesian Simple Regression . . . 139
C.39 Non-Symbolic Bayesian Multiple Regression menu . . . . . . . . . . . . . . . . . . 139
C.40 Analysis Options in Non-Symbolic Bayesian Multiple Regression . . . . . . . . . . 140
C.41 Graphics Options in Non-Symbolic Bayesian Multiple Regression . . . . . . . . . . 140
C.42 Save Options in Non-Symbolic Bayesian Multiple Regression . . . . . . . . . . . . 140
C.43 Model Options in Non-Symbolic Bayesian Multiple Regression . . . . . . . . . . . 141
C.44 Symbolic Classical Simple Regression Menu . . . . . . . . . . . . . . . . . . . . . 141
C.45 Select Variables in Symbolic Classical Simple Regression . . . . . . . . . . . . . . . 141
C.46 Analysis Options in Symbolic Classical Simple Regression . . . . . . . . . . . . . . 142
C.47 Graphics Options in Symbolic Classical Simple Regression . . . . . . . . . . . . . . 142
C.48 Symbolic Classical Multiple Regression Menu . . . . . . . . . . . . . . . . . . . . . 143
C.49 Select Variables in Symbolic Classical Multiple Regression . . . . . . . . . . . . . . 143
C.50 Analysis Options in Symbolic Classical Multiple Regression . . . . . . . . . . . . . 144
C.51 Graphics Options in Symbolic Classical Multiple Regression . . . . . . . . . . . . . 144
C.52 Symbolic Bayesian Simple Regression . . . . . . . . . . . . . . . . . . . . . . . . . 145
C.53 Select Variables in Symbolic Bayesian Simple Regression . . . . . . . . . . . . . . 145
C.54 Analysis Options in Symbolic Bayesian Simple Regression . . . . . . . . . . . . . . 145
C.55 Graphics Options in Symbolic Bayesian Simple Regression . . . . . . . . . . . . . . 146
C.56 Model Options in Symbolic Bayesian Simple Regression . . . . . . . . . . . . . . . 147
C.57 Symbolic Bayesian Multiple Regression Menu . . . . . . . . . . . . . . . . . . . . 147
C.58 Select Variables in Symbolic Bayesian Multiple Regression . . . . . . . . . . . . . . 147
C.59 Graphics Options in Symbolic Bayesian Multiple Regression . . . . . . . . . . . . . 148

List of Tables

2.1 Distributions in Bayesian Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . 10
2.2 Comparison between Univariate and Multivariate Normal . . . . . . . . . . . . . . . 15
2.3 Conjugate distributions for other likelihood distributions . . . . . . . . . . . . . . . 16

4.1 Bayes Factor Interpretation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.2 Sensitivity Summary I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.3 Sensitivity Summary II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

5.1 Multiple and Simple Regression Comparison . . . . . . . . . . . . . . . . . . . . . 40
5.2 Sensitivity analysis of parameter β . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
5.3 Sensitivity analysis of parameter σ2 . . . . . . . . . . . . . . . . . . . . . . . . . . 46
5.4 Classical and Bayesian regression comparison . . . . . . . . . . . . . . . . . . . . . 48
5.5 Main Prior Distributions Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
5.6 Main Posterior Distributions Summary . . . . . . . . . . . . . . . . . . . . . . . . . 58
5.7 Prior and Posterior Parameters Summary . . . . . . . . . . . . . . . . . . . . . . . . 59
5.8 Main Posterior Predictive Distributions Summary . . . . . . . . . . . . . . . . . . . 60

6.1 Multivalued Data Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
6.2 Modal-multivalued Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

7.1 Error Measures for Classical Regression with single values . . . . . . . . . . . . . . 74
7.2 Error Measure for Centre Method (2000) . . . . . . . . . . . . . . . . . . . . . . . . 76
7.4 Error Measures for Centre and Radius Method . . . . . . . . . . . . . . . . . . . . . 78
7.5 Error Measures in Bayesian Centre and Radius Method . . . . . . . . . . . . . . . . 80
7.6 Error Measures for Classical Regression with single values . . . . . . . . . . . . . . 82

ix

LIST OF TABLES x

7.9 Error Measures for Centre and Radius Method . . . . . . . . . . . . . . . . . . . . . 84
7.10 Error Measures in Bayesian Centre and Radius Method . . . . . . . . . . . . . . . . 86

11.1 Estimated material costs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
11.2 Amortization Costs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
11.3 Summarized Budget . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

Contents

Acknowlegdements i

Resumen ii

Abstract iv

List of Figures vi

List of Tables x

Contents xvi

1 Introduction 1
1.1 Project Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2 Bayesian Data Analysis 6
2.1 What is Bayesian Data Analysis? . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2 Bayesian Analysis for Normal and other distributions . . . . . . . . . . . . . . . . . 10
2.2.1 Univariate Normal distribution . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.2.2 Multivariate Normal distribution . . . . . . . . . . . . . . . . . . . . . . . . 13
2.2.3 Other distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.3 Hierarchical Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.4 Nonparametric Bayesian . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

3 Posterior Simulation 20
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

xi

CONTENTS xii

3.2 Markov chains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.3 Monte Carlo Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.4 Gibbs sampler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.5 Metropolis-Hastings sampler and its special cases . . . . . . . . . . . . . . . . . . . 25
3.5.1 Metropolis-Hastings sampler . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.5.2 Metropolis sampler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.5.3 Random-walk sampler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.5.4 Independence sampler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.6 Importance sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

4 Sensitivity Analysis 28
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.2 Bayes Factor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.3 Alternative Stats to Bayes Factor . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.4 Highest Posterior Density Intervals . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.5 Model Comparison Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

5 Regression Analysis 35
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
5.2 Classical Regression Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
5.3 The Bayesian Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
5.4 Normal Linear Regression Model subject to inequality constraints . . . . . . . . . . 48
5.5 Normal Linear Regression Model with Independent Parameters . . . . . . . . . . . . 49
5.6 Normal Linear Regression Model with Heteroscedasticity and Correlation . . . . . . 51
5.6.1 Heteroscedasticity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
5.6.2 Correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
5.7 Models Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

6 Symbolic Data 61
6.1 What is symbolic data analysis? . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
6.2 Interval-valued variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
6.3 Classical regression analysis with Interval-valued data . . . . . . . . . . . . . . . . . 67
6.4 Bayesian regression analysis with Interval-valued data . . . . . . . . . . . . . . . . 70

CONTENTS xiii

7 Results 72
7.1 Spanish Continuous Stock Market data sets . . . . . . . . . . . . . . . . . . . . . . 72
7.2 Direct Relation between Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
7.3 Uncorrelated Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

8 A Guide to Statistical Software Today 88
8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
8.2 Commercial Packages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
8.2.1 The SAS System for Statistical Analysis . . . . . . . . . . . . . . . . . . . . 89
8.2.2 Minitab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
8.2.3 BMDP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
8.2.4 SPSS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
8.2.5 S-PLUS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
8.2.6 Others . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
8.3 Public License Packages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
8.3.1 R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
8.3.2 BUGS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
8.4 Analysis Packages with Statistical Libraries . . . . . . . . . . . . . . . . . . . . . . 94
8.4.1 Matlab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
8.4.2 Mathematica . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
8.4.3 Others . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
8.5 Some General Languages with Statistical Libraries . . . . . . . . . . . . . . . . . . 95
8.5.1 Java . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
8.5.2 C++ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
8.6 Developed Software Tool: BARESIMDA . . . . . . . . . . . . . . . . . . . . . . . 96

9 Software Requirements Speciﬁcation 98
9.1 Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
9.2 Intended Audience . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
9.3 Functionality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
9.3.1 Classical Regression with crisp data . . . . . . . . . . . . . . . . . . . . . . 99
9.3.2 Classical Regression with interval- valued data . . . . . . . . . . . . . . . . 99
9.3.3 Bayesian Regression with crisp data . . . . . . . . . . . . . . . . . . . . . . 100
9.3.4 Bayesian Regression with interval- valued data . . . . . . . . . . . . . . . . 100

CONTENTS xiv

9.3.5 Data Manipulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
9.3.6 Portability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
9.3.7 Maintainability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
9.4 External Interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
9.4.1 User Interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
9.4.2 Software Interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

10 Software Architecture Study 103
10.1 Hardware/ Software Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
10.2 Logical Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104

11 Project Budget 106
11.1 Engineering Costs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
11.2 Investment and Elements Costs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
11.2.1 Summarized Budget . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108

12 Conclusions 110
12.1 Bayesian Regression applied to Symbolic Data . . . . . . . . . . . . . . . . . . . . 110
12.2 BARESIMDA Software Tool . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
12.3 Future Developments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
12.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112

A Probability Distributions 113
A.1 Discrete Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
A.1.1 Binomial . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
A.1.2 Geometric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
A.1.3 Poisson . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
A.2 Continuous Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
A.2.1 Uniform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
A.2.2 Univariate Normal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
A.2.3 Exponential . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
A.2.4 Gamma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
A.2.5 Inverse- Gamma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
A.2.6 Chi-square . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
A.2.7 Inverse- Chi-square and Inverse-Scaled Chi-Square . . . . . . . . . . . . . . 118

CONTENTS xv

A.2.8 Univariate Student- t . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
A.2.9 Beta . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
A.2.10 Multivariate Normal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
A.2.11 Multivariate Student- t . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
A.2.12 Wishart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
A.2.13 Inverse- Wishart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121

B Installation Guide 122
B.1 From source folder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
B.2 From installer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122

C User’s Guide 123
C.1 Data Entry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
C.1.1 Loading an excel file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
C.1.2 Defining a new variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
C.1.3 Editing an existing variable . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
C.1.4 Deleting an existing variable . . . . . . . . . . . . . . . . . . . . . . . . . . 127
C.1.5 Typing in a new data row . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
C.1.6 Deleting an existing data row . . . . . . . . . . . . . . . . . . . . . . . . . . 128
C.1.7 Modifying an existing data . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
C.2 Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
C.2.1 Setting the look& feel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
C.2.2 Selecting the type of user . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
C.3 Non Symbolic Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
C.3.1 Simple Classical Regression . . . . . . . . . . . . . . . . . . . . . . . . . . 131
C.3.2 Multiple Classical Regression . . . . . . . . . . . . . . . . . . . . . . . . . 133
C.3.3 Simple Bayesian Regression . . . . . . . . . . . . . . . . . . . . . . . . . . 136
C.3.4 Multiple Bayesian Regression . . . . . . . . . . . . . . . . . . . . . . . . . 139
C.4 Symbolic Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
C.4.1 Simple Classical Regression . . . . . . . . . . . . . . . . . . . . . . . . . . 140
C.4.2 Multiple Classical Regression . . . . . . . . . . . . . . . . . . . . . . . . . 143
C.4.3 Simple Bayesian Regression . . . . . . . . . . . . . . . . . . . . . . . . . . 144
C.4.4 Multiple Bayesian Regression . . . . . . . . . . . . . . . . . . . . . . . . . 146

CONTENTS xvi

D Obtaining and Installing R 149
D.1 Binary distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
D.2 Installation from source . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
D.3 Package installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151

E Obtaining and installing Java Runtime Environment 152
E.1 Microsoft Windows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
E.2 Linux . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
E.2.1 Installation of Self-Extracting Binary . . . . . . . . . . . . . . . . . . . . . 153
E.2.2 Installation of RPM File . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
E.3 UNIX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155

Bibliography 157

Chapter 1

Introduction

1.1 Project Motivation

Statistics is primarily concerned with the analysis of data, either to assist in arriving at an improved
understanding of some underlying mechanism, or as a means for making informed rational decisions.
Both these aspects generally involve some degree of uncertainty. The statistician’s task is then to
explain such uncertainty, and to reduce it to the extent in which this is possible. Problems of this type
occur throughout all the physical, social and other sciences. One way of looking at statistics stems
from the perception that, ultimately, probability is the only appropriate way to describe and system-
atically deal with uncertainty, as if it were the language for the logic of uncertainty. Thus, inference
statements are precisely framed as probability statements on the possible values of the unknown quan-
tities of interest (parameters or future observations) conditional on the observed, available data. The
scientiﬁc discipline based on this understanding is called Bayesian Statistics. Moreover, increasingly
needed and sophisticated models, often hierarchical models, to describe available data are typically
too much complex for conventional statistics to handle, but can be tackled within Bayesian Statistics.
In principle, Bayesian Statistics is designed to handle all situations where uncertainty is found. Since
some uncertainty is present in most aspects of life, it may be argued that Bayesian Statistics should
be appreciated and used by everyone. It is the logic of contemporary society and science. According
to [Rupp04], applying Bayesian methodology is no more discussed, but the question is when this has
to be done.

Bayesian methods have matured and improved in several ways during last ﬁfteen years. Actually,
they are increasingly becoming attractive to researchers as well as successful applications of Bayesian

1

1. Introduction

data analysis have been appeared in many different ﬁelds, including Actuarial Science, Biometrics,
Finance, Market Research, Marketing, Medicine, Engineering or Social Science. It is not only that
the Bayesian approach produces appropriate answers to many current important problems, but also
there is an evident need for it, given the inapplicability of conventional statistics to many of them.

Thus, the main characteristic offered by Bayesian data analysis is the possibility of incorporating
researcher’s knowledge about the problem to be handled. This supposes obtaining the better and the
more reliable results as far as prior knowledge is more and more precise. But Bayesian Statistics was
restrained until mid 90’s by its computational complexity. Since then, it has had a great expansion
favoured by the development and improvement of different computational methods in this ﬁeld such
as Markov chain Monte Carlo.

This methodology has shown to be extremely useful in its application to regression models, which
are widely accepted. Let us remember that the general purpose of regression analysis is to learn more
about the relationship between several independent or predictor variables and a dependent or criterion
variable. Bayesian methodology let the researcher incorporate her or his knowledge to the analysis,
improving the results since they do not only depend on the sampling data.

On the other hand, increasingly, datasets are so large that they must be summarized in some fash-
ion so that the resulting summary dataset is of a more manageable size, while still retaining as much
knowledge inherent to the entire dataset as possible. One consequence of this situation is that data
may no longer be formatted as single values such as is the case for classical data, but rather may be
represented by lists, intervals, distributions, and the like. These summarized data are examples of
symbolic data. This kind of data also lets us represent better the knowledge and beliefs having in our
mind and that it is limited and hardly to take out with classical Statistics. According to [Bill02], this
responds to the current need of changing from a Statistics of data in the past century to a Statistics of
knowledge in XXI century.

Market and demand requirements are increasing continuously throughout the time. This implies
a need of better and more accurate methods to forecast new situations and to control different quanti-
ties with the minimum error in order to supply better products, to obtain higher incomes or scientist
advantages and better results.

Dealing with this outlook, this project is intended to respond to those requirements providing a

2

1. Introduction

wide and exhaustive documentation about some of the currently more used and advanced techniques,
including Bayesian data analysis, regression models and symbolic data. Different examples related
to the Continuous Spanish Stock Market have been explained throughout this writing, making clear
the advantages of employing the described methods. Likewise a software tool with a user- friendly
graphical interface has been developed to practice and to check all the acquired knowledge.

Therefore, this is a project combining the most recent techniques with major future implications
in theoretical issues, as Bayesian regression applied to interval- valued data is, with a technological
part dealing with the problem of interconnecting two software programs: one used to show the graph-
ical user interface and the other one employed to make computations.

Regarding to a more personal motivation, when accepting this project, several factors were taken
into consideration by the author:

• A great challenge: it is an ambitious project with a high technical complexity related to both its
theoretical basis and its technological basis. This represents a very good letter of introduction
in order to be incorporated to the labour world.

• A good planning time: this project was designed to be ﬁnished before June of 2007, which
means to be able of ﬁnishing the career in June and incorporating to labour world in September.

• Some very interesting issues: on one hand, it deals with the always needed issue of forecasting
and modelling observations and situations in order to get the best possible results. On the other
hand, it focuses on the Stock Market, which meets my personal hobbies.

• A new programming language: the possibility of learning deeply a new and relatively recent
programming language, such as R, was an extra- motivation factor.

• The project director: Carlos Mat´ is considered a demanding and very competent director by
e
the students of the university.

• An investigation scholarship: The possibility of being in the Industrial Organization department
of the University learning from people such as the director mentioned above and another very
recognized professors was a great factor.

3

1. Introduction

1.2 Objectives

This project pretends to get the following aims.

• To provide a wide and rigorous documentation about the following issues: Bayesian data anal-
ysis, regression models and symbolic data. From this point, documentation about Bayesian
regression will be developed, as well as the software tool designed.

• To build a software tool in order to fit Bayesian regression models to interval- valued data,
finding out the most efficient way to design the graphical user interface. This must be as user-
friendly as possible.

• To find out the most efficient way to offer that system to future clients from the tests carried out
with the application.

• To design a survey to measure the quality of the tool and users’ satisfaction.

• The possibility to write an article for a scientific journal.

1.3 Methodology

As the title of the project indicates, the last purpose is the development of an application aimed to-
wards stock markets based on a Bayesian regression system and, therefore, some previous knowledge
is required.

The first stage is the familiarization of the Bayesian data analysis, regression models applied to
Bayesian methodology and symbolic data.

Within this phase, Bayesian data analysis will be firstly studied, trying to synthesize and to get
the most important elements. A special dedication will be given to posterior simulation and computa-
tional algorithms. Then, regression models will be treated, reviewing quickly the classical approach,
to deep later into the different Bayesian regression models, applying great part of what was explained
in Bayesian methodology. Finally, this first stage will be completed with the application to symbolic
data, paying special attention to interval- valued data.

The second stage is referred to the development of the software application, employing an incre-
mental methodology for programming and testing iterative prototypes. This methodology has been

4

1. Introduction

considered the most suitable for this project since it will let us introduce successive models into the
application.

The following ﬁgure shows the structure of the work packages the project is divided into:

Figure 1.1: Project Work Packages

5

Chapter 2

Bayesian Data Analysis

2.1 What is Bayesian Data Analysis?

Statistics can be defined as the discipline that provides us with a methodology to collect, to organize,
to summarize and to analyze a set of data.

Regarding data analysis, it can be divided into two ways of analysis: exploratory data analysis and
confirmatory data analysis. The former is used to represent, describe and analyze a set of data through
simple methods in the first stages of statistical analysis. The latter is applied to make inferences from
data, based on probability models.

In the same way, confirmatory data analysis is divided into two branches depending on the adopted
approach. The first one, known as frequentist, is used to make the inference of the data resulting from
a sampling through classical methods. The second branch, known as Bayesian, goes further in the
analysis and adds to those data the prior knowledge which the researcher has about the treated prob-
lem. Since the frequentist approach is not worthy to explain everything here, a more extended revision
of different classical methods related to the frequentist approach can be found in [Mont02].


 Exploratory


Data Analysis Frequentist
 Confirmatory

 Bayesian

6

2. Bayesian Data Analysis

As far as Bayesian analysis is concerned and according to [Gelm04], the process can be divided
into the following three steps:

• To set up a full probability model, through a joint probability distribution for all observable and
unobservable quantities in a problem.

• To condition on observed data, obtaining the posterior distribution.

• Finally, to evaluate the fit of the model and the implications of the resulting posterior distribu-
tion.

f (θ, y), known as the joint probability distribution (or f (y|θ), if there are several parameters θ),
is obtained by means of

f (θ, y) = f (y|θ)f (θ) (resp. f (θ, y) = f (y|θ)f (θ)) (2.1)

where y is the set of sampled data. So this distribution is the product of two densities that are referred
to as the sampling distribution f (y|θ) (resp. f (y|θ)) and the prior distribution f (θ) (resp. f (θ)).

The sampling distribution, as its name suggests, is the probability model that the researcher as-
signs to the statistics (resp. set of statistics) to be studied after the data have been observed. Here,
an important problem stands up in relation to parametric approach due to the fact that the probability
model that the researcher chooses could not be adequate. The nonparametric approach overcomes
this inconvenient as it will be seen later.

When y is considered fixed, so it is function of θ (resp. θ), the sampling distribution is called the
likelihood function and obeys the likelihood principle, which states that for a given sample of data,
any two probability models f (y|θ) (resp. f (y|θ)) with the same likelihood function yield the same
inference for θ, (resp. θ).

The prior distribution does not depend upon the data. Accordingly, it contains the information
and the knowledge that the researcher has about the situation or problem to be solved. When there
is not any previous significant population from which the engineer can take his knowledge, that is,
the researcher has not any prior information about the problem, a non-informative prior distribution
must be used in the analysis in order to let the data speak for themselves. Hence, it is assumed that
the prior knowledge will have very little importance in the results. But most non- informative priors

7


are ”improper” in that they do not integrate to 1, and this fact can cause problems. In these cases
it is necessary to be sure that the posterior distribution is proper. Another possibility is to use an
informative prior distribution but with an insignificant weight (around zero) associated to it.

Though the prior distribution can take any form, it is common to choose particular classes of
priors that make computation and interpretation easier. These are the conjugate priors. A conjugate
prior distribution is one which, when combined with the likelihood function, gives a distribution that
falls in the same class of distributions as the prior. Furthermore, and according to [Koop03], a natural
conjugate prior has the additional property that it has the same form as the likelihood does. But it is
not always possible to find this kind of distribution and the researcher has to manage a lot of distribu-
tions to be able to give expression to his prior knowledge about the problem. This is another handicap
that the nonparametric approach reduces.

In relation to the prior, what distribution should be chosen? There are three different points of
view corresponding to different styles of Bayesians:

• Classical Bayesians consider that the prior is a necessary evil and priors that interject the least
information possible should be chosen.

• Modern parametric Bayesians considers that the prior is a useful convenience and priors with
desirable properties such as conjugacy should be chosen. They remark that given a distribu-
tional choice, prior hyper-parameters that interject the least information possible should be
chosen.

• Subjective Bayesians give essential importance to the prior, in the sense they consider it as a
summary of old beliefs. So prior distributions which are based on previous knowledge (either
the results of earlier studies or non-scientific opinion) should be chosen.

Returning to Bayesian data analysis process, simply conditioning on the observed data y and
applying the Bayes’ Theorem, the posterior distribution, namely f (θ|y) (resp. f (θ|y)), yields:

f (θ, y) f (θ)f (y|θ) f (θ, y) f (θ)f (y|θ)
f (θ|y) = = (resp. f (θ|y) = = ) (2.2)
f (y) f (y) f (y) f (y)
where

∞ ∞ ∞
f (y) = f (θ)f (y|θ)dθ (resp. f (y) = f (θ)f (y|θ)dθ) (2.3)
0 0 0

8


is known as the prior predictive distribution, since it is not conditional upon a previous observation of
the process and is applied to an observable quantity.

An equivalent form of the posterior distribution displayed above omits the prior predictive distri-
bution, since it does not involve θ (resp. θ) and the interest is based on learning about θ (resp. θ).
So, with fixed y, it can be said that the posterior distribution is proportional to the joint probability
distribution f (θ, y).

Once the posterior distribution is calculated, some kind of summary measure will be required to
estimate the uncertainty about the parameter θ (resp. θ). This is due to the fact that the posterior
distribution is a high- dimensional object and its use is not practical for a problem. That measure
which will summarize the posterior distribution can be the posterior mean, mode, median or variance,
apart from others. Its choice will depend on the requirements of the problem. So the posterior dis-
tribution has a great importance since it lets the researcher manage the uncertainty about θ (resp. θ)
and provide him information about it (resp. them) taking into account both his prior knowledge and
the data collected from sampling on that parameter.

According to [Mat´ 06], it is not difficult to deduce that posterior inference will fit in the non-
e
Bayesian one as long as the estimation which the researcher gives to the parameter θ (resp. θ) is the
same as the one resulting from the sampling.

Once the data y have been observed, a new unknown observable quantity y can be predicted for
˜
the same process through the posterior predictive distribution, namely f (˜|y):
y

f (˜|y) =
y f (˜, θ|y)dθ =
y f (˜|θ, y)dθ =
y f (˜|θ)f (θ|y)dθ
y (2.4)

To sum up, the basic idea is to update the prior distribution f (θ) through Bayes’ theorem by
observing the data y in order to get a posterior distribution f (θ|y). Then a summary measure or a
prediction for new data can be obtained from f (θ|y). Table 2.1 reflects what has been said.

9


Distribution Expression Information Required Result

Likelihood f (y|θ) Data Distribution f (y|θ)

Prior f (θ) Researcher’s Knowledge Parameter Distribution f (θ)

Joint f (y|θ)f (θ) Likelihood Distribution Prior Distribution f (θ, y)

Posterior f (θ)f (y|θ) Prior Joint Distribution f (θ|y)

Predictive f (˜|θ)f (θ|y)dθ
y New Data Distribution Posterior Distribution f (˜|y)
y

Table 2.1: Distributions in Bayesian Data Analysis

2.2 Bayesian Analysis for Normal and other distributions

2.2.1 Univariate Normal distribution

The basic model to be discussed concerns an observable variable , normally distributed with mean µ
and unknown variance σ 2 :

y|µ, σ 2 N (µ, σ 2 ) (2.5)

As it can be seen in Appendix A, the likelihood function for a single observation is

1
f (y|µ, σ 2 ) ∝ (σ 2 )−1/2 exp − (y − µ)2 (2.6)
2σ 2
This means that the likelihood function is proportional to a Normal distribution, omitting those
terms that are constant.

Now let us consider we have n independent observations y1 , y2 , . . . , yn . According to the previ-
ous section, the parameters to be estimated θ are µ and σ 2 :

10


θ = (θ1 , θ2 ) = (µ, σ 2 ) (2.7)

A full probability model must be set up through a joint probability distribution:

f (θ, (y1 , y2 , . . . , yn )) = f (θ, y) = f (y|θ)f (θ) (2.8)

The likelihood function for a sample of n iid observations in this case is

n
1
f (y|θ) = f (y|µ, σ 2 ) ∝ (σ 2 )−1/2 exp − (yi − µ)2 (2.9)
2σ 2
i=1

As it was recommended previously, a conjugate prior will be chosen; in fact, it will be a natural
conjugate prior. According to [Gelm04], this likelihood function suggests a conjugate prior distribu-
tion of the form

f (θ) = f (µ, σ 2 ) = f (µ|σ 2 )f (σ 2 ) (2.10)

where the marginal distribution of σ 2 is the Scaled Inverse-χ2 and the conditional distribution of µ
given σ 2 is Normal (details about these distributions in Appendix A):

µ|σ 2 N (µ0 , σ 2 V0 ) (2.11)
σ2 Inv − χ2 (µ0 , s2 )
0 (2.12)

So the joint prior distribution is:

f (θ) = f (µ, σ 2 ) = f (µ|σ 2 )f (σ 2 ) ∝ N − Inv − χ2 (µ0 , s2 V0 , ν0 , s2 )
0 0 (2.13)

Its four parameters can be identiﬁed as the location and scale of µ and the degrees of freedom and
scale of σ 2 , respectively.

As a natural conjugate prior was employed, the posterior joint distribution will have the same
form that the prior has. So, conditioning on the data, and according to Bayes’ Theorem, we have:

f (θ|y) = f (µ, σ 2 |y) = f (y|µ, σ 2 )f (µ, σ 2 ) ∝ N − Inv − χ2 (µ1 , s2 V1 , ν1 , s2 )
1 1 (2.14)

where it be can shown that

11


µ1 = (V0−1 + n)−1 V0−1 µ0 + n¯
y (2.15)
−1
V1 = V0−1 + n (2.16)
ν1 = ν0 + n (2.17)
V0−1 n
ν1 s2 = ν0 s2 + (n − 1)s2 +
1 0 (¯ − µ0 )2
y (2.18)
V0−1 + n

All these formulae evidence that Bayesian inference combines prior and posterior information.

The first term means that posterior mean µ1 is a weighted mean of prior mean µ0 and empirical
mean divided by the sum of their respective weights, where these are represented by V0−1 and the
simple size n.

The second term represents the importance that posterior mean has and it can be seen as a com-
promise between the sample size and the significance given to the prior mean.

The third term indicates that the degrees of freedom of posterior variance are the sum of the prior
degrees of freedom and the sample size. That is, the prior degrees of freedom can be understood as a
fictitious sample size on which the expert’s prior information is based.

The last term explains the posterior sum of square errors as a combination of prior and empirical
sum of square errors plus a term that measures the conflict between prior and posterior information.

A more detailed explanation of this last step can be found in [Gelm04], [Koop03] or [Cong06].

It is obvious that the marginal posterior distributions are:

µ|σ 2 , y N (µ1 , σ 2 V0 ) (2.19)
σ 2 |y Inv − χ2 (ν1 , s2 )
1 (2.20)

If we integrate out σ 2 , the marginal for µ will be a t-distribution (see Appendix A for details):

µ|y tν1 (µ1 , s2 V0 )
1 (2.21)

12


Let us see an application to the Spanish Stock market. Let us suppose that the monthly close
values associated with Ibex 35 are normally distributed. If we take the values at which the Span-
ish index closed during the first two weeks in January in 2006, it can be shown that the mean was
10893.29 and the standard deviation was 61.66. So the non- Bayesian approach would inference a
Normal distribution with the previous mean and standard deviation. Let us guess that we had asked
any analyst about the Ibex 35 evolution in January, he would have affirmed strongly that it would
decrease slightly, the mean close value at the end of the month would be around 10870 and, hence,
the standard deviation would be higher, around 100. Then, according to the previous formulas, the
posterior parameters would be

µ1 = (100 + 10)−1 (100 × 10870 + 10 × 10893.29) = 10872.12
V1 = (100 + 10)−1 = 0.0091
ν1 = 100 + 10 = 110
(100 × 1002 + 9 × 61.66 + 1000 (10893.29 − 10870)2 )
110
s1 = = 95.60
110

This means that there is a difference of almost 20 points between the Bayesian estimation and the
non-Bayesian for the mean close value of January. When the month of January would have passed, we
could compare both results and we could note that the Bayesian estimation was closer to the finally
real mean close value and standard deviation: 10871.2 and 112.44. In figure 2.1, it can be seen how
the blue line representing the Bayesian estimation is closer to the cyan line representing the final real
mean close value than the red line representing the frequentist estimation:

2.2.2 Multivariate Normal distribution

Now, let us consider that we have an observable vector y of d components with the multivariate
Normal distribution:

y N (µ, Σ) (2.22)

where the first parameter is the mean column vector and the second one is the variance-covariance
matrix.

Extending what was said above to the multivariate case, we have:

13


−3
x 10
7
Frequentist Approach
Bayesian Approach
6 Real Mean Colse Value in January

5

4

3

2

1

0
10000 10200 10400 10600 10800 11000 11200 11400 11600 11800 12000

Figure 2.1: Univariate Normal Example

1
f (y|µ, Σ) ∝ Σ−1/2 exp − (y − µ) Σ−1 (y − µ) (2.23)
2
And for n iid observations:

n
−n/2 1
f (y1 , y2 , . . . , yn |µ, Σ) ∝ Σ exp − (yi − µ) Σ−1 (yi − µ) (2.24)
2
i=1

A multivariate generalization of the Scaled-Inverse χ2 is the Inverse Wishart distribution (see
details in Appendix A), so the prior joint distribution is

Λ0
f (θ|y) = f (µ, Σ|y) ∝ N − Inv − W ishart µ0 , , ν0 , Λ0 (2.25)
k0
due to the fact that

Σ
µ|Σ N µ0 , (2.26)
k0
Σ Inv − W ishart ν0 , Λ−1
0 (2.27)

14


Univariate Normal Multivariate Normal

Expression y N (µ, σ 2 ) y N (µ, Σ)

Parameters to estimate µ, σ 2 µ, Σ

2
µ|σ 2 N µ0 , σ0
k µ|Σ Σ
N µ0 , k0

Prior Distributions σ2 Inv − χ2 ν0 , σ0
2 Σ Inv − W ishart ν0 , Λ−1
0

2
σ0
µ, σ 2 N − Inv − χ2 µ0 , 2
k0 , ν0 , σ0 µ, Σ N − Inv − W ishart µ0 , k0 , ν0 , Λ−1
Σ
0

2
µ|σ 2 N µ1 , σ1
k µ|Σ Σ
N µ1 , k1

Posterior Distributions σ2 Inv − χ2 ν1 , σ1
2 Σ Inv − W ishart ν1 , Λ−1
1

2
σ1
µ, σ 2 N − Inv − χ2 µ1 , 2
k1 , ν1 , σ1 µ, Σ N − Inv − W ishart µ1 , Λ1 , ν1 , Λ1
k
1

Table 2.2: Comparison between Univariate and Multivariate Normal

The posterior results are the same that were told for the univariate case but applying these distri-
butions. For those interested readers, more information in [Gelm04] or [Cong06].

A summary is shown in Table 2.2 in order to get the most important ideas.

2.2.3 Other distributions

As it has just been made with the Normal distribution, a Bayesian analysis for other distributions could
be done. For instance, the exponential distribution is commonly used in reliability analysis. Because
of this project will deal with the Normal distribution for the likelihood, it will not be explained in detail
the analysis with other distributions. Table 2.3 shows the conjugate prior and posterior distributions

15


for other likelihood distributions. More details can be found in [Cong06], [Gelm04], or [Rossi06].

Likelihood Parameter Conjugate Prior Hyperparameters Posterior Hyperparameters

Bin(y|n, θ) θ Beta α, β α + y, β + n − y

P (y|θ) θ Gamma α, β α + n¯, β + n
y

Exp (y|θ) θ Gamma α, β α + 1, β + y

Geo(y|θ) θ Beta α, β α + 1, β + y

Table 2.3: Conjugate distributions for other likelihood distributions

2.3 Hierarchical Models

Hierarchical data arise when they are structured or related among them. When this occurs, standard
techniques either assume that these groups belong to entirely different populations or ignore the ag-
gregate information entirely.

Hierarchical models provide a way of pooling the information for the disparate groups without
assuming that they belong to precisely the same population.

Suppose we have collected data about some random variable Y from m different populations with
n observations for each population.

Let yij represent observation j from population i. Now suppose yij f (θi ), where θi is a vector
of parameters for population i. Furthermore, θi f (Θ), where Θ may also be a vector. Until this
point, we have only rewritten what it was said previously.

16


Now let us extend the model, and assume that the parameters Θ11 , Θ12 that govern the distribution
of the θ’s are themselves random variables and assign a prior distribution to these variables as well:

Θ f (ψ) (2.28)

where Θ is called the hyperprior. The vector parameter ψ for the hyperprior may be ”known” and
represents our prior beliefs about Θ or, in theory; we can also assign a probability distribution for
these quantities as well, and proceed to another layer of hierarchy.

According to [Gelm04], the idea of exchangeability will be used to create a joint probability
distribution model for all the parameters θ. A formal deﬁnition to explain what exchangeability
consists of is:
”The parameters θ1 , θ2 , . . . , θn are exchangeable in their joint distribution if f (θ1 , θ2 , . . . , θn ) is
invariant to permutations in the index 1, 2, . . . , n”.

This means that if no information other than the data is available to distinguish any of the θi from
any of the others, and no ordering of the parameters can be made, one must assume symmetry among
the parameters in the prior distribution. So we can treat the parameters for each sub-population as
exchangeable units. This can be formulated by:

f θ1 , θ2 , . . . , θn |Θ = Πl f θi |Θ
i=1 (2.29)

The prior joint distribution is now:

f θ1 , θ2 , . . . , θn , Θ = f θ1 , θ2 , . . . , θn |Θ f (Θ) (2.30)

And conditioning on the data, it yields:

f θ1 , θ2 , . . . , θn |y = f θ1 , θ2 , . . . , θn , Θ f y|θ1 , θ2 , . . . , θn , Θ (2.31)

Perhaps the most important point in practice is that non-hierarchical models are usually inappro-
priate for hierarchical data, while non-hierarchical data can be modelled following the hierarchical
structure and assigning concrete values to the hyperprior parameters.

This kind of models will be used in Bayesian regression models with autocorrelated errors, as it
will be seen in the following chapters.

17


For more details about Bayesian hierarchical models, the reader is referenced to [Cong06], [Gelm04]
and [Rossi06].

2.4 Nonparametric Bayesian

To overcome the limitations that have been mentioned throughout this chapter, it is the nonparametric
approach which achieves to get through and to reduce the restrictions of the parametric approach.
This kind of analysis can be performed through the so-called Dirichlet Process, which allows us to
express in a simple way the prior distributions or the distribution family of F , where F is the distri-
bution function of the studied variable. This process has a parameter, called α, which is transformed
into a distribution probability.

According to [Mat´ 06], a Dirichlet Process for F (t) requires to know:
e

• A previous proposal for F (t), F0 (t), that corresponds to the distribution function that remarks
the prior knowledge which the engineer has and it is denoted by

α(t)
F0 (t) = (2.32)
M
• A measure of the conﬁdence about the previous proposal, denoted by M , and whose values can
vary between 0 and ∞, depending on whether there is a total conﬁdence in the data or in the
previous proposal respectively.

ˆ
It can be demonstrated that the posterior distribution for F (t), Fn (t), with a sampling over n data,
is given by

ˆ
Fn (t) = pn Fn (t) + (1 − pn )Fn (t) (2.33)
M
where Fn (t) is the empirical distribution function and pn = M +n .

A more detailed information about the nonparametric approach and how Dirichlet processes are
used can be found in [Mull04] or [Gosh03].

18


With this approach not only the limitation of the parametric approach related to the probability
model of the variable to study is avoided, since no hypothesis is required, but also it allows us to
confer a quantiﬁed importance to the prior knowledge which the engineer gives, depending on the
conﬁdence on the certainty about this knowledge.

19

Chapter 3

Posterior Simulation

3.1 Introduction

A practical problem with Bayesian inference is the difficulty of summarizing realistically complex
posterior distributions. In most practical problems, posterior densities will not take the form of any
well-known and understood density, so summary statistics, such as the posterior mean and variance of
parameters of interest, will not be analytically available. It is at this point where the importance of the
Bayesian computation arises and any computational tools are required to gain meaningful inference
from the posterior distribution. Its importance is such that the computing revolution of the last 20
years has led to a blossoming of Bayesian methods in many fields such Econometrics, Ecology or
Health.

Regarding to this, the most transcendent simulation methods are the Markov chain Monte Carlo
methods (MCMC). MCMC methods date from the original work of [Metr53], who were interested
in methods for the efficient simulation of the energy levels of atoms in a crystalline structure. The
original idea was subsequently generalized by [Hast70], but its true potential was not fully realized
within the statistical literature until [Gelf90] demonstrated its application to the estimation of inte-
grals commonly occurring in the context of Bayesian statistical inference.

As [Berg05] points up, the underlying principle is simple: if one wishes to sample randomly from
a specific probability distribution then design a Markov chain whose long-time equilibrium is that
distribution, write a computer program to simulate the Markov chain, run it for a time long enough
to be confident that approximate equilibrium has been attained, then record the state of the Markov

20

3. Posterior Simulation

chain as an approximate draw from equilibrium.

The technique has been developed strongly in different fields and with rather different emphases
in the computer science community concerned with the study of random algorithms (where the em-
phasis is on whether the resulting algorithm scales well with increasing size of the problem), in the
spatial statistics community (where one is interested in understanding what kinds of patterns arise
from complex stochastic models), and also in the applied statistics community (where it is applied
largely in Bayesian contexts, enabling researchers to formulate statistical models which would other-
wise be resistant to effective statistical analyses).

The development of the theoretical work also benefits the development of statistical applications.
The MCMC simulation techniques have been applied to develop practical statistical inferences for
almost all problems in (bio) statistics, for example, the problems in longitudinal data analysis, im-
age analysis, genetics, contagious disease epidemics, random spatial pattern, and financial statistical
models such as GARCH and stochastic volatility.

The simplicity of the underlying principle of MCMC is a major reason for its success. However
a substantial complication arises as the underlying target problem becomes more complex; namely,
how long should one run the Markov chain so as to ensure that it is close to equilibrium? According to
[Gelm04], with n = 100 independent samples should be enough for reasonable posterior summaries,
but in some cases more samples are needed to assure more accuracy.

3.2 Markov chains

The essential theory required in developing Monte Carlo methods based on Markov chains is pre-
sented here. The most fundamental result is that certain Markov chains converge to a unique invariant
distribution, and can be used to estimate expectations with respect to this distribution. But in order to
reach this conclusion, some concepts need to be defined firstly.

A Markov chain is a series of random variables, X0 , . . . , Xn , also called a statistic process, in
which only the value of Xn−1 influences the distribution of Xn . Formally:

P (Xn = xn |X0 = x0 , . . . , Xn−1 = xn−1 ) = P (Xn = xn |Xn−1 = xn−1 ) (3.1)

21


where the Xn−1 have a common range called the state space of the Markov chain.

The common language to refer to different situations in which a Markov chain can be found is
the following. If Xn = i, it is said that the chain is in the state i in the step n or that it has the value
i in the step n. This language confers the chain certain dynamic view, which is corroborated by the
main tool to study it: the transition probabilities P (Xn+1 = j|Xn = i), which are represented by the
transition matrix P = (Pij ) with Pij = P (Xn+1 = j|Xn = i) . This is used to show the probability
of changing of state i to state j.

Due to the fact that in major interesting applications Markov chains are homogeneous, the transi-
tion matrix can be defined from the initial probability, P0 = P (X1 = j|X0 = i). Regarding to this, a
Markov chain Xt is homogeneous if P (Xn+1 = j|Xn = i) = P (X1 = j|X0 = i) for all n, i, j.

Furthermore, using Chapman- Kolmogorov equation, it can be shown that, given the transition
matrixes P and, for step n, Pn of a homogenous Markov chain, then Pn = P n .

On the other hand we will see the concepts of invariant or stationary distribution, ergodicity and
irreducibility, which are indispensable to reach the main result. It will be assumed that Xt is a ho-
mogenous Markov chain.

Then, vector P is an invariant distribution of the chain Xt if satisfies:

a) πj ≥ 0 such as j πj = 1.

b) π = πP .

That is, a stationary distribution over the states of a Markov chain is one that persists forever once
it is reached.

The concept of ergodic state requires making other definitions clear such as recurrence and aperi-
odicity:

• The state i is recurrent if P (Xn = i|X0 = i) = 1 for any n ≥ 1. Otherwise, it is transient.
Moreover, i will be positive recurrent if the expected (average) return time is finite, and null
recurrent if it is not.

22


• The period of a state i, denoted by d, is deﬁned as di = mcd(n : [Pn ]ii > 0). The state i is
aperiodic if di = 1, or periodic if it is greater.

Then a state is ergodic if it is positive recurrent and aperiodic. The last concept to deﬁne is the
irreducibility. A set of states C ∈ S, where S is the set of all possible states, is irreducible if for all
i, j ∈ C:

• i and j have the same period.

• i is transient if and only if j is transient.

• i is recurrent if and only if j is null recurrent.

Now, having all these concepts in mind, we can know if a Markov chain has a stationary distribu-
tion with next lemma:

Lema 3.2.1. Let Xt be a homogenous and irreducible Markov chain. The chain will have only one
stationary distribution if, and only if, all the states are positive recurrent. In that case, it will have
inputs given by πi = µi −1 , where µi denotes the expected return time of the state i.

The relation with the long time behaviour is given by this other lemma:

Lema 3.2.2. Let Xt be a homogenous, irreducible and aperiodic Markov chain. Then

1
[Pn ]ij −→ for all i, j ∈ S as n ∞ (3.2)
µi

3.3 Monte Carlo Integration

Monte Carlo integration estimates the integral E[g(θ)] by obtaining samples θt , t = 1, . . . , n from
the posterior distribution p(θ|y) and averaging

n
1
E[g(θ)] = g(θt ) (3.3)
n
t=1

where the function g(θ) represents the function of interest to estimate. Note that if samples
θt , t = 1, . . . , n has p(θ|y) as its stationary distribution, the θt form a Markov chain.

23


3.4 Gibbs sampler

In many models, it is not easy to draw directly from the posterior distribution p(θ|y). However, if the
parameter θ is partitioned into several blocks as θ = (θ1 , . . . , θp ) where θj for j = 1, . . . , p, then the
full conditional posterior distributions, p(θ1 |y, θ2 , . . . , θp ), . . . , p(θp |y, θ1 , . . . , θp−1 ) , could be sim-
ple to draw from to obtain a sequence θ1 , . . . , θp . For instance, in the Normal linear regression model
it is convenient to set j=2, with θ1 = β and θ2 = σ 2 , and the full conditional distributions would be
p(θ1 = β|y, θ2 = σ 2 ) and p(θ2 = σ 2 |y, θ1 = β), which are very useful in the Normal independent
model which will be explained later.

The Gibbs sampler is deﬁned by iterative sampling from each of those p conditional distributions:

1. Set a starting value, θ0 = (θ2 , . . . , θp ).
0 0

2. Take random draws
1 0 0
- θ1 from p(θ1 |y, θ2 , . . . , θp )
1 1 0
- θ2 from p(θ2 |y, θ1 , . . . , θp )
.
-.
.
1 1 1
- θp from p(θp |y, θ1 , . . . , θp−1 )

3. Repeat step 2 as necessary.

4. Reject those θ affected by θ0 = (θ2 , . . . , θp ), that is the ﬁrst p − 1 draws, and average the rest
0 0

of draws applying the Monte Carlo integration.

For instance, in the Normal regression model we would have:

1. Set a starting value, θ0 = (θ2 = (σ 2 )0 ).
0
2

2. Take random draws

- θ1 = β1 from p(θ1 = β|y, θ2 = (σ 2 )0 )
1 1 0
2

- θ2 = (σ 2 )1 from p(θ2 = σ 2 |y, θ1 = β)
1
2
1

3. Repeat step 2 as necessary.

1 1
4. Eliminate those θ1 = β1 and average the rest of draws applying the Monte Carlo integration.

24


Those values dropped which are affected by the starting point are called the burn-in. Generally,
any set of values which are discarded in a MCMC simulation is called the burn-in. The size of the
burn-in period is the subject of current research in MCMC methods.

As the state of each draw depends on the state of the previous one, the sequence is a Markov
chain. More detail information can found in [Chen00], [Mart01] or [Rossi06].

3.5 Metropolis-Hastings sampler and its special cases

3.5.1 Metropolis-Hastings sampler

The Metropolis-Hastings method is adequate to simulate models that are not conditionally conjugate.
Furthermore, it can be combined with the Gibbs sampler to simulate posterior distributions where
some of the conditional posterior distributions are easy to sample from and other ones are not. As
the algorithms above explained, this is based on formulating a Markov chain, but using a proposal
distribution, q(.|θt ), which depends on the current state θt , to generate a new proposed sample θ∗ .
This proposal is accepted as the next state with probability given by

p(θ∗ |y)q(θt |θ∗ )
α(θt , θ∗ ) = min 1, (3.4)
p(θt |y)q(θ∗ |θt )
If the point θ∗ is not accepted, then the chain does not move and θt+1 = θt . According to
[Mart01], the steps to follow are:

1. Initialize the chain to θ0 and set t=0.

2. Generate a candidate point θ∗ from q(.|θt ).

3. Generate U from a uniform (0,1) distribution.

4. If U ≤ α(θt , θ∗ ) then set θt+1 = θ∗ , else set θt+1 = θt .

5. Set t=t+1 and repeat steps 2 trough 5.

6. Take the average of the draws g(θ1 ), . . . , g(θn )

Note that it should be, not only recommendable, but also essential that the proposal distribution
q(·|θt ) were easy to sample from.

25


There are some special cases of this method. The most important are brieﬂy explained below. As
well as those, it can shown according to [Gelm04] that the Gibbs sampler is another special case of
the Metropolis-Hastings algorithm where the proposal point is always accepted.

3.5.2 Metropolis sampler

This method is a particular case of the Metropolis-Hastings sampler where the proposal distribution
has to be symmetric. That is,

q(θ∗ |θt ) = q(θt |θ∗ ) (3.5)

for all θ∗ and θt . Then, the probability of accepting the new point is

p(θ = θ∗ |y)
α(θt , θ∗ ) = min 1, (3.6)
p(θ = θt |y)
The same procedure seen in the Metropolis-Hastings sampler has to be followed.

3.5.3 Random-walk sampler

This special case refers to a proposal distribution of the form

q(θ∗ |θt) = q(|θt − θ∗ |) (3.7)

And the candidate point is θ∗ = θt + z, where z is called the increment random variable from q.
Then, the probability of accepting the new point is

p(θ = θ∗ |y)
α(θt , θ∗ ) = min 1, (3.8)
p(θ = θt |y)

3.5.4 Independence sampler

The last variation has a proposal distribution such that

q(θ∗ |θt ) = q(θ∗ ) (3.9)

So it does not depend on θt . Then, the probability of accepting the new point is

26


p(θ∗ |y)p(θt ) w(θ∗ )
α(θt , θ∗ ) = min 1, = min 1, (3.10)
p(θt |y)p(θ∗ ) w(θt )
where

p(θ|y)
w(θ) = (3.11)
q(θ)
It is important to remark that to make this method works well, the proposal distribution q should
be very similar to the posterior distribution p(θ|y).


3.6 Importance sampling

Importance sampling is a variance reduction technique that can be used in the Monte Carlo method.
The idea behind this method is that certain values of the input random variables in a simulation have
more impact on the parameter being estimated than others. So instead of taking a simple average,
importance sampling takes a weighted average.

Let q(θ) be a density from which is easy to obtain random draws θ(s) for s = 1, . . . , S. Then q(θ)
is called the importance function, and the importance sampling can be deﬁned:

PS (s) )g(θ (s) )
s=1 w(θ p(θ=θ(s) |y)
The function gs =
ˆ PS (s) )
, where w(θ(s) ) = q(θ=θ (s) )
, converges to E[g(θ)|y] as
s=1 w(θ
S −→ inf.

p∗ (θ|y)
In fact, w(θ(s) ) can be formulated by w(θ(s) ) = q ∗ (θ|y) , where the new densities are proportional
to the old ones.

For more information and details about Markov chain Monte Carlo methods and their application,
the reader is referred to [Chen00], [Gilk95], [Berg05] and [Kend05].

27

Chapter 4

Sensitivity Analysis

4.1 Introduction

There will be many times where the researcher, having selected a model, wants to consider the pos-
sibility of choosing another model or simply to compare with it. Then it is necessary any tool that
help him to compare both models, and to select one of them. This will be useful to make the variables
selection too in the regression models. In this section, the Bayesian Model Comparison is briefly
discussed, remarking those methods which will be more useful.

In the Bayesian field, common methods for model comparison are based on the following: sepa-
rate estimation, comparative estimation and simultaneous estimation.

Comparative estimation is based on distance measures such as entropy distance, and the underly-
ing idea is that the more parsimonious model may be preferred between two models whose distance
between their posterior or posterior predictive distributions is sufficiently small.

Simultaneous model estimation let us compare many models at the same time, and the main meth-
ods are reversible jump MCMC (RJMCMC) and birth and death MCMC (BDMCMC).

Separate estimation compares two models not necessarily nested, and the most used terms are the
posterior predictive distributions and the posterior probability of the model. Since methods which can
be considered to be into this type are the most accepted, we will explain some of them, remarking the
most important ones.

28

4. Sensitivity Analysis

4.2 Bayes Factor

This is probably the dominant method of Bayesian model testing. It is the analogue of likelihood ratio
tests within the frequentist framework, and the basic intuition is that prior and posterior information
are combined in a ratio that provides evidence in favour of one model speciﬁcation versus another.

Let us suppose we have two models to compare, M1 and M2 . Let p(M1 ) and p(M2 ) be the
prior probabilities for the model M1 , M2 , respectively, and p(M1 |y) and p(M2 |y) be the posterior
probabilities for the model M1 , M2 , respectively. Then the Bayes Factor is:

p(y|M1 ) p(M1 |y)p(M1 )
B(y) = = (4.1)
p(y|M2 ) p(M2 |y)p(M2 )
This means that the Bayes Factor chooses the very model for which the marginal likelihood of the
data, namely p(y|Mi ), is maximum. Therefore, the value of a factor gives evidence of the preference
between two models.
According to [Jeff61], the following interpretation is suggested:

Bayes Factor Interpretation

1
B(y) < 10 Strong evidence for M2

1 1
10 < B(y) < 3 Moderate evidence for M2

1
3 < B(y) < 1 Weak evidence for M2



B(y) > 10 Strong evidence for M1

Table 4.1: Bayes Factor Interpretation

29


The marginal likelihood usually involves an integral which can be analytically evaluated only for
some special cases. So, while Bayes Factors are rather intuitive, they are often quite difﬁcult or even
impossible to calculate from a practical point of view. Because of this, there are other alternatives to
this method.

4.3 Alternative Stats to Bayes Factor
ˆ
Let θ be the posterior mean of the posterior distribution and let us assume that the Bayes estimate for
the parameters θ is approximately equal to the maximum likelihood estimate. Then, the following
stats, from which some of them are used in frequentist statistics, could be useful diagnostics:

• The likelihood Ratio, which will always favour the unrestricted model, and where the ratio is:

ˆ ˆ
Ratio = −2[log(p(θRestricted |y)) − log(p(θF ull |y))] (4.2)

The ratio is distributed as a χ2 , where p is the number of parameters, including the intercept.
p

• Akaike Information Criterion (AIC), where a ratio between AIC1 (AIC for M1 ) and AIC2
(AIC for M2 ) less than 1 indicates that M1 is better. This method let the models not have to be
nested and it favours more complicated models. The stat is:

ˆ
AIC = −2log(p(θ|y)) + 2p (4.3)

where p is the number of parameters, including the intercept. It is used to be better than the
previous one.

• The Bayesian Information Criterion (BIC), which is also known as Schwarz Criterion (SC),
Schwarz Information Criterion (SIC) or Schwarz Bayesian Criterion (SBC). As it occurred
with the AIC, this method can be used for non- nested models. The BIC is:

ˆ
BIC = −2log(p(θ|y)) + plog(n) (4.4)

where p is the number of parameters, including the intercept, and n is the sample size. Given any
two estimated models, the model with the lower value of BIC is the one to be preferred. Since
this method promotes model parsimony by penalizing models with increased model complexity
(larger p) and sample size, say n, it may be preferred to the AIC.

30


• The Deviance Information Criterion (DIC), which is a new statistic introduced by the devel-
opers of the WinBugs software, who explained it in a detailed way in [Spie03]. The main and
most important difference with the previous methods is that this is not an approximation of the
Bayes Factor. It is a hierarchical modelling generalization of the AIC and the BIC, and it is
particularly useful when the posterior distributions have been obtained by simulation. The DIC
is:

L
−4 ˆ
DIC = log(p(y|θl )) + 2log(p(y|θ)) (4.5)
L
l=1

where θL is the draw which has been obtained by simulating the posterior distribution in the
L iteration. This method also penalizes against higher dimensional models, and it may be
preferred to previous ones, mainly in linear models context.

4.4 Highest Posterior Density Intervals

All the techniques mentioned above typically require the elicitation of informative priors. However,
there could be Bayesians who were interested to do model comparison with a non-informative prior.
In such case, there are other techniques which can be used. Since the most common one in regression
analysis is the Highest Posterior Density Interval (HPDI), we will only explain this method and let
will the interested reader reference to the below citations.

Before defining the idea of HPDI is required to make the concept of credible set clear. Let us
assume that ω is the region over which the coefficients β are defined. Then, C ⊆ ω is a 100(1 − α)%
credible set with respect to β if:

p(β ∈ C|y) = 1 − α (4.6)

Since there are commonly numerous credible intervals, it is used to choose the one with smallest
area, namely the Highest Posterior Density Interval.

Formally, a 100(1 − α)% highest posterior density interval for α is a 100(1 − α)% credible inter-
val for θ with the property that it has a smaller area than any other 100(1−α)% credible interval for β.

31


This is the Bayesian analogue of conﬁdence intervals within frequentist framework, but now the
meaning is more in line with commonsense.

More information about all these methods and other variants of the Bayes Factor can be found in
a more detailed way in [Aitk97], [Berg98], [Chen00], [Cong06] or [Koop03].

4.5 Model Comparison Summary

A model comparison summary can be found in Tables 4.2 and 4.3 where the mark symbols mean:

• * Good

• ** Better

• *** Still better

• **** Probably the best

32


Method Formulae Interpretation Mark

p(y|M1 ) 1
Bayes Factor B(y) = p(y|M2 ) B(y) < 10 Strong evidence for M2 *
1 1
10 < B(y) <3 Moderate evidence for M2
1
B(y) > 10 Strong evidence for M1

33
Likelihood Ratio ˆ ˆ
Ratio=−2 log p(βRestricted |y) − log p(βF ull |y) Ratio > χ2 Reject the restricted model *
p

Ratio < χ2
p Reject the restricted model

ˆ AIC1
AIC AIC=−2 log p(β|y) + 2p AIC2 < 1 M1 is better than M2 **
AIC1
AIC2 > 1 M2 is better than M1

Table 4.2: Sensitivity Summary I


Method Formulae Interpretation Mark

ˆ BIC1
BIC BIC=−2log p(β|y) + plog(n) BIC2 < 1 M1 is better than M2 ***
BIC1
BIC2 > 1 M2 is better than M1

1 L ˆ DIC1
DIC DIC=−4 L i=1 log p(y|β L ) + 2log p(y|β) DIC2 < 1 M1 is better than M2 ****

34
DIC1
DIC2 > 1 M2 is better than M1

There is a probability of
HPDI HPDI=p(β ∈ C|y) = 1 − α with the smallest area 100(1 − α)% of β being ****
in the region C

Table 4.3: Sensitivity Summary II

Chapter 5

Regression Analysis

5.1 Introduction

Regression analysis is a statistical tool for the investigation of relationships between variables, such
as models the relationship between one or more random variables y, called the response variables,
and an independent variable or variables x, called the predictors. That is, it allows us to examine
the conditional distribution of y given x, denoted by p(y|β, x), when the n observations (xi , yi ), are
exchangeable.

Applications of regression analysis exist in almost every ﬁeld. In economics, the dependent vari-
able might be Ibex 35 index and the independent variables could be Dow Jones and FTSE 100 indexes.
In political science, the dependent variable might be a state’s level of welfare spending and the inde-
pendent variables measures of public opinion and institutional variables that would cause the state to
have higher or lower levels of welfare spending. In sociology, the dependent variable might be a mea-
sure of the social status of various occupations and the independent variables characteristics of the
occupations (pay, qualiﬁcations, etc.). In psychology, the dependent variable might be individual’s
racial tolerance as measured on a standard scale and with indicators of social background as indepen-
dent variables. In education, the dependent variable might be a student’s score on an achievement test
and the independent variables characteristics of the student’s family, teachers, or school.

Before explaining the Bayesian regression, it will be reviewed the classical regression model,
focusing on those parts useful for the former.

35

5. Regression Analysis

5.2 Classical Regression Model

The simplest version of this model is the Normal linear model, where the variable y given X is a
Normal distribution whose mean is a linear function of X:

E(yi |β, X) = β0 + β1 xi1 + · · · + βp xip for all i = 1, . . . , n. (5.1)

Even though the mean of y is a linear function of X, the real and the observed data do not ﬁt in,
and this is due to a random error, namely , so the appropriate form to reach a probabilistic linear
model is through

yi = β0 + β1 xi1 + · · · + βp xip + i for all i = 1, . . . , n. (5.2)

where i is the term of the random error, which has a Normal distribution with mean 0 and variance
σ 2 . Due to the fact that the random variable yi is the result of the addition of a constant (the mean)
and a random variable which has Normal distribution, yi follows a Normal distribution:

yi N (β0 + β1 xi1 + · · · + βp xip , σ 2 ) for all i = 1, . . . , n (5.3)

When the variance of y given X, β is assumed to be constant over all observations, the model
will be called ordinary linear regression model.

In a matrix notation, the Normal linear model can be denoted by

Y = Xβ + (5.4)

and

Y N (Xβ, σ 2 I) (5.5)

where:
       
y1 1 x11 . . . x1p β0 0
       
 y2  1 x21 . . . x1p  β1   1
       
Y = .  X = . . .. .  β= .  =.
. . . . .  . .
. . . .  . .
yn 1 xn1 . . . xnp βp p

and I is the identity matrix.
ˆ
It can be shown that the ordinary least squares estimate of β, namely β, is

36


 
ˆ
β0
 
β0 
ˆ
ˆ  
β = (X X −1 )X Y =  .  (5.6)
.
.
ˆ
β0
where

   
n n n
n i=1 xi1 ... i=1 xik i=1 yi
   
 n n 2 n   n 
 i=1 xi1 i=1 xi1 ... i=1 xik xik   i=1 xi1 yi 
XX= . . .. .  XY = . 
 .
. .
. . .
.   .
. 
   
n n n 2 n
i=1 xik i=1 xik xi1 ... i=1 xik i=1 xik yi

As well, it can be shown that

ˆ
E(β) = β (5.7)
ˆ
Furthermore, the variances of β are proportional to the elements of the matrix (X X)−1 , denoted
by C, which multiplied by the constant σ 2 represents the covariance matrix. The elements of the
diagonal of that matrix are the variances of

ˆ
V ar(βj ) = σ 2 Cjj for all j = 0, 1, . . . , p. (5.8)

where C = (X X)−1 .

Likewise, the classical estimation of σ 2 is given in terms of the sum of squares error, SSE =
n
i=1 (yi − yi )2 , and is given by the mean squares error:
ˆ

SSE ˆ ˆ
(YX β) (YX β) ˆ
Y Y −β X Y
σ 2 = M SE = = = (5.9)
n−p n−p n−p
where n is the number of observations and p corresponds to the number of parameters β.

Regarding individual regression coefﬁcients β, there will be sometimes where to make hypothesis
tests about them can be interesting in order to evaluate the potential value of each regressor variable
of the model. The statistic to use in these cases is

ˆ
βj
T0 = (5.10)
σ 2 Cjj
ˆ

37


ˆ
where Cjj is the element of the diagonal of the matrix XX corresponding to βj . So the null hypoth-
esis will be rejected if |T0 | > tn−p, α .
2

Finally, once the model has been estimated and validated, one of its more important applications
consists of new predictions about the response variable Y when a new explanatory variable X ∗ is
observed. In this case, a point estimate would be

ˆ ˆ
Y ∗ = X∗ β (5.11)

and a conﬁdence interval for this future observation will be

ˆ
Y ∗ ± tn−p, α σ 2 (1 + X ∗ (X X)−1 X ∗ )
ˆ (5.12)
2

where

X ∗ = [x∗
1 x∗
2 ... x∗ ]
k (5.13)

These results can be found in a more detailed way in [Mont02], [Zamo01] or [Mat´ 95].
e

To understand better all that has been said above, let us see a practical application in the Stock
Markets. Let us suppose we are interesting in investigating the relationship between Ibex 35 index
and Dow Jones, FTSE 100 and Dax indexes the previous day. For such purpose, we have the points
(taken as the mean of the daily maximum and minimum points) from January to October in 2006; this
is during the ﬁrst ten months in 2006.

The model to adjust is:

IBEX35t = β1 DowJonest−1 + β2 F T SE100t−1 + β3 DAXt−1 + t

where
t N (0, σ2 )
ˆ
The estimates β are calculated according to what said before resulting:
   
β1 1.0147
   
β2  = −2.0085
   
β3 2.1082

38


The estimate for the variance σ 2 is:

σ 2 = 332.182

So the model calculated is:

IBEX35t = 1.0147 × DowJonest−1 − 2.0085 × F T SE100t−1 +
+2.1082 × DAXt−1 + t

where

t N (0, 332.182 )

This indicates that when Dow Jones or DAX goes up, Ibex 35 will increase the next day too.
However, when FTSE100 arises, Ibex 35 will decrease the next day.

If we use this model to predict the value which Ibex 35 will have on November, 1st , when DOW
Jones, FTSE 100 and DAX values are known the previous day, we have:

IBEX35t = 1.0147 × 12067 − 2.0085 × 6155 + 2.1082 × 6287 = 13137

Finally, a comparison between the multiple and the simple Normal linear regression models is
shown in Table 5.1 indicating the different parameters to use in each case. The goal of this compar-
ison is to make clear that the simple Normal regression is a particular case of the multiple Normal
regression where there is only a regressor variable or predictor.

5.3 The Bayesian Approach

The main difference between classical and Bayesian approach of the regression analysis is that the
latter treats the parameters like random variables which have a distribution. The aim of Bayesian
approach is to make inferences through the posterior distribution based on a prior distribution for
the parameters β and σ 2 of the Normal linear model and to provide a predictive distribution for the
model’s predictions.

As it was said in the preceding section, and according to [Rossi06], the Normal linear regression
model is given by:

39


Multiple Normal Linear Regression Simple Normal Linear Regression

Function yi = β0 + β1 xi1 + · · · + βp xip + i y = β0 + β1 x +

Mean µi = β0 + β1 xi1 + · · · + βp xip µ = β0 + β1 x

Variance σ2 σ2

Model Y N (µ, σ 2 I) Y N (µ.σ 2 )

ˆ
β ˆ
β = (X X)−1 X Y

ˆ
E[β] β β

¯
ˆ
V ar(β) ˆ
V ar(βj ) = σ 2 Cjj ˆ
V ar(β0 ) = σ 2 1
+ Pn x
2
n x 2
i=1 (xi −¯)

2
ˆ
V ar(β1 ) = Pn σ
i=1 (xi −¯)2
x

Pn
ˆ
Y Y −β X Y y 2
i=1 (yi −ˆ1 )
σ2 σ2 = n−p σ2 =
ˆ n−2

ˆ ˆ 1 (x −¯)2
x
Prediction Yf ± tn−p, α σ 2 (1 + Xf (X X)−1 Xf )
ˆ Yf ± tn−p, α σ 2 (1 +
ˆ n + Pn f
x 2
)
2 2 i=1 (xi −¯)

Only applied to those data in same Only applied to those data in same
Limitation
range as sampled data range as sampled data

Table 5.1: Multiple and Simple Regression Comparison

Y = Xβ + (5.14)

where

40


N (0, σ 2 I) (5.15)

So

Y |X, β, σ 2 N (Xβ, σ 2 I) (5.16)

For simplicity of notation, we will not explicitly include X in our conditioning set for regression
model.

Using the deﬁnition of the multivariate Normal density, the likelihood function is obtained:

(σ 2 )−n/2 −1
p(Y |β, σ 2 ) = exp (Y − Xβ) (Y − Xβ) (5.17)
(2π)n 2σ 2
It will be convenient to write

(Y − Xβ) (Y − Xβ) (5.18)

in terms of the ordinary least squares estimators

v =n−p (5.19)
ˆ
β = (X X)−1 X Y (5.20)
ˆ ˆ
(Y −X β) (Y −X β)
s2 = n−p (5.21)

So

ˆ ˆ
(Y − Xβ) (Y − Xβ) = vs2 + (β − β)X X(β − β) (5.22)

Then

1 −1 ˆ ˆ −vs2
p(Y |β, σ 2 ) = exp (β − β) (X X)(β − β) (σ 2 )−v/2 exp (5.23)
(2π)n/2 σ p 2σ 2 2σ 2

As it was said before, n corresponds to the number of observations and p refers to the number of
parameters β. This new form of expressing the likelihood function would be more useful to ﬁnd a
natural conjugate prior distribution, which would have the same form that the former has.

41


The prior distribution for β and σ 2 , denoted by p(β, σ 2 ), can be written in a more convenient
way applying the definition of the joint distribution:

p(β, σ 2 ) = p(β|σ 2 )p(σ 2 ) (5.24)

Note that β and σ 2 are supposed to be dependent, which will rarely occur. Some authors prefer
1
to work with the error precision, σ2
say, instead of the variance σ 2 .

All this is very similar to that explained in the Bayesian Analysis for the Normal distribution. The
term of the first parenthesis in the likelihood function suggests a form of a Normal distribution for the
parameter β knowing σ 2 . So

−1
p(β|σ 2 ) ∝ (σ 2 )−p exp (β − β0 ) V0−1 (β − β0 ) (5.25)
2σ 2
and, hence,

β|σ 2 N (β0 , σ 2 V0 ) (5.26)

According to [Rossi06] the term of the second parenthesis in the likelihood function suggests a
form of an inverse gamma distribution for the parameter σ 2 (see appendix A). So

v0 −v0 s2
p(σ 2 ) ∝ (σ 2 )−( 2
+1)
exp 0
(5.27)
2σ 2
and, hence,

v0 v0 s2
σ2 Inv − G , 0
(5.28)
2 2
Note that there is an extra term (σ 2 )−1 here which is not suggested by the form of the likelihood
explained above. This term can be rationalized by viewing the conjugate prior as arising from the
posterior of a sample of size v0 with sufficient statistics, s2 , β0 , formed with the noninformative prior,
0
p(β, σ 2 ) ∝ σ −2 , which will be briefly explained later.

So the natural conjugate prior distribution of the parameters β and σ 2 is:

p+v0 −1
p(β, σ 2 ) ∝ (σ 2 )−( 2
+1)
exp [v0 s2 + (β − β0 ) V0−1 (β − β0 )]
0 (5.29)
2σ 2
and, hence,

42


β, σ 2 N − Inv − χ2 (β0 , V0 s2 ; v0 , s2 )
0 0 (5.30)

where the prior hyper-parameters β0 ,V0 ,v0 and s2 show the knowledge that the researcher has about
0
the problem and her or his confidence in it. Furthermore, the parameter β0 measures the marginal
effect of the explanatory variable on the dependent variable. As well, V0 indicates the uncertainty
about the prior information and it plays the same role than (X X)−1 does in the classical approach,
v0 represents a fictitious data set so it plays a similar role than n and s2 is an imaginary s2 for those
0
fictitious data. In terms of the distribution, β0 and V0 σ 2 represent the location and scale of β, respec-
tively, and v0 and s2 the degrees of freedom and scale of σ 2 , respectively.
0

Since a conjugate prior distribution has been used, the posterior distribution will have the same
form. That is, the posterior distribution will be a Normal-Scaled Inverse χ2 with a posterior hyper-
parameters β1 , V1 , v1 and s2 . According to [Rossi06] and [Koop03], it can be shown that
1

β, σ 2 |y N − Inv − χ2 (β1 , V1 s2 ; v1 , s2 )
1 1 (5.31)

The relation between the prior and the posterior hyper-parameters, according to [Koop03], is:

V1 = (V0−1 + X X)−1 (5.32)
ˆ
β1 = V1 (V0−1 β0 + X X β) (5.33)
v1 = v0 + n (5.34)
ˆ ˆ
v1 s2 = v0 s2 + vs2 + (β − β0 )[V0 + (X X)−1 ]−1 (β − β0 ) (5.35)
1 0

As it was mentioned in the Bayesian Data Analysis chapter, a measure is needed to summarize
the posterior distribution, and this is usually the posterior mean, namely E(β|y). According to what
said in previous chapters, the marginal for β will be a multivariate t-distribution (see Appendix A):

β|y tv1 (β1 , s2 V1 )
1 (5.36)

where

ˆ
E(β|y) = β1 = V1 (V0−1 β0 + X X β) (5.37)

and

43


v1 s2
1
V ar(β|y) = V1 (5.38)
v1 − 2
ˆ
So the posterior mean is a weighted average of the ordinary least squares estimate, β, and the
prior mean, β1 , where those weights are proportional to the observed data, X X, and the importance
given to the prior, V0−1 , respectively. This should make clear that as prior variance for β is decreased,
greater posterior weight is placed on prior beliefs relative to the data, so the posterior mean will be
closer to the prior mean.

v1 s2
The elements of the diagonal of the matrix v1 −2 V1
1
are the variances of β0 , β1 , . . . , βp .

v1 s2
1
V ar(βj ) = V1jj for all j = 0, 1, . . . , p (5.39)
v1 − 2
Likewise, the marginal posterior for σ 2 is:

σ 2 |y Inv − χ2 (v1 , s2 )
1 (5.40)

and, hence,

v1 s2
E(σ 2 |y) = 1
(5.41)
v1 − 2
2v1 s4
2
V ar(σ 2 |y) = 1
(5.42)
(v1 − 2)2 (v1 − 4)

So, as we increase the total of fictitious data v0 , then v1 tends towards v0 , and, hence, σ 2 get
closer to s2 .

Tables 5.2 and 5.3 shows how the different posterior parameters of interest vary depending on the
prior parameters V0 (considering V0 as cIk ) and v0 and the sample size n:
Table 5.2 means that if the size of the sample increases towards infinity, then the prior information
that the researcher gives has very little or almost none importance, as it occurs if the precision of the
prior distribution for β decreases (that is, V0 increases) towards 0. The difference between both cases
is that in the former the variance of β is lower than in the latter.

The number of fictitious data does not seem to affect to the posterior mean, but it affects to the
posterior variance increasing it (resp. decreasing) as the fictitious data increase (resp. decrease).

44


Action E[β|y] V ar[β|y]

n Increase Closer to OLS estimates Closer to 0

Decrease Closer to β0 Further from 0

V0 Increase Closer to OLS estimates Further from 0

Decrease Closer to β0 Closer to 0

ν0 Increase Not affected Increase

Decrease Not affected Increase

Table 5.2: Sensitivity analysis of parameter β

Table 5.3 refers to the parameter σ 2 , and it means that if the fictitious data increase, then the
information given by the researcher will have much more weight over the posterior mean of σ 2 than
the real data have, and the variance will be lower too. The other way round occurs when the number
of real data increases. Then, the data information will have the most important weight and the prior
information will not have any value. Another interesting result is that as the precision of the prior
distribution for β decreases (that is, V0 increases) the posterior mean of σ 2 will approximate to the
number of real data times the ordinary least estimates.

Continuing in a different issue, the fact that the natural conjugate prior implies prior information
enters in the same manner as data information helps with prior elicitation. When several priors can
be applied to the same problem, two strategies can be adopted to surmount the possible criticisms.
First, a prior sensitivity analysis can be carried out to demonstrate that results are the same with
different priors chosen. But, if results are sensitive to choice of prior, Bayesian approach allows for
the scientifically honest finding of such a state of affairs. There has been work done on extreme
bounds analysis for quantities such as the posterior mean of a parameter. [Poir95] provides a detailed

45


Action E[β|y] V ar[β|y]

n Increase Closer to OLS estimates Closer to 0

Decrease Closer to s2
0 Further from 0

V0 Increase Closer to vs2 Closer to 0

Decrease Closer to OLS estimates Further from 0

ν0 Increase Closer to s2
0 Closer to 0

Decrease Closer to OLS estimates Further from 0

Table 5.3: Sensitivity analysis of parameter σ 2

discussion about this issue. A second strategy is to use a non-informative prior to let the data speak
loudly and be predominant over prior information. For example, let’s set v0 = 0, and V0−1 = 0. Then

β, σ 2 |y N − Inv − χ2 (β1 , V1 s2 ; v1 , s2 )
1 1 (5.43)

where

V1 = (X X)−1 (5.44)
ˆ
β1 = β (5.45)
v1 = n (5.46)
v1 s2 = vs2
1 (5.47)

With this non-informative prior, all of these formulas involve only data information and equal to
ordinary least squares results. Bayesians often write this prior as:

46


p(β, σ 2 ) ∝ σ −2 (5.48)

Finally, one of the goals of the Bayesian approach is to provide a predictive model to predict an
unobserved data point generated from the same model that the data set with n observations (N (0, σ 2 )
with the same β). This is denoted by:

Y ∗ = X ∗β + ∗
(5.49)

where Y ∗ is not observed and ∗ is independent of .

Bayesian prediction is based on calculating

p(y ∗ |y) = p(y ∗ |y, β, σ 2 )p(β, σ 2 |y)dβdσ 2 (5.50)

The key to get the prediction is to ﬁnd out the form of p(y ∗ |y, β, σ 2 ), since the posterior p(β, σ 2 |y)
has been already calculated, and to test if p(y ∗ |y) is easy to integrate or, on the contrary, a posterior
simulator has to be employed.

Since ∗ is independent of , then Y ∗ is independent of Y , and p(y ∗ |y, β, σ 2 ) can be written as
p(y ∗ |β, σ 2 ), which is a multivariate Normal, as it was seen before.

T
(σ 2 )− 2 1
p(y ∗ |β, σ 2 ) = exp − (y ∗ − X ∗ β) (y ∗ − X ∗ β) (5.51)
(2π)S 2σ 2
Multiplying this by the posterior obtained previously and integrating yields a multivariate t:

y ∗ |y tv1 (X ∗ β1 , s2 (IT + X ∗ V1 X ∗ ))
1 (5.52)

where T is the number of observed X ∗ .

It is easy to see that:

E(y ∗ |y) = X ∗ β1 V ar(y ∗ |y) = s2 (IT + X ∗ V1 X ∗ )
1 (5.53)

A brief summary that compares the classical and the Bayesian approaches is displayed to note the
coincidences and differences between them.

47


Classical Regression Bayesian Regression

ˆ
β = (X X)−1 X Y ˆ
β1 = V1 V0−1 β0 + X X β

ˆ ˆ
(β−β0 ) (β−β0 )
ˆ ν0 s2 +νs2 +
Y Y −βX Y 0 V0 +(X X)−1
σ2
ˆ = n−p s2
1 = ν1

ˆ
E[β] = β E[β|y] = β1

ˆ ν1 s2
V ar βj = σ 2 Cjj V ar (βj |y) = ν1 −2 V1jj
1

Y ∗ |y tn−p X ∗ β, σ 2 IT
ˆ Y ∗ |y tν1 X ∗ β1 , s2 IT + X ∗ V1 X ∗
1

Table 5.4: Classical and Bayesian regression comparison

A very interesting and more exhaustive comparison between these two approaches can be read in
the article written by [Urba92], where he explains the advantages and disadvantages of using each of
them.

5.4 Normal Linear Regression Model subject to inequality constraints

In this section, let us guess we want to impose inequality constraints on the coefﬁcients in the Normal
linear regression model, such as βj ∈ A, where A is the region of all valid values of the coefﬁcients.
This is quite simple in Bayesian regression since they are imposed through the prior distribution:

p(β, σ 2 ) N − Inv − χ2 (β0 , V0 s2 ; v0 , s2 )1(β ∈ A)
0 0 (5.54)

where β0 , V0 , v0 and s2 are prior hyper-parameters to be chosen and 1(β ∈ A) is the indicator func-
0
tion, which equals 1 if β ∈ A and 0 otherwise.

Likewise, the posterior distribution for β is now:

48


p(β|y) ∝ tv1 (β1 , s2 V1 )1(β ∈ A)
1 (5.55)

where β1 , V1 , v1 and s2 were deﬁned previously.
1
So the difference introducing inequality constraints is that we must add the indicator function now.

This can result very easy, but for general choice of A neither analytical posterior results nor Gibbs
sampling work. The most suitable method is the importance sampling, which has already explained.
In this case, according to [Koop03] the importance function is:

q(β) = tv1 (β1 , s2 V1 )
1 (5.56)

The strategy consists of getting draws y ∗(s) drawing from p(y ∗ |β (s) , σ 2(s) using the draws β (s)
and σ 2(s) which were obtained for the posterior distribution. Then using these draws (y ∗ )(s) in the
Importance Sampling, the mean and the variance can be calculated.

Other more simple way consists of ignoring the constraints until the end of simulation, and then
discarding those draws which violate the restrictions. According to [Gelm04], this works reasonably
well if the constraints do not eliminate a large portion of data.

5.5 Normal Linear Regression Model with Independent Parameters

Now, suppose that the parameters β and σ 2 are independent, so

p(β, σ 2 ) = p(β)p(σ 2 ) (5.57)

With the same likelihood function as that used in the previous section, this assumption implies that
β follows a Multivariate Normal Distribution with mean β0 , as it occurred with β and σ 2 dependent,
but with variance V0 , and σ 2 has exactly the same Scaled − Inv − χ2 distribution used previously.
That is:

β N (β0 , V0 )σ 2 Inv − χ2 (v0 , s2 )
0 (5.58)

The prior joint distribution is

49


1 v0 v0 s2
p(β, σ 2 ) ∝ exp − (β − β0 ) V0−1 (β − β0 ) (σ 2 )−( 2
+1)
exp − 0
(5.59)
2 2σ 2
β, σ 2 N − Inv − χ2 (β0 , V0 , v0 , s2 )
0 (5.60)

As the posterior joint distribution is proportional to the prior times the likelihood:

1 (Y − Xβ) (Y − Xβ)
p(β, σ 2 |Y ) ∝ exp − + (β − β0 ) V0−1 (β − β0 ) ×
2 σ2
n+v0 v0 s2 + vs2
× (σ 2 )−( 1 +1) exp − 0 2 (5.61)
2σ
Since this function does not take the form of any well-known density, it is interesting to ﬁnd the
conditional distributions for β, p(β|Y, σ 2 ), and for σ 2 , p(σ 2 |Y, β), because with them any informa-
tion from p(β, σ 2 |Y ) can be obtained through the posterior simulation with the Gibbs sampler already
explained in previous chapters.

According to [Koop03], it can be shown that those conditional distributions are:

1
p(β|Y, σ 2 ) ∝ exp − (β − β1 ) V1−1 (β − β1 ) (5.62)
2
p+v0 1
p(σ 2 |Y, β) ∝ (σ 2 )−( 2 +1) exp − 2 (Y − Xβ) (Y − Xβ) + v0 s2 + vs2
0 (5.63)
2σ
And this all yields:

β|y, σ 2 N (β1 , V1 ) (5.64)
σ 2 |y, β Inv − χ2 (v1 , s2 )
1 (5.65)

where

1
V1 = (V0−1 + X X)−1 (5.66)
σ2
1
β1 = V1 (V0−1 |β0 + 2 X Y ) (5.67)
σ
v1 = n + v0 (5.68)
(Y − Xβ) (Y − Xβ) + v0 s2
s2
1 = 0
(5.69)
v1

50


The fact that the posterior distribution has an unknown form affects to the prediction for y ∗ ,
p(y ∗ |y), too. As it has been already said for the posterior predictive in Bayesian Approach chapter,
the interest is on p(y ∗ |y, β, σ 2 ). Since y and y ∗ are independent of one another,

p(y ∗ |y, β, σ 2 ) = p(y ∗ |β, σ 2 ) (5.70)

And hence

T
∗ 2 (σ 2 ) 2 1
p(y |β, σ ) = exp − (y ∗ − X ∗ β)(y ∗ − X ∗ β) (5.71)
(2π)
T
2 2σ 2
As the analytical solution of the integral of this figure is not trivial, the importance of Gibbs
sampler arises again, and, combine it with the Monte Carlo integration, any posterior and predictive
inference can be done. The strategy consists of getting draws y ∗(s) drawing from p(y ∗ |β s , σ 2(s) )
using the draws β (s) , σ 2(s) which were obtained for the posterior distribution. Then using these draws
y ∗(s) in the Monte Carlo integration the mean and the variance can be calculated.

5.6 Normal Linear Regression Model with Heteroscedasticity and Cor-
relation

Until now the variances have been supposed to be equal and having no correlation, but this is not very
realistic. In this section we are going to relax that assumption and to consider the next model:

Y = Xβ + (5.72)

where

N (0, Σ) (5.73)

That is, we are considering heteroscedasticity and correlation. According to [Koop03], since Σ is
a positive definite matrix, a matrix P can be found that verifies P ΣP = I, and it can be shown that

Y ∗ = X ∗β + ∗
(5.74)

where

∗
(0, σ 2 I) (5.75)

51


and

Y ∗ = PY (5.76)
X∗ = P X (5.77)
∗ =P (5.78)

Then, the likelihood function to consider now is:

1 2 −p 1 ˆ ˆ
p(Y |β, σ 2 , Σ) = n (σ ) 2 exp − (β − βΣ) X Σ−1 X(β − βΣ) ×
(2π) 2 2σ 2
v −vsΣ−2
× (σ 2 )− 2 exp (5.79)
2σ 2

where:

v = n−p (5.80)
ˆ
βΣ = (X ∗ X ∗ )−1 X ∗ Y ∗ (5.81)
ˆ ˆ
(Y ∗ − X ∗ βΣ) (Y ∗ − X ∗ βΣ)
s2 (Σ) = (5.82)
v
which is very similar to that use with equal variances.

Using the prior distributions described in the previous section, we have:

p(β, σ 2 , Σ) = p(β)p(σ 2 )p(Σ) (5.83)

where β is normally distributed with prior parameters β0 , V0 and σ 2 is an scaled inverse Chi-square
with parameters v0 and s2 .
0

Hence, knowing that the posterior distribution is proportional to the prior times the likelihood:

1 (Y ∗ − X ∗ β) (Y ∗ − X ∗ β)
p(β, σ 2 , Σ|Y ) ∝ p(Σ) × exp − + (β − β0 ) V0−1 (β − β0 ) ×
2 σ2
n+v0 v0 s2
× (σ 2 )−( 2 +1) exp − 2 0
(5.84)
2σ

52


This suggests a Normal distribution for the posterior conditional for β and an scaled inverse Chi-
square for the posterior conditional for σ 2 , as occurred before. Therefore:

β|Y, σ 2 , Σ N (β1 , V1 ) (5.85)
σ 2 |Y, β, Σ Inv − χ2 (v1 , s2 )
1 (5.86)

where

X Σ−1 X −1
V1 = (V0−1 + ) (5.87)
σ2
ˆ
X Σ−1 X βΣ
β1 = V1 (V0−1 β0 + 2
) (5.88)
σ
v1 = n + v0 (5.89)
(Y − Xβ) Σ (Y − Xβ) + v0 s2
s2
1 = 0
(5.90)
v1
According to [Koop03], the posterior conditional for Σ yields:

1 1
p(Σ|Y, β, σ 2 ) ∝ p(Σ)|Σ|− 2 exp − (Y − Xβ) Σ−1 (Y − Xβ) (5.91)
σ2
So we have come to the point where the form that Σ takes is crucial.

5.6.1 Heteroscedasticity

Let us suppose we suspect that there is not correlation among the errors but their variances are differ-
ent. Hence, we will have n variances ωi for n errors i .

It could be that the researcher has an idea of the form of Σ and assumes that

ωi = h(xi , α) = (1 + α1 xi1 + · · · + αp xip )2 (5.92)

That is, the variances are related to some or all independent variables. The researcher should
choose a prior for α, and then, Bayesian inference can be carried out through a Metropolis-Hastings
algorithm such as Random walk.

If the researcher knows that the error variances are different but has not idea of their form, then a
prior for Σ has to be chosen. According to [Koop03]:

53


n
p(Σ) = p(ωi ) (5.93)
i=1
where

ωi Inv − χ2 (vw , 1) (5.94)

But now a hyper-prior distribution should be fixed for vw , such as

p(Σ) = p(Σ|vw )p(vw ) (5.95)

That is, we are using a hierarchical prior to treat the heteroscedasticity. According to [Gelm04] a
Metropolis-Hastings algorithm can be used to draw posterior simulations.

5.6.2 Correlation

Now, let us assume that there is some correlation among the errors through the time- space relationship
such as the error in one period depends on that in the previous period. This is a type of regression
called autoregressive, and it can be considered a time series. For example, if we are considering
the relation among the Ibex 35 values one day and the previous ones, we could say that there is a
correlation between errors that exists in the relation among Fridays and previous days and what exists
in the relation among the values on Thursdays or Wednesdays or Tuesdays and the previous days.
That is:

t = ρ1 t−1 + ρ2 t−2 + · · · + ρp t−p + ut (5.96)

where

ut N (0, σ 2 ) (5.97)

We will consider that there is stationary. This means, in a general way, that the probability dis-
tribution does not vary through the time. Some time series does not seem to be stationary, but the
differences do. The main difference to take into account is the first one mentioned. The first differ-
ence of t is δ t and it indicates the variation in among the periods t and t − 1, t − 2 ,. . . , t − p.

According to [Koop03], the irregular component ut can be formulated in the following way:

ρ(L) t = ut (5.98)

54


where L is called the lag operator and has the property that L t = t−1 and ρ(L) = (1 − ρ1 L − · · · −
ρp Lp ).

So, if we have the regression model:

Yt = Xt β + t (5.99)

Then, it is possible to ﬁnd a model such as

Yt∗ = Xt β + ut , ut
∗
N (0, σ 2 ) (5.100)

where

Yt∗ = ρ(L)Yt (5.101)
∗
Xt = ρ(L)Xt (5.102)

Therefore, using an independent Normal scaled inverse chi-square prior for β and σ 2 , it yields:

β|Y, σ 2 , ρ N (β1 , V 1) (5.103)
σ 2 |Y, β, ρ Inv − χ2 (v1 , s2 )
1 (5.104)

where

X ∗ X ∗ −1
V1 = (V0−1 + ) (5.105)
σ2
X∗ Y ∗
β1 = V1 (V0−1 β0 + ) (5.106)
σ2
v1 = v0 + T − p (5.107)
(Y ∗ − X ∗ β) (Y ∗ − X ∗ β) + v0 s2
s2
1 = 0
(5.108)
v1
And now, as it occurred with heteroscedasticity, a prior should be selected for ρ. Let us choose a
multivariate Normal subject to the constraint ρ ∈ φ, where φ is the stationary region. Then,

p(ρ) N (ρ0 , Vρ0 )1(ρ ∈ φ) (5.109)
p(ρ|Y, β, σ 2 ) N (ρ1 , Vρ1 )1(ρ ∈ φ) (5.110)

55


where ρ0 and Vρ0 are the prior parameters which the research should establish and ρ1 and Vρ1 are the
posterior parameters with the next relation:

−1 E E −1
Vρ1 = (Vρ0 + ) (5.111)
σ2
−1 EE
ρ1 = Vρ1 (Vρ0 ρ0 + 2 ) (5.112)
σ
where E is a matrix with the errors through the time from t − 1 to t − p for each independent variable.

According to [Koop03], a Gibbs sampler can be used to draw posterior simulations.

5.7 Models Summary

Since the main models to be used in the posterior application are those homoscedastic and not auto
correlated, the main ideas are shown in Tables 5.5, 5.6, 5.7 and 5.8.

56


Case β σ2 Joint Prior Distribution

p(β, σ 2 ) = p(β|σ 2 )p(σ 2 ) β|σ 2 N (β0 , σ 2 V0 ) σ2 Inv − χ2 (v0 , s2 )
0 β, σ 2 N − Invχ2 (β0 , V0 s2 ; v0 , s2 )
0 0

p(β, σ 2 ) = p(β|σ 2 )p(σ 2 ) β|σ 2 N (β0 , σ 2 V0 )1(β ∈ A) σ2 Inv − χ2 (v0 , s2 )
0 β, σ 2 N − Invχ2 (β0 , V0 s2 ; v0 , s2 )1(β ∈ A)
0 0

57
p(β, σ 2 ) = p(β)p(σ 2 ) β N (β0 , V0 ) σ2 Inv − χ2 (v0 , s2 )
0 β, σ 2 N − Invχ2 (β0 , V0 ; v0 , s2 )
0

p(β, σ 2 ) = p(β)p(σ 2 ) β N (β0 , V0 )1(β ∈ A) σ2 Inv − χ2 (v0 , s2 )
0 β, σ 2 N − Invχ2 (β0 , V0 ; v0 , s2 )1(β ∈ A)
0

Table 5.5: Main Prior Distributions Summary

Case Joint Posterior Distribution Key

Obtain Margin Distribu-
p(β, σ 2 |y) = p(y|β, σ 2 )p(β|σ 2 )p(σ 2 ) β, σ 2 N − Inv − χ2 (β1 , V1 s2 ; v1 , s2 )
1 1 tions, Draw directly from
them and summarize

Obtain Margin Distri-
butions, Draw directly
p(β, σ 2 |y) = p(y|β, σ 2 )p(β|σ 2 )p(σ 2 ) β, σ 2 N − Inv − χ2 (β1 , V1 s2 ; v1 , s2 )1(β ∈ A)
1 1
from them, discard invalid
draws and summarize

58
Obtain Conditional Distri-
1 (Y −Xβ) (Y −Xβ)
p(β, σ 2 |y) = p(y|β, σ 2 )p(β)p(σ 2 ) ∝ exp − 2 σ2
+ (β − β0 ) V0−1 (β − β0 ) × butions, Draw with Gibbs
Sampler and summarize
n+v0 v s2 +vs2
2
+1) 0
× (σ 2 )−( exp − 0 2σ2

Obtain Conditional Distri-
(Y −Xβ) (Y −Xβ) butions, Draw with Gibbs
p(β, σ 2 |y) = p(y|β, σ 2 )p(β)p(σ 2 ) ∝ exp − 1
2 σ2
+ (β − β0 ) V0−1 (β − β0 ) ×
Sampler, discard invalid
n+v0
draws and summarize
+1) v0 s2 +vs2
0
× (σ 2 )−( 2 exp − 1(β ∈ A)
2σ 2

Table 5.6: Main Posterior Distributions Summary


Case Prior Parameters Posterior Parameters Relation

β0 β1 ˆ
β1 = V1 (V0−1 β0 + X X β)
V0 V1 V1 = (V0−1 + X X)−1
p(β, σ 2 |y) = p(y|β, σ 2 )p(β|σ 2 )p(σ 2 )
v0 v1 v1 = v0 + n
ˆ ˆ
v0 s2 +vs2 +(β−β0 )[V0 +(X X)−1 ](β−β0 )
s2 s2 0
0 1 s2 =
1 v1

59
X Y
β0 β1 β1 = V1 (V0−1 β0 + σ2
)
X X −1
V0 V1 V1 = (V0−1 + σ2
)
p(β, σ 2 |y) = p(y|β, σ 2 )p(β)p(σ 2 )
v0 v1 v1 = v0 + n
v0 s2 +(Y −Xβ) (Y −Xβ)
0
s2
0 s2
1 s2
1 = v1

Table 5.7: Prior and Posterior Parameters Summary

Chapter 6

Symbolic Data

6.1 What is symbolic data analysis?

Nowadays there are more and more data which are susceptible to be analyzed and studied. The tech-
nological advances let us get huge quantities of information about a specific variable. But part of
that information is lost due to the fact that standard statistical methods do not have the flexibility to
manage such quantity of information. For example, let us assume we are studying the evolution of
stock prices for an enterprise. At the end of each month we would have the different values that the
stock has been taking daily. It seems reasonable to think that the researcher would take only the daily
close prices, or the daily mean prices, but he would not manage all the gathered information.

The symbolic data analysis (SDA) deals with this problem and let us analyse vast information ef-
ficiently in order to extract the required knowledge and to represent it better. Going on with the same
example, the symbolic data will let the engineer manage the daily maximum and minimum prices
of a month, or manage a histogram for monthly prices and work with them. In this way, SDA com-
plements other statistical tools which are widely used, such as candlesticks. More information about
candlesticks and other interesting tools can be found in [Lee 06] and [Irpi05]. For instance, Figure 6.1
illustrates an interval time series for the daily maximum and minimum Ibex 35 values in January 2006.

So, the possibilities with symbolic data are evident. For instance, let us think of an application
of this with warrants. Warrant is a right, without obligation, to buy, namely call warrant, or to sell,
namely put warrant, something at an agreed price, namely strike. So you could get a predicted stock
price range to decide the best put warrant or the most suitable call warrant, and obtain more benefits.

61

6. Symbolic Data

Figure 6.1: Interval time series

Regarding to the aggregation method used by SDA lies the notion of a symbolic object. This is a
mathematical model of a concept (see [Dida95]) which, basically, let us select some individuals from
a group. Going further with SDA and according to [Bill06a], three main kinds of symbolic data can
be considered: multi-valued, interval-valued and modal- valued.

As far as the former is concerned, a multi-valued symbolic random variable Y is one whose pos-
sible value takes one or more values from the list of values in its domain Y. The complete list of
possible values in Y is finite, and values may be well- defined categorical or quantitative values.

For example, let us have all the companies which have formed the Ibex 35 index since its be-
ginning. Then we could define a variable Y = blue chips in the Ibex 35 having 15 observations
wu = year. Thus, we have, for instance, that during the first year, in 1992 (wu = w1 ), Telef´ nica,
o
Repsol, Endesa, SCH and BBVA were considered to be the blue chips. In 2007 (wu = w1 ) Santander,
Telef´ nica, BBVA, Endesa and Repsol YPF are considered to be the blue chips.
o

Likewise, an interval-valued symbolic random variable Y is one that takes values in an interval.

62

6. Symbolic Data

wu Year Z = Blue chips in Ibex 35

w1 1992 {Telef´ nica, Repsol, Endesa, SCH, BBVA}
o

.
. .
. .
.
. . .

w15 2007 {Telef´ nica, Repsol YPF, Endesa, BBVA, Santander}
o

Table 6.1: Multivalued Data Example

That interval can be closed or open at either end. This is very important in SDA; furthermore, it can
extract the tendency of centralization and dispersion of a dataset. Let us recall the example of the
daily stock prices for a company in a month. This information can be recorded as the daily maximum
and minimum values during the month. As this is one of the most interesting types of symbolic data
for our purpose, we will take it up again below.

Finally, let a random variable Y take possible values {ηk : k = 1, 2, . . . } over a domain Y. Then,
a modal valued outcome is that formed with the value ηk and an associated measure πk . This last
one is usually a weight, probability, relative frequency, and the like. But it can also be capacities,
necessities, possibilities, credibilities and related entities.

Then modal multi-valued variable can be defined now. This is a variable whose observed outcome
takes values that are a subset of the domain with its respective measure. For example we could define
a variable Z = Importance of the companies in the Ibex 35 index. Thus, for instance, we have that the
most important company in 1992 was Telef´ nica and now Santander is currently the company with
o
highest weight in the index in 2007.

Another example: let us suppose we define a variable Y = Maximum daily stock price for enter-
prises in the Continuous Spanish Stock Market. We could have for the enterprise Endesa:

63

6. Symbolic Data

wu Year Z = Importance of a Ibex 35 company

{Telef´ nica, 13.7; Repsol, 9.7; Endesa, 9.2; SCH, 8.0, BBVA,
o
7.2; Iberdrola, 6.9; Santander, 5.9; Banco Popular, 3.8; Banesto,
3.6; Banco Exterior, 3.0; Cepsa, 2.5; Tabacalera, 2.4; Acesa,
2.1; Uni´ n FENOSA, 2.0; Gas Natural, 1.9; Sevillana de Elec-
o
tric, 1.8; Fuerzas E. Catalua, 1.7; Bankinter, 1.6; Dragados,
w1 1992
1.4; Aguas de Barcelona, 1.3; Mapfre, 1.3; Asland, 1.2;FCC,
1.1; Portland Valderribas, 1.0; Hidrocantbrico, 0.8;Vallehermoso,
0.8; Metrovacesa, 0.8; Acerinox, 0.7; Viscofn, 0.6; Cubiertas y
MZOV, 0.5; Sarri´ , 0.4; Uralita, 0.4; Huarte, 0.3; Urbis, 0.3;
o
Agromn, 0.2}

.
. .
. .
.
. . .

{ Telef´ nica, 16.0; Repsol YPF, 5.9; Endesa, 7.5; BBVA, 13.0;
o
Iberdrola, 5.6; Santander, 17.2; Banco Popular, 3.4; Banesto,
0.5; Uni´ n FENOSA, 1.8; Gas Natural, 1.5; Bankinter, 0.9; Cor.
o
Mapfre, 0.7; FCC, 1.2; Sacyr Vallehermoso, 1.0; Metrovacesa,
w15 2007 0.5; Acerinox, 1.0; Inditex 3.0; ACS Const, 2.9; B. Sabadell, 2.1;
Altadis, 2.0; Abertis A, 2.0; G. Ferrovial, 1.6; Acciona, 1.4; FCC,
1.2; Gamesa, 1.0; Enags, 0.8; REE, 0.8, Cintra, 0.7; Agbar, 0.7;
Telecinco, 0.6; Iberia, 0.5; Indra A, 0.5; Fadesa, 0.5; Sogecable,
0.4; Antena 3 TV, 0.4; NH Hoteles, 0.4}

Table 6.2: Modal-multivalued Example

Y (Endesa) = {38.7, 0.125; 38.75, 0.125; 38.8, 0.250; 38.85, 0.250; 38.9, 0.125; 39, 0.125}

This means that we assign a probability of 0.125 to the possibility of that Endesa maximum daily
price is 38.7, 0.125 to the possibility of that Endesa maximum daily price is 38.75, a probability of

64

6. Symbolic Data

0.25 to the possibility of that Endesa maximum daily price is 38.8 and so on.

Another very interesting variant of this type are modal interval-valued variables. That is, instead
of a value with a probability, the variable can take any value in an interval with a probability. Contin-
uing with the previous example:

Y (Endesa) = {[38.7, 38.75), 0.125; [38.75, 38.85), 0.125; [38.8, 38.9), 0.25}

For more information and other types of data, the reader is referenced to [Bill06a], [Huiw06] and
[Arro06].

6.2 Interval-valued variables

As it has been already mentioned, to summarize a dataset is one of the three possible sources or rea-
sons from which the interval data may result. According to [Huiw06], there are other two sources:
the imprecision of measurement and the expert’s knowledge including uncertainty.

Now, suppose u ∈ E is the set of m symbolic objects with observations Y (u) with u = 1, . . . , m.
Let us suppose we are interested in the particular random variable Yj ≡ Z, and that the realization
of Z for the observation wu is the interval Z(wu ) = [au , bu ] = ξ. Then, according to [Bill06a], the
empirical density function of Z is

1 Iu (ξ)
f (ξ) = , ξ∈R (6.1)
m Z(u)
u∈E

where Iu (·) indicates if ξ is or is not in the interval Z(u) and where Z(u) is the length of that
interval.

Likewise, it can be shown that the symbolic empirical mean is given by

¯ 1
Z= (bu + au ) (6.2)
2m
u∈E

and the symbolic empirical variance is given by

2
2 1 1
S = b2
u + bu au + a2
u − (bu + au ) . (6.3)
3m 4m2
u∈E u∈E

65

6. Symbolic Data

These formulas are coherent with the hypothesis of uniformity into the intervals. As well as the
symbolic mean can be understood intuitively as the centre of gravity, the symbolic variance is not so
easy to be understood. In fact, it would seem more reasonable to formulate the variance as:

2
2 1 12
S = (bu + au ) − (bu + au ) . (6.4)
4m 4m2
u∈E u∈E

That is, the variance of the midpoints. But this last formulation does not take into account the
internal variation of the intervals, while the former does and, hence, this is higher.

For example, let us consider the maximum and minimum points for the Ibex 35 during December
2006.

Then, according to what said above, the mean point in that month was:

¯ 1
Z= (highu + lowu ) = 14116.
38
u∈E

And the empirical symbolic variance is:

2
2 1 1
S = b2
u + bu au ) + a2
u − (bu + au ) = 28006.
3m 4m2
u∈E u∈E

If we had calculated the variance taking only the midpoints the result would have been:

2
2 1 2 1
S = (bu + au ) − (bu + au ) = 26023
4m 4m2
u∈E u∈E

which is lower than that obtained previously because it does not take into consideration the internal
variation of the intervals.

Although it seems that everything related to interval-valued data are advantages, according to
[Huiw06], there are two major limitations when applying multivariate analysis on an interval dataset.
The ﬁrst one is that the computing work is hard, and the second one is that the hyperrectangle may
enlarge the range of the original dataset and reduce analysis accuracy.

The methodology of interval data applied to multivariate analysis involved transforming symbolic
data matrix into numerical matrix. That is, to reduce p-dimensional observations into s-dimensional

66

6. Symbolic Data

components (where usually s p). This is called Principal Component Analysis. There are two
main methods that carry out this: the Vertices Method and the Centres Method. The former consists
of getting a matrix with 2p rows and p columns, from a hyperrectangle in the p−dimensional space,
where each row contains the coordinates of one vertex of hyperrectangle in Rp . On the other hand,
the latter deals with the idea of the average value of every variable for each category of data. A more
extended review of these two methods can found in [Bill06a]. [Huiw06] point up some limitations of
these methods and propose a new type of symbolic data: factor interval data. Due to the fact that the
symbolic data is a wide field, the reader is referenced to all the above citations.

6.3 Classical regression analysis with Interval-valued data

Regarding to classical multiple regression there are three current approaches to be considered, though
one of them is just a regression fit. Let us begin with the most intuitive to finish with the most con-
ceptual.

Due to the fact that now we have intervals instead of single values, it would be natural to take
midpoints and to proceed as it was done with multiple classical regression. That is, to use the result
to make new predictions from a new interval applying it to each extreme of the interval. Moreover,
[DeCa05] remark the need of establishing the constraint βi ≥ 0 to ensure that the lower extreme of the
predicted interval is lower than the higher extreme, and suggest the algorithm presented by [Laws74]
to solve such constraint. We suggest the alternative of getting enough draws from the posterior distri-
bution of β, discard those who are negative and average.

Let us recall the same example shown in the classical multiple regression, but taking now the
maximum and minimum values that Ibex 35, Dow Jones, FTSE 100 and DAX took in the first ten
months in 2006. We would take the midpoints of those intervals and we would obtain the same result
that we got in the classical multiple regression:

IBEX35t = 1.0102DowJonest−1 − 2.0144F T SE100t−1 + 2.1229DAXt−1 + t

where

t N 0, 332.712

67

6. Symbolic Data

We could use this model to predict a new observation for November, 1st , applying it to each
extreme of the intervals:

max (IBEX35t ) = 1.0102 × 12161 − 2.0144 × 6149.9 + 2.1229 × 6289.7 = 13242.41
min (IBEX35t ) = 1.0102 × 11986.84 − 2.0144 × 6110.9 + 2.1229 × 6237.55 = 13040.7

So the prediction would be: [13040.7, 13242.41].

A disadvantage of this approach is that it does not take into account the interval length.

To solve that problem, [DeCa05] and [DeCa04] suggest another regression for the interval range.
They refer to this new approach as the constrained centre and range method (CCRM). In that case
the constraint is applied to the interval range regression instead of to the centres regression. We will
employ the radii instead of the ranges. So, going on with the previous example, we would have the
radios for the different indexes to build the next model:

RadioIBEX35t = 0.35RadioDowJonest−1 + 0.484RadioF T SE100t−1 +
+ 0.272RadioDAXt−1 + t

where

t N (0, 26.312 )

With this new approach, the prediction could be calculated from the midpoint and the range of
the interval:

M idpointIBEX35t = 1.0102 × 12073.65 − 2.0144 × 6130.4 + 2.1229 × 6262.125 = 13141.3
RadioIBEX35t = 0.35 × 86.81 + 0.484 × 19.5 + 0.272 × 24.575 = 46.53

Now the prediction would be: [13094.75, 13187.81].

Finally, the last approach is the use of the symbolic mean, the symbolic variance and the sym-
bolic covariance to make the regression. This means that a symbolic regression is used instead of the

68

6. Symbolic Data

classical regression. For this new approach another way of estimating is needed.

Recall the classical univariate multiple regression model:

Y = β0 + X1 β1 + · · · + Xp βp + (6.5)

where

N (0, σ 2 ) (6.6)

Calculating the mean values we have:

¯ ¯ ¯
Y = β0 + X1 β1 + · · · + Xp βp + ¯ (6.7)

from which it can be easily deduced that:

β0 = 0 ⇒ ¯ = 0 (6.8)

This means that the mean error is zero if there is a constant term in the model. This is a very
important point for the posterior consequences.

Then we can obtain an equivalent model as:

¯ ¯ ¯
Y − Y = β0 + X1 − X1 β1 + · · · + Xp − Xp βp + (6.9)
¯ ¯
where Y − Y is the new dependent variable and X − X is the new matrix of independent variables.

β can be estimated in the following way:

ˆ −1
β = SXX SXY (6.10)

where  
var (X1 ) cov (X1 , X2 ) . . . cov (X1 , Xp )
 
cov (X1 , X2 ) var (X2 ) . . . cov (X2 , Xp )
 
SXX = . . .. . 
 .
. .
. . .
. 
 
cov (X1 , Xp ) cov (X2 , Xp ) . . . var (Xp )
and

69

6. Symbolic Data

 
cov (X1 , Y )
 
cov (X2 , Y )
 
SXY = . 
 .
. 
 
cov (Xp , Y )
where independent term is not being taken into account (so there is no column of ones in matrix X).

The independent term β0 is estimated in the following way:

p
ˆ ¯
β0 = Y − ¯
βj Xj (6.11)
j=1

With this new way, the symbolic variance, the symbolic covariance and the symbolic mean for
interval- valued variables can be used to estimate β.

But this approach has the limitation that to be able to employ the symbolic statistics and the last
way of estimating it is necessary to introduce the independent term in the regression model. In fact,
the most important point is that this last approach suggested by [Bill06a] is just a regression ﬁt since
they do not deﬁned any residual term for symbolic data.

6.4 Bayesian regression analysis with Interval-valued data

Once we know how the interval-valued data can be employed in the classical regression, let us see
how this could be included in the Bayesian approach. For such purpose we will employ the CCRM
proposed by [DeCa05].

According to what has been said above and in Bayesian Regression, there is nothing new to be
done. The problem is reduced to two Bayesian regressions: one for the centres and another for the
radios with a constraint applied to . As we saw in Bayesian Regression, the constraint is much easier
to be incorporated into the Bayesian approach than into the classical one.

So, introducing the Bayesian approach into the regression with symbolic data, the engineer will
be able to incorporate more information to the problem that he could do with Bayesian regression and
traditional data. This is due to the fact that now two regressions are being made, and the expert will
be able to establish if the centres value will increase or decrease and the same for the radios. In this

70

6. Symbolic Data

sense, an opinion like:

’I think that Dow Jones will have less importance over Ibex 35 and DAX will have more relevance
than they have had until now, and there will be more volatility.’

would mean that, for instance, the prior mean for the Dow Jones midpoint will be less than the
indicated by the data. On the contrary, the prior mean for the DAX midpoint would be greater, as it
would occur with the prior means for the radios.

71

Chapter 7

Results

To show the usefulness of the Bayesian Centre and Radius approach proposed in this project, exper-
iments with real symbolic interval-valued data sets ﬁtting a linear regression model together with a
data set from Spanish Continuous Stock Market are considered in this section.

7.1 Spanish Continuous Stock Market data sets

We have considered two situations in the Spanish Continuous Stock Market. On one hand, we have
used the monthly minimum and maximum prices of BBVA and BSCH from January 2000 to June
2007 in order to show how the classical regression approach applied to interval- valued data can be
improved through the Bayesian Centre and Radius approach when the variables are directly related.
This will let us see other advantages of the proposed approach over the classical regression with sin-
gles values.

On the other hand, we have taken the daily minimum and maximum prices of others two Spanish
Continuous Stock Market companies such as Dogi and Zardoya from January 2006 to December 2006
in order to show that the Bayesian Centre and Radius approach is better than other approaches even
when the variables are not related; that is, they are uncorrelated.

7.2 Direct Relation between Variables

In this case 66 of the total 89 months will be applied to the training set. The other 23 months will be
applied to the testing set.

72

7. Results

Let us begin with the classical regression approach applied to the midpoints of the monthly mini-
mum and maximum prices that BBVA and BSCH took in the considered training period. These data
yields the next model:

BSCHM idpoint = 1.3008 + 0.6229 × BBV AM idpoint +

where

N 0, 0.52372

Figures 7.1 and 7.2 show that this model fits well enough for both training and testing sets.

13

12

11

10

9

8

7

6

5
7 8 9 10 11 12 13 14 15 16 17
Midpoints BBVA prices

Figure 7.1: Classical Regression with single values in training test

If we calculate the different error measures (Mean Error, Mean Absolute Error, Mean Square
Error and Root Mean Square Error) for each set, we obtain the Table 7.1. This means that it is a good
model, but we are only using the midpoints to fit new data when we have much more available data.
Therefore we are wasting information we have gathered. Actually, this can be seen graphically in
Figure 7.3. This figure makes oneself think that the model is not as good as it was believed before,
since there are too available information for a very simple result and one could expect more from
those data.
Thus, other approach, known as Centre Method, could be considered by applying the obtained
model to each maximum and minimum price to get a predicted maximum and minimum price. This
provides the results displayed in Figures 7.4 and 7.5.
According to [Bill00], the total deviation is given by:

CenterM ethod2000 = lower + upper (7.1)

73

7. Results

14.5

14

13.5

13

12.5

12

11.5

11

10.5

10

9.5
13 14 15 16 17 18 19 20
Midpoints BBVA prices

Figure 7.2: Classical Regression with single values in testing test

Set ME MAE MSE RMSE

Training 0 0.4208 0.2660 0.5157

Testing 0.2321 0.3831 0.2446 0.4946

Table 7.1: Error Measures for Classical Regression with single values

The resulting error measures can be seen in Table 7.2.
Now, we have a ﬁtted interval-valued data for each observed interval- valued data. This approach
seems to take advantage of the extracted data.

Now let us see the resulting error measures according to Centre method proposed by [Bill02],
where the sum of square errors is given by:

n
2 2
SSECenterM ethod2002 = lower + upper (7.2)
i=1
and, thus, the absolute mean error is given by:

74

7. Results

13

12

11

10

9

8

7

6

5

4
6 8 10 12 14 16 18
Minimum and Maximum BBVA prices

Figure 7.3: Classical Regression with interval- valued data

13

12

11

10

9

8

7

6

5

4
6 8 10 12 14 16 18

Figure 7.4: Centre Method (2000) in training set

n
i=1 (| lower | +| upper |)
M AECenterM ethod2002 = (7.3)
n
Table 7.3 shows that this new deﬁnition of the error does not improve very much the previous
one.
However, let us compare these last approaches with the Centre and Radius Method. In this case
we will have the next model:

BCSHM idpoint = 1.3008 + 0.6299 × BBV AM idpoint + M idpoint

75

7. Results

15

14

13

12

11

10

9
12 13 14 15 16 17 18 19 20 21

Figure 7.5: Centre Method (2000) in testing set

Set ME MAE MSE RMSE

Training 0 0.8416 1.0638 1.0314

Testing 0.4643 0.7663 0.9784 0.9891

Table 7.2: Error Measure for Centre Method (2000)

where

M idpoint N (0, 0.52372 )

and

BCSHRadius = 0.106 + 0.6188 × BBV ARadius + Radius

where

Radius N (0, 0.14582 )

76

7. Results

Set ME MAE MSE RMSE

Training 0 0.8917 0.5922 0.7695

Testing 0.4643 0.7717 0.5125 0.7159


According to [DeCa07], the sum of squares of deviations is given by:

n
2 2
SSECentreRadiusM ethod = M idpoint + Radius (7.4)
i=1

Therefore, the mean absolute error is given by:

n
i=1 (| M idpoint | +| Radius |)
M AE = (7.5)
n

13

12

11

10

9

8

7

6

5

4
6 8 10 12 14 16 18

Figure 7.6: Centre and Radius Method in training set

The results shown in Figures 7.6 and and in Table 7.4 clearly clearly show how the error measures
are less with Centre and Radius Method than with Centre Method, and thus, the former is better than

77

7. Results

15

14

13

12

11

10

9

8
12 13 14 15 16 17 18 19 20 21

Figure 7.7: Centre and Radius Method in testing set

Set ME MAE MSE RMSE

Training 0 0.5233 0.2866 0.5353

Testing 0.1837 0.4712 0.2558 0.5058

Table 7.4: Error Measures for Centre and Radius Method

the latter.

Now, let us take into consideration an expert’s knowledge about the Spanish Continuous Stock
Market and see the results of the Bayesian Centre and Radius Method. Obviously, the Bayesian
methodology is mainly useful in the testing set since it is there where there are unobserved data.

Bearing in mind the previous Centre and Radius model, an expert could think that BSCH would
slightly get better respect to BBVA and assign the prior distribution seen in 5.36 with the following
prior parameters for the Midpoints:

78

7. Results

1.3008
β0 =
0.64
V0 = 0.000000001
s2 = 0.52372
0

v0 = 107

Then the ﬁnal Midpoints model would be:

BCSHM idpoint = 1.3008 + 0.64 × BBV AM idpoint + M idpoint

Let us assume that the expert would consider that the volatility would not vary and she or he
assign a vague prior parameters for the Radius distribution:

0.106
β0 =
0.6188
V0 = 106
s2 = 0.14582
0

v0 = 4

Then, the ﬁnal Radius model would be:

BCSHRadius = 0.106 + 0.6188 × BBV ARadius + Radius

The results for the testing set are shown in Figure 7.8 and in Table 7.5.
This shows that the proposed Bayesian Centre and Radius Method improve all the previous ap-
proaches since let us manage more information than the classical regression and we obtain best results
than those obtained with the Centre and Centre and Radius methods.

7.3 Uncorrelated Variables

In this other case 170 of the total 255 days will be applied to the training set. The other days will be
applied to the testing set.

79

7. Results

15

14

13

12

11

10

9
12 13 14 15 16 17 18 19 20 21

Figure 7.8: Bayesian Centre and Radius Method in testing test

Set ME MAE MSE RMSE

Testing 0.0126 0.4409 0.1997 0.4469

Table 7.5: Error Measures in Bayesian Centre and Radius Method

The classical regression with the midpoints of the prices range yields the next model:

DogiM idpoint = 5.6570 − 0.0806 × ZardoyaM idpoint +

where

N (0, 0.28822 )

Figures 7.9 and 7.10 show that this model does not ﬁts well for neither training nor testing sets.
If we calculate the different error measures (Mean Error, Mean Absolute Error, Mean Square
Error and Root Mean Square Error) for each set, we obtain the Table 7.6.

80

7. Results

5

4.8

4.6

4.4

4.2

4

3.8

3.6

3.4

3.2

3
21 21.5 22 22.5 23 23.5 24 24.5 25 25.5 26
Midpoints Zardoya prices

Figure 7.9: Classical Regression with single values in training test

4.1

4

3.9

3.8

3.7

3.6

3.5

3.4

3.3

3.2

3.1
22.4 22.6 22.8 23 23.2 23.4 23.6 23.8 24 24.2
Midpoints Zardoya prices

Figure 7.10: Classical Regression with single values in testing test

The Centre Method could be applied to get a predicted maximum and minimum price. This
method yields the next model:

DogiM idpoint = 5.6570 + 0.0792 × ZardoyaM idpoint +

where

81

7. Results

Set ME MAE MSE RMSE

Training 0 0.4231 0.2268 0.4763

Testing -0.3518 0.3651 0.1642 0.4052

Table 7.6: Error Measures for Classical Regression with single values

N (0, 7.21372 )

Note that the slope has changed since, according to [DeCa04], it cannot be negative to ensure that
the ﬁtted maximum is greater than the ﬁtted minimum.

This provides the results shown in Figures 7.11 and 7.12.

8

7
Minimum and Maximum Dogi prices

6

5

4

3

2
20 21 22 23 24 25 26 27
Minimum and Maximum Zardoya prices

Figure 7.11: Centre Method (2000) in training set

Table 7.7 shows the resulting error measures.

82

7. Results

8

7.5

7

6.5

6

5.5

5

4.5

4

3.5

3
22 22.5 23 23.5 24 24.5

Figure 7.12: Centre Method (2000) in testing set

Set ME MAE MSE RMSE

Training -7.1288 7.1288 51.8315 7.1994

Testing -8.0653 8.0653 65.1544 8.0718


It is very clear that this model is not accurate. This example evidences the main weak point of
this approach: the positive constraint imposed to the coefﬁcients. This makes impossible having an
inverse relationship between variables. This is pointed out by the error measures, which are very high.

Now let us see the resulting error measures according to Centre method proposed by [Bill02]:
This new deﬁnition of the error improves the previous one.

However, let us compare these last approaches with the Centre and Radius Method. In this case
we will have the next model:

DogiM idpoint = 5.6570 − 0.086 × ZardoyaM idpoint +

83

7. Results

Set ME MAE MSE RMSE

Training -7.1288 7.1288 25.9183 5.0910

Testing -8.0653 8.0653 32.5825 5.7081


where

N (0, 0.28822 )

and

DogiRadius = 0.0283 + 0.08 × ZardoyaRadius +

where

N (0, 0.02592 )

The results can be seen in Figures 7.13 and 7.14 and Table 7.9.

Set ME MAE MSE RMSE

Training 0 0.4385 0.2273 0.4768

Testing -0.3426 0.3882 0.1655 0.4068

Table 7.9: Error Measures for Centre and Radius Method

As it occurred with a direct relationship between variables, now the error measures are less with
Centre and Radius Method than with Centre Method, and thus, the former is better than the latter even

84

7. Results

5

4.5

4

3.5

3

2.5
20 21 22 23 24 25 26 27

Figure 7.13: Centre and Radius Method in training set

4.4

4.2

4

3.8

3.6

3.4

3.2

3
22 22.5 23 23.5 24 24.5

Figure 7.14: Centre and Radius Method in testing set

when there is not a clear relationship.

Now, let us see what happens introducing the Bayesian methodology. Bearing in mind the previ-
ous Centre and Radius model, an expert could think that the situation would change drastically and
assign the following prior parameters to the prior distribution explained in 5.36 for the Midpoints:

85

7. Results

3.1
β0 =
0.02
V0 = 10−8
s2 = 0.28822
0

v0 = 106

So the ﬁnal Midpoint model would be:

DogiM idpoint = 3.1 + 0.02 × ZardoyaM idpoint + M idpoint

And the following prior parameters to the prior distribution for the Radii:

0.0283
β0 =
0.08
V0 = 106
s2 = 0.02592
0

v0 = 4

So the ﬁnal Radius model would be:

DogiRadius = 0.0283 + 0.08 × ZardoyaRadius +

The results for the testing set are shown in Figure 7.15 and Table 7.10.

Set ME MAE MSE RMSE

Testing 0.1031 0.2008 0.0443 0.2104

Table 7.10: Error Measures in Bayesian Centre and Radius Method

86

7. Results

4.4

4.2

4

3.8

3.6

3.4

3.2

3
22 22.5 23 23.5 24 24.5

Figure 7.15: Bayesian Centre and Radius Method in testing set

The Bayesian Centre and Radius Method results again better than the rest of approaches, even in
unfavourable conditions.

Therefore, we can conclude that the Bayesian Centre and Radius method has the same advantages
as the Centre and Radius method as [DeCa07] describe, but it has also the great advantages of the
Bayesian methodology. All this makes obtaining less errors in new predictions. An important future
development could be to build a Bayesian symbolic regression model with uniformly distributed
errors.

87

Chapter 8

A Guide to Statistical Software Today

8.1 Introduction

Statistical software begins to blend in one direction with relational database software such as Oracle
or Sybase (software we do not discuss here) and with mathematical software such as MATLAB in
the other direction. Mathematical software exhibits not only statistical capabilities ﬂowing from code
for matrix manipulation, but also optimization and symbolic manipulation useful for statistical pur-
poses. This chapter is an assessment of the state of the art of the statistical software arena as of 2007.
It attempts to touch upon a few commercial packages, a few general public license packages, a few
analysis packages with statistic add-ons, and a few general purpose languages with statistical libraries.

We begin with the most important commercial packages such as SAS, Minitab, BMDP, SPSS
or S-PLUS, followed by some of the public licenses statistical and Bayesian software such as R or
BUGS and then some general purpose mathematical software and some general programming lan-
guage with statistical libraries.

Finally, it is exposed the role of the developed application in the current statistical scene, remark-
ing the main advantages and disadvantages.

88

8. A Guide to Statistical Software Today

8.2 Commercial Packages

8.2.1 The SAS System for Statistical Analysis

SAS began as a statistical analysis system in the late 1960’s growing out of a project in the Depart-
ment of Experimental Statistics at North Carolina State University. The SAS Institute was founded in
1976. Since that time, the SAS System has expanded to become an evolving system for complete data
management and analysis. This means that SAS is really much more than a simple software system.
As an example of its great potential, it is worth to mention that it is used by the 90 percent of those
companies on the Fortune 500 list. This expansion is probably due to the fact that the SAS manage-
ment has aligned themselves with the recent ”statistical-like” advances within the computer science
community such as data mining. This clever integration of mathematical/statistical methodologies,
database technology, and business applications has helped propel SAS to the top of the commercial
statistical software arena.

The architecture for the SAS approach is called the SAS Intelligence Platform, which is really a
closely integrated set of hardware/software components that allow users to fully utilize the business
intelligence (BI) that can be extracted from their client base. Among the products making up the SAS
System are products for: management of large data bases; statistical analysis of time series; statistical
analysis of most classical statistical problems, including multivariate analysis, linear models (as well
as generalized linear models), and clustering; data visualization and plotting. Being more precise, the
SAS Intelligence Platform consists of the following components:

• The SAS Enterprise ETL Servers

• The SAS Intelligence Storage

• The SAS Enterprise BI Server

• The SAS Analytic Technologies

One of the strengths of SAS is the fact that the package which contains those capabilities that one
normally associates with a data analysis package is constantly being upgraded with each release in
order to reﬂect the latest and greatest algorithmic developments in the statistical ﬁeld.

The SAS System is available on PC and UNIX based platforms, as well as on mainframe com-
puters so it covers almost the main options, except Macintosh. As one could guess from what has

89


been said above, this system is aimed mainly to industrial, scientists and statisticians users with a
very high needs and knowledge and who do not care about spending time in learning process to use
this complex system.

Some useful URL’s are:

• http://www.sas.com/ which is the main URL for SAS

• http://is.rice.edu/ radam/prog.html which contains some user-developed tips on using SAS

Other statistical systems which are of the same general vintage as SAS are MINITAB, BMDP and
SPSS. All of these systems began as mainframe systems, but have evolved to smaller scale systems
as computing has evolved.

8.2.2 Minitab

Minitab Inc. was formed more than 20 years ago around its ﬂagship product, MINITAB statistical
software. MINITAB Statistical Software provides tools to analyze data across a variety of disciplines,
and is targeted for users at every level: scientists, business and industrial users, faculty, and students.

In relation to the operating system, MINITAB is available on the most widely-used computer
platforms, including Windows, DOS, Macintosh, OpenVMS, and Unix.

As the opposite of SAS, MINITAB is quite easy to learn and use. There is no lengthy learning
process and little need for unwieldy manuals. Perhaps, this may be the main reason why MINITAB
is used extensively in the educational community.

For more details about this software visit the URL http://www.minitab.com/.

8.2.3 BMDP

BMDP has its roots as a bio-medical analysis packages from the late 1960s. In many ways it has re-
mained true to its origins and this is evidenced by its long list of clients which includes such biomed-
ical giants as Bristol- Myers Squibb, Merck and Glaxo Wellcome. There are three main distributions:

90


BMDP New System Personal Edition, the BMDP Classic for PCs - Release 7, and the BMDP New
System Professional Edition. As BMDP New System has an easy-to-use interface that makes data
analysis possible with simple point and click and fill-in-the-blank interactions, the Professional Edi-
tion combines the full suite of BMDP Classic for PCs Release 7 statistics with the powerful data
management and front-end data exploration features of the BMDP New System Personal Edition.

A reference URL for BMDP is http://www.ppgsoft.com/bmdp00.html.

8.2.4 SPSS

SPSS is a multinational software company founded in the late 1960s that provides statistical product
and service solutions for survey research, marketing and sales analysis, quality improvement, scien-
tific research, government reporting and education.

SPSS starts with the SPSS Base which includes most popular statistics, complete graphics, broad
data management and reporting capabilities. The SPSS products are a modular system and includes
SPSS Professional Statistics, SPSS Advanced Statistics, SPSS Tables, SPSS Trends, SPSS Cate-
gories, SPSS CHAID, SPSS LISREL 7, SPSS Developer’s Kit, SPSS Exact Tests, Teleform, and
MapInfo. Although this software was originally designed for mainframe use, SPSS has adapted to
market demand and it has releases for Windows, MAC and UNIX.

A reference URL for SPSS is http://www.spss.com/.

8.2.5 S-PLUS

While there are many different packages for performing statistical analysis, one that offers some of
the greatest flexibility with regard to the implementation of user defined functions and the customiza-
tion of ones environment is S-PLUS, which is one of the two implementations of the S language (R
is the other one, which will be viewed later).

S is an exceptionally well-developed tool for statistical research and analysis. S is especially
strong for statistical graphics, the output of data analysis through which both raw data and results are
displayed for both analysts and clients. S was originally developed at AT&T Bell Labs (recently split
into AT&T Laboratories and Lucent Bell Labs) by a team of researchers including Richard A. Becker,

91


John M. Chambers, Allan Wilks, William S. Cleveland and Trevor Hastie. The original description
of the S language, which was written by Becker, Chambers, and Wilks in 1988, was awarded by the
Association for Computing Machinery (ACM) for the 1998 Software System Award. The aim of the
language, as expressed by John Chambers, is ”to turn ideas into software, quickly and faithfully”.

A good introduction to the application of S to statistical analysis problems is contained in [Cham92]
and [Cham83]. More recent work that focus on the statistical capabilities of the S-PLUS system can
be found in [Vena02].

S-PLUS is manufactured and supported by the Statistical Sciences Corporation, now a division of
MathSoft. It runs on both PC and UNIX based platforms. In addition the company offers easy links
for the user to call S-PLUS from within C/FORTRAN or for the user to call C/FORTRAN compiled
functions within the S-PLUS environment. Statistical Sciences has made great efforts to keep the
software current with regard to the needs of the statistical community. They have released dedicated
modules which are targeted at speciﬁc application areas.

The S-PLUS home page can be reached at http://www.mathsoft.com/. This site contains an inter-
esting comparison between SAS and S-PLUS.

8.2.6 Others

Other statistically oriented packages enjoying good reputations are SYSTAT, DataDesk, JMP and
StatGraphics. SYSTAT originated as a PC-based package developed by Leland Wilkinson and is now
owned by SPSS. The current version is 6.0 and is a Microsoft Windows oriented product. On the con-
trary, DataDesk is a Macintosh-based product authored by Paul Velleman from Cornell University.
Currently released is version 5.0.1. and it is a GUI-based product which contains many innovative
graphical data analysis and statistical analysis features. More information about DataDesk can be
found at URL: http://www.lightlink.com/datadesk/. JMP is another SAS product that is highly visu-
alization oriented. It is a stand alone product for PC and Macintosh platforms. Information on JMP
can be found at http://www.sas.com/. Statgraphics is an education- orientated statistical software to
be used mainly in Universities and which offers a friendly-user interface. A good reference showing
how to use StatGrahics can be found in [Mat´ 95].
e

92


8.3 Public License Packages

8.3.1 R

R is an Open Source implementation of the well-known S language which was originated at the Uni-
versity of Auckland, New Zealand, in the early 1990’s. It works on multiple computing platforms like
Unix systems or Windows, but the most important characteristic is that a software system that exists
under the Open source paradigm benefits from having ”many pairs of eyes” to examine the software
to help insure quality of the software. An example of the rapid development of this software is that in
1997, only two years after the public release in June 1995, the led team had to select a Core group of
around 10 members, who was responsible for changes to source code.

R software, for the most part, is a command line based language which is organized into vari-
ous packages. Basic packages are installed by default, and the user can download and install a great
variety of packages to be used. There are also several major projects that are ”R spin-offs”, such as
”Bioconductor”, which is an R package for gene expression analysis or ”Omega”, which is another
package focused on providing seamless interface between R and a number of other languages (PERL,
PYTHON, MATLAB). There are two main packages that have to be mentioned because of their im-
portance and implication for this project: JRI and bayesm. The first one deals with the problem of
communicating Java with R. This will let us create a graphical user interface using Swing in Java
and make all the statistical calculates with R. The second one, developed by [Rossi06], contains the
main functions to be used in Bayesian analysis. Precisely, in Bayesian data analysis is where R can
be better than the other statistical softwares.

More information about R can be found at http://www.r-project.org/.

8.3.2 BUGS

The BUGS (Bayesian inference Using Gibbs Sampling) project is concerned with flexible software
for the Bayesian analysis of complex statistical models using Markov chain Monte Carlo methods.
The project began in 1989 in the MRC Biostatistics Unit and led initially to the ”Classic” BUGS
program, and the onto the WinBUGS software developed jointly with the Imperial College School
of Medicine at St. Mary’s, London. Development now also includes the OpenBUGS project in the
University of Helsinki, Finland.

93


The main advantages of these software is, as R, the flexibility that offers to the researcher to
model whatever he needs, but they are slightly more complex to learn than R is. Due to this fact, Phil
Woodward developed BugsXLA, which is an Excel Add-In that not only allows the user to specify
a model as one would in a package such as SAS or S-PLUS, but also aids the specification of priors
and control of the MCMC run itself.

More information can be found in http://www.mrc-bsu.cam.ac.uk/bugs/.

8.4 Analysis Packages with Statistical Libraries

8.4.1 Matlab

MATLAB is an interactive computing environment that can be used for scientific and statistical data
analysis and visualization. The basic data object in MATLAB is the matrix. The user can per-
form numerical analysis, signal processing, image processing and statistics on matrices, thus freeing
the user from programming considerations inherent in other programming languages such as C and
FORTRAN. There are versions of MATLAB for Unix platforms, PC’s running Microsoft Windows
and Macintosh. Because the functions are platform independent, provides the user with maximum
reusability of their work.

MATLAB comes with many functions for basic data analysis and graphics. Most of these are
written as M-file functions, which are basically text files that the user can read and adapt for other
uses. The user also has the ability to create their own M-file functions and script files, thus making
MATLAB a programming language. The recent addition of the MATLAB C-Compiler and C-Math
Library allow the user to write executable code from their MATLAB library of functions, yielding
faster execution times and stand-alone applications.

For researchers who need more specific functionality, MATLAB offers several modules or tool-
boxes. These typically focus on areas that might not be of interest to the general scientific community.
Basically, the toolboxes are a collection of M-file functions that implement algorithms and functions
common to an area of interest.

94


One of the most useful capabilities of MATLAB is the tools available for visualizing data. MAT-
LAB supports standard two and three dimensional scatter plots along with surface plots. In addition it
provides the user with a graphics property editor. As it occurs with R, there is a considerable amount
of contributed MATLAB code available on the internet. One notably useful source of code is avail-
able via the home page for MATLAB at http://www.mathworks.com/, where more information about
this software can be found.

8.4.2 Mathematica

Mathematica is an algebra computational system developed originally by Stephen Wolfram and sold
by his company, Wolfram Research. It has numerical and graphical features and powerful symbolic
processing capabilities but is comparatively complex to learn. Information on Mathematica is avail-
able at URL http://www.wolfram.com/.

8.4.3 Others

Other mathematical software worth noting is MAPLE, with powerful symbolic processing capabili-
ties, and MATHCAD, a package which combines numerical, symbolic, and graphical features. More
information about these software can be found at their official web sites, which are :

• http://www.maplesoft.com/

• http://www.mathsoft.com/

8.5 Some General Languages with Statistical Libraries

8.5.1 Java

It is difficult to asses the state of the art with regards to Java statistical libraries in that there may be
many custom user developed packages that we are unaware of. Given this caveat, there are three main
packages to mention.

The first one is StatCrunch, which provides the user capability to perform interactive exploratory
data analysis, logistic regression, nonparametric procedures, regression and regression diagnostics,

95


and others. The reader is referred to a review that appeared in [West04].

Another source of Java- based statistics functions is the Apache Software Foundation Jakarta
math project. The math project seeks to provide common mathematical functionality to the Java user
community.

The final source for Java- based statistical analysis is the Visual Numerics JSML package. It
provides the user with an integrated set of statistical, visualization, data mining, neural network, and
numerical packages. The reader is referred to http://www.vni.com/products/imsl/jmsl/jmsl.html for
additional discussions on JSML.

8.5.2 C++

C++ is another object oriented language program, like Java, with different statistical libraries. There
are two libraries that are worth mentioning. Goose and Probability and Statistics.

The first one is dedicated to statistical computation, and provide support for t-tests, F-tests,
Kruskall-Wallis tests, Spearman tests and others with an implementation of simple linear regression
models. More information about this in http://www.gnu.org/software/goose/goose.html.

The second one is aimed to Microsoft Windows developers and consists of five packages: statis-
tics, discrete probability, standard probability distributions, hypothesis testing and correlation and
regression. A strength of these modules are their support for various interfaces including C# and C++
.NET. The reader is referred to the URL http://www.webcabcomponents.com/dotNET/dotnet/pss/.

8.6 Developed Software Tool: BARESIMDA

The software tool developed throughout this project, as it has been said above, is based on Java and
R, both of them public licenses software. It has not been developed with the intention of creating
a complete statistical package which can be an alternative to any of the above software. Evidently,
it is very difficult to incorporate all the facilities that those programs have, and much more in a one
year period time for only one developer. In fact, BARESIMDA only focus on regression analysis

96


procedures with different approaches and data. In that sense, the developed tool gathers classical and
Bayesian regression and let user analyze Normal regression models in a very simple way with a very
intuitive graphical user interface. This is a very important feature in Bayesian approach, where there
is a complex theoretical basis and many users may not be familiarized with it.

Another advantage, maybe the most important one, over the rest of statistical package is that
BARESIMDA incorporates regression analysis with interval data in either classical and Bayesian ap-
proach. Not only it displays the analytical results but also let us see graphically the goodness ﬁt and
the centres and radii tendencies.

With this ﬁrst version of BARESIMDA, we have wanted to start the way towards public license
software which can take the advantages from the Java graphical user interface with Swing and from
statistical libraries in R.

97

Chapter 9

Software Requirements Specification

This chapter defines the complete description of the functions to be performed by the BARESIMDA
software, so it will assist the potential users to determine if the software specified meets their needs
or how the software must be modified in order to meet their needs.

This also reduces the development effort since the preparation of the Software Requirements
Specification (SRS) forces the developer to consider rigorously all of the requirements before de-
sign begins and reduces later redesign, recoding, and retesting. Careful review of the requirements
in the SRS can reveal omissions, misunderstandings, and inconsistencies early in the development
cycle when these problems are easy to correct. Likewise it provides a basis for estimating costs and
schedules and a baseline for verification and validation.

9.1 Purpose

The aim of this system is to provide a tool to build different types of regression analysis and check
advantages and disadvantages in each approach that has been developed.

9.2 Intended Audience

The software is intended to be handled by different types of users such:

• Inexperienced people who has a minimum knowledge about what regression is and what con-
sists on.

98

9. Software Requirements Specification

• Students and people with a medium degree of knowledge about regression and minimum infor-
mation about Bayesian paradigm.

• Graduate and experienced people who has a deep knowledge about regression and Bayesian
analysis and want to learn about symbolic regression.

9.3 Functionality

The software is supposed to do those things shown in the following points.

9.3.1 Classical Regression with crisp data

This refers to analytic and graphical analysis of multiple and simple classical Normal regression
models with crisp data. Being precise, the software has to provide the following facilities:

• Regression analysis summary with estimated parameters

• ANOVA table.

• Normality test.

• Heteroscedasticity test.

• Autocorrelated errors test.

• To predict new data.

• Complementary graphics to see the fitted model.

9.3.2 Classical Regression with interval- valued data

As well as it was done with crisp data, a regression analysis must be carried out with symbolic data,
specifically, with interval- valued data. All the functions which were described previously must be
implemented for the centres and radii regressions. In addition, the software will display graphically
the adequacy of the fitted model to the original interval- valued data.

99


9.3.3 Bayesian Regression with crisp data

The user must be capable of creating two different Bayesian models: Normal and Independent Nor-
mal. Since the main characteristic of Bayesian paradigm is the possibility to introduce subjective
information, the application will provide a very intuitive dialog to retrieve user’s beliefs about the
different parameters. The software will display the estimated parameters, it will provide a Normality
test for residuals and input fields to make new predictions.

9.3.4 Bayesian Regression with interval- valued data

As it occurred with classical regression, Bayesian regression must be possible to be carried out with
interval- valued data so the user will be able to incorporate prior information about the centres and
the radii. The analysis options are the same that those for crisp data and it has additional graphics to
see the adequacy of fitted interval- valued data with observed data.

9.3.5 Data Manipulation

The user will be able to type in new data by hand or to load an existing excel file into the application.
In the same way, he will be able to save both source data and the following resulting data:

• Residuals

• Normalized residuals

• Studentized residuals

• Fitted values

• Predicted values in an excel file.

9.3.6 Portability

The application must be able to be executed in the main platforms such as Windows, Linux and Unix.

9.3.7 Maintainability

In the same way, the tool must be well structured to be easily maintainable since changes and exten-
sions in the future are quite probable.

100


9.4 External Interfaces

9.4.1 User Interfaces

The application to be developed will have a Multiple Document Interface (MDI) with a high degree
of usability. The former means that its windows will reside under a single parent window like Figure
C.1 shows.

Figure 9.1: BARESIMDA MDI

This will avoid ﬁlling up the operating system task management interface, as they are hierarchi-
cally organized, and it will let the user hidden/ show/minimize/maximize them as a whole.
The second characteristic means that the user will not have to think too much about what the
application does or how it does it.
There will be an option to conﬁgure the application look to be able to be adapted to user’s prefer-
ences. The user will have the possibility to set the windows look as:

• Unix

• Windows

101


• Windows Classic

• Java

In the same way, the user will be able to indicate if she or he is an experienced or an inexperienced
user, what will help her or him to specify prior information in Bayesian regression.

9.4.2 Software Interfaces

BARESIMDA will connect to a Statistics software which will be responsible for making all compu-
tations and returning them to BARESIMDA. In this way all the operations must be transparent for
the end user through an interface which will let us interact between both programs. This makes the
application more usable.

Regarding input and output data, there will be necessary an interface to let us read and write from
and to an excel ﬁle.

102

Chapter 10

Software Architecture Study

10.1 Hardware/ Software Architecture

The application will be programmed in Java and built, executed and tested with SDK version 1.4.2
or posterior. Speciﬁcally, the graphical user interface will be developed using Swing. This is one
of the most powerful tools for developing user-friendly mechanism for interacting with an applica-
tion giving it a distinctive ”look” and ”feel”. Its libraries are part of the Java Foundation Classes
(JFC) - Java’s libraries for cross- platform GUI development. For more information on JFC visit
http://java.sun.com/products/jfc/. This will let us develop the main interface in a particular system,
but then can be executed in any platform, allowing users from different operating systems use the
look and feel of their own platform.

The software which has been chosen to carry out the statistics processes has been R since it is
distributed under public license, like Java; it let developer a high degree of ﬂexibility to model and to
program the models that he wants to build and it is having a great expansion between statisticians and
scientists.

The way to communicate BARESIMDA and R is through the Java to R interface, called JRI. This
is a .jar library which can be obtained from http://rosuda.org/JRI/ and it allows running R inside Java
applications as a single thread. Basically it loads R dynamic library into Java and provides a Java API
to R functionality. JRI uses native code, but it supports all platforms where Sun’s Java (or compatible)
is available, including Windows, Mac OS X, Sun and Linux. More information about this interface
can be found in the reference cited above.

103

10. Software Architecture Study

Figure 10.1: Interface between BARESIMDA and R

As it was indicated in the previous chapter, BARESIMDA is required to read/ write excel files. For
such purpose, POI project consists of various parts that fit together to deliver the data in a Microsoft
file format to the Java application. Specifically, and according to our requirements, HSSF is the POI
project’s pure Java implementation of the Excel file format. It provides a way to create, modify, read
and write XLS spreadsheets. Being more precise, it offers:

• Low level structures for those with special needs

• An event-model API for efficient read-only access.

• A full user model API for creating, reading and modifying XLS files.

Visit http://jakarta.apache.org/poi/hssf/index.html for more information.

10.2 Logical Architecture

The application will be structured in three levels or layers where each of them will have a well defined
responsibility:

104

10. Software Architecture Study

Figure 10.2: Interface between BARESIMDA and Excel

• gui: it will be responsible for showing the graphical user interface and for getting the input
parameters and requests and passing them to the classes which will process them.

• action: it will contain the main procedures that treat the information and elaborate the regres-
sion models and analysis. The results will be given back to the caller process. It will be
responsible for calling the dao classes too.

• dao: it will be responsible for accessing to permanent data, this is, to load and to save informa-
tion.

Figure 10.3 shows the relation among these packages.

Figure 10.3: Logical Architecture

105

Chapter 11

Project Budget

Project costs for this system have been divided into two types of costs, which will be commented in
the following sections:

• Engineering costs.

• Investment and Materials Costs.

There is also a section summarizing the entire expected budget for the project. There is not any
commercial cost since it is intended to be a public license software for free distribution.

11.1 Engineering Costs

A computer engineer working in the environment where the project is focused on is expected to earn
around 2500 e/month. There is an additional extra cost of 30% for paying the Social Security.

The programmer works 8 hours/day, in a mean of 22 days/month. This makes a mean of 176
hours/month. Thus, the cost per hour is 18.46 e/h.

The estimated time required for the development of the project is divided in the work packages
explained in the beginning of this project:

• Bayesian Data Analysis: 168 hours

• Regression Models: 160 hours.

106

11. Project Budget

• Symbolic Data: 64 hours.

• Requirements Speciﬁcation: 40 hours.

• Architecture Study: 56 hours.

• Design: 80 hours.

• Programming: 416 hours.

• Testing: 40 hours.

The estimated time required for the project is: 1024 hours (5 months and 18 days). Thus, the
estimated engineering costs is: 18903.04e.

11.2 Investment and Elements Costs

The elements used for the development of this project have been computer and software equipment.
These costs can be seen in Table 11.1.

Element Price

Pentium D925 to 3GHz 630e

Other expenses (Internet connection, ofﬁce materials) 60e

Total 690e

Table 11.1: Estimated material costs

The amortization period for this type of elements is considered to be complete after 10000 work-
ing hours. Moreover, the usage rate is considered to be of about 85% of the engineering work hours,
thus obtaining the results shown in Table 11.2.

107

11. Project Budget

Concept Total

Hours of use of the material 870.4 hours

Resources cost/hour 0.19e/h

Total amortization materials costs 165.38e

Table 11.2: Amortization Costs

Thus, the sum of the engineering and material cost is 19068.42e. It can be assumed that the
investment done is about 5% of the engineering cost, thus the investment cost sums 945.15e.

Therefore, the total cost of the project, which is the sum of the engineering, materials, and invest-
ment costs, is estimated to be 20013.57e.

11.2.1 Summarized Budget

The overall expected budget can be observed in Table 11.3.

108

11. Project Budget

Cost Total

Engineering 18903.04e

Material 165.38e

Investment 945.15e

Total 20013.57e

Table 11.3: Summarized Budget

109

Chapter 12

Conclusions

12.1 Bayesian Regression applied to Symbolic Data

Dealing with a current and recent investigation matter such as symbolic data involves having a high
English level, since it is the universal language. On the other hand, a project like this that depends
on current investigations makes more difficult its progress since it is not treating with an established
issue.

A good investigation task requires a rigorous documentation and a complete bibliography. There
must be enough well cited references to let the reader find more information about those points inter-
esting for her or him.

Bayesian methodology is called to be a fundamental element in business processes orientated to
predict and forecast new situations and quantities. Although I have really enjoyed a lot with this
project, I suspect that, with a more complete previous formation in Bayesian data analysis, I could
have saved some initial time in learning concepts that, later, result obvious. This would let me extend
the project to other fields such as regression with hierarchical models or nonparametric Bayesian re-
gression, where the authentic Bayesian potential resides. However, this is probably due to the fact
that the more one knows about an issue, the more one likes it and the more one wants to learn about
it, and therefore the problem would never finish. Regarding to this, the project has responded and
exceeded to the initial personal perspectives, arousing a great interest for the investigation field and
learning to value this hard but exciting arena.

110

12. Conclusions

If I could change any of the past related to the project planning, I would have tried to condense the
study stage in order to spend more time on the application of the software tool to more real problems
and situations. Nevertheless, it would be difficult to carry out since the project development is done
within an academic year, where more activities are done.

12.2 BARESIMDA Software Tool

Public license software is increasingly enormously fortunately. This lets everybody have more op-
tions to choose.

In that sense, R is a great tool for programming new models, but it requires, on one hand, a very
high Statistics knowledge, since people requirements with a low- medium Statistics level are satisfied
by current Statistics software. On the other hand, it requires a medium programming level to be able
to carry out one’s ideas. Moreover, the way in which R handles data results tedious for someone used
to work with matrix representation.

Interconnecting different interfaces or applications is usually a difficult task, especially when there
is very little documentation to establish the connection in both sides. This problem is very important,
and is not usually taken into consideration when integrating different environments.

Concerning Java, it is really incredible all the possibilities and facilities that this programming
language offers. This makes programming task much easier.

12.3 Future Developments

As it can be deduced from all what has been said above and in previous chapters, the project could
have many and different extensions. The more important ones are:

• Bayesian regression with hierarchical models for interval- valued data.

• Bayesian time series for interval- valued data.

• Bayesian linear regression for histogram-valued data.

• Nonparametric Bayesian regression for interval- valued data.

111

12. Conclusions

• Bayesian Vector Autoregressive for interval- valued data.

• Bayesian regression for functional data.

• Bayesian symbolic regression with uniformly distributed errors.

Likewise, the software tool can be improved adding some conventional statistical functions in
order to get public license Statistics software with a user-friendly graphical interface.

12.4 Summary

In one hand, we have built a new Bayesian regression model for interval- valued data ﬁtting better
than other existing approaches providing that prior information is accurate. And, as it has been shown,
this works well for both directly relationship between variables and uncorrelated variables. This is
an important advance in symbolic data ﬁeld, since to our best knowledge there is not any Bayesian
approach for this kind of data.

On the other hand, a new software tool letting user make Bayesian symbolic regression has been
developed. Again, to our best knowledge, there is not any package with the same friendly-user in-
terface and the same facilities. Furthermore, it offers the possibility to make standard and Bayesian
regression with classical and symbolic data individually respectively.

As a result of this project, author and director are working together in a paper about past, present
and future of regression which is intended to be sent to ANALES. In the same way, another possible
article about Bayesian symbolic regression is born in mind to a more transcendent journal such as
Computational Statistics and Data Analysis (CSDA).

112

Appendix A

Probability Distributions

A number of probability distributions together with their density or probability mass function, mean
or variance have been used or mentioned previously. For ease of reference, their deﬁnitions are
regrouped in this chapter, together with a short discussion of their key properties. More information
about these distributions in a Bayesian context can be found in [Gelm04] or [Mat´ 93].
e

A.1 Discrete Distributions

A.1.1 Binomial

The Binomial distribution is perhaps the most commonly encountered discrete distribution in Statis-
tics, and it is used in quality control by attributes and sampling techniques with replacement. Consider
a sequence of n independent trials, each of which can result in one of just two possible outcomes,
namely success and failure. Further assume that the probability of success, p, is the same for each
trial. Let Y denote the number of successes observed in the n trials. Then Y has a Binomial distri-
bution with parameters n and p. Properly, a discrete random variable, Y , has a Binomial distribution
with parameters n and p, denoted Y Bin(n, p), if its probability mass function is given by:

f (y|n, p) = py (1 − p)n−y (A.1)

where n > 0, y = 0, 1, . . . , n and 0 ≤ p ≤ 1.
Likewise, mean and variance are deﬁned:

113

A. Probability Distributions

E(Y ) = np (A.2)
V ar(Y ) = np(1 − p) (A.3)

A.1.2 Geometric

The Geometric distribution is related in a certain way with the previous one. Consider the same
situation that before with a sequence of n independent trials with a constant success probability p
in each trial. In this case the number of trials varies until the first success is obtained, that is, it is
used to model the number of trials until the first success is obtained and it is common in reliability
analysis. Formally, a discrete random variable, Y , has a Geometric distribution with parameter p,
denoted Y Geo(p), if its probability mass function is given by:

f (y|p) = (1 − p)y−1 p (A.4)

where p ≥ 0 and y = 1, 2, . . . , n.

In the same way, mean and variance are denoted by:

1
E(Y ) = (A.5)
p
1−p
V ar(Y ) = (A.6)
p2

A.1.3 Poisson

The Poisson distribution is commonly used to represent count data, such as the number of shares sold
in a fixed time period. As well it is usual to see it in reliability analysis. In an strict way, a discrete
random variable, Y , has a Poisson distribution with parameter λ, denoted Y P (λ), if its probability
mass function is given by:

exp(−λ)λy
f (y|λ) = (A.7)
y!
where λ ≥ 0 and y = 0, 1, . . . , n.

114


In the same way, mean and variance are denoted by:

E(Y ) = λ (A.8)
V ar(Y ) = λ (A.9)

A.2 Continuous Distributions

A.2.1 Uniform

The uniform distribution is used to represent a variable that is known to lie in an interval and equally
likely to be found anywhere in the interval. The main characteristic is that if a variable, X, has a
probability distribution F (x), then the variable Y = F (X) is uniform in the interval [0, 1]. Properly,
a continuous random variable, Y , has a Uniform distribution over the interval [a,b], denoted Y
U (a, b), if its probability density function is given by:

1
b−a a≤y≤b
f (y|a, b) = (A.10)
0 otherwise
where −∞ < a < b < ∞.
Mean and variance are speciﬁed alike by:

a+b
E(Y ) = (A.11)
2
(b − a)2
V ar(Y ) = (A.12)
12

A.2.2 Univariate Normal

The Normal distribution, also called Gaussian distribution, is ubiquitous in statistical work. It is a
family of distributions of the same general form, differing in their location and scale parameters: the
mean and standard deviation, respectively. The standard normal distribution is the normal distribution
with a mean of zero and a variance of one. Formally, a continuous random variable, Y , has a Normal
distribution with mean µ and variance σ 2 , denoted Y N (µ, σ 2 ), if its probability density function
is given by:

1 (y − µ)2
f (y|µ, σ 2 ) = √ exp(− ) (A.13)
2πσ 2 2σ 2

115


where σ 2 ≥ 0, −∞ < µ < ∞ and y ∈ .

Likewise, mean and variance are formulated by:

E(Y ) = µ (A.14)
V ar(Y ) = σ 2 (A.15)

A.2.3 Exponential

This distribution is used to model the time, t, between independent events that happen at a constant
rate,λ. Therefore, this is the distribution of waiting times for the next event in a Poisson process and
is a special case of the Gamma distribution with α = 1. Formally, a continuous random variable,
Y , has an Exponential distribution with parameter λ, denoted Y Exp(λ), if its probability density
function is given by:

f (y|λ) = λexp(−λy) (A.16)

where λ ≥ 0 and y ≥ 0.
Similarly, mean and variance are identiﬁed by:

1
E(Y ) = (A.17)
λ
V ar(Y ) = 1λ2 (A.18)

A.2.4 Gamma

A Gamma distribution is a general type of statistical distribution that is related to the Beta distribution
and arises naturally in processes for which the waiting times between Poisson distributed events are
relevant.

In Bayesian context, the Gamma distribution is the conjugate prior distribution for the inverse of
the normal variance and for the mean parameter of the Poisson distribution.

116


In a formal way, a continuous random variable, Y , has a Gamma distribution with shape and scale
parameters α and β , respectively, denoted Y Gamma(α, β), if its probability density function is
given by:

y
y α−1 exp(− β )
f (y|α, β) = (A.19)
β α Γ(α)
where α > 0 and y > 0.

E(Y ) = αβ (A.20)
V ar(Y ) = αβ 2 (A.21)

A.2.5 Inverse- Gamma

If Y −1 has a Gamma distribution with parameters α and β then Y has the Inverse- Gamma distribu-
tion. In a Bayesian context, this distribution is the conjugate prior distribution for the normal variance.

Formally, a continuous random variable, Y , has an Inverse-Gamma distribution with shape and
scale parameters α and β, respectively, denoted Y Inv − Gamma(α, β), if its probability density

β α y −α−1 exp(− β )
y
f (y|α, β) = (A.22)
Γ(α)
where α > 0, β > 0 and y > 0.

β
E(Y ) = α>1 (A.23)
α−1
β2
V ar(Y ) = α>2 (A.24)
(α − 1)2 (α − 2)

A.2.6 Chi-square

It is an essential distribution in Inference Statistics and in goodness tests. The χ2 distribution is a
v
special case of the Gamma distribution, with shape parameter α = 2 and scale parameter β = 12.

117


Since it is a special case, we need not define again the density function and mean and variance as they
can be deduced easily from the Gamma distribution.

A.2.7 Inverse- Chi-square and Inverse-Scaled Chi-Square

As the χ2 distribution is a special case of the Gamma distribution, the inverse χ2 distribution is a
v
special case of the inverse- Gamma distribution, with shape parameter α = 2 and scale parameter
β = 12. So, for density function and mean and variance, see inverse- Gamma distribution. We also
define the scaled inverse χ2 distribution, which is useful for variance parameters in normal models.
A continuous random variable, Y , has a scaled inverse χ2 distribution with v degrees of freedom and
scale s, denoted Y ScaledInv − χ2 (v, s2 ), if its probability density function is given by:

v
(v)2 v vs2
f (y|v, s) = v v sv y −( 2 +1) exp(− ) (A.25)
Γ( 2 ) 2y

The mean and variance are defined by:

v 2
E(Y ) = s v>2 (A.26)
v−2
2v 2
V ar(Y ) = v>4 (A.27)
(v − 2)2 (v − 4)

Note that this is the same as Inv − Gamma(α = v , β = v s2 ).
2 2

A.2.8 Univariate Student- t

The Student’s t-distribution is a probability distribution that arises in the problem of estimating the
mean of a normally distributed population when the sample size is small. In regression analysis, it is
used to represent the posterior predictive distribution in Normal regression. As anecdote, it is worth
to mention that this distribution was published by William Gosset in 1908, but he was not allowed
to bring it out under his own name, so the paper was written under the pseudonym Student. Strictly,
a continuous random variable, Y , has a Student’s t-distribution with degrees of freedom, denoted
Y t(v), if its probability density function is given by:

Γ( v+1 ) y 2 v+1
f (y|v) = √ 2 v (1 + )− 2 (A.28)
vπΓ( 2 ) v

118


where v > 0 and y > 0.

In the same way, mean and variance are identified by:

E(Y ) = 0 v>1 (A.29)
v
V ar(Y ) = v>2 (A.30)
v−2

A.2.9 Beta

In probability theory and statistics, the Beta distribution is a family of continuous distributions de-
fined on the interval [a, b] differing the values of their two non- negative shape parameters, α and
β. In Bayesian context, the Beta is the conjugate prior distribution for the binomial probability. A
continuous random variable, Y , has a Beta distribution with α and β, denoted Y Beta(α, β) if its
probability density function is given by:

Γ(α + β) α−1
f (y|α, β) = y (1 − y)β−1 (A.31)
Γ(α)γ(β)

where α ≥ 0 and β ≥ 0

Mean and variance are identified by:

α
E(Y ) = (A.32)
α+β
αβ
V ar(Y ) = (A.33)
(α + β)2 (α + β + 1)

A.2.10 Multivariate Normal

The multivariate normal distribution extends the univarate Normal distribution model to fit vector
observations. A p- dimensional vector of continuous random variables, Y = (Y1 , Y2 , . . . , Yp ), is said
to have a multivariate Normal distribution with vector of means µ and variance- covariance matrix Σ,
if its probability density function is given by:

1 p 1 1
f () = ( √ ) 2 |Σ|− 2 exp[− (y − µ) Σ−1 (y − µ)] (A.34)
2π 2

119


Likewise, mean and variance are formulated by:

E(Y ) = µ (A.35)
V ar(Y ) = Σ (A.36)

A.2.11 Multivariate Student- t

It is a multivariate generalization of the Student’s t-distribution. Rigorously, a continuous random
variable has a multivariate Student’s t-distribution with v degrees of freedom, location µ = (µ1 , . . . , µd )
and symmetric, positive definite dxd scale matrix Σ, denoted Y t(v, µ, Σ), if its probability density

Γ( v+d ) 1 1 −(v+d)
f (y|v, µ, Σ) = 2
Σ−1 2 (1 + (y − µ) Σ−1 (y − µ))− 2
d d (A.37)
Γ( v )v π 2 2 v
2
In the same way, mean and variance are defined by:

E(Y ) = µ v > 1) (A.38)
v
V ar(Y ) = Σ v>2 (A.39)
v−2

A.2.12 Wishart

The Wishart is the conjugate prior distribution for the inverse covariance matrix in a multivariate Nor-
mal distribution. It is a multivariate generalization of the Gamma distribution. The integral is finite if
the degrees of freedom parameter, v, is greater or equal to the dimension, k.

Formally, a continuous random variable, Y , has a Wishart distribution with v degrees of freedom
and symmetric, positive definite k × k scale matrix S, denoted Y W ishartv (S), if its probability
density function is given by (W positive definite):

k
vk k(k−1) v + 1 − i −1 − v v+k+1 1
f (y|v, S) = (2 π 2 4 Γ( )) |S| 2 |W |− 2 exp[− tr(S −1 W )] (A.40)
2 2
i=1
Similarly, mean is defined by:

E(Y ) = vS (A.41)

120


A.2.13 Inverse- Wishart

If W −1 W ishartv (S), then W has the inverse- Wishart distribution. This is the conjugate prior
distribution for the multivariate Normal covariance matrix. Formally, a continuous random variable,
Y, has an inverse- Wishart distribution with v degrees of freedom and symmetric, positive definite kxk
scale matrix S, denoted Y Inv − W ishartv (S −1 ), if its probability density function is given by
(W positive definite):

k
vk k(k−1) v + 1 − i −1 v v+k+1 1
f (y|v, S) = (2 2 π 4 Γ( )) |S| 2 |W | 2 exp[− tr(SW −1 )] (A.42)
2 2
i=1

Similarly, mean is defined by:

E(Y ) = (v − k − 1)−1 S (A.43)

121

Appendix B

Installation Guide

B.1 From source folder

Source folder contains the following files and folders:

• BARESIMDA.jar: it is the executable application file. Java Runtime Enviroment 1.4.2 or pos-
terior, R 2.4.1 or posterior and the libraries provided in the folder requires to be installed.

• R Libraries: with the libraries to be moved into the R software library %R HOME%library.

• Java Library: it contains the file to be moved into %JAVA HOME%libext.

%R HOME% and %JAVA HOME% refers to the path in which R and Java are installed respec-
tively. For instance, in Windows, if you have installed them into the root directory C: you should have
C:RR-2.4.1library and C:Javalibext.

B.2 From installer

An installer will be provided to make the installation process much easier. It is not necessary any
previous program, since the installer will install the Java Runtime Environment and R. As result of
executing this installer, a new folder and a shortcut icon will be created.

122

Appendix C

User’s Guide

C.1 Data Entry

C.1.1 Loading an excel file

1. Select the file menu item in the menu bar.

Figure C.1: Load Data Menu

2. Put the mouse over the Load element and click on it.

3. A dialog box shown in Figure C.2 will be displayed. Click on the Search button to select the
excel file to load and indicate the sheet number in the field with that label. If the first row in
the data sheet is the header with the variable names, then click OK to load data. Otherwise,
deselect the variable names option and click OK.

4. Then, data will be displayed in the Data windows as it was done in an excel sheet (see Figure
C.3)

C.1.2 Defining a new variable

1. Ensure that Data window is the active window.

123

C. User’s Guide

Figure C.2: Select File Dialog

Figure C.3: Display Loaded Data

2. Deﬁne the new variable by clicking on the New Variable button (see Figure C.4).

3. You will be required to type in the name of the new variable. Type it in and click OK (see
Figure C.5).

4. A new column will be added to the spreadsheet with the new variable as header (see Figure
C.6).

5. If you want to deﬁne several new variables, repeat from step 2 as necessary

124

C. User’s Guide

Figure C.4: Deﬁne New Variable

Figure C.5: Enter New Variable Name

Figure C.6: Display New Variable

125

C. User’s Guide

C.1.3 Editing an existing variable


2. Click on the Edit Variable button (see Figure C.7).

Figure C.7: Edit Variable

3. A dialog will be displayed. Select the variable to edit and go on (see Figure C.8).

Figure C.8: Select Variable to Be Editted

4. A new dialog will be shown and you will be required to type in the new name of the variable.
Type it in and the variable will be stored with the new name (see Figure C.9).

126

C. User’s Guide

Figure C.9: Enter New Name

C.1.4 Deleting an existing variable


2. Click on the Delete Variable button and a dialog will be displayed.

3. Select the variable to delete and go on. A confirmation dialog will be shown. Confirm that is
the variable to be deleted and the variable and its data will be removed from the application
(see Figure C.10).

Figure C.10: Confirmation

C.1.5 Typing in a new data row


2. Click on the New Row button. If there is any defined variable previously, a row will be added
to the spreadsheet with so many columns as variables are defined (see Figure C.11).

3. Double-click onto the cell to edit and enter the new value. When you finish, press enter (see
Figure C.12).

4. Repeat step 2 and 3 as necessary.

127

C. User’s Guide

Figure C.11: New Row data

Figure C.12: Type Data

C.1.6 Deleting an existing data row


2. Select the data row or rows to be deleted. Then click on the Delete Row button. A conﬁrmation
dialog will be displayed.

3. Conﬁrm and all data in those rows will be removed.

128

C. User’s Guide

C.1.7 Modifying an existing data


2. Select the data cell to be modified and double-click onto it. You will be able to edit the cell
value. When you finish, press Enter.

C.2 Configuration

C.2.1 Setting the look& feel

1. Select the Look&Feel item in the Configuration element of the menu bar (see Figure C.13).

Figure C.13: Look And Feel Menu

2. Select the Look&Feel style you want. The available options are: Metal (Java style), CDE/
Motif (Unix/ Linux style), Windows and Windows Classic (see Figure C.14).

Figure C.14: Look And Feel Styles

3. When you have selected your option (for instance CDE/ Motif), the application appearance will
be modified (see Figure C.15).

C.2.2 Selecting the type of user

1. Select the Type Of User item in the Configuration element of the menu bar (see Figure C.16).

2. A dialog will be displayed. Select the type of user you are and accept (see Figure C.17). This
will be useful to define prior information in Bayesian regression.

129

C. User’s Guide

Figure C.15: New Look And Feel

Figure C.16: Type Of User Menu

Figure C.17: Select Type Of User

130

C. User’s Guide

C.3 Non Symbolic Regression

C.3.1 Simple Classical Regression

1. Select the Classical menu in the Non-Symbolic Regression element of the menu bar. Then,
select Simple Regression (see Figure C.18).

Figure C.18: Non-Symbolic Classical Regression Menu

2. You will be required to select the independent and dependent variables from the deﬁned vari-
ables. Select them and go on (see Figure C.19).

Figure C.19: Select Non-Symbolic Variables in Simple Regression

3. A brief report will be displayed in the Classical Simple regression window indicating that for
more details see Analysis Options in the ToolBar (see Figure C.20).

4. From this point, you can:

(a) Change dependent and independent variables in the Variables Options, by selecting them
again as it was done before.

131

C. User’s Guide

Figure C.20: Brief Report

(b) Select tests and analysis in the Analysis Options, by clicking on the wanted analysis
options. The analysis options available are shown in Figure C.21.

Figure C.21: Analysis Options in Non-Symbolic Classical Simple Regression

To make new predictions, you will have to select the predict option and introduce the new
observed value and press OK (see Figure C.22).
(c) Select graphics in the Graphics Options, by clicking on the wanted graphics options. The
graphics options available are show in Figure C.28.
(d) Save some results in the Save Options, by clicking on the wanted save options and select-
ing the ﬁle where is going to be saved. The save options available are shown in Figure
C.24.

132

C. User’s Guide

Figure C.22: New Prediction in Non-Symbolic Classical Simple Regression

Figure C.23: Graphics options in Non-Symbolic Classical Simple Regression

C.3.2 Multiple Classical Regression

1. Select the Classical menu in the Non-Symbolic Regression element of the menu bar. Then,
select Multiple Regression (see Figure C.25).

2. You will be required to select the dependent and independent variables from the deﬁned vari-
ables. Select them and go on (see Figure C.26).

133

C. User’s Guide

Figure C.24: Save options in Non-Symbolic Classical Simple Regression

Figure C.25: Non-Symbolic Classical Multiple Regression Menu

Figure C.26: Select Variables in Non-Symbolic Classical Multiple Regression

3. From this point a new Multiple Classical regression window is created, and the procedure is
similar to that described in Simple Classical Regression. Therefore the user is referenced to
that section to see how to select variable, analysis, graphics and save options.

(a) Available Analysis Options can be seen in Figure C.21.
There are two new analysis options: backward and forward selection. This will let you to

134

C. User’s Guide

Figure C.27: Analysis options in Non-Symbolic Classical Multiple Regression

know those independent variables that really inﬂuences into the dependent variable.
(b) Available Graphics Options are shown in Figure C.28.

Figure C.28: Graphics options in Non-Symbolic Classical Multiple Regression

(c) Available Save Options can be seen in Figure C.29.

4. You will be able to select if there is an intercept in the model or not by clicking on the Model
option (see Figure C.30).

135

C. User’s Guide

Figure C.29: Save options in Non-Symbolic Classical Multiple Regression

Figure C.30: Intercept in Non-Symbolic Classical Multiple Regression

C.3.3 Simple Bayesian Regression

1. Select the Bayesian menu in the Non-Symbolic Regression element of the menu bar. Then,

Figure C.31: Non-Symbolic Bayesian Simple Regression Menu

ables as it was done in Simple Classical Regression. Select them and go on (see Figure C.32).

3. A new Bayesian Simple Regression window will be created. The estimates mean and standard
deviation of the parameters will be displayed as well as the 95% highest posterior density
interval and the standard numerical error.

4. You will be able to select variable, analysis, graphics and save options as it was done in Simple

136

C. User’s Guide

Figure C.32: Select Variables in Non-Symbolic Bayesian Simple Regression

Classical Regression, although for Bayesian regression, these options are more limited. How-
ever, the procedure is the same.

(a) Available Analysis Options are shown in Figure C.33.

Figure C.33: Analysis Options in Non-Symbolic Bayesian Simple Regression

(b) Available Graphics Options can be seen in C.34.
(c) Available Save Options are shown in Figure C.35.

5. In Bayesian regression, new options are available in the ToolBar:

(a) Specifying Prior Information, by clicking on the Prior Information item in the ToolBar. A
new input dialog will be displayed, where you will be able to specify prior information.
If you have selected Experienced User in the Type Of User option in the Conﬁguration
menu, you will see a dialog like that shown in Figure C.38.

137

C. User’s Guide

Figure C.34: Graphics Options in Non-Symbolic Bayesian Simple Regression

Figure C.35: Save Options in Non-Symbolic Bayesian Simple Regression

Figure C.36: Prior Experienced Especiﬁcation Options in Non-Symbolic Bayesian Simple Regression

138

C. User’s Guide

Otherwise, you will see that shown in Figure C.37.

Figure C.37: Prior Inexperienced Especiﬁcation in Non-Symbolic Bayesian Simple Regression

(b) Selecting the Bayesian regression model, by clicking on the Model option in the ToolBar.

Figure C.38: Prior Experienced Especiﬁcation in Non-Symbolic Bayesian Simple Regression

C.3.4 Multiple Bayesian Regression

1. Select the Bayesian menu in the Non-Symbolic Regression element of the menu bar. Then,

Figure C.39: Non-Symbolic Bayesian Multiple Regression menu

ables as it was done in Multiple Classical Regression. Select them and go on.

3. A new Bayesian Multiple regression will be created. From this point the procedure is the same
as in Bayesian Simple Regression.

(a) Analysis Options are shown in Figure C.40.

139

C. User’s Guide

Figure C.40: Analysis Options in Non-Symbolic Bayesian Multiple Regression

(b) Graphics options can be seen in Figure C.41.

Figure C.41: Graphics Options in Non-Symbolic Bayesian Multiple Regression

(c) Save Options are shown in Figure C.42.

Figure C.42: Save Options in Non-Symbolic Bayesian Multiple Regression

(d) Model Options are those shown in Figure C.43.

C.4 Symbolic Regression

C.4.1 Simple Classical Regression

1. Select the Classical menu in the Symbolic Regression element of the menu bar. Then, select
Simple Regression (see Figure C.44).

140

C. User’s Guide

Figure C.43: Model Options in Non-Symbolic Bayesian Multiple Regression

Figure C.44: Symbolic Classical Simple Regression Menu

2. You will be required to select the minimum and maximum dependent and minimum and max-
imum independent variables from the deﬁned variables. Select them and go on (see Figure
C.45).

Figure C.45: Select Variables in Symbolic Classical Simple Regression

3. A brief report will be displayed for midpoints and radii analysis. This is very similar to the case
for Non-Symbolic Regression, but now you will have an analysis for the midpoints and other
one for the radii. In this case, there more graphics options.

(b) Graphics Options can be seen in Figure refGraphics SCSS.

141

C. User’s Guide

Figure C.46: Analysis Options in Symbolic Classical Simple Regression

Figure C.47: Graphics Options in Symbolic Classical Simple Regression

(c) Save Options are the same that in Non-Symbolic Regression were.

142

C. User’s Guide

C.4.2 Multiple Classical Regression

1. Select the Classical menu in the Symbolic Regression element of the menu bar. Then, select
Multiple Regression (see Figure C.48).

Figure C.48: Symbolic Classical Multiple Regression Menu

imum independent variables from the deﬁned variables (se Figure C.49). Ensure that the ﬁrst
maximum independent variable selected is the adequate for the minimum independent variable
chosen.

Figure C.49: Select Variables in Symbolic Classical Multiple Regression

3. A brief report will be displayed for midpoints and radii analysis. This is very similar to the case
for Non-Symbolic Regression, but now you will have an analysis for the midpoints and other
one for the radii. In this case, there more graphics options.

(b) Graphics Options can be seen in Figure C.51.
(c) Save Options are the same the same that were in Non-Symbolic Regression.

143

C. User’s Guide

Figure C.50: Analysis Options in Symbolic Classical Multiple Regression

Figure C.51: Graphics Options in Symbolic Classical Multiple Regression

C.4.3 Simple Bayesian Regression

1. Select the Bayesian menu in the Symbolic Regression element of the menu bar. Then, select

2. You will be required to select the minimum and maximum dependent and minimum and maxi-
mum independent variables from the deﬁned variables (see Figure C.53).

144

C. User’s Guide

Figure C.52: Symbolic Bayesian Simple Regression

Figure C.53: Select Variables in Symbolic Bayesian Simple Regression

3. A new Bayesian Simple Regression window will be created. The estimates mean and standard
deviation of the midpoints and radii parameters will be displayed as well as the 95% highest
posterior density interval and the standard numerical error.

4. You will be able to select variable, analysis, graphics and save options as it was done in Non-
Symbolic Regression.

(a) Available Analysis Options are shown in Figure C.54.

Figure C.54: Analysis Options in Symbolic Bayesian Simple Regression

(b) Available Graphics Options can be seen in Figure C.55.

145

C. User’s Guide

Figure C.55: Graphics Options in Symbolic Bayesian Simple Regression

(c) Save Options are the same that were in Non-Symbolic Regression.

5. As it occurred in Non- Symbolic Regression, in Bayesian analysis, new options are available in
the ToolBar:

(a) Specifying Midpoints and Radii Prior Information, by clicking on the Midpoints or Raddi
Prior Information item in the ToolBar. A new input dialog will be displayed, where you
will be able to specify prior information.
(b) Selecting the Bayesian regression model, by clicking on the Model option in the ToolBar
(see Figure C.56).

C.4.4 Multiple Bayesian Regression

1. Select the Bayesian menu in the Symbolic Regression element of the menu bar. Then, select

imum independent variables from the deﬁned variables (see Figure C.58). Ensure that the ﬁrst

146

C. User’s Guide

Figure C.56: Model Options in Symbolic Bayesian Simple Regression

Figure C.57: Symbolic Bayesian Multiple Regression Menu

maximum independent variable selected is the adequate for the minimum independent variable
chosen.

Figure C.58: Select Variables in Symbolic Bayesian Multiple Regression

3. A new Bayesian Multiple Regression window will be created. The estimates mean and standard
deviation of the midpoints and radii parameters will be displayed as well as the 95% highest
posterior density interval and the standard numerical error.

4. You will be able to select variable, analysis, graphics and save options as it was done in Non-
Symbolic Regression.

(a) Analysis Options are the same that were in Non-Symbolic Regression.

147

C. User’s Guide

(b) Graphics Options are shown in Figure C.59.

Figure C.59: Graphics Options in Symbolic Bayesian Multiple Regression

(c) Save Options are the same that were in Non-Symbolic Regression.

5. As it occurred in Non- Symbolic Regression, in Bayesian analysis, new options are available in
the ToolBar:

(a) Specifying Midpoints and Radii Prior Information, by clicking on the Midpoints or Raddi
Prior Information item in the ToolBar. A new input dialog will be displayed, where you
will be able to specify prior information.
(b) Selecting the Bayesian regression model, by clicking on the Model option in the ToolBar.

148

Appendix D

Obtaining and Installing R

The way to obtain R is to download it from one of the CRAN (Comprehensive R Archive Network)
sites. The main site is http://cran.r-project.org. It has a number of mirror sites worldwide, which may
be closer to you and give faster download times.

Installation details tend to vary over time, so you should read the accompanying documents and
any other information offered on CRAN.

D.1 Binary distributions

The version for recent variants of Microsoft Windows comes as a single SetupR.exe file, on which
you simply double- click with the mouse and then follow the on- screen instructions. When the pro-
cess is completed, you will have an entry under Programs on the Start men for invoking R, as well as
a desktop icon.

For Linux distributions that use RPM package format (RedHat, Mandrake,LinuxRPC and SuSE)
and also for Alpha Unix (OSF/Tru64), .rpm files of R and the recommended add-on packages can be
installed using the rpm command. Packages for the Debian APT package manager are also available.

For the Macintosh platforms there are two different binary distributions: the ”Carbon” R and the
”Darwin” R. The first version is intended to run natively on MacOS System from 8.6 to OS X, and
the second one as a usual Unix command undex OS X. The Darwin R also requires an X windows
manager like XDarwin to use the X11 graphic device.

149

D. Obtaining and Installing R

Carbon R comes in single .sit archive file that you simply decompress by dragging the file onto
Stuffit Expander ad move the resulting folder rmxyz into your favourite applications folder. The Dar-
win version is a .tgz archive, which can be installed, after decompression, with some (fairly trivial)
manual adjustments.

Darwin R can also be installed using the ”fink”. Fink installs all dynamic libraries that might be
needed, and it can update R to newer versions when available.

D.2 Installation from source

Installation from source code is possible on all supported platforms, although nontrivial on Macintosh
and Windows, mainly because the build environment is not part of the system. On Unix-like systems
(Macintosh OS X included), the process can be as simple as unpacking the sources and writing

.configure

make

make install

and then you would unpack the recommended package bundle, change to its directory and enter

R CMD INSTALL *.tar.gz

The above works on widely used platforms, provided that the relevant compilers and support li-
braries are installed. If your system is more esoteric or you want to use special compilers or libraries,
then you may need to dig deeper.

For Windows and Carbon Macintosh, the directories src/gnuwin32 and src/macintosh have IN-
STALL file with detailed information about the procedure to follow.

150

D. Obtaining and Installing R

D.3 Package installation

To install R packages such as bayesm under Unix/Linux or Windows, you can connect to the Internet,
start R, and enter

install.packages(”bayesm”,.libPaths()[1])

The Windows version provides a convenient menu interface for the operation.

If your R machine is not connected to the Internet, you can also download the package the pack-
age as a file and install that. For Windows and the Carbon version of Macintosh, you need to get
the binary package (.zip or .sit extension). For Windows, installation from a local .zip file is possible
via a menu entry. For Macintosh users, the procedure is described in the Macintosh FAQ. For Unix
and Linux, you can issue the following at the shell prompt (the -l option allows you to give a private
library):

R CMD INSTALL bayesm

On Unix and Linux systems you will need superuser permissions to install. Otherwise you can
set up a private library directory and install into that. Use the R LIBS environment variable to use
your private library subsequently. A similar issue arises if R is installed on a read-only file system in
a Windows environment. Further details can be found in the help page for library.

Information and further Internet resources for R can be obtained from CRAN and the R homepage
at http://www.r-project.org. Notice in particular the mailing lists, the user-contributed documents, and
the FAQs.

151

Appendix E

Obtaining and installing Java Runtime
Environment

The way to obtain the Java Runtime Enviroment (JRE) is to download it from Sun Microsystems offi-
cial site. The main site is http://java.sun.com, from where you can select the version to be downloaded.
The link to download the current version, which is J2SE v1.4.2 14 JRE, is http://java.sun.com-
/j2se/1.4.2/download.html.

E.1 Microsoft Windows

You must have administrative permissions in order to install the Java 2 Runtime Environment on Mi-
crosoft Windows 2000 and XP. The download page provides the following two choices of installation.
Continue based on your choice.

1. Windows Installation- After clicking the ”Download” link for the JRE, a dialog box pops up.
Choose the open option to start a small program which then prompts you for more information
about what you want to install.

2. Windows Offline Installation- After clicking the JRE ”Download” link for the ”Windows Of-
fline Installation”, a dialog box pops up. Choose the save option to save the downloaded file
without installing it. Run this file by double-clicking on the installer’s icon. Then follow the
instructions that the installer provides. When done with the installation, you can delete the
downloaded file to recover disk space.

152

E. Obtaining and installing Java Runtime Environment

E.2 Linux

Java 2 Runtime Environment 1.4.2 is available in two installation formats:

1. Self-extracting Binary File - This file can be used to install the Java 2 Runtime Environment in
a location chosen by the user. This one can be installed by anyone (not only root users), and
it can easily be installed in any location. As long as you are not root user, it cannot displace
the system version of the Java platform supplied by Linux. To use this file, see Installation of
Self-Extracting Binary below.

2. RPM Packages - A rpm.bin file which contains RPM packages, installed with the rpm utility.
It requires root access to install, and installs by default in a location that replaces the system
version of the Java platform supplied by Linux. To use this bundle, see Installation of RPM
File below.

Choose the install format that is most suitable to your needs.

E.2.1 Installation of Self-Extracting Binary

Use these instructions if you want to use the self-extracting binary file to install the Java 2 Runtime
Environment. If you want to install RPM packages instead, see Installation of RPM File.

1. Download and check the download file size to ensure that you have downloaded the full, uncor-
rupted software bundle. You can download to any directory you choose; it does not have to be
the directory where you want to install the Java 2 Runtime Environment. Before you download
the file, notice its byte size provided on the download page on the web site. Once the download
has completed, compare that file size to the size of the downloaded file to make sure they are
equal.

2. Make sure that execute permissions are set on the self-extracting binary. Run this command:
chmod +x j2re-1 4 2 14-linux-i586.bin.

3. Change directory to the location where you would like the files to be installed. The next step
installs the Java 2 Runtime Environment into the current directory.

4. Run the self-extracting binary. Execute the downloaded file, prep ended by the path to it. For
example, if the file is in the current directory, prep end it with ”./” (necessary if ”.” is not in the

153


PATH environment variable):

./j2re-1 4 2 14-linux-i586.bin

The binary code license is displayed, and you are prompted to agree to its terms. The Java
2 Runtime Environment files are installed in a directory called j2re1.4.2 14 in the current
directory.

E.2.2 Installation of RPM File

Use these instructions if you want to install Java 2 Runtime Environment in the form of RPM pack-
ages. If you want to use the self-extracting binary file instead, see Installation of Self-Extracting
Binary.

1. Download and check the file size. You can download to any directory you choose. Before you
download the file, notice its byte size provided on the download page on the web site. Once the
download has completed, compare that file size to the size of the downloaded file to make sure
they are equal.

2. Extract the contents of the downloaded file. Change directory to where the downloaded file is
located and run these commands to first set the executable permissions and then run the binary
to extract the RPM file:

chmod a+x j2re-1 4 2 14-linux-i586-rpm.bin

./j2re-1 4 2 14-linux-i586-rpm.bin

Note that the initial ”./” is required if you do not have ”.” in your PATH environment variable.

The script displays a binary license agreement, which you are asked to agree to before installation
can proceed. Once you have agreed to the license, the install script creates the file j2re-1 4 2 14-
linux-i586.rpm in the current directory.

1. Become root by running the su command and entering the super-user password.

154


2. Run the rpm command to install the packages that comprise the Java 2 Runtime Environment:

rpm -iv j2re-1 4 2 14-linux-i586.rpm

3. Delete the bin and rpm file if you want to save disk space.

4. Exit the root shell.

E.3 UNIX

1. Check the download file size. You can download to any directory you choose; it does not have
to be the directory where you want to install the J2RE. Before you download the file, notice its
byte size provided on the download page on the web site. Once the download has completed,
compare that file size to the size of the downloaded file to make sure they are equal.

2. Make sure that execute permissions are set on the self-extracting binary:

On SPARC processors: chmod +x j2re-1 4 2 14-solaris-sparc.sh

On x86 processors: chmod +x j2re-1 4 2 14-solaris-i586.sh

3. Change directory to the location where you would like the files to be installed. The next step
installs the J2RE into the current directory.

4. Run the self-extracting binary. Execute the downloaded file, prep ending the path to it. For
example, if the downloaded file is in the current directory, prep end it with ”./”:

On SPARC processors: ./j2re-1 4 2 14-solaris-sparc.sh

On x86 processors: ./j2re-1 4 2 14-solaris-i586.sh

The binary code license is displayed, and you are prompted to agree to its terms. The J2RE
files are installed in a directory called j2re1.4.2 14 in the current directory.

155


More information about installation process on different kinds of operating systems can be found
in the Sun Microsystems ofﬁcial site which has been mentioned above.

156

Bibliography

[Aitk97] Aitkin, M., The calibration of P-values, posterior Bayes factors and the AIC from the pos-
terior distribution of the likelihood, Statistics and Computing 7 (4), 253-261. 1997

[Arro06] Arroyo, J. and Mat´ , C., Introducing interval time series: accuracy measures, COMPSTAT,
e
Rome 2006.

[Berg05] Berg, B.A., Introduction to Markov Chain Monte Carlo Simulations and their Statistical
Analysis, NATIONAL UNIVERSITY OF SINGAPORE 7. 2005

[Berg98] Berger, J. and Pericchi, L., Accurate and stable Bayesian model selection: the median
intrinsic Bayes Factor, The Indian Journal Of Statistics 60 (1), 1-18. 1998

[Bill00] Billard, L. and Diday, E., Regression Analysis for Interval-Valued Data, Data Analysis,
Classiﬁcation and Related Methods: Proceedings of the Seventh Conference of the International
Federation of Classiﬁcation Societies, Namur, Belgium 2000.

[Bill02] Billard, L. and Diday, E., From the Statistics of Data to the Statistics of Knowledge: Sym-
bolic Data Analysis, Journal of the American Statistical Association 98 (462), 470-487. 2002.

[Bill06a] Billard, L. and Diday, E., Symbolic Data Analysis: Conceptual Statistics and Data Mining,
Wiley ,England 2006.

[Bill06b] Billard, L. and Diday, E., Symbolic Data Analysis: what is it?, COMPSTAT, Rome 2006.

[Cham83] Chambers, J.M., Cleveland, W.S., Kleiner, B. and Tukey, P.A., Graphics Methods for Data
Analysis, Wadsworth, 1983.

[Cham92] Chambers, J.M. and Hastie, T.J., Statistical Models in S, Hall/CRC, 1992.

[Chen00] Chen, M., Shao, Q. and Ibrahim, J.G., Monte Carlo Methods in Bayesian Computation,
Springer, New York 2000.

157

[Chen03] Cheng, R. and Sahu, S., A fast distance based approach for determining the number of
components in mixtures, Canadian Journal of Statistics 31, 3-22, 2003.

[Cong06] Congdon, P., Bayesian Statistical Modelling, Wiley, England 2006.

[Dalg02] Dalgaard, P., Introductory Statistics with R, Springer, New York 2002.

[DeCa04] De Carvalho, F.A.T., Freire, E.S. and Lima Neto, E.A. A New Method to Fit a Linear
Regression Model for Interval-Valued Data, KI 2004: Advances in Artificial Intelligence: 27th
Annual German Conference in AI, 295-306, Springer, Ulm, Germany, 2004.

[DeCa05] De Carvalho, F.A.T., Freire, E.S. and Lima Neto, E.A. Applying Constrained Linear Re-
gression Models to Predict Interval-Valued Data , KI 2005: Advances in Artificial Intelligence
3698, 92-106, Springer, Koblenz, Germany 2005.

[DeCa07] De Carvalho, F.A.T. and Lima Neto, E.A., Centre and Range method for fitting a linear
regression model to symbolic interval data, Computational Statistics and Data Analysis, 2007.

[Dida95] Diday, E., Probabilist, Possibilist and Belief Objects for Knowledge Analysis, Annals of
Operations Research, 55, 227-276, 1995.

[Gelf90] Gelfand, A.E. and Smith, A.F.M., Sampling-based approaches to calculating marginal den-
sities, Journal of the American Statistical Association 85, 398-409, 1990.

[Gelm04] Gelman, A., Carlin, J.B., Stern, H.S. and Rubin, D.B., Bayesian Data Analysis, Hall/CRC,
Boca Raton, Florida 2004.

[Gilk95] Gilks, W.R., Best, N. and Tan, K.K.C., Adaptive rejection Metropolis sampling within Gibbs
sampling, Applied Statistics 44, 455-472, 1995.

[Gosh03] Gosh, J.K. and Ramamoorthi, R.V., Bayesian Nonparametrics, Spriger, New York 2003.

[Hast70] Hastings, W.K., Monte Carlo sampling methods using Markov chains and their applica-
tions, Biometrika 57, 97-109, 1970.

[Huiw06] Huiwen, W., Mok, H.M.K. and Dapeng, L., Factor interval data analysis and its applica-
tion, COMPSTAT, Rome 2006.

[Irpi05] Irpino, A., ”Spaghetti” PCA analysis: An extension of principal component analysis to time
dependent interval data, Pattern Recognition Letters, 2005.

158

[Jeff61] Jeffreys, H., Theory of Probability, Oxford University Press, 1961.

[Kend05] Kendall, W. S., Liang, F. and Wang, J-S., Markov chain Monte Carlo: Innovations and
Applications, National University of Singapore 7, 2005.

[Koop03] Koop, G., Bayesian Econometrics, Wiley, England 2003.

[Laws74] Lawson, C.l. and Hanson, R.J., Solving Least Squares Problem, Prentice-Hall, New York
1974.

[Lee 06] Lee, C-H.L., Liu, A. and Chen, W-S., Pattern Discovery of Fuzzy Time Series for Financial
Prediction, IEEE 18, (5), 2006.

[Mart01] Martinez, W.L. and Martinez, A.R., Computational Statistics Handbook with MATLAB ,
Hall/CRC, Boca Raton, Florida 2001.

e e ´
[Mat´ 93] Mat´ , C. and Sarabia, A., Problemas de Probabilidad y Estad´stica, CLAGSA, Madrid
ı
1993.

[Mat´ 95] Mat´ , C., Curso General sobre StatGraphics II, Universidad Pontiﬁca Comillas, Madrid
e e
1995.

[Mat´ 06] Mat´ , C., An´ lisis Bayesiano de Datos, Asociaci´ n Espaola para la Calidad, Madrid 2006.
e e a o

[Metr53] Metropolis, N., Rosenbluth, A.W., Rosenbluth, M.N., Teller, A.H. and Teller, E., Equation
of state calculations by fast computing machines, Journal of Chemical Physics 21, 1087-1092,
1953.

[Mont02] Montgomery, D.C. and Runger, G.C., Probabilidad y Estad´stica Aplicadas a la Inge-
ı
nier´a, Wiley, 2002.
ı

[Mull04] Muller, P. and Quintana, F.A., Nonparametric Bayesian Data Analysis, Statistical Science
19, 95-110, 2004.

[Poir95] Poirier, D., Intermediate Statistics and Econometrics: A Comparative Approach., The MIT
Press, Cambridge 1995.

[Rossi06] Rossi, P.E., Allenby, G. and McCulloch, R., Bayesian Statistics and Marketing, Wiley,
New York 2006.

159

[Rupp04] Rupp, A.A., Dey, D.K. and Zumbo, B.D., To Bayes or Not to Bayes, From Whether to
When: Applications of Bayesian Methodology to Modeling, Structural Equation Modeling: A
Multidisciplinary Journal 11 (3), 424-451. 2004.

[Spie03] Spiegelhalter, D., Thomas, A., Best, N., Gilks, W. and Lunn, D., BUGS: Bayesian inference
using Gibbs sampling, 2003.

[Urba92] Urbach, P., Regression Analysis: Classical and Bayesian , The British Journal for the Phi-
losophy of Science 43 (3), 311-342, 1992.

[Vena02] Venables, W.N. and Ripley, B.D., Modern Applied Statistics with S, Springer, New York
2002.

[West04] West, R.W., Wu,T. and Heydt, D., An introduction to StatCrunch 3.0, Journal of Statistical
Software 9 (6), 2004.

[Zamo01] Zamora, MM. and Estavillo, J., Modelo de regresi´ n normal cl´ sico, 2001.
o a

160

Bayesian Regression System for Interval-valued data

More Related Content

Similar to Bayesian Regression System for Interval-valued data (20)

Bayesian Regression System for Interval-valued data