SlideShare a Scribd company logo
Autorizada la entrega del proyecto del alumno:

              Rub´ n Salgado Fern´ ndez
                 e               a




        EL DIRECTOR DEL PROYECTO

              Carlos Mat´ Jim´ nez
                        e    e




Fdo.:                            Fecha: 12/06/2007




Vo Bo DEL COORDINADOR DE PROYECTOS

           Claudia Meseguer Velasco




Fdo.:                            Fecha: 12/06/2007
UNIVERSIDAD PONTIFICIA DE COMILLAS


        ESCUELA TECNICA SUPERIOR DE INGENIER´ (ICAI)
                 ´                          IA



                                  ´
           INGENIERO EN ORGANIZACION INDUSTRIAL




         PROYECTO FIN DE CARRERA


    Bayesian Regression System
      for Interval-Valued Data.
Application to the Spanish Continuous
             Stock Market




                          AUTOR : Salgado Fern´ ndez, Rub´ n
                                              a          e

                                      M ADRID , Junio 2007
Acknowlegdements

Firstly, I would like to thank my director, Carlos Mat´ Jim´ nez, PhD, for giving me the chance of
                                                      e    e
making this project. With him, I have learnt, not only about Statistics and investigation, but also
about how to enjoy with them.


   Special thanks to my parents. Their love and all they have taught me in this life are the things
what have made possible being the person I am now.


   Thanks to my brothers, my sister and the rest of my family for their support and for the stolen time.


   Thanks to Charo for standing my bad mood in the bad moments, for supporting me and for giving
me the inspiration to go ahead.



                                                                                    Madrid, June 2007




                                                   i
Resumen

       ´
En los ultimos a˜ os los m´ todos Bayesianos se han extendido y se han venido utilizando de forma
                n         e
exitosa en muchos y variados campos tales como marketing, medicina, ingenier´a, econometr´a o mer-
                                                                            ı            ı
cados financieros. La principal caracter´stica que hace destacar al an´ lisis Bayesiano de datos (AN-
                                       ı                             a
BAD) frente a otras alternativas es que, no s´ lo tiene en cuenta la informaci´ n objetiva procedente de
                                             o                                o
los datos del suceso en estudio, sino tambi´ n el conocimiento anterior al mismo. Los beneficios que
                                           e
se obtienen de este enfoque son m´ ltiples ya que, cuanto mayor sea el conocimiento de la situaci´ n,
                                 u                                                               o
                           a                          ´
con mayor fiabilidad se podr´ n tomar las decisiones y estas ser´ n m´ s acertadas. Pero no siempre todo
                                                               a a
han sido ventajas. El ANBAD, hasta hace unos a˜ os, presentaba una serie de dificultades que limita-
                                              n
ban el desarrollo del mismo a los investigadores. Si bien la metodolog´a Bayesiana existe como tal
                                                                      ı
desde hace bastante tiempo, no se ha empezado emplear de manera generalizada hasta los 90’s. Esta
expansi´ n ha sido propiciada en gran parte por el avance en el desarrollo computacional y la mejora y
       o
perfeccionamiento de distintos m´ todos de c´ lculo como los m´ todos de cadenas de Markov-Monte
                                e           a                 e
Carlo.


                               ı                                     ´
    En especial, esta metodolog´a se ha mostrado extraordinariamente util en la aplicaci´ n a los mod-
                                                                                        o
elos de regresi´ n, ampliamente adoptados. En m´ ltiples ocasiones en la pr´ ctica, se dan situaciones
               o                               u                           a
en las que se requiere analizar la relaci´ n entre dos variables cuantitativas. Los dos objetivos fun-
                                         o
damentales de este an´ lisis ser´ n, por un lado, determinar si dichas variables est´ n asociadas y en
                     a          a                                                   a
qu´ sentido se da dicha asociaci´ n (es decir, si los valores de una de las variables tienden a aumentar
  e                             o
-o disminuir- al aumentar los valores de la otra); y por otro, estudiar si los valores de una variable
pueden ser utilizados para predecir el valor de la otra. Un modelo de regresi´ n trata de proporcionar
                                                                             o
informaci´ n sobre uno o varios sucesos a trav´ s de su relaci´ n con el comportamiento de otros. Con
         o                                    e               o
la metodolog´a Bayesiana se permite incorporar el conocimiento del investigador al an´ lisis, haciendo
            ı                                                                        a
los resultados m´ s precisos, ya que no se a´slan los resultados a los datos de una determinada muestra.
                a                           ı




                                                   ii
iii

                         a                                            ´
    Por otro lado, se est´ empezando a aceptar que el siglo XXI en el ambito de la estad´stica va a
                                                                                        ı
ser el siglo de la ”estad´stica del conocimiento” a diferencia del anterior que fue el de la ”estad´stica
                         ı                                                                         ı
de los datos”. El concepto b´ sico para construir dicha estad´stica es el de dato simb´ lico y se han
                            a                                ı                        o
desarrollado m´ todos estad´sticos para algunos tipos de datos simb´ licos.
              e            ı                                       o


    En la actualidad, la exigencia del mercado, la demanda y, en general, del mundo crece. Esto
implica que cada vez sea mayor el deseo de predecir la ocurrencia de un evento o poder controlar el
comportamiento de ciertas cantidades con el menor error posible con el fin de ofrecer mejores pro-
ductos, obtener mayores beneficios o adelantos cient´ficos y mejores resultados.
                                                   ı


    Sobre esta realidad, este proyecto trata de responder a dichas necesidades proporcionando una
amplia documentaci´ n sobre varias de las t´ cnicas m´ s utilizadas y m´ s punteras a d´a de hoy, como
                  o                        e         a                 a               ı
son el an´ lisis Bayesiano de datos, los modelos de regresi´ n y los datos simb´ licos, y proponiendo
         a                                                 o                   o
diferentes t´ cnicas de regresi´ n. De igual forma se desarrollar´ una herramienta que permita poner
            e                  o                                 a
en pr´ ctica todos los conocimientos adquiridos. Dicha aplicaci´ n estar´ dirigida al mercado burs´ til
     a                                                         o        a                         a
espa˜ ol y permitir´ al usuario utilizarla de manera sencilla y amigable. En cuanto al desarrollo de esta
    n              a
herramienta se emplear´ uno de los lenguajes m´ s novedosos y con m´ s proyecci´ n del momento: R.
                      a                       a                    a           o


    Se trata, por tanto, de un proyecto que combina las t´ cnicas m´ s novedosas y con mayor proyecci´ n
                                                         e         a                                 o
tanto en materia te´ rica, como es la regresi´ n Bayesiana aplicada a datos de tipo intervalo, como en
                   o                         o
materia pr´ ctica, como es el empleo del lenguaje R.
          a
Abstract

In the recent years, Bayesian methods have been spread and successfully used in many and several
fields such as Marketing, Medicine, Engineering, Econometrics or Financial Markets. The main char-
acteristic that makes Bayesian Data Analysis (BADAN) remarkable compared with other alternatives
is that not only does it take into account the objective information coming from the analyzed event,
but also the pre-event knowledge. The benefits obtained from this approach are innumerable due to
the fact that the more knowledge of the situation one has, the more reliable and accurate decisions
could be taken. However, although Bayesian methodology was set long time ago, it has not been
applied in a general way until the 90’s because of the computational difficulties. Such expansion has
been mainly favoured by the advances in that field and the improvement on different calculus meth-
ods, such as Markov-chain Monte Carlo methods.


    Particularly, this Bayesian methodology has been resulted in an extraordinary useful application
for the regression models, which have been adopted by large. There are many times in real life in
which it is necessary to analyse the situation between two quantitive variables. The two main objec-
tives of this analysis would be, on the one hand, to determine whether such variables are associated
and in what sense that association comes about (that is, whether the value of one of the variables
tends to rise- or to decrease- when augmented the value of the other); and on the other hand, to study
whether the values of one variable can be used to predict the value of the other. A regression model
offers information about one or more events through their relationship with the behaviour of the oth-
ers. With the Bayesian methodology it is possible to add the researcher’s knowledge to the analysis,
making thus the results be more accurate due to the fact that the results are not isolated from the data
of one determined sample.


    On the other hand, in the Statistics field, it has been more and more accepted the fact that the XXI
century will be the century of the ”Statistics of knowledge” contrary to the last one, which was the


                                                   iv
v

one of the ”Statistics of data”. The most basic concept to constitute such Statistics is the symbolic
data; furthermore, there have been developed more statistics methods for some types of symbolic data.


   Nowadays, the requirements of the market, and the demands of the world in general, are growing
up. This implies the continuous increase of the desire for predicting the occurrence of an event or for
the ability of controlling the behaviour of certain quantities with the minimum error with the aim of
offering better products, obtaining more benefits or scientific improvements and better outcomes.


   Under this frame, this project tries to responds such needs by offering a large documentation
about several of the most applied and leading nowadays techniques, such as Bayesian data analysis,
regression models, and symbolic data, and suggesting different regression techniques. Similarly, it
has been developed a tool that allow the reader to put all the acquired knowledge into practice. Such
application will be aimed to the Spanish Continuous Stock Market and it will let the user apply it eas-
ily. As far as the development of this tool is concerned, it has been used one of the more innovative
and with more projection languages of the moment: R.


   So, the project is about a combination of the techniques that are most innovative and with the
most projection both in theoretical questions such as Bayesian regression applied to interval- valued
data and in practical questions such us the employment of the R language.
List of Figures

 1.1   Project Work Packages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .       5

 2.1   Univariate Normal Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .        14

 6.1   Interval time series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .   62

 7.1   Classical Regression with single values in training test . . . . . . . . . . . . . . . .     73
 7.2   Classical Regression with single values in testing test . . . . . . . . . . . . . . . . .    74
 7.3   Classical Regression with interval- valued data . . . . . . . . . . . . . . . . . . . .      75
 7.4   Centre Method (2000) in training set . . . . . . . . . . . . . . . . . . . . . . . . . .     75
 7.5   Centre Method (2000) in testing set . . . . . . . . . . . . . . . . . . . . . . . . . .      76
 7.6   Centre and Radius Method in training set . . . . . . . . . . . . . . . . . . . . . . .       77
 7.7   Centre and Radius Method in testing set . . . . . . . . . . . . . . . . . . . . . . . .      78
 7.8   Bayesian Centre and Radius Method in testing test . . . . . . . . . . . . . . . . . .        80
 7.9   Classical Regression with single values in training test . . . . . . . . . . . . . . . .     81
 7.10 Classical Regression with single values in testing test . . . . . . . . . . . . . . . . .     81
 7.11 Centre Method (2000) in training set . . . . . . . . . . . . . . . . . . . . . . . . . .      82
 7.12 Centre Method (2000) in testing set . . . . . . . . . . . . . . . . . . . . . . . . . .       83
 7.13 Centre and Radius Method in training set . . . . . . . . . . . . . . . . . . . . . . .        85
 7.14 Centre and Radius Method in testing set . . . . . . . . . . . . . . . . . . . . . . . .       85
 7.15 Bayesian Centre and Radius Method in testing set . . . . . . . . . . . . . . . . . . .        87

 9.1   BARESIMDA MDI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

 10.1 Interface between BARESIMDA and R . . . . . . . . . . . . . . . . . . . . . . . . 104
 10.2 Interface between BARESIMDA and Excel . . . . . . . . . . . . . . . . . . . . . . 105
 10.3 Logical Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105


                                                vi
LIST OF FIGURES                                                                                   vii

  C.1 Load Data Menu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
  C.2 Select File Dialog . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
  C.3 Display Loaded Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
  C.4 Define New Variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
  C.5 Enter New Variable Name . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
  C.6 Display New Variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
  C.7 Edit Variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
  C.8 Select Variable to Be Editted . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
  C.9 Enter New Name . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
  C.10 Confirmation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
  C.11 New Row data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
  C.12 Type Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
  C.13 Look And Feel Menu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
  C.14 Look And Feel Styles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
  C.15 New Look And Feel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
  C.16 Type Of User Menu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
  C.17 Select Type Of User . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
  C.18 Non-Symbolic Classical Regression Menu . . . . . . . . . . . . . . . . . . . . . . . 131
  C.19 Select Non-Symbolic Variables in Simple Regression . . . . . . . . . . . . . . . . . 131
  C.20 Brief Report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
  C.21 Analysis Options in Non-Symbolic Classical Simple Regression . . . . . . . . . . . 132
  C.22 New Prediction in Non-Symbolic Classical Simple Regression . . . . . . . . . . . . 133
  C.23 Graphics options in Non-Symbolic Classical Simple Regression . . . . . . . . . . . 133
  C.24 Save options in Non-Symbolic Classical Simple Regression . . . . . . . . . . . . . . 134
  C.25 Non-Symbolic Classical Multiple Regression Menu . . . . . . . . . . . . . . . . . . 134
  C.26 Select Variables in Non-Symbolic Classical Multiple Regression . . . . . . . . . . . 134
  C.27 Analysis options in Non-Symbolic Classical Multiple Regression . . . . . . . . . . . 135
  C.28 Graphics options in Non-Symbolic Classical Multiple Regression . . . . . . . . . . 135
  C.29 Save options in Non-Symbolic Classical Multiple Regression . . . . . . . . . . . . . 136
  C.30 Intercept in Non-Symbolic Classical Multiple Regression . . . . . . . . . . . . . . . 136
  C.31 Non-Symbolic Bayesian Simple Regression Menu . . . . . . . . . . . . . . . . . . . 136
  C.32 Select Variables in Non-Symbolic Bayesian Simple Regression . . . . . . . . . . . . 137
  C.33 Analysis Options in Non-Symbolic Bayesian Simple Regression . . . . . . . . . . . 137
  C.34 Graphics Options in Non-Symbolic Bayesian Simple Regression . . . . . . . . . . . 138
LIST OF FIGURES                                                                                  viii

  C.35 Save Options in Non-Symbolic Bayesian Simple Regression . . . . . . . . . . . . . 138
  C.36 Prior Experienced Especification Options in Non-Symbolic Bayesian Simple Regres-
       sion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
  C.37 Prior Inexperienced Especification in Non-Symbolic Bayesian Simple Regression . . 139
  C.38 Prior Experienced Especification in Non-Symbolic Bayesian Simple Regression . . . 139
  C.39 Non-Symbolic Bayesian Multiple Regression menu . . . . . . . . . . . . . . . . . . 139
  C.40 Analysis Options in Non-Symbolic Bayesian Multiple Regression . . . . . . . . . . 140
  C.41 Graphics Options in Non-Symbolic Bayesian Multiple Regression . . . . . . . . . . 140
  C.42 Save Options in Non-Symbolic Bayesian Multiple Regression . . . . . . . . . . . . 140
  C.43 Model Options in Non-Symbolic Bayesian Multiple Regression . . . . . . . . . . . 141
  C.44 Symbolic Classical Simple Regression Menu . . . . . . . . . . . . . . . . . . . . . 141
  C.45 Select Variables in Symbolic Classical Simple Regression . . . . . . . . . . . . . . . 141
  C.46 Analysis Options in Symbolic Classical Simple Regression . . . . . . . . . . . . . . 142
  C.47 Graphics Options in Symbolic Classical Simple Regression . . . . . . . . . . . . . . 142
  C.48 Symbolic Classical Multiple Regression Menu . . . . . . . . . . . . . . . . . . . . . 143
  C.49 Select Variables in Symbolic Classical Multiple Regression . . . . . . . . . . . . . . 143
  C.50 Analysis Options in Symbolic Classical Multiple Regression . . . . . . . . . . . . . 144
  C.51 Graphics Options in Symbolic Classical Multiple Regression . . . . . . . . . . . . . 144
  C.52 Symbolic Bayesian Simple Regression . . . . . . . . . . . . . . . . . . . . . . . . . 145
  C.53 Select Variables in Symbolic Bayesian Simple Regression . . . . . . . . . . . . . . 145
  C.54 Analysis Options in Symbolic Bayesian Simple Regression . . . . . . . . . . . . . . 145
  C.55 Graphics Options in Symbolic Bayesian Simple Regression . . . . . . . . . . . . . . 146
  C.56 Model Options in Symbolic Bayesian Simple Regression . . . . . . . . . . . . . . . 147
  C.57 Symbolic Bayesian Multiple Regression Menu . . . . . . . . . . . . . . . . . . . . 147
  C.58 Select Variables in Symbolic Bayesian Multiple Regression . . . . . . . . . . . . . . 147
  C.59 Graphics Options in Symbolic Bayesian Multiple Regression . . . . . . . . . . . . . 148
List of Tables

 2.1   Distributions in Bayesian Data Analysis . . . . . . . . . . . . . . . . . . . . . . . .        10
 2.2   Comparison between Univariate and Multivariate Normal . . . . . . . . . . . . . . .            15
 2.3   Conjugate distributions for other likelihood distributions . . . . . . . . . . . . . . .       16

 4.1   Bayes Factor Interpretation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .      29
 4.2   Sensitivity Summary I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .        33
 4.3   Sensitivity Summary II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .       34

 5.1   Multiple and Simple Regression Comparison . . . . . . . . . . . . . . . . . . . . .            40
 5.2   Sensitivity analysis of parameter β . . . . . . . . . . . . . . . . . . . . . . . . . . .      45
 5.3   Sensitivity analysis of parameter   σ2   . . . . . . . . . . . . . . . . . . . . . . . . . .   46
 5.4   Classical and Bayesian regression comparison . . . . . . . . . . . . . . . . . . . . .         48
 5.5   Main Prior Distributions Summary . . . . . . . . . . . . . . . . . . . . . . . . . . .         57
 5.6   Main Posterior Distributions Summary . . . . . . . . . . . . . . . . . . . . . . . . .         58
 5.7   Prior and Posterior Parameters Summary . . . . . . . . . . . . . . . . . . . . . . . .         59
 5.8   Main Posterior Predictive Distributions Summary . . . . . . . . . . . . . . . . . . .          60

 6.1   Multivalued Data Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .         63
 6.2   Modal-multivalued Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .          64

 7.1   Error Measures for Classical Regression with single values . . . . . . . . . . . . . .         74
 7.2   Error Measure for Centre Method (2000) . . . . . . . . . . . . . . . . . . . . . . . .         76
 7.3   Error Measure for Centre Method (2002) . . . . . . . . . . . . . . . . . . . . . . . .         77
 7.4   Error Measures for Centre and Radius Method . . . . . . . . . . . . . . . . . . . . .          78
 7.5   Error Measures in Bayesian Centre and Radius Method . . . . . . . . . . . . . . . .            80
 7.6   Error Measures for Classical Regression with single values . . . . . . . . . . . . . .         82



                                                  ix
LIST OF TABLES                                                                                   x

  7.7   Error Measure for Centre Method (2000) . . . . . . . . . . . . . . . . . . . . . . . .   83
  7.8   Error Measure for Centre Method (2002) . . . . . . . . . . . . . . . . . . . . . . . .   84
  7.9   Error Measures for Centre and Radius Method . . . . . . . . . . . . . . . . . . . . .    84
  7.10 Error Measures in Bayesian Centre and Radius Method . . . . . . . . . . . . . . . .       86

  11.1 Estimated material costs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
  11.2 Amortization Costs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
  11.3 Summarized Budget . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
Contents

Acknowlegdements                                                                                          i

Resumen                                                                                                  ii

Abstract                                                                                                 iv

List of Figures                                                                                          vi

List of Tables                                                                                           x

Contents                                                                                                xvi

1 Introduction                                                                                           1
   1.1     Project Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .      1
   1.2     Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .    4
   1.3     Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .       4

2 Bayesian Data Analysis                                                                                 6
   2.1     What is Bayesian Data Analysis? . . . . . . . . . . . . . . . . . . . . . . . . . . .         6
   2.2     Bayesian Analysis for Normal and other distributions . . . . . . . . . . . . . . . . .       10
           2.2.1   Univariate Normal distribution . . . . . . . . . . . . . . . . . . . . . . . . .     10
           2.2.2   Multivariate Normal distribution . . . . . . . . . . . . . . . . . . . . . . . .     13
           2.2.3   Other distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .    15
   2.3     Hierarchical Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .    16
   2.4     Nonparametric Bayesian . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .       18

3 Posterior Simulation                                                                                  20
   3.1     Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .   20

                                                    xi
CONTENTS                                                                                              xii

   3.2   Markov chains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .    21
   3.3   Monte Carlo Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .      23
   3.4   Gibbs sampler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .    24
   3.5   Metropolis-Hastings sampler and its special cases . . . . . . . . . . . . . . . . . . .      25
         3.5.1   Metropolis-Hastings sampler . . . . . . . . . . . . . . . . . . . . . . . . . .      25
         3.5.2   Metropolis sampler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .     26
         3.5.3   Random-walk sampler . . . . . . . . . . . . . . . . . . . . . . . . . . . . .        26
         3.5.4   Independence sampler . . . . . . . . . . . . . . . . . . . . . . . . . . . . .       26
   3.6   Importance sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .      27

4 Sensitivity Analysis                                                                                28
   4.1   Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .   28
   4.2   Bayes Factor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .     29
   4.3   Alternative Stats to Bayes Factor . . . . . . . . . . . . . . . . . . . . . . . . . . . .    30
   4.4   Highest Posterior Density Intervals . . . . . . . . . . . . . . . . . . . . . . . . . . .    31
   4.5   Model Comparison Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .         32

5 Regression Analysis                                                                                 35
   5.1   Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .   35
   5.2   Classical Regression Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .     36
   5.3   The Bayesian Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .      39
   5.4   Normal Linear Regression Model subject to inequality constraints . . . . . . . . . .         48
   5.5   Normal Linear Regression Model with Independent Parameters . . . . . . . . . . . .           49
   5.6   Normal Linear Regression Model with Heteroscedasticity and Correlation . . . . . .           51
         5.6.1   Heteroscedasticity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .     53
         5.6.2   Correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .    54
   5.7   Models Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .       56

6 Symbolic Data                                                                                       61
   6.1   What is symbolic data analysis? . . . . . . . . . . . . . . . . . . . . . . . . . . . .      61
   6.2   Interval-valued variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .    65
   6.3   Classical regression analysis with Interval-valued data . . . . . . . . . . . . . . . . .    67
   6.4   Bayesian regression analysis with Interval-valued data . . . . . . . . . . . . . . . .       70
CONTENTS                                                                                              xiii

7 Results                                                                                              72
   7.1   Spanish Continuous Stock Market data sets . . . . . . . . . . . . . . . . . . . . . .         72
   7.2   Direct Relation between Variables . . . . . . . . . . . . . . . . . . . . . . . . . . .       72
   7.3   Uncorrelated Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .      79

8 A Guide to Statistical Software Today                                                                88
   8.1   Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .    88
   8.2   Commercial Packages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .       89
         8.2.1   The SAS System for Statistical Analysis . . . . . . . . . . . . . . . . . . . .       89
         8.2.2   Minitab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .     90
         8.2.3   BMDP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .        90
         8.2.4   SPSS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .      91
         8.2.5   S-PLUS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .      91
         8.2.6   Others . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .    92
   8.3   Public License Packages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .       93
         8.3.1   R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .     93
         8.3.2   BUGS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .      93
   8.4   Analysis Packages with Statistical Libraries . . . . . . . . . . . . . . . . . . . . . .      94
         8.4.1   Matlab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .      94
         8.4.2   Mathematica . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .       95
         8.4.3   Others . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .    95
   8.5   Some General Languages with Statistical Libraries . . . . . . . . . . . . . . . . . .         95
         8.5.1   Java . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .    95
         8.5.2   C++ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .     96
   8.6   Developed Software Tool: BARESIMDA . . . . . . . . . . . . . . . . . . . . . . .              96

9 Software Requirements Specification                                                                   98
   9.1   Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .     98
   9.2   Intended Audience . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .       98
   9.3   Functionality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .     99
         9.3.1   Classical Regression with crisp data . . . . . . . . . . . . . . . . . . . . . .      99
         9.3.2   Classical Regression with interval- valued data . . . . . . . . . . . . . . . .       99
         9.3.3   Bayesian Regression with crisp data . . . . . . . . . . . . . . . . . . . . . . 100
         9.3.4   Bayesian Regression with interval- valued data . . . . . . . . . . . . . . . . 100
CONTENTS                                                                                            xiv

         9.3.5   Data Manipulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
         9.3.6   Portability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
         9.3.7   Maintainability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
   9.4   External Interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
         9.4.1   User Interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
         9.4.2   Software Interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

10 Software Architecture Study                                                                      103
   10.1 Hardware/ Software Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
   10.2 Logical Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104

11 Project Budget                                                                                   106
   11.1 Engineering Costs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
   11.2 Investment and Elements Costs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
         11.2.1 Summarized Budget . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108

12 Conclusions                                                                                      110
   12.1 Bayesian Regression applied to Symbolic Data . . . . . . . . . . . . . . . . . . . . 110
   12.2 BARESIMDA Software Tool . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
   12.3 Future Developments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
   12.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112

A Probability Distributions                                                                         113
   A.1 Discrete Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
         A.1.1 Binomial . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
         A.1.2 Geometric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
         A.1.3 Poisson . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
   A.2 Continuous Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
         A.2.1 Uniform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
         A.2.2 Univariate Normal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
         A.2.3 Exponential . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
         A.2.4 Gamma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
         A.2.5 Inverse- Gamma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
         A.2.6 Chi-square . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
         A.2.7 Inverse- Chi-square and Inverse-Scaled Chi-Square . . . . . . . . . . . . . . 118
CONTENTS                                                                                           xv

        A.2.8 Univariate Student- t . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
        A.2.9 Beta . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
        A.2.10 Multivariate Normal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
        A.2.11 Multivariate Student- t . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
        A.2.12 Wishart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
        A.2.13 Inverse- Wishart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121

B Installation Guide                                                                              122
   B.1 From source folder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
   B.2 From installer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122

C User’s Guide                                                                                    123
   C.1 Data Entry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
        C.1.1   Loading an excel file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
        C.1.2   Defining a new variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
        C.1.3   Editing an existing variable . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
        C.1.4   Deleting an existing variable . . . . . . . . . . . . . . . . . . . . . . . . . . 127
        C.1.5   Typing in a new data row . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
        C.1.6   Deleting an existing data row . . . . . . . . . . . . . . . . . . . . . . . . . . 128
        C.1.7   Modifying an existing data . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
   C.2 Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
        C.2.1   Setting the look& feel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
        C.2.2   Selecting the type of user . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
   C.3 Non Symbolic Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
        C.3.1   Simple Classical Regression . . . . . . . . . . . . . . . . . . . . . . . . . . 131
        C.3.2   Multiple Classical Regression . . . . . . . . . . . . . . . . . . . . . . . . . 133
        C.3.3   Simple Bayesian Regression . . . . . . . . . . . . . . . . . . . . . . . . . . 136
        C.3.4   Multiple Bayesian Regression . . . . . . . . . . . . . . . . . . . . . . . . . 139
   C.4 Symbolic Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
        C.4.1   Simple Classical Regression . . . . . . . . . . . . . . . . . . . . . . . . . . 140
        C.4.2   Multiple Classical Regression . . . . . . . . . . . . . . . . . . . . . . . . . 143
        C.4.3   Simple Bayesian Regression . . . . . . . . . . . . . . . . . . . . . . . . . . 144
        C.4.4   Multiple Bayesian Regression . . . . . . . . . . . . . . . . . . . . . . . . . 146
CONTENTS                                                                                         xvi

D Obtaining and Installing R                                                                     149
   D.1 Binary distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
   D.2 Installation from source . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
   D.3 Package installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151

E Obtaining and installing Java Runtime Environment                                              152
   E.1 Microsoft Windows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
   E.2 Linux . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
        E.2.1   Installation of Self-Extracting Binary . . . . . . . . . . . . . . . . . . . . . 153
        E.2.2   Installation of RPM File . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
   E.3 UNIX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155

Bibliography                                                                                     157
Chapter 1

Introduction

1.1 Project Motivation

Statistics is primarily concerned with the analysis of data, either to assist in arriving at an improved
understanding of some underlying mechanism, or as a means for making informed rational decisions.
Both these aspects generally involve some degree of uncertainty. The statistician’s task is then to
explain such uncertainty, and to reduce it to the extent in which this is possible. Problems of this type
occur throughout all the physical, social and other sciences. One way of looking at statistics stems
from the perception that, ultimately, probability is the only appropriate way to describe and system-
atically deal with uncertainty, as if it were the language for the logic of uncertainty. Thus, inference
statements are precisely framed as probability statements on the possible values of the unknown quan-
tities of interest (parameters or future observations) conditional on the observed, available data. The
scientific discipline based on this understanding is called Bayesian Statistics. Moreover, increasingly
needed and sophisticated models, often hierarchical models, to describe available data are typically
too much complex for conventional statistics to handle, but can be tackled within Bayesian Statistics.
In principle, Bayesian Statistics is designed to handle all situations where uncertainty is found. Since
some uncertainty is present in most aspects of life, it may be argued that Bayesian Statistics should
be appreciated and used by everyone. It is the logic of contemporary society and science. According
to [Rupp04], applying Bayesian methodology is no more discussed, but the question is when this has
to be done.


    Bayesian methods have matured and improved in several ways during last fifteen years. Actually,
they are increasingly becoming attractive to researchers as well as successful applications of Bayesian


                                                   1
1. Introduction


data analysis have been appeared in many different fields, including Actuarial Science, Biometrics,
Finance, Market Research, Marketing, Medicine, Engineering or Social Science. It is not only that
the Bayesian approach produces appropriate answers to many current important problems, but also
there is an evident need for it, given the inapplicability of conventional statistics to many of them.


    Thus, the main characteristic offered by Bayesian data analysis is the possibility of incorporating
researcher’s knowledge about the problem to be handled. This supposes obtaining the better and the
more reliable results as far as prior knowledge is more and more precise. But Bayesian Statistics was
restrained until mid 90’s by its computational complexity. Since then, it has had a great expansion
favoured by the development and improvement of different computational methods in this field such
as Markov chain Monte Carlo.


    This methodology has shown to be extremely useful in its application to regression models, which
are widely accepted. Let us remember that the general purpose of regression analysis is to learn more
about the relationship between several independent or predictor variables and a dependent or criterion
variable. Bayesian methodology let the researcher incorporate her or his knowledge to the analysis,
improving the results since they do not only depend on the sampling data.


    On the other hand, increasingly, datasets are so large that they must be summarized in some fash-
ion so that the resulting summary dataset is of a more manageable size, while still retaining as much
knowledge inherent to the entire dataset as possible. One consequence of this situation is that data
may no longer be formatted as single values such as is the case for classical data, but rather may be
represented by lists, intervals, distributions, and the like. These summarized data are examples of
symbolic data. This kind of data also lets us represent better the knowledge and beliefs having in our
mind and that it is limited and hardly to take out with classical Statistics. According to [Bill02], this
responds to the current need of changing from a Statistics of data in the past century to a Statistics of
knowledge in XXI century.


    Market and demand requirements are increasing continuously throughout the time. This implies
a need of better and more accurate methods to forecast new situations and to control different quanti-
ties with the minimum error in order to supply better products, to obtain higher incomes or scientist
advantages and better results.


    Dealing with this outlook, this project is intended to respond to those requirements providing a

                                                   2
1. Introduction


wide and exhaustive documentation about some of the currently more used and advanced techniques,
including Bayesian data analysis, regression models and symbolic data. Different examples related
to the Continuous Spanish Stock Market have been explained throughout this writing, making clear
the advantages of employing the described methods. Likewise a software tool with a user- friendly
graphical interface has been developed to practice and to check all the acquired knowledge.


    Therefore, this is a project combining the most recent techniques with major future implications
in theoretical issues, as Bayesian regression applied to interval- valued data is, with a technological
part dealing with the problem of interconnecting two software programs: one used to show the graph-
ical user interface and the other one employed to make computations.


    Regarding to a more personal motivation, when accepting this project, several factors were taken
into consideration by the author:

    • A great challenge: it is an ambitious project with a high technical complexity related to both its
       theoretical basis and its technological basis. This represents a very good letter of introduction
       in order to be incorporated to the labour world.

    • A good planning time: this project was designed to be finished before June of 2007, which
       means to be able of finishing the career in June and incorporating to labour world in September.

    • Some very interesting issues: on one hand, it deals with the always needed issue of forecasting
       and modelling observations and situations in order to get the best possible results. On the other
       hand, it focuses on the Stock Market, which meets my personal hobbies.

    • A new programming language: the possibility of learning deeply a new and relatively recent
       programming language, such as R, was an extra- motivation factor.

    • The project director: Carlos Mat´ is considered a demanding and very competent director by
                                      e
       the students of the university.

    • An investigation scholarship: The possibility of being in the Industrial Organization department
       of the University learning from people such as the director mentioned above and another very
       recognized professors was a great factor.




                                                   3
1. Introduction


1.2 Objectives

This project pretends to get the following aims.

    • To provide a wide and rigorous documentation about the following issues: Bayesian data anal-
       ysis, regression models and symbolic data. From this point, documentation about Bayesian
       regression will be developed, as well as the software tool designed.

    • To build a software tool in order to fit Bayesian regression models to interval- valued data,
       finding out the most efficient way to design the graphical user interface. This must be as user-
       friendly as possible.

    • To find out the most efficient way to offer that system to future clients from the tests carried out
       with the application.

    • To design a survey to measure the quality of the tool and users’ satisfaction.

    • The possibility to write an article for a scientific journal.


1.3 Methodology

As the title of the project indicates, the last purpose is the development of an application aimed to-
wards stock markets based on a Bayesian regression system and, therefore, some previous knowledge
is required.


    The first stage is the familiarization of the Bayesian data analysis, regression models applied to
Bayesian methodology and symbolic data.


    Within this phase, Bayesian data analysis will be firstly studied, trying to synthesize and to get
the most important elements. A special dedication will be given to posterior simulation and computa-
tional algorithms. Then, regression models will be treated, reviewing quickly the classical approach,
to deep later into the different Bayesian regression models, applying great part of what was explained
in Bayesian methodology. Finally, this first stage will be completed with the application to symbolic
data, paying special attention to interval- valued data.


    The second stage is referred to the development of the software application, employing an incre-
mental methodology for programming and testing iterative prototypes. This methodology has been

                                                    4
1. Introduction


considered the most suitable for this project since it will let us introduce successive models into the
application.


    The following figure shows the structure of the work packages the project is divided into:




                                 Figure 1.1: Project Work Packages




                                                  5
Chapter 2

Bayesian Data Analysis

2.1     What is Bayesian Data Analysis?

Statistics can be defined as the discipline that provides us with a methodology to collect, to organize,
to summarize and to analyze a set of data.


    Regarding data analysis, it can be divided into two ways of analysis: exploratory data analysis and
confirmatory data analysis. The former is used to represent, describe and analyze a set of data through
simple methods in the first stages of statistical analysis. The latter is applied to make inferences from
data, based on probability models.


    In the same way, confirmatory data analysis is divided into two branches depending on the adopted
approach. The first one, known as frequentist, is used to make the inference of the data resulting from
a sampling through classical methods. The second branch, known as Bayesian, goes further in the
analysis and adds to those data the prior knowledge which the researcher has about the treated prob-
lem. Since the frequentist approach is not worthy to explain everything here, a more extended revision
of different classical methods related to the frequentist approach can be found in [Mont02].




                                         
                                          Exploratory
                                         
                                         
                         Data Analysis                         Frequentist
                                          Confirmatory
                                         
                                                               Bayesian


                                                   6
2. Bayesian Data Analysis


    As far as Bayesian analysis is concerned and according to [Gelm04], the process can be divided
into the following three steps:

    • To set up a full probability model, through a joint probability distribution for all observable and
      unobservable quantities in a problem.

    • To condition on observed data, obtaining the posterior distribution.

    • Finally, to evaluate the fit of the model and the implications of the resulting posterior distribu-
      tion.

    f (θ, y), known as the joint probability distribution (or f (y|θ), if there are several parameters θ),
is obtained by means of


                            f (θ, y) = f (y|θ)f (θ) (resp. f (θ, y) = f (y|θ)f (θ))                       (2.1)

where y is the set of sampled data. So this distribution is the product of two densities that are referred
to as the sampling distribution f (y|θ) (resp. f (y|θ)) and the prior distribution f (θ) (resp. f (θ)).


    The sampling distribution, as its name suggests, is the probability model that the researcher as-
signs to the statistics (resp. set of statistics) to be studied after the data have been observed. Here,
an important problem stands up in relation to parametric approach due to the fact that the probability
model that the researcher chooses could not be adequate. The nonparametric approach overcomes
this inconvenient as it will be seen later.


    When y is considered fixed, so it is function of θ (resp. θ), the sampling distribution is called the
likelihood function and obeys the likelihood principle, which states that for a given sample of data,
any two probability models f (y|θ) (resp. f (y|θ)) with the same likelihood function yield the same
inference for θ, (resp. θ).


    The prior distribution does not depend upon the data. Accordingly, it contains the information
and the knowledge that the researcher has about the situation or problem to be solved. When there
is not any previous significant population from which the engineer can take his knowledge, that is,
the researcher has not any prior information about the problem, a non-informative prior distribution
must be used in the analysis in order to let the data speak for themselves. Hence, it is assumed that
the prior knowledge will have very little importance in the results. But most non- informative priors


                                                      7
2. Bayesian Data Analysis


are ”improper” in that they do not integrate to 1, and this fact can cause problems. In these cases
it is necessary to be sure that the posterior distribution is proper. Another possibility is to use an
informative prior distribution but with an insignificant weight (around zero) associated to it.


    Though the prior distribution can take any form, it is common to choose particular classes of
priors that make computation and interpretation easier. These are the conjugate priors. A conjugate
prior distribution is one which, when combined with the likelihood function, gives a distribution that
falls in the same class of distributions as the prior. Furthermore, and according to [Koop03], a natural
conjugate prior has the additional property that it has the same form as the likelihood does. But it is
not always possible to find this kind of distribution and the researcher has to manage a lot of distribu-
tions to be able to give expression to his prior knowledge about the problem. This is another handicap
that the nonparametric approach reduces.


    In relation to the prior, what distribution should be chosen? There are three different points of
view corresponding to different styles of Bayesians:

    • Classical Bayesians consider that the prior is a necessary evil and priors that interject the least
        information possible should be chosen.

    • Modern parametric Bayesians considers that the prior is a useful convenience and priors with
        desirable properties such as conjugacy should be chosen. They remark that given a distribu-
        tional choice, prior hyper-parameters that interject the least information possible should be
        chosen.

    • Subjective Bayesians give essential importance to the prior, in the sense they consider it as a
        summary of old beliefs. So prior distributions which are based on previous knowledge (either
        the results of earlier studies or non-scientific opinion) should be chosen.

    Returning to Bayesian data analysis process, simply conditioning on the observed data y and
applying the Bayes’ Theorem, the posterior distribution, namely f (θ|y) (resp. f (θ|y)), yields:

                            f (θ, y)   f (θ)f (y|θ)                             f (θ, y)   f (θ)f (y|θ)
              f (θ|y) =              =                   (resp. f (θ|y) =                =              )   (2.2)
                             f (y)         f (y)                                 f (y)         f (y)
where

                                  ∞                                         ∞        ∞
                  f (y) =             f (θ)f (y|θ)dθ   (resp. f (y) =                    f (θ)f (y|θ)dθ)    (2.3)
                              0                                         0        0

                                                           8
2. Bayesian Data Analysis


is known as the prior predictive distribution, since it is not conditional upon a previous observation of
the process and is applied to an observable quantity.


    An equivalent form of the posterior distribution displayed above omits the prior predictive distri-
bution, since it does not involve θ (resp. θ) and the interest is based on learning about θ (resp. θ).
So, with fixed y, it can be said that the posterior distribution is proportional to the joint probability
distribution f (θ, y).


    Once the posterior distribution is calculated, some kind of summary measure will be required to
estimate the uncertainty about the parameter θ (resp. θ). This is due to the fact that the posterior
distribution is a high- dimensional object and its use is not practical for a problem. That measure
which will summarize the posterior distribution can be the posterior mean, mode, median or variance,
apart from others. Its choice will depend on the requirements of the problem. So the posterior dis-
tribution has a great importance since it lets the researcher manage the uncertainty about θ (resp. θ)
and provide him information about it (resp. them) taking into account both his prior knowledge and
the data collected from sampling on that parameter.


    According to [Mat´ 06], it is not difficult to deduce that posterior inference will fit in the non-
                     e
Bayesian one as long as the estimation which the researcher gives to the parameter θ (resp. θ) is the
same as the one resulting from the sampling.


    Once the data y have been observed, a new unknown observable quantity y can be predicted for
                                                                          ˜
the same process through the posterior predictive distribution, namely f (˜|y):
                                                                          y


                   f (˜|y) =
                      y         f (˜, θ|y)dθ =
                                   y              f (˜|θ, y)dθ =
                                                     y               f (˜|θ)f (θ|y)dθ
                                                                        y                          (2.4)

    To sum up, the basic idea is to update the prior distribution f (θ) through Bayes’ theorem by
observing the data y in order to get a posterior distribution f (θ|y). Then a summary measure or a
prediction for new data can be obtained from f (θ|y). Table 2.1 reflects what has been said.




                                                   9
2. Bayesian Data Analysis




   Distribution         Expression                          Information Required              Result



   Likelihood               f (y|θ)                           Data Distribution               f (y|θ)


   Prior                     f (θ)         Researcher’s Knowledge Parameter Distribution       f (θ)


   Joint               f (y|θ)f (θ)           Likelihood Distribution Prior Distribution     f (θ, y)


   Posterior           f (θ)f (y|θ)                         Prior Joint Distribution          f (θ|y)


   Predictive         f (˜|θ)f (θ|y)dθ
                         y                  New Data Distribution Posterior Distribution      f (˜|y)
                                                                                                 y


                             Table 2.1: Distributions in Bayesian Data Analysis



2.2 Bayesian Analysis for Normal and other distributions

2.2.1 Univariate Normal distribution

The basic model to be discussed concerns an observable variable , normally distributed with mean µ
and unknown variance σ 2 :


                                            y|µ, σ 2    N (µ, σ 2 )                                (2.5)

    As it can be seen in Appendix A, the likelihood function for a single observation is

                                                                    1
                               f (y|µ, σ 2 ) ∝ (σ 2 )−1/2 exp −         (y − µ)2                   (2.6)
                                                                   2σ 2
    This means that the likelihood function is proportional to a Normal distribution, omitting those
terms that are constant.


    Now let us consider we have n independent observations y1 , y2 , . . . , yn . According to the previ-
ous section, the parameters to be estimated θ are µ and σ 2 :



                                                       10
2. Bayesian Data Analysis




                                             θ = (θ1 , θ2 ) = (µ, σ 2 )                              (2.7)

    A full probability model must be set up through a joint probability distribution:


                               f (θ, (y1 , y2 , . . . , yn )) = f (θ, y) = f (y|θ)f (θ)              (2.8)

    The likelihood function for a sample of n iid observations in this case is

                                                                                 n
                                                                           1
                      f (y|θ) = f (y|µ, σ 2 ) ∝ (σ 2 )−1/2 exp −                       (yi − µ)2     (2.9)
                                                                          2σ 2
                                                                                 i=1

    As it was recommended previously, a conjugate prior will be chosen; in fact, it will be a natural
conjugate prior. According to [Gelm04], this likelihood function suggests a conjugate prior distribu-
tion of the form


                                      f (θ) = f (µ, σ 2 ) = f (µ|σ 2 )f (σ 2 )                      (2.10)

where the marginal distribution of σ 2 is the Scaled Inverse-χ2 and the conditional distribution of µ
given σ 2 is Normal (details about these distributions in Appendix A):



                                         µ|σ 2         N (µ0 , σ 2 V0 )                             (2.11)
                                            σ2         Inv − χ2 (µ0 , s2 )
                                                                       0                            (2.12)

    So the joint prior distribution is:


                 f (θ) = f (µ, σ 2 ) = f (µ|σ 2 )f (σ 2 ) ∝ N − Inv − χ2 (µ0 , s2 V0 , ν0 , s2 )
                                                                                0            0      (2.13)

    Its four parameters can be identified as the location and scale of µ and the degrees of freedom and
scale of σ 2 , respectively.


    As a natural conjugate prior was employed, the posterior joint distribution will have the same
form that the prior has. So, conditioning on the data, and according to Bayes’ Theorem, we have:



        f (θ|y) = f (µ, σ 2 |y) = f (y|µ, σ 2 )f (µ, σ 2 ) ∝ N − Inv − χ2 (µ1 , s2 V1 , ν1 , s2 )
                                                                                 1            1     (2.14)

where it be can shown that

                                                         11
2. Bayesian Data Analysis




                             µ1 = (V0−1 + n)−1 V0−1 µ0 + n¯
                                                          y                                        (2.15)
                                                 −1
                             V1 =     V0−1 + n                                                     (2.16)
                             ν1 = ν0 + n                                                           (2.17)
                                                             V0−1 n
                            ν1 s2 = ν0 s2 + (n − 1)s2 +
                                1       0                                (¯ − µ0 )2
                                                                          y                        (2.18)
                                                            V0−1 + n

    All these formulae evidence that Bayesian inference combines prior and posterior information.


    The first term means that posterior mean µ1 is a weighted mean of prior mean µ0 and empirical
mean divided by the sum of their respective weights, where these are represented by V0−1 and the
simple size n.


    The second term represents the importance that posterior mean has and it can be seen as a com-
promise between the sample size and the significance given to the prior mean.


    The third term indicates that the degrees of freedom of posterior variance are the sum of the prior
degrees of freedom and the sample size. That is, the prior degrees of freedom can be understood as a
fictitious sample size on which the expert’s prior information is based.


    The last term explains the posterior sum of square errors as a combination of prior and empirical
sum of square errors plus a term that measures the conflict between prior and posterior information.


    A more detailed explanation of this last step can be found in [Gelm04], [Koop03] or [Cong06].


    It is obvious that the marginal posterior distributions are:



                                    µ|σ 2 , y      N (µ1 , σ 2 V0 )                                (2.19)
                                       σ 2 |y      Inv − χ2 (ν1 , s2 )
                                                                   1                               (2.20)

    If we integrate out σ 2 , the marginal for µ will be a t-distribution (see Appendix A for details):


                                          µ|y    tν1 (µ1 , s2 V0 )
                                                            1                                      (2.21)


                                                   12
2. Bayesian Data Analysis


    Let us see an application to the Spanish Stock market. Let us suppose that the monthly close
values associated with Ibex 35 are normally distributed. If we take the values at which the Span-
ish index closed during the first two weeks in January in 2006, it can be shown that the mean was
10893.29 and the standard deviation was 61.66. So the non- Bayesian approach would inference a
Normal distribution with the previous mean and standard deviation. Let us guess that we had asked
any analyst about the Ibex 35 evolution in January, he would have affirmed strongly that it would
decrease slightly, the mean close value at the end of the month would be around 10870 and, hence,
the standard deviation would be higher, around 100. Then, according to the previous formulas, the
posterior parameters would be



             µ1 = (100 + 10)−1 (100 × 10870 + 10 × 10893.29) = 10872.12
             V1 = (100 + 10)−1 = 0.0091
             ν1 = 100 + 10 = 110
                            (100 × 1002 + 9 × 61.66 + 1000 (10893.29 − 10870)2 )
                                                       110
             s1 =                                                                = 95.60
                                                    110

    This means that there is a difference of almost 20 points between the Bayesian estimation and the
non-Bayesian for the mean close value of January. When the month of January would have passed, we
could compare both results and we could note that the Bayesian estimation was closer to the finally
real mean close value and standard deviation: 10871.2 and 112.44. In figure 2.1, it can be seen how
the blue line representing the Bayesian estimation is closer to the cyan line representing the final real
mean close value than the red line representing the frequentist estimation:


2.2.2 Multivariate Normal distribution

Now, let us consider that we have an observable vector y of d components with the multivariate
Normal distribution:


                                              y   N (µ, Σ)                                       (2.22)

where the first parameter is the mean column vector and the second one is the variance-covariance
matrix.


    Extending what was said above to the multivariate case, we have:



                                                   13
2. Bayesian Data Analysis


           −3
        x 10
    7
                                                                                       Frequentist Approach
                                                                                       Bayesian Approach
    6                                                                                  Real Mean Colse Value in January



    5


    4


    3


    2


    1


    0
   10000        10200     10400      10600      10800      11000     11200     11400        11600        11800       12000




                                     Figure 2.1: Univariate Normal Example




                                                      1
                              f (y|µ, Σ) ∝ Σ−1/2 exp − (y − µ) Σ−1 (y − µ)                                            (2.23)
                                                      2
    And for n iid observations:

                                                                      n
                                                    −n/2         1
                   f (y1 , y2 , . . . , yn |µ, Σ) ∝ Σ      exp −           (yi − µ) Σ−1 (yi − µ)                      (2.24)
                                                                 2
                                                                     i=1

    A multivariate generalization of the Scaled-Inverse χ2 is the Inverse Wishart distribution (see
details in Appendix A), so the prior joint distribution is

                                                                                  Λ0
                        f (θ|y) = f (µ, Σ|y) ∝ N − Inv − W ishart µ0 ,               , ν0 , Λ0                        (2.25)
                                                                                  k0
due to the fact that


                                                               Σ
                                      µ|Σ          N    µ0 ,                                                          (2.26)
                                                               k0
                                         Σ         Inv − W ishart ν0 , Λ−1
                                                                        0                                             (2.27)




                                                           14
2. Bayesian Data Analysis




                                         Univariate Normal                                  Multivariate Normal



Expression                                 y   N (µ, σ 2 )                                      y   N (µ, Σ)


Parameters to estimate                         µ, σ 2                                               µ, Σ

                                                             2
                                     µ|σ 2     N µ0 , σ0
                                                      k                                      µ|Σ           Σ
                                                                                                    N µ0 , k0


Prior Distributions                 σ2     Inv − χ2 ν0 , σ0
                                                          2                            Σ    Inv − W ishart ν0 , Λ−1
                                                                                                                 0

                                                                  2
                                                                 σ0
                          µ, σ 2    N − Inv − χ2 µ0 ,                       2
                                                                 k0 , ν0 , σ0   µ, Σ   N − Inv − W ishart µ0 , k0 , ν0 , Λ−1
                                                                                                               Σ
                                                                                                                          0


                                                             2
                                     µ|σ 2     N µ1 , σ1
                                                      k                                      µ|Σ           Σ
                                                                                                    N µ1 , k1


Posterior Distributions             σ2     Inv − χ2 ν1 , σ1
                                                          2                            Σ    Inv − W ishart ν1 , Λ−1
                                                                                                                 1

                                                                  2
                                                                 σ1
                          µ, σ 2    N − Inv − χ2 µ1 ,                       2
                                                                 k1 , ν1 , σ1   µ, Σ   N − Inv − W ishart µ1 , Λ1 , ν1 , Λ1
                                                                                                               k
                                                                                                                 1




                          Table 2.2: Comparison between Univariate and Multivariate Normal


             The posterior results are the same that were told for the univariate case but applying these distri-
        butions. For those interested readers, more information in [Gelm04] or [Cong06].


             A summary is shown in Table 2.2 in order to get the most important ideas.


        2.2.3 Other distributions

        As it has just been made with the Normal distribution, a Bayesian analysis for other distributions could
        be done. For instance, the exponential distribution is commonly used in reliability analysis. Because
        of this project will deal with the Normal distribution for the likelihood, it will not be explained in detail
        the analysis with other distributions. Table 2.3 shows the conjugate prior and posterior distributions


                                                                    15
2. Bayesian Data Analysis


for other likelihood distributions. More details can be found in [Cong06], [Gelm04], or [Rossi06].



    Likelihood      Parameter    Conjugate    Prior Hyperparameters      Posterior Hyperparameters



    Bin(y|n, θ)             θ       Beta                 α, β                α + y, β + n − y


      P (y|θ)               θ     Gamma                  α, β                  α + n¯, β + n
                                                                                    y


     Exp (y|θ)              θ     Gamma                  α, β                   α + 1, β + y


     Geo(y|θ)               θ       Beta                 α, β                   α + 1, β + y


                  Table 2.3: Conjugate distributions for other likelihood distributions




2.3 Hierarchical Models

Hierarchical data arise when they are structured or related among them. When this occurs, standard
techniques either assume that these groups belong to entirely different populations or ignore the ag-
gregate information entirely.


    Hierarchical models provide a way of pooling the information for the disparate groups without
assuming that they belong to precisely the same population.


    Suppose we have collected data about some random variable Y from m different populations with
n observations for each population.


    Let yij represent observation j from population i. Now suppose yij       f (θi ), where θi is a vector
of parameters for population i. Furthermore, θi         f (Θ), where Θ may also be a vector. Until this
point, we have only rewritten what it was said previously.




                                                   16
2. Bayesian Data Analysis


    Now let us extend the model, and assume that the parameters Θ11 , Θ12 that govern the distribution
of the θ’s are themselves random variables and assign a prior distribution to these variables as well:


                                                      Θ     f (ψ)                                            (2.28)

where Θ is called the hyperprior. The vector parameter ψ for the hyperprior may be ”known” and
represents our prior beliefs about Θ or, in theory; we can also assign a probability distribution for
these quantities as well, and proceed to another layer of hierarchy.


    According to [Gelm04], the idea of exchangeability will be used to create a joint probability
distribution model for all the parameters θ. A formal definition to explain what exchangeability
consists of is:
    ”The parameters θ1 , θ2 , . . . , θn are exchangeable in their joint distribution if f (θ1 , θ2 , . . . , θn ) is
invariant to permutations in the index 1, 2, . . . , n”.


    This means that if no information other than the data is available to distinguish any of the θi from
any of the others, and no ordering of the parameters can be made, one must assume symmetry among
the parameters in the prior distribution. So we can treat the parameters for each sub-population as
exchangeable units. This can be formulated by:


                                    f θ1 , θ2 , . . . , θn |Θ = Πl f θi |Θ
                                                                 i=1                                         (2.29)

    The prior joint distribution is now:


                            f θ1 , θ2 , . . . , θn , Θ = f θ1 , θ2 , . . . , θn |Θ f (Θ)                     (2.30)

    And conditioning on the data, it yields:


                  f θ1 , θ2 , . . . , θn |y = f θ1 , θ2 , . . . , θn , Θ f y|θ1 , θ2 , . . . , θn , Θ        (2.31)

    Perhaps the most important point in practice is that non-hierarchical models are usually inappro-
priate for hierarchical data, while non-hierarchical data can be modelled following the hierarchical
structure and assigning concrete values to the hyperprior parameters.


    This kind of models will be used in Bayesian regression models with autocorrelated errors, as it
will be seen in the following chapters.



                                                          17
2. Bayesian Data Analysis


    For more details about Bayesian hierarchical models, the reader is referenced to [Cong06], [Gelm04]
and [Rossi06].




2.4 Nonparametric Bayesian

To overcome the limitations that have been mentioned throughout this chapter, it is the nonparametric
approach which achieves to get through and to reduce the restrictions of the parametric approach.
This kind of analysis can be performed through the so-called Dirichlet Process, which allows us to
express in a simple way the prior distributions or the distribution family of F , where F is the distri-
bution function of the studied variable. This process has a parameter, called α, which is transformed
into a distribution probability.


    According to [Mat´ 06], a Dirichlet Process for F (t) requires to know:
                     e

    • A previous proposal for F (t), F0 (t), that corresponds to the distribution function that remarks
      the prior knowledge which the engineer has and it is denoted by

                                                            α(t)
                                                 F0 (t) =                                          (2.32)
                                                             M
    • A measure of the confidence about the previous proposal, denoted by M , and whose values can
      vary between 0 and ∞, depending on whether there is a total confidence in the data or in the
      previous proposal respectively.



                                                                      ˆ
    It can be demonstrated that the posterior distribution for F (t), Fn (t), with a sampling over n data,
is given by

                                   ˆ
                                   Fn (t) = pn Fn (t) + (1 − pn )Fn (t)                            (2.33)
                                                                    M
where Fn (t) is the empirical distribution function and pn =       M +n .


    A more detailed information about the nonparametric approach and how Dirichlet processes are
used can be found in [Mull04] or [Gosh03].



                                                   18
2. Bayesian Data Analysis


    With this approach not only the limitation of the parametric approach related to the probability
model of the variable to study is avoided, since no hypothesis is required, but also it allows us to
confer a quantified importance to the prior knowledge which the engineer gives, depending on the
confidence on the certainty about this knowledge.




                                                19
Chapter 3

Posterior Simulation

3.1 Introduction

A practical problem with Bayesian inference is the difficulty of summarizing realistically complex
posterior distributions. In most practical problems, posterior densities will not take the form of any
well-known and understood density, so summary statistics, such as the posterior mean and variance of
parameters of interest, will not be analytically available. It is at this point where the importance of the
Bayesian computation arises and any computational tools are required to gain meaningful inference
from the posterior distribution. Its importance is such that the computing revolution of the last 20
years has led to a blossoming of Bayesian methods in many fields such Econometrics, Ecology or
Health.


    Regarding to this, the most transcendent simulation methods are the Markov chain Monte Carlo
methods (MCMC). MCMC methods date from the original work of [Metr53], who were interested
in methods for the efficient simulation of the energy levels of atoms in a crystalline structure. The
original idea was subsequently generalized by [Hast70], but its true potential was not fully realized
within the statistical literature until [Gelf90] demonstrated its application to the estimation of inte-
grals commonly occurring in the context of Bayesian statistical inference.


    As [Berg05] points up, the underlying principle is simple: if one wishes to sample randomly from
a specific probability distribution then design a Markov chain whose long-time equilibrium is that
distribution, write a computer program to simulate the Markov chain, run it for a time long enough
to be confident that approximate equilibrium has been attained, then record the state of the Markov


                                                    20
3. Posterior Simulation


chain as an approximate draw from equilibrium.


    The technique has been developed strongly in different fields and with rather different emphases
in the computer science community concerned with the study of random algorithms (where the em-
phasis is on whether the resulting algorithm scales well with increasing size of the problem), in the
spatial statistics community (where one is interested in understanding what kinds of patterns arise
from complex stochastic models), and also in the applied statistics community (where it is applied
largely in Bayesian contexts, enabling researchers to formulate statistical models which would other-
wise be resistant to effective statistical analyses).


    The development of the theoretical work also benefits the development of statistical applications.
The MCMC simulation techniques have been applied to develop practical statistical inferences for
almost all problems in (bio) statistics, for example, the problems in longitudinal data analysis, im-
age analysis, genetics, contagious disease epidemics, random spatial pattern, and financial statistical
models such as GARCH and stochastic volatility.


    The simplicity of the underlying principle of MCMC is a major reason for its success. However
a substantial complication arises as the underlying target problem becomes more complex; namely,
how long should one run the Markov chain so as to ensure that it is close to equilibrium? According to
[Gelm04], with n = 100 independent samples should be enough for reasonable posterior summaries,
but in some cases more samples are needed to assure more accuracy.




3.2 Markov chains

The essential theory required in developing Monte Carlo methods based on Markov chains is pre-
sented here. The most fundamental result is that certain Markov chains converge to a unique invariant
distribution, and can be used to estimate expectations with respect to this distribution. But in order to
reach this conclusion, some concepts need to be defined firstly.


    A Markov chain is a series of random variables, X0 , . . . , Xn , also called a statistic process, in
which only the value of Xn−1 influences the distribution of Xn . Formally:


              P (Xn = xn |X0 = x0 , . . . , Xn−1 = xn−1 ) = P (Xn = xn |Xn−1 = xn−1 )              (3.1)

                                                        21
3. Posterior Simulation


where the Xn−1 have a common range called the state space of the Markov chain.


    The common language to refer to different situations in which a Markov chain can be found is
the following. If Xn = i, it is said that the chain is in the state i in the step n or that it has the value
i in the step n. This language confers the chain certain dynamic view, which is corroborated by the
main tool to study it: the transition probabilities P (Xn+1 = j|Xn = i), which are represented by the
transition matrix P = (Pij ) with Pij = P (Xn+1 = j|Xn = i) . This is used to show the probability
of changing of state i to state j.


    Due to the fact that in major interesting applications Markov chains are homogeneous, the transi-
tion matrix can be defined from the initial probability, P0 = P (X1 = j|X0 = i). Regarding to this, a
Markov chain Xt is homogeneous if P (Xn+1 = j|Xn = i) = P (X1 = j|X0 = i) for all n, i, j.


    Furthermore, using Chapman- Kolmogorov equation, it can be shown that, given the transition
matrixes P and, for step n, Pn of a homogenous Markov chain, then Pn = P n .


    On the other hand we will see the concepts of invariant or stationary distribution, ergodicity and
irreducibility, which are indispensable to reach the main result. It will be assumed that Xt is a ho-
mogenous Markov chain.


    Then, vector P is an invariant distribution of the chain Xt if satisfies:

   a) πj ≥ 0 such as       j   πj = 1.

   b) π = πP .

    That is, a stationary distribution over the states of a Markov chain is one that persists forever once
it is reached.


    The concept of ergodic state requires making other definitions clear such as recurrence and aperi-
odicity:

    • The state i is recurrent if P (Xn = i|X0 = i) = 1 for any n ≥ 1. Otherwise, it is transient.
       Moreover, i will be positive recurrent if the expected (average) return time is finite, and null
       recurrent if it is not.


                                                    22
3. Posterior Simulation


    • The period of a state i, denoted by d, is defined as di = mcd(n : [Pn ]ii > 0). The state i is
       aperiodic if di = 1, or periodic if it is greater.



    Then a state is ergodic if it is positive recurrent and aperiodic. The last concept to define is the
irreducibility. A set of states C ∈ S, where S is the set of all possible states, is irreducible if for all
i, j ∈ C:

    • i and j have the same period.

    • i is transient if and only if j is transient.

    • i is recurrent if and only if j is null recurrent.

    Now, having all these concepts in mind, we can know if a Markov chain has a stationary distribu-
tion with next lemma:

Lema 3.2.1. Let Xt be a homogenous and irreducible Markov chain. The chain will have only one
stationary distribution if, and only if, all the states are positive recurrent. In that case, it will have
inputs given by πi = µi −1 , where µi denotes the expected return time of the state i.

    The relation with the long time behaviour is given by this other lemma:

Lema 3.2.2. Let Xt be a homogenous, irreducible and aperiodic Markov chain. Then

                                           1
                              [Pn ]ij −→        for all i, j ∈ S       as n   ∞                      (3.2)
                                           µi


3.3 Monte Carlo Integration

Monte Carlo integration estimates the integral E[g(θ)] by obtaining samples θt , t = 1, . . . , n from
the posterior distribution p(θ|y) and averaging

                                                            n
                                                     1
                                           E[g(θ)] =              g(θt )                             (3.3)
                                                     n
                                                            t=1

    where the function g(θ) represents the function of interest to estimate. Note that if samples
θt , t = 1, . . . , n has p(θ|y) as its stationary distribution, the θt form a Markov chain.




                                                      23
3. Posterior Simulation


3.4 Gibbs sampler

In many models, it is not easy to draw directly from the posterior distribution p(θ|y). However, if the
parameter θ is partitioned into several blocks as θ = (θ1 , . . . , θp ) where θj for j = 1, . . . , p, then the
full conditional posterior distributions, p(θ1 |y, θ2 , . . . , θp ), . . . , p(θp |y, θ1 , . . . , θp−1 ) , could be sim-
ple to draw from to obtain a sequence θ1 , . . . , θp . For instance, in the Normal linear regression model
it is convenient to set j=2, with θ1 = β and θ2 = σ 2 , and the full conditional distributions would be
p(θ1 = β|y, θ2 = σ 2 ) and p(θ2 = σ 2 |y, θ1 = β), which are very useful in the Normal independent
model which will be explained later.


    The Gibbs sampler is defined by iterative sampling from each of those p conditional distributions:

   1. Set a starting value, θ0 = (θ2 , . . . , θp ).
                                   0            0


   2. Take random draws
          1                0            0
       - θ1 from p(θ1 |y, θ2 , . . . , θp )
          1                1            0
       - θ2 from p(θ2 |y, θ1 , . . . , θp )
        .
       -.
        .
          1                1            1
       - θp from p(θp |y, θ1 , . . . , θp−1 )

   3. Repeat step 2 as necessary.

   4. Reject those θ affected by θ0 = (θ2 , . . . , θp ), that is the first p − 1 draws, and average the rest
                                        0            0

       of draws applying the Monte Carlo integration.



    For instance, in the Normal regression model we would have:

   1. Set a starting value, θ0 = (θ2 = (σ 2 )0 ).
                                   0
                                             2

   2. Take random draws

       - θ1 = β1 from p(θ1 = β|y, θ2 = (σ 2 )0 )
          1    1                   0
                                             2

       - θ2 = (σ 2 )1 from p(θ2 = σ 2 |y, θ1 = β)
          1
                    2
                                           1


   3. Repeat step 2 as necessary.

                       1    1
   4. Eliminate those θ1 = β1 and average the rest of draws applying the Monte Carlo integration.

                                                           24
3. Posterior Simulation


    Those values dropped which are affected by the starting point are called the burn-in. Generally,
any set of values which are discarded in a MCMC simulation is called the burn-in. The size of the
burn-in period is the subject of current research in MCMC methods.


    As the state of each draw depends on the state of the previous one, the sequence is a Markov
chain. More detail information can found in [Chen00], [Mart01] or [Rossi06].




3.5 Metropolis-Hastings sampler and its special cases

3.5.1 Metropolis-Hastings sampler

The Metropolis-Hastings method is adequate to simulate models that are not conditionally conjugate.
Furthermore, it can be combined with the Gibbs sampler to simulate posterior distributions where
some of the conditional posterior distributions are easy to sample from and other ones are not. As
the algorithms above explained, this is based on formulating a Markov chain, but using a proposal
distribution, q(.|θt ), which depends on the current state θt , to generate a new proposed sample θ∗ .
This proposal is accepted as the next state with probability given by

                                                          p(θ∗ |y)q(θt |θ∗ )
                                α(θt , θ∗ ) = min 1,                                             (3.4)
                                                          p(θt |y)q(θ∗ |θt )
    If the point θ∗ is not accepted, then the chain does not move and θt+1 = θt . According to
[Mart01], the steps to follow are:

   1. Initialize the chain to θ0 and set t=0.

   2. Generate a candidate point θ∗ from q(.|θt ).

   3. Generate U from a uniform (0,1) distribution.

   4. If U ≤ α(θt , θ∗ ) then set θt+1 = θ∗ , else set θt+1 = θt .

   5. Set t=t+1 and repeat steps 2 trough 5.

   6. Take the average of the draws g(θ1 ), . . . , g(θn )

    Note that it should be, not only recommendable, but also essential that the proposal distribution
q(·|θt ) were easy to sample from.



                                                     25
3. Posterior Simulation


    There are some special cases of this method. The most important are briefly explained below. As
well as those, it can shown according to [Gelm04] that the Gibbs sampler is another special case of
the Metropolis-Hastings algorithm where the proposal point is always accepted.



3.5.2 Metropolis sampler

This method is a particular case of the Metropolis-Hastings sampler where the proposal distribution
has to be symmetric. That is,


                                          q(θ∗ |θt ) = q(θt |θ∗ )                             (3.5)

    for all θ∗ and θt . Then, the probability of accepting the new point is

                                                           p(θ = θ∗ |y)
                                 α(θt , θ∗ ) = min 1,                                         (3.6)
                                                           p(θ = θt |y)
    The same procedure seen in the Metropolis-Hastings sampler has to be followed.


3.5.3 Random-walk sampler

This special case refers to a proposal distribution of the form


                                        q(θ∗ |θt) = q(|θt − θ∗ |)                             (3.7)

    And the candidate point is θ∗ = θt + z, where z is called the increment random variable from q.
Then, the probability of accepting the new point is

                                                           p(θ = θ∗ |y)
                                 α(θt , θ∗ ) = min 1,                                         (3.8)
                                                           p(θ = θt |y)
    The same procedure seen in the Metropolis-Hastings sampler has to be followed.


3.5.4 Independence sampler

The last variation has a proposal distribution such that


                                           q(θ∗ |θt ) = q(θ∗ )                                (3.9)

    So it does not depend on θt . Then, the probability of accepting the new point is



                                                    26
3. Posterior Simulation



                                                        p(θ∗ |y)p(θt )                       w(θ∗ )
                          α(θt , θ∗ ) = min 1,                              = min 1,                                   (3.10)
                                                        p(θt |y)p(θ∗ )                       w(θt )
    where

                                                                p(θ|y)
                                                    w(θ) =                                                             (3.11)
                                                                 q(θ)
    It is important to remark that to make this method works well, the proposal distribution q should
be very similar to the posterior distribution p(θ|y).


    The same procedure seen in the Metropolis-Hastings sampler has to be followed.


3.6 Importance sampling

Importance sampling is a variance reduction technique that can be used in the Monte Carlo method.
The idea behind this method is that certain values of the input random variables in a simulation have
more impact on the parameter being estimated than others. So instead of taking a simple average,
importance sampling takes a weighted average.


    Let q(θ) be a density from which is easy to obtain random draws θ(s) for s = 1, . . . , S. Then q(θ)
is called the importance function, and the importance sampling can be defined:

                            PS       (s) )g(θ (s) )
                             s=1 w(θ                                               p(θ=θ(s) |y)
    The function gs =
                 ˆ            PS          (s) )
                                                    ,   where w(θ(s) ) =            q(θ=θ (s) )
                                                                                                ,   converges to E[g(θ)|y] as
                                s=1 w(θ
S −→ inf.

                                                                   p∗ (θ|y)
    In fact, w(θ(s) ) can be formulated by w(θ(s) ) =              q ∗ (θ|y) ,   where the new densities are proportional
to the old ones.


    For more information and details about Markov chain Monte Carlo methods and their application,
the reader is referred to [Chen00], [Gilk95], [Berg05] and [Kend05].




                                                              27
Chapter 4

Sensitivity Analysis

4.1 Introduction

There will be many times where the researcher, having selected a model, wants to consider the pos-
sibility of choosing another model or simply to compare with it. Then it is necessary any tool that
help him to compare both models, and to select one of them. This will be useful to make the variables
selection too in the regression models. In this section, the Bayesian Model Comparison is briefly
discussed, remarking those methods which will be more useful.


    In the Bayesian field, common methods for model comparison are based on the following: sepa-
rate estimation, comparative estimation and simultaneous estimation.


    Comparative estimation is based on distance measures such as entropy distance, and the underly-
ing idea is that the more parsimonious model may be preferred between two models whose distance
between their posterior or posterior predictive distributions is sufficiently small.


    Simultaneous model estimation let us compare many models at the same time, and the main meth-
ods are reversible jump MCMC (RJMCMC) and birth and death MCMC (BDMCMC).


    Separate estimation compares two models not necessarily nested, and the most used terms are the
posterior predictive distributions and the posterior probability of the model. Since methods which can
be considered to be into this type are the most accepted, we will explain some of them, remarking the
most important ones.


                                                   28
4. Sensitivity Analysis


4.2 Bayes Factor

This is probably the dominant method of Bayesian model testing. It is the analogue of likelihood ratio
tests within the frequentist framework, and the basic intuition is that prior and posterior information
are combined in a ratio that provides evidence in favour of one model specification versus another.


    Let us suppose we have two models to compare, M1 and M2 . Let p(M1 ) and p(M2 ) be the
prior probabilities for the model M1 , M2 , respectively, and p(M1 |y) and p(M2 |y) be the posterior
probabilities for the model M1 , M2 , respectively. Then the Bayes Factor is:

                                           p(y|M1 )   p(M1 |y)p(M1 )
                                  B(y) =            =                                             (4.1)
                                           p(y|M2 )   p(M2 |y)p(M2 )
    This means that the Bayes Factor chooses the very model for which the marginal likelihood of the
data, namely p(y|Mi ), is maximum. Therefore, the value of a factor gives evidence of the preference
between two models.
    According to [Jeff61], the following interpretation is suggested:



                             Bayes Factor                 Interpretation



                                           1
                                 B(y) <   10        Strong evidence for M2

                            1                  1
                            10   < B(y) <      3   Moderate evidence for M2

                             1
                             3   < B(y) < 1         Weak evidence for M2


                             1 < B(y) < 3           Weak evidence for M1


                           13 < B(y) < 10          Moderate evidence for M2


                                 B(y) > 10          Strong evidence for M1


                                  Table 4.1: Bayes Factor Interpretation



                                                     29
4. Sensitivity Analysis


    The marginal likelihood usually involves an integral which can be analytically evaluated only for
some special cases. So, while Bayes Factors are rather intuitive, they are often quite difficult or even
impossible to calculate from a practical point of view. Because of this, there are other alternatives to
this method.


4.3 Alternative Stats to Bayes Factor
    ˆ
Let θ be the posterior mean of the posterior distribution and let us assume that the Bayes estimate for
the parameters θ is approximately equal to the maximum likelihood estimate. Then, the following
stats, from which some of them are used in frequentist statistics, could be useful diagnostics:

    • The likelihood Ratio, which will always favour the unrestricted model, and where the ratio is:


                                             ˆ                        ˆ
                            Ratio = −2[log(p(θRestricted |y)) − log(p(θF ull |y))]                 (4.2)

       The ratio is distributed as a χ2 , where p is the number of parameters, including the intercept.
                                      p

    • Akaike Information Criterion (AIC), where a ratio between AIC1 (AIC for M1 ) and AIC2
       (AIC for M2 ) less than 1 indicates that M1 is better. This method let the models not have to be
       nested and it favours more complicated models. The stat is:


                                                      ˆ
                                        AIC = −2log(p(θ|y)) + 2p                                   (4.3)

       where p is the number of parameters, including the intercept. It is used to be better than the
       previous one.

    • The Bayesian Information Criterion (BIC), which is also known as Schwarz Criterion (SC),
       Schwarz Information Criterion (SIC) or Schwarz Bayesian Criterion (SBC). As it occurred
       with the AIC, this method can be used for non- nested models. The BIC is:


                                                    ˆ
                                      BIC = −2log(p(θ|y)) + plog(n)                                (4.4)

       where p is the number of parameters, including the intercept, and n is the sample size. Given any
       two estimated models, the model with the lower value of BIC is the one to be preferred. Since
       this method promotes model parsimony by penalizing models with increased model complexity
       (larger p) and sample size, say n, it may be preferred to the AIC.

                                                   30
4. Sensitivity Analysis


    • The Deviance Information Criterion (DIC), which is a new statistic introduced by the devel-
       opers of the WinBugs software, who explained it in a detailed way in [Spie03]. The main and
       most important difference with the previous methods is that this is not an approximation of the
       Bayes Factor. It is a hierarchical modelling generalization of the AIC and the BIC, and it is
       particularly useful when the posterior distributions have been obtained by simulation. The DIC
       is:

                                              L
                                      −4                                    ˆ
                                DIC =              log(p(y|θl )) + 2log(p(y|θ))                    (4.5)
                                      L
                                             l=1

       where θL is the draw which has been obtained by simulating the posterior distribution in the
       L iteration. This method also penalizes against higher dimensional models, and it may be
       preferred to previous ones, mainly in linear models context.


4.4 Highest Posterior Density Intervals

All the techniques mentioned above typically require the elicitation of informative priors. However,
there could be Bayesians who were interested to do model comparison with a non-informative prior.
In such case, there are other techniques which can be used. Since the most common one in regression
analysis is the Highest Posterior Density Interval (HPDI), we will only explain this method and let
will the interested reader reference to the below citations.


    Before defining the idea of HPDI is required to make the concept of credible set clear. Let us
assume that ω is the region over which the coefficients β are defined. Then, C ⊆ ω is a 100(1 − α)%
credible set with respect to β if:


                                         p(β ∈ C|y) = 1 − α                                        (4.6)

    Since there are commonly numerous credible intervals, it is used to choose the one with smallest
area, namely the Highest Posterior Density Interval.


    Formally, a 100(1 − α)% highest posterior density interval for α is a 100(1 − α)% credible inter-
val for θ with the property that it has a smaller area than any other 100(1−α)% credible interval for β.




                                                    31
4. Sensitivity Analysis


    This is the Bayesian analogue of confidence intervals within frequentist framework, but now the
meaning is more in line with commonsense.


    More information about all these methods and other variants of the Bayes Factor can be found in
a more detailed way in [Aitk97], [Berg98], [Chen00], [Cong06] or [Koop03].


4.5 Model Comparison Summary

A model comparison summary can be found in Tables 4.2 and 4.3 where the mark symbols mean:

    • * Good

    • ** Better

    • *** Still better

    • **** Probably the best




                                                32
4. Sensitivity Analysis




         Method                              Formulae                                                  Interpretation                Mark

                                                   p(y|M1 )                                      1
      Bayes Factor                        B(y) =   p(y|M2 )                             B(y) <  10      Strong evidence for M2        *
                                                                               1                   1
                                                                               10   < B(y)      <3       Moderate evidence for M2
                                                                                    1
                                                                                    3   < B(y) < 1 Weak evidence for M2
                                                                                    1 < B(y) < 3 Weak evidence for M1
                                                                               3 < B(y) < 10 Moderate evidence for M1
                                                                                        B(y) > 10 Strong evidence for M1




33
     Likelihood Ratio                      ˆ                     ˆ
                        Ratio=−2 log p(βRestricted |y) − log p(βF ull |y)       Ratio > χ2             Reject the restricted model    *
                                                                                         p

                                                                                Ratio < χ2
                                                                                         p             Reject the restricted model



                                                ˆ                                        AIC1
          AIC                      AIC=−2 log p(β|y) + 2p                                AIC2   < 1 M1 is better than M2              **
                                                                                         AIC1
                                                                                         AIC2   > 1 M2 is better than M1


                                                   Table 4.2: Sensitivity Summary I
4. Sensitivity Analysis




     Method                      Formulae                                     Interpretation          Mark


                                 ˆ                                BIC1
      BIC            BIC=−2log p(β|y) + plog(n)                   BIC2     < 1 M1 is better than M2   ***
                                                                  BIC1
                                                                  BIC2     > 1 M2 is better than M1



                     1     L                              ˆ       DIC1
      DIC     DIC=−4 L     i=1 log   p(y|β L ) + 2log p(y|β)      DIC2     < 1 M1 is better than M2   ****




34
                                                                  DIC1
                                                                  DIC2     > 1 M2 is better than M1



                                                                     There is a probability of
     HPDI     HPDI=p(β ∈ C|y) = 1 − α with the smallest area         100(1 − α)% of β being           ****
                                                                     in the region C


                                       Table 4.3: Sensitivity Summary II
Chapter 5

Regression Analysis

5.1 Introduction

Regression analysis is a statistical tool for the investigation of relationships between variables, such
as models the relationship between one or more random variables y, called the response variables,
and an independent variable or variables x, called the predictors. That is, it allows us to examine
the conditional distribution of y given x, denoted by p(y|β, x), when the n observations (xi , yi ), are
exchangeable.


    Applications of regression analysis exist in almost every field. In economics, the dependent vari-
able might be Ibex 35 index and the independent variables could be Dow Jones and FTSE 100 indexes.
In political science, the dependent variable might be a state’s level of welfare spending and the inde-
pendent variables measures of public opinion and institutional variables that would cause the state to
have higher or lower levels of welfare spending. In sociology, the dependent variable might be a mea-
sure of the social status of various occupations and the independent variables characteristics of the
occupations (pay, qualifications, etc.). In psychology, the dependent variable might be individual’s
racial tolerance as measured on a standard scale and with indicators of social background as indepen-
dent variables. In education, the dependent variable might be a student’s score on an achievement test
and the independent variables characteristics of the student’s family, teachers, or school.


    Before explaining the Bayesian regression, it will be reviewed the classical regression model,
focusing on those parts useful for the former.




                                                  35
5. Regression Analysis


5.2 Classical Regression Model

The simplest version of this model is the Normal linear model, where the variable y given X is a
Normal distribution whose mean is a linear function of X:


                      E(yi |β, X) = β0 + β1 xi1 + · · · + βp xip             for all i = 1, . . . , n.       (5.1)

      Even though the mean of y is a linear function of X, the real and the observed data do not fit in,
and this is due to a random error, namely , so the appropriate form to reach a probabilistic linear
model is through


                          yi = β0 + β1 xi1 + · · · + βp xip +         i    for all i = 1, . . . , n.         (5.2)

where    i   is the term of the random error, which has a Normal distribution with mean 0 and variance
σ 2 . Due to the fact that the random variable yi is the result of the addition of a constant (the mean)
and a random variable which has Normal distribution, yi follows a Normal distribution:


                         yi        N (β0 + β1 xi1 + · · · + βp xip , σ 2 ) for all i = 1, . . . , n          (5.3)

      When the variance of y given X, β is assumed to be constant over all observations, the model
will be called ordinary linear regression model.


      In a matrix notation, the Normal linear model can be denoted by


                                                         Y = Xβ +                                            (5.4)

and


                                                     Y     N (Xβ, σ 2 I)                                     (5.5)

where:
                                                                                                  
                              y1            1 x11 . . . x1p                           β0                 0
                                                                                            
                     y2                  1 x21 . . . x1p                  β1                1
                                                                                            
                 Y = .               X = .  .  ..     .                 β= .              =.
                    .                    .  .      .  .                   .                .
                    .                    .  .         .                   .                .
                     yn                     1 xn1 . . . xnp                    βp                        p

and I is the identity matrix.
                                                                      ˆ
It can be shown that the ordinary least squares estimate of β, namely β, is

                                                              36
5. Regression Analysis



                                                              
                                                               ˆ
                                                              β0
                                                              
                                                             β0 
                                                               ˆ
                                          ˆ                   
                                          β = (X X −1 )X Y =  .                                                (5.6)
                                                             .
                                                             .
                                                               ˆ
                                                              β0
where

                                                                                                         
                                       n                          n                             n
                          n            i=1 xi1       ...          i=1 xik                       i=1 yi
                                                                                                         
                        n             n    2                    n                           n          
                        i=1 xi1       i=1 xi1       ...         i=1 xik xik                 i=1 xi1 yi 
        XX=              .              .           ..             .               XY =        .         
                         .
                          .              .
                                         .              .           .
                                                                    .                           .
                                                                                                  .         
                                                                                                         
                         n            n                           n    2                       n
                         i=1 xik      i=1 xik xi1    ...          i=1 xik                      i=1 xik yi

    As well, it can be shown that

                                                        ˆ
                                                      E(β) = β                                                   (5.7)
                                  ˆ
    Furthermore, the variances of β are proportional to the elements of the matrix (X X)−1 , denoted
by C, which multiplied by the constant σ 2 represents the covariance matrix. The elements of the
diagonal of that matrix are the variances of

                                        ˆ
                                   V ar(βj ) = σ 2 Cjj      for all j = 0, 1, . . . , p.                         (5.8)

where C = (X X)−1 .


    Likewise, the classical estimation of σ 2 is given in terms of the sum of squares error, SSE =
  n
  i=1 (yi   − yi )2 , and is given by the mean squares error:
              ˆ

                                    SSE           ˆ      ˆ
                                             (YX β) (YX β)            ˆ
                                                               Y Y −β X Y
                          σ 2 = M SE =    =                  =                                                   (5.9)
                                    n−p           n−p              n−p
where n is the number of observations and p corresponds to the number of parameters β.


    Regarding individual regression coefficients β, there will be sometimes where to make hypothesis
tests about them can be interesting in order to evaluate the potential value of each regressor variable
of the model. The statistic to use in these cases is

                                                               ˆ
                                                              βj
                                                    T0 =                                                        (5.10)
                                                              σ 2 Cjj
                                                              ˆ

                                                            37
5. Regression Analysis


                                                                           ˆ
where Cjj is the element of the diagonal of the matrix XX corresponding to βj . So the null hypoth-
esis will be rejected if |T0 | > tn−p, α .
                                       2




    Finally, once the model has been estimated and validated, one of its more important applications
consists of new predictions about the response variable Y when a new explanatory variable X ∗ is
observed. In this case, a point estimate would be

                                                  ˆ       ˆ
                                                 Y ∗ = X∗ β                                    (5.11)

and a confidence interval for this future observation will be

                                  ˆ
                                 Y ∗ ± tn−p, α       σ 2 (1 + X ∗ (X X)−1 X ∗ )
                                                     ˆ                                         (5.12)
                                             2


where


                                         X ∗ = [x∗
                                                 1       x∗
                                                          2    ...   x∗ ]
                                                                      k                        (5.13)

    These results can be found in a more detailed way in [Mont02], [Zamo01] or [Mat´ 95].
                                                                                   e


    To understand better all that has been said above, let us see a practical application in the Stock
Markets. Let us suppose we are interesting in investigating the relationship between Ibex 35 index
and Dow Jones, FTSE 100 and Dax indexes the previous day. For such purpose, we have the points
(taken as the mean of the daily maximum and minimum points) from January to October in 2006; this
is during the first ten months in 2006.


    The model to adjust is:


                IBEX35t = β1 DowJonest−1 + β2 F T SE100t−1 + β3 DAXt−1 +              t

where
                                                 t      N (0, σ2 )
                  ˆ
    The estimates β are calculated according to what said before resulting:
                                                           
                                              β1       1.0147
                                                           
                                             β2  = −2.0085
                                                           
                                              β3      2.1082


                                                        38
5. Regression Analysis


    The estimate for the variance σ 2 is:


                                                σ 2 = 332.182

    So the model calculated is:



                 IBEX35t = 1.0147 × DowJonest−1 − 2.0085 × F T SE100t−1 +
                                      +2.1082 × DAXt−1 +            t


where


                                            t     N (0, 332.182 )

    This indicates that when Dow Jones or DAX goes up, Ibex 35 will increase the next day too.
However, when FTSE100 arises, Ibex 35 will decrease the next day.


    If we use this model to predict the value which Ibex 35 will have on November, 1st , when DOW
Jones, FTSE 100 and DAX values are known the previous day, we have:


             IBEX35t = 1.0147 × 12067 − 2.0085 × 6155 + 2.1082 × 6287 = 13137

    Finally, a comparison between the multiple and the simple Normal linear regression models is
shown in Table 5.1 indicating the different parameters to use in each case. The goal of this compar-
ison is to make clear that the simple Normal regression is a particular case of the multiple Normal
regression where there is only a regressor variable or predictor.


5.3 The Bayesian Approach

The main difference between classical and Bayesian approach of the regression analysis is that the
latter treats the parameters like random variables which have a distribution. The aim of Bayesian
approach is to make inferences through the posterior distribution based on a prior distribution for
the parameters β and σ 2 of the Normal linear model and to provide a predictive distribution for the
model’s predictions.


    As it was said in the preceding section, and according to [Rossi06], the Normal linear regression
model is given by:

                                                     39
5. Regression Analysis




                   Multiple Normal Linear Regression             Simple Normal Linear Regression



  Function         yi = β0 + β1 xi1 + · · · + βp xip +    i              y = β0 + β1 x +


    Mean             µi = β0 + β1 xi1 + · · · + βp xip                       µ = β0 + β1 x


  Variance                          σ2                                               σ2


    Model                    Y     N (µ, σ 2 I)                              Y      N (µ.σ 2 )

        ˆ
        β                  ˆ
                           β = (X X)−1 X Y

      ˆ
    E[β]                             β                                                β

                                                                                                      ¯
        ˆ
   V ar(β)                       ˆ
                            V ar(βj ) = σ 2 Cjj                       ˆ
                                                                 V ar(β0 ) = σ 2      1
                                                                                          +     Pn x
                                                                                                       2
                                                                                      n                   x 2
                                                                                                 i=1 (xi −¯)


                                                                                              2
                                                                           ˆ
                                                                      V ar(β1 ) =       Pn σ
                                                                                         i=1 (xi −¯)2
                                                                                                  x

                                                                                   Pn
                                        ˆ
                                   Y Y −β X Y                                                 y 2
                                                                                     i=1 (yi −ˆ1 )
      σ2                    σ2 =       n−p                               σ2 =
                                                                         ˆ                n−2


                ˆ                                             ˆ                             1        (x −¯)2
                                                                                                           x
 Prediction     Yf ± tn−p, α     σ 2 (1 + Xf (X X)−1 Xf )
                                 ˆ                            Yf ± tn−p, α       σ 2 (1 +
                                                                                 ˆ          n   +   Pn f
                                                                                                              x 2
                                                                                                                  )
                           2                                             2                           i=1 (xi −¯)



                 Only applied to those data in same            Only applied to those data in same
 Limitation
                 range as sampled data                         range as sampled data


                         Table 5.1: Multiple and Simple Regression Comparison




                                                  Y = Xβ +                                                      (5.14)

where


                                                     40
5. Regression Analysis




                                                           N (0, σ 2 I)                         (5.15)

    So


                                            Y |X, β, σ 2       N (Xβ, σ 2 I)                    (5.16)

    For simplicity of notation, we will not explicitly include X in our conditioning set for regression
model.


    Using the definition of the multivariate Normal density, the likelihood function is obtained:

                                            (σ 2 )−n/2         −1
                          p(Y |β, σ 2 ) =                exp        (Y − Xβ) (Y − Xβ)           (5.17)
                                               (2π)n           2σ 2
    It will be convenient to write


                                               (Y − Xβ) (Y − Xβ)                                (5.18)

in terms of the ordinary least squares estimators



                                                       v =n−p                                   (5.19)
                                                ˆ
                                                β = (X X)−1 X Y                                 (5.20)
                                                               ˆ        ˆ
                                                         (Y −X β) (Y −X β)
                                               s2 =            n−p                              (5.21)

    So

                                                           ˆ         ˆ
                            (Y − Xβ) (Y − Xβ) = vs2 + (β − β)X X(β − β)                         (5.22)

    Then


                             1           −1        ˆ            ˆ                 −vs2
     p(Y |β, σ 2 ) =                 exp      (β − β) (X X)(β − β) (σ 2 )−v/2 exp               (5.23)
                         (2π)n/2 σ p     2σ 2                                     2σ 2

    As it was said before, n corresponds to the number of observations and p refers to the number of
parameters β. This new form of expressing the likelihood function would be more useful to find a
natural conjugate prior distribution, which would have the same form that the former has.



                                                             41
5. Regression Analysis


    The prior distribution for β and σ 2 , denoted by p(β, σ 2 ), can be written in a more convenient
way applying the definition of the joint distribution:


                                            p(β, σ 2 ) = p(β|σ 2 )p(σ 2 )                           (5.24)

    Note that β and σ 2 are supposed to be dependent, which will rarely occur. Some authors prefer
                                     1
to work with the error precision,    σ2
                                           say, instead of the variance σ 2 .


    All this is very similar to that explained in the Bayesian Analysis for the Normal distribution. The
term of the first parenthesis in the likelihood function suggests a form of a Normal distribution for the
parameter β knowing σ 2 . So

                                                          −1
                          p(β|σ 2 ) ∝ (σ 2 )−p exp             (β − β0 ) V0−1 (β − β0 )             (5.25)
                                                          2σ 2
and, hence,


                                                β|σ 2     N (β0 , σ 2 V0 )                          (5.26)

    According to [Rossi06] the term of the second parenthesis in the likelihood function suggests a
form of an inverse gamma distribution for the parameter σ 2 (see appendix A). So

                                                           v0             −v0 s2
                                    p(σ 2 ) ∝ (σ 2 )−(      2
                                                              +1)
                                                                    exp         0
                                                                                                    (5.27)
                                                                           2σ 2
and, hence,

                                                                 v0 v0 s2
                                           σ2       Inv − G        ,    0
                                                                                                    (5.28)
                                                                 2   2
    Note that there is an extra term (σ 2 )−1 here which is not suggested by the form of the likelihood
explained above. This term can be rationalized by viewing the conjugate prior as arising from the
posterior of a sample of size v0 with sufficient statistics, s2 , β0 , formed with the noninformative prior,
                                                             0
p(β, σ 2 ) ∝ σ −2 , which will be briefly explained later.


    So the natural conjugate prior distribution of the parameters β and σ 2 is:

                                         p+v0             −1
                 p(β, σ 2 ) ∝ (σ 2 )−(     2
                                              +1)
                                                    exp        [v0 s2 + (β − β0 ) V0−1 (β − β0 )]
                                                                    0                               (5.29)
                                                          2σ 2
and, hence,


                                                           42
5. Regression Analysis




                                β, σ 2     N − Inv − χ2 (β0 , V0 s2 ; v0 , s2 )
                                                                  0         0                      (5.30)

where the prior hyper-parameters β0 ,V0 ,v0 and s2 show the knowledge that the researcher has about
                                                 0
the problem and her or his confidence in it. Furthermore, the parameter β0 measures the marginal
effect of the explanatory variable on the dependent variable. As well, V0 indicates the uncertainty
about the prior information and it plays the same role than (X X)−1 does in the classical approach,
v0 represents a fictitious data set so it plays a similar role than n and s2 is an imaginary s2 for those
                                                                          0
fictitious data. In terms of the distribution, β0 and V0 σ 2 represent the location and scale of β, respec-
tively, and v0 and s2 the degrees of freedom and scale of σ 2 , respectively.
                    0



      Since a conjugate prior distribution has been used, the posterior distribution will have the same
form. That is, the posterior distribution will be a Normal-Scaled Inverse χ2 with a posterior hyper-
parameters β1 , V1 , v1 and s2 . According to [Rossi06] and [Koop03], it can be shown that
                             1



                               β, σ 2 |y    N − Inv − χ2 (β1 , V1 s2 ; v1 , s2 )
                                                                   1         1                     (5.31)

      The relation between the prior and the posterior hyper-parameters, according to [Koop03], is:



                         V1 = (V0−1 + X X)−1                                                       (5.32)
                                                ˆ
                         β1 = V1 (V0−1 β0 + X X β)                                                 (5.33)
                         v1 = v0 + n                                                               (5.34)
                                            ˆ                          ˆ
                     v1 s2 = v0 s2 + vs2 + (β − β0 )[V0 + (X X)−1 ]−1 (β − β0 )                    (5.35)
                         1       0


      As it was mentioned in the Bayesian Data Analysis chapter, a measure is needed to summarize
the posterior distribution, and this is usually the posterior mean, namely E(β|y). According to what
said in previous chapters, the marginal for β will be a multivariate t-distribution (see Appendix A):


                                            β|y     tv1 (β1 , s2 V1 )
                                                               1                                   (5.36)

where

                                                                  ˆ
                                  E(β|y) = β1 = V1 (V0−1 β0 + X X β)                               (5.37)

and


                                                      43
5. Regression Analysis



                                                           v1 s2
                                                               1
                                          V ar(β|y) =            V1                                          (5.38)
                                                          v1 − 2
                                                                                        ˆ
    So the posterior mean is a weighted average of the ordinary least squares estimate, β, and the
prior mean, β1 , where those weights are proportional to the observed data, X X, and the importance
given to the prior, V0−1 , respectively. This should make clear that as prior variance for β is decreased,
greater posterior weight is placed on prior beliefs relative to the data, so the posterior mean will be
closer to the prior mean.

                                                    v1 s2
    The elements of the diagonal of the matrix      v1 −2 V1
                                                        1
                                                               are the variances of β0 , β1 , . . . , βp .

                                           v1 s2
                                               1
                            V ar(βj ) =          V1jj     for all j = 0, 1, . . . , p                        (5.39)
                                          v1 − 2
    Likewise, the marginal posterior for σ 2 is:


                                        σ 2 |y    Inv − χ2 (v1 , s2 )
                                                                  1                                          (5.40)

and, hence,


                                                                   v1 s2
                                                  E(σ 2 |y) =          1
                                                                                                             (5.41)
                                                                  v1 − 2
                                                         2v1 s4
                                                           2
                                 V ar(σ 2 |y) =               1
                                                                                                             (5.42)
                                                   (v1 − 2)2 (v1 − 4)

    So, as we increase the total of fictitious data v0 , then v1 tends towards v0 , and, hence, σ 2 get
closer to s2 .


    Tables 5.2 and 5.3 shows how the different posterior parameters of interest vary depending on the
prior parameters V0 (considering V0 as cIk ) and v0 and the sample size n:
    Table 5.2 means that if the size of the sample increases towards infinity, then the prior information
that the researcher gives has very little or almost none importance, as it occurs if the precision of the
prior distribution for β decreases (that is, V0 increases) towards 0. The difference between both cases
is that in the former the variance of β is lower than in the latter.


    The number of fictitious data does not seem to affect to the posterior mean, but it affects to the
posterior variance increasing it (resp. decreasing) as the fictitious data increase (resp. decrease).


                                                     44
5. Regression Analysis




                               Action             E[β|y]               V ar[β|y]



                         n    Increase   Closer to OLS estimates      Closer to 0


                              Decrease         Closer to β0          Further from 0


                         V0   Increase   Closer to OLS estimates     Further from 0


                              Decrease         Closer to β0           Closer to 0


                         ν0   Increase         Not affected             Increase


                              Decrease         Not affected             Increase


                               Table 5.2: Sensitivity analysis of parameter β



    Table 5.3 refers to the parameter σ 2 , and it means that if the fictitious data increase, then the
information given by the researcher will have much more weight over the posterior mean of σ 2 than
the real data have, and the variance will be lower too. The other way round occurs when the number
of real data increases. Then, the data information will have the most important weight and the prior
information will not have any value. Another interesting result is that as the precision of the prior
distribution for β decreases (that is, V0 increases) the posterior mean of σ 2 will approximate to the
number of real data times the ordinary least estimates.


    Continuing in a different issue, the fact that the natural conjugate prior implies prior information
enters in the same manner as data information helps with prior elicitation. When several priors can
be applied to the same problem, two strategies can be adopted to surmount the possible criticisms.
First, a prior sensitivity analysis can be carried out to demonstrate that results are the same with
different priors chosen. But, if results are sensitive to choice of prior, Bayesian approach allows for
the scientifically honest finding of such a state of affairs. There has been work done on extreme
bounds analysis for quantities such as the posterior mean of a parameter. [Poir95] provides a detailed

                                                    45
5. Regression Analysis




                               Action                E[β|y]                 V ar[β|y]



                         n    Increase      Closer to OLS estimates        Closer to 0


                              Decrease            Closer to s2
                                                             0           Further from 0


                         V0   Increase           Closer to vs2             Closer to 0


                              Decrease      Closer to OLS estimates      Further from 0


                         ν0   Increase            Closer to s2
                                                             0             Closer to 0


                              Decrease      Closer to OLS estimates      Further from 0


                              Table 5.3: Sensitivity analysis of parameter σ 2



discussion about this issue. A second strategy is to use a non-informative prior to let the data speak
loudly and be predominant over prior information. For example, let’s set v0 = 0, and V0−1 = 0. Then


                                β, σ 2 |y    N − Inv − χ2 (β1 , V1 s2 ; v1 , s2 )
                                                                    1         1                (5.43)

where



                                                V1 = (X X)−1                                   (5.44)
                                                         ˆ
                                                    β1 = β                                     (5.45)
                                                     v1 = n                                    (5.46)
                                                  v1 s2 = vs2
                                                      1                                        (5.47)

    With this non-informative prior, all of these formulas involve only data information and equal to
ordinary least squares results. Bayesians often write this prior as:


                                                       46
5. Regression Analysis




                                                        p(β, σ 2 ) ∝ σ −2                                (5.48)

    Finally, one of the goals of the Bayesian approach is to provide a predictive model to predict an
unobserved data point generated from the same model that the data set with n observations (N (0, σ 2 )
with the same β). This is denoted by:


                                                        Y ∗ = X ∗β +     ∗
                                                                                                         (5.49)

    where Y ∗ is not observed and            ∗   is independent of .


    Bayesian prediction is based on calculating


                               p(y ∗ |y) =               p(y ∗ |y, β, σ 2 )p(β, σ 2 |y)dβdσ 2            (5.50)

    The key to get the prediction is to find out the form of p(y ∗ |y, β, σ 2 ), since the posterior p(β, σ 2 |y)
has been already calculated, and to test if p(y ∗ |y) is easy to integrate or, on the contrary, a posterior
simulator has to be employed.


    Since   ∗   is independent of , then Y ∗ is independent of Y , and p(y ∗ |y, β, σ 2 ) can be written as
p(y ∗ |β, σ 2 ), which is a multivariate Normal, as it was seen before.

                                                    T
                                          (σ 2 )− 2                1
                      p(y ∗ |β, σ 2 ) =                  exp −         (y ∗ − X ∗ β) (y ∗ − X ∗ β)       (5.51)
                                                 (2π)S            2σ 2
    Multiplying this by the posterior obtained previously and integrating yields a multivariate t:


                                    y ∗ |y        tv1 (X ∗ β1 , s2 (IT + X ∗ V1 X ∗ ))
                                                                 1                                       (5.52)

where T is the number of observed X ∗ .


    It is easy to see that:


                              E(y ∗ |y) = X ∗ β1 V ar(y ∗ |y) = s2 (IT + X ∗ V1 X ∗ )
                                                                 1                                       (5.53)

    A brief summary that compares the classical and the Bayesian approaches is displayed to note the
coincidences and differences between them.



                                                               47
5. Regression Analysis




                   Classical Regression                              Bayesian Regression



                    ˆ
                    β = (X X)−1 X Y                                                  ˆ
                                                               β1 = V1 V0−1 β0 + X X β

                                                                                        ˆ       ˆ
                                                                                       (β−β0 ) (β−β0 )
                                        ˆ                               ν0 s2 +νs2 +
                                 Y Y −βX Y                                  0           V0 +(X X)−1
                        σ2
                        ˆ    =      n−p                        s2
                                                                1   =                  ν1


                               ˆ
                             E[β] = β                                     E[β|y] = β1

                        ˆ                                                               ν1 s2
                   V ar βj = σ 2 Cjj                                V ar (βj |y) =      ν1 −2 V1jj
                                                                                            1




               Y ∗ |y        tn−p X ∗ β, σ 2 IT
                                         ˆ           Y ∗ |y        tν1 X ∗ β1 , s2 IT + X ∗ V1 X ∗
                                                                                 1




                             Table 5.4: Classical and Bayesian regression comparison



    A very interesting and more exhaustive comparison between these two approaches can be read in
the article written by [Urba92], where he explains the advantages and disadvantages of using each of
them.


5.4 Normal Linear Regression Model subject to inequality constraints

In this section, let us guess we want to impose inequality constraints on the coefficients in the Normal
linear regression model, such as βj ∈ A, where A is the region of all valid values of the coefficients.
This is quite simple in Bayesian regression since they are imposed through the prior distribution:


                              p(β, σ 2 )     N − Inv − χ2 (β0 , V0 s2 ; v0 , s2 )1(β ∈ A)
                                                                    0         0                          (5.54)

where β0 , V0 , v0 and s2 are prior hyper-parameters to be chosen and 1(β ∈ A) is the indicator func-
                        0
tion, which equals 1 if β ∈ A and 0 otherwise.


    Likewise, the posterior distribution for β is now:


                                                              48
5. Regression Analysis




                                      p(β|y) ∝ tv1 (β1 , s2 V1 )1(β ∈ A)
                                                          1                                        (5.55)

where β1 , V1 , v1 and s2 were defined previously.
                        1
    So the difference introducing inequality constraints is that we must add the indicator function now.


    This can result very easy, but for general choice of A neither analytical posterior results nor Gibbs
sampling work. The most suitable method is the importance sampling, which has already explained.
In this case, according to [Koop03] the importance function is:


                                            q(β) = tv1 (β1 , s2 V1 )
                                                              1                                    (5.56)

    The strategy consists of getting draws y ∗(s) drawing from p(y ∗ |β (s) , σ 2(s) using the draws β (s)
and σ 2(s) which were obtained for the posterior distribution. Then using these draws (y ∗ )(s) in the
Importance Sampling, the mean and the variance can be calculated.


    Other more simple way consists of ignoring the constraints until the end of simulation, and then
discarding those draws which violate the restrictions. According to [Gelm04], this works reasonably
well if the constraints do not eliminate a large portion of data.


5.5 Normal Linear Regression Model with Independent Parameters

Now, suppose that the parameters β and σ 2 are independent, so


                                           p(β, σ 2 ) = p(β)p(σ 2 )                                (5.57)

    With the same likelihood function as that used in the previous section, this assumption implies that
β follows a Multivariate Normal Distribution with mean β0 , as it occurred with β and σ 2 dependent,
but with variance V0 , and σ 2 has exactly the same Scaled − Inv − χ2 distribution used previously.
That is:


                                 β      N (β0 , V0 )σ 2        Inv − χ2 (v0 , s2 )
                                                                               0                   (5.58)

    The prior joint distribution is




                                                          49
5. Regression Analysis




                                 1                                              v0               v0 s2
        p(β, σ 2 ) ∝        exp − (β − β0 ) V0−1 (β − β0 )           (σ 2 )−(    2
                                                                                   +1)
                                                                                         exp −       0
                                                                                                         (5.59)
                                 2                                                               2σ 2
            β, σ 2       N − Inv − χ2 (β0 , V0 , v0 , s2 )
                                                       0                                                 (5.60)

    As the posterior joint distribution is proportional to the prior times the likelihood:


                                      1 (Y − Xβ) (Y − Xβ)
           p(β, σ 2 |Y ) ∝ exp −                             + (β − β0 ) V0−1 (β − β0 ) ×
                                      2        σ2
                                     n+v0          v0 s2 + vs2
                           × (σ 2 )−( 1 +1) exp − 0 2                                                    (5.61)
                                                       2σ
    Since this function does not take the form of any well-known density, it is interesting to find the
conditional distributions for β, p(β|Y, σ 2 ), and for σ 2 , p(σ 2 |Y, β), because with them any informa-
tion from p(β, σ 2 |Y ) can be obtained through the posterior simulation with the Gibbs sampler already
explained in previous chapters.


    According to [Koop03], it can be shown that those conditional distributions are:


                                  1
          p(β|Y, σ 2 ) ∝ exp − (β − β1 ) V1−1 (β − β1 )                                                  (5.62)
                                  2
                                 p+v0          1
          p(σ 2 |Y, β) ∝ (σ 2 )−( 2 +1) exp − 2 (Y − Xβ) (Y − Xβ) + v0 s2 + vs2
                                                                        0                                (5.63)
                                             2σ
    And this all yields:



                                      β|y, σ 2       N (β1 , V1 )                                        (5.64)
                                      σ 2 |y, β      Inv − χ2 (v1 , s2 )
                                                                     1                                   (5.65)

where


                                                   1
                                 V1 = (V0−1 +        X X)−1                                              (5.66)
                                                  σ2
                                                       1
                                 β1   = V1 (V0−1 |β0 + 2 X Y )                                           (5.67)
                                                       σ
                                 v1   = n + v0                                                           (5.68)
                                        (Y − Xβ) (Y − Xβ) + v0 s2
                                 s2
                                  1   =                         0
                                                                                                         (5.69)
                                                      v1

                                                     50
5. Regression Analysis


    The fact that the posterior distribution has an unknown form affects to the prediction for y ∗ ,
p(y ∗ |y), too. As it has been already said for the posterior predictive in Bayesian Approach chapter,
the interest is on p(y ∗ |y, β, σ 2 ). Since y and y ∗ are independent of one another,


                                         p(y ∗ |y, β, σ 2 ) = p(y ∗ |β, σ 2 )                      (5.70)

    And hence

                                                T
                            ∗    2       (σ 2 ) 2            1
                         p(y |β, σ ) =              exp −        (y ∗ − X ∗ β)(y ∗ − X ∗ β)        (5.71)
                                         (2π)
                                                T
                                                2           2σ 2
    As the analytical solution of the integral of this figure is not trivial, the importance of Gibbs
sampler arises again, and, combine it with the Monte Carlo integration, any posterior and predictive
inference can be done. The strategy consists of getting draws y ∗(s) drawing from p(y ∗ |β s , σ 2(s) )
using the draws β (s) , σ 2(s) which were obtained for the posterior distribution. Then using these draws
y ∗(s) in the Monte Carlo integration the mean and the variance can be calculated.


5.6 Normal Linear Regression Model with Heteroscedasticity and Cor-
        relation

Until now the variances have been supposed to be equal and having no correlation, but this is not very
realistic. In this section we are going to relax that assumption and to consider the next model:


                                                     Y = Xβ +                                      (5.72)

where


                                                          N (0, Σ)                                 (5.73)

    That is, we are considering heteroscedasticity and correlation. According to [Koop03], since Σ is
a positive definite matrix, a matrix P can be found that verifies P ΣP = I, and it can be shown that


                                                    Y ∗ = X ∗β +       ∗
                                                                                                   (5.74)

where

                                                      ∗
                                                          (0, σ 2 I)                               (5.75)


                                                          51
5. Regression Analysis


and



                                                Y ∗ = PY                                           (5.76)
                                                X∗ = P X                                           (5.77)
                                                  ∗   =P                                           (5.78)

      Then, the likelihood function to consider now is:


                                 1       2 −p        1        ˆ                ˆ
           p(Y |β, σ 2 , Σ) =       n (σ ) 2 exp −       (β − βΣ) X Σ−1 X(β − βΣ) ×
                              (2π) 2                2σ 2
                                      v      −vsΣ−2
                            × (σ 2 )− 2 exp                                                        (5.79)
                                              2σ 2

where:



                                    v = n−p                                                        (5.80)
                                    ˆ
                                   βΣ = (X ∗ X ∗ )−1 X ∗ Y ∗                                       (5.81)
                                                    ˆ              ˆ
                                         (Y ∗ − X ∗ βΣ) (Y ∗ − X ∗ βΣ)
                                s2 (Σ) =                                                           (5.82)
                                                        v
which is very similar to that use with equal variances.


      Using the prior distributions described in the previous section, we have:


                                      p(β, σ 2 , Σ) = p(β)p(σ 2 )p(Σ)                              (5.83)

where β is normally distributed with prior parameters β0 , V0 and σ 2 is an scaled inverse Chi-square
with parameters v0 and s2 .
                        0



      Hence, knowing that the posterior distribution is proportional to the prior times the likelihood:


                                         1 (Y ∗ − X ∗ β) (Y ∗ − X ∗ β)
p(β, σ 2 , Σ|Y ) ∝ p(Σ) × exp −                                        + (β − β0 ) V0−1 (β − β0 ) ×
                                         2             σ2
                                 n+v0           v0 s2
                   ×     (σ 2 )−( 2 +1) exp − 2     0
                                                                                                 (5.84)
                                                 2σ


                                                      52
5. Regression Analysis


    This suggests a Normal distribution for the posterior conditional for β and an scaled inverse Chi-
square for the posterior conditional for σ 2 , as occurred before. Therefore:



                                    β|Y, σ 2 , Σ       N (β1 , V1 )                               (5.85)
                                    σ 2 |Y, β, Σ       Inv − χ2 (v1 , s2 )
                                                                       1                          (5.86)

where


                                                 X Σ−1 X −1
                              V1 = (V0−1 +                )                                       (5.87)
                                                    σ2
                                                              ˆ
                                                     X Σ−1 X βΣ
                              β1     = V1 (V0−1 β0 +        2
                                                                )                                 (5.88)
                                                          σ
                               v1    = n + v0                                                     (5.89)
                                       (Y − Xβ) Σ (Y − Xβ) + v0 s2
                               s2
                                1    =                            0
                                                                                                  (5.90)
                                                       v1
    According to [Koop03], the posterior conditional for Σ yields:


                                                   1        1
                 p(Σ|Y, β, σ 2 ) ∝ p(Σ)|Σ|− 2 exp −            (Y − Xβ) Σ−1 (Y − Xβ)              (5.91)
                                                            σ2
    So we have come to the point where the form that Σ takes is crucial.


5.6.1 Heteroscedasticity

Let us suppose we suspect that there is not correlation among the errors but their variances are differ-
ent. Hence, we will have n variances ωi for n errors i .


    It could be that the researcher has an idea of the form of Σ and assumes that


                            ωi = h(xi , α) = (1 + α1 xi1 + · · · + αp xip )2                      (5.92)

    That is, the variances are related to some or all independent variables. The researcher should
choose a prior for α, and then, Bayesian inference can be carried out through a Metropolis-Hastings
algorithm such as Random walk.


    If the researcher knows that the error variances are different but has not idea of their form, then a
prior for Σ has to be chosen. According to [Koop03]:

                                                       53
5. Regression Analysis



                                                                      n
                                                         p(Σ) =           p(ωi )                             (5.93)
                                                                  i=1
where


                                                    ωi     Inv − χ2 (vw , 1)                                 (5.94)

    But now a hyper-prior distribution should be fixed for vw , such as


                                                    p(Σ) = p(Σ|vw )p(vw )                                    (5.95)

    That is, we are using a hierarchical prior to treat the heteroscedasticity. According to [Gelm04] a
Metropolis-Hastings algorithm can be used to draw posterior simulations.


5.6.2 Correlation

Now, let us assume that there is some correlation among the errors through the time- space relationship
such as the error in one period depends on that in the previous period. This is a type of regression
called autoregressive, and it can be considered a time series. For example, if we are considering
the relation among the Ibex 35 values one day and the previous ones, we could say that there is a
correlation between errors that exists in the relation among Fridays and previous days and what exists
in the relation among the values on Thursdays or Wednesdays or Tuesdays and the previous days.
That is:


                                        t   = ρ1   t−1   + ρ2   t−2   + · · · + ρp   t−p   + ut              (5.96)

where


                                                          ut     N (0, σ 2 )                                 (5.97)

    We will consider that there is stationary. This means, in a general way, that the probability dis-
tribution does not vary through the time. Some time series does not seem to be stationary, but the
differences do. The main difference to take into account is the first one mentioned. The first differ-
ence of    t   is δ   t   and it indicates the variation in among the periods t and t − 1, t − 2 ,. . . , t − p.


    According to [Koop03], the irregular component ut can be formulated in the following way:


                                                           ρ(L)   t   = ut                                   (5.98)

                                                                 54
5. Regression Analysis


where L is called the lag operator and has the property that L      t   =   t−1   and ρ(L) = (1 − ρ1 L − · · · −
ρp   Lp ).


      So, if we have the regression model:


                                               Yt = Xt β +   t                                           (5.99)

      Then, it is possible to find a model such as


                                     Yt∗ = Xt β + ut , ut
                                            ∗
                                                             N (0, σ 2 )                                (5.100)

where



                                              Yt∗ = ρ(L)Yt                                              (5.101)
                                              ∗
                                             Xt = ρ(L)Xt                                                (5.102)

      Therefore, using an independent Normal scaled inverse chi-square prior for β and σ 2 , it yields:



                                    β|Y, σ 2 , ρ     N (β1 , V 1)                                       (5.103)
                                    σ 2 |Y, β, ρ     Inv − χ2 (v1 , s2 )
                                                                     1                                  (5.104)

where


                                                X ∗ X ∗ −1
                               V1 = (V0−1 +            )                                                (5.105)
                                                  σ2
                                                     X∗ Y ∗
                               β1   = V1 (V0−1 β0 +         )                                           (5.106)
                                                       σ2
                               v1   = v0 + T − p                                                        (5.107)
                                      (Y ∗ − X ∗ β) (Y ∗ − X ∗ β) + v0 s2
                               s2
                                1   =                                   0
                                                                                                        (5.108)
                                                       v1
      And now, as it occurred with heteroscedasticity, a prior should be selected for ρ. Let us choose a
multivariate Normal subject to the constraint ρ ∈ φ, where φ is the stationary region. Then,



                                          p(ρ)       N (ρ0 , Vρ0 )1(ρ ∈ φ)                              (5.109)
                                 p(ρ|Y, β, σ 2 )     N (ρ1 , Vρ1 )1(ρ ∈ φ)                              (5.110)

                                                    55
5. Regression Analysis


where ρ0 and Vρ0 are the prior parameters which the research should establish and ρ1 and Vρ1 are the
posterior parameters with the next relation:


                                            −1       E E −1
                                   Vρ1 = (Vρ0 +          )                                     (5.111)
                                                      σ2
                                                   −1      EE
                                     ρ1   = Vρ1 (Vρ0 ρ0 + 2 )                                  (5.112)
                                                           σ
where E is a matrix with the errors through the time from t − 1 to t − p for each independent variable.


    According to [Koop03], a Gibbs sampler can be used to draw posterior simulations.


5.7 Models Summary

Since the main models to be used in the posterior application are those homoscedastic and not auto
correlated, the main ideas are shown in Tables 5.5, 5.6, 5.7 and 5.8.




                                                  56
5. Regression Analysis




                Case                                   β                            σ2                                  Joint Prior Distribution



     p(β, σ 2 ) = p(β|σ 2 )p(σ 2 )           β|σ 2   N (β0 , σ 2 V0 )      σ2   Inv − χ2 (v0 , s2 )
                                                                                                0          β, σ 2        N − Invχ2 (β0 , V0 s2 ; v0 , s2 )
                                                                                                                                             0         0



     p(β, σ 2 ) = p(β|σ 2 )p(σ 2 )   β|σ 2      N (β0 , σ 2 V0 )1(β ∈ A)   σ2   Inv − χ2 (v0 , s2 )
                                                                                                0     β, σ 2      N − Invχ2 (β0 , V0 s2 ; v0 , s2 )1(β ∈ A)
                                                                                                                                      0         0




57
      p(β, σ 2 ) = p(β)p(σ 2 )                  β    N (β0 , V0 )          σ2   Inv − χ2 (v0 , s2 )
                                                                                                0              β, σ 2     N − Invχ2 (β0 , V0 ; v0 , s2 )
                                                                                                                                                     0



      p(β, σ 2 ) = p(β)p(σ 2 )          β       N (β0 , V0 )1(β ∈ A)       σ2   Inv − χ2 (v0 , s2 )
                                                                                                0      β, σ 2      N − Invχ2 (β0 , V0 ; v0 , s2 )1(β ∈ A)
                                                                                                                                              0



                                                        Table 5.5: Main Prior Distributions Summary
Case                                             Joint Posterior Distribution                                     Key
                                                                                                                                                                5. Regression Analysis




                                                                                                                                 Obtain Margin Distribu-
     p(β, σ 2 |y)   =   p(y|β, σ 2 )p(β|σ 2 )p(σ 2 )             β, σ 2       N − Inv −         χ2 (β1 , V1 s2 ; v1 , s2 )
                                                                                                             1         1         tions, Draw directly from
                                                                                                                                 them and summarize

                                                                                                                                 Obtain     Margin    Distri-
                                                                                                                                 butions,   Draw     directly
     p(β, σ 2 |y) = p(y|β, σ 2 )p(β|σ 2 )p(σ 2 )            β, σ 2       N − Inv − χ2 (β1 , V1 s2 ; v1 , s2 )1(β ∈ A)
                                                                                                1         1
                                                                                                                                 from them, discard invalid
                                                                                                                                 draws and summarize




58
                                                                                                                                 Obtain Conditional Distri-
                                                               1         (Y −Xβ) (Y −Xβ)
      p(β, σ 2 |y) = p(y|β, σ 2 )p(β)p(σ 2 )           ∝ exp − 2               σ2
                                                                                            + (β − β0 ) V0−1 (β − β0 )       ×   butions, Draw with Gibbs
                                                                                                                                 Sampler and summarize
                                                                                     n+v0              v s2 +vs2
                                                                                      2
                                                                                          +1)             0
                                                                     ×    (σ 2 )−(              exp   − 0 2σ2

                                                                                                                                 Obtain Conditional Distri-
                                                                         (Y −Xβ) (Y −Xβ)                                         butions, Draw with Gibbs
      p(β, σ 2 |y) = p(y|β, σ 2 )p(β)p(σ 2 )           ∝ exp − 1
                                                               2               σ2
                                                                                            + (β − β0 ) V0−1 (β − β0 )       ×
                                                                                                                                 Sampler, discard invalid
                                                                             n+v0
                                                                                                                                 draws and summarize
                                                                                  +1)             v0 s2 +vs2
                                                                                                      0
                                                              × (σ 2 )−(      2         exp −                    1(β ∈ A)
                                                                                                      2σ 2



                                                       Table 5.6: Main Posterior Distributions Summary
5. Regression Analysis




                       Case                        Prior Parameters   Posterior Parameters                       Relation



                                                         β0                   β1                                            ˆ
                                                                                                     β1 = V1 (V0−1 β0 + X X β)
                                                         V0                   V1                       V1 = (V0−1 + X X)−1
     p(β, σ 2 |y) = p(y|β, σ 2 )p(β|σ 2 )p(σ 2 )
                                                         v0                   v1                               v1 = v0 + n
                                                                                                                 ˆ                    ˆ
                                                                                                    v0 s2 +vs2 +(β−β0 )[V0 +(X X)−1 ](β−β0 )
                                                         s2                   s2                        0
                                                          0                    1             s2 =
                                                                                              1                         v1




59
                                                                                                                             X Y
                                                         β0                   β1                      β1 = V1 (V0−1 β0 +      σ2
                                                                                                                                 )
                                                                                                                         X X −1
                                                         V0                   V1                       V1 = (V0−1 +       σ2
                                                                                                                             )
      p(β, σ 2 |y) = p(y|β, σ 2 )p(β)p(σ 2 )
                                                         v0                   v1                                v1 = v0 + n
                                                                                                              v0 s2 +(Y −Xβ) (Y −Xβ)
                                                                                                                  0
                                                         s2
                                                          0                   s2
                                                                               1                     s2
                                                                                                      1   =              v1



                                               Table 5.7: Prior and Posterior Parameters Summary
Case                                 p(y ∗ |y, β, σ 2 )                     Key                      Constraint
                                                                                                                                                 5. Regression Analysis




                                                                                         Obyain draws y ∗ from p(y ∗ |y, β, σ 2 )
                                                                                         using previous draws from posterior
     p(y ∗ |y) =    p(y ∗ |y, β, σ 2 )p(β|σ 2 , y)p(σ 2 |y)dβdσ 2     N (β, σ 2 )                                                      No
                                                                                         simulation. Use Monte Carlo integra-
                                                                                         tion to get predictive inferences.

                                                                                         Obyain draws y ∗ from p(y ∗ |y, β, σ 2 )
                                                                                         using previous draws from posterior
     p(y ∗ |y) =    p(y ∗ |y, β, σ 2 )p(β|σ 2 , y)p(σ 2 |y)dβdσ 2     N (β, σ 2 )                                                      Yes
                                                                                         simulation. Use Monte Carlo integra-
                                                                                         tion to get predictive inferences.




60
                                                                                         Obyain draws y ∗ from p(y ∗ |y, β, σ 2 )
                                                                                         using previous draws from posterior
      p(y ∗ |y) =     p(y ∗ |y, β, σ 2 )p(β|y)p(σ 2 |y)dβdσ 2         N (β, σ 2 )                                                      No
                                                                                         simulation. Use Monte Carlo integra-
                                                                                         tion to get predictive inferences.

                                                                                         Obyain draws y ∗ from p(y ∗ |y, β, σ 2 )
                                                                                         using previous draws from posterior
      p(y ∗ |y) =    p(y ∗ |y, β, σ 2 )p(β, y)p(σ 2 |y)dβdσ 2         N (β, σ 2 )                                                      Yes
                                                                                         simulation. Use Monte Carlo integra-
                                                                                         tion to get predictive inferences.


                                        Table 5.8: Main Posterior Predictive Distributions Summary
Chapter 6

Symbolic Data

6.1 What is symbolic data analysis?

Nowadays there are more and more data which are susceptible to be analyzed and studied. The tech-
nological advances let us get huge quantities of information about a specific variable. But part of
that information is lost due to the fact that standard statistical methods do not have the flexibility to
manage such quantity of information. For example, let us assume we are studying the evolution of
stock prices for an enterprise. At the end of each month we would have the different values that the
stock has been taking daily. It seems reasonable to think that the researcher would take only the daily
close prices, or the daily mean prices, but he would not manage all the gathered information.


    The symbolic data analysis (SDA) deals with this problem and let us analyse vast information ef-
ficiently in order to extract the required knowledge and to represent it better. Going on with the same
example, the symbolic data will let the engineer manage the daily maximum and minimum prices
of a month, or manage a histogram for monthly prices and work with them. In this way, SDA com-
plements other statistical tools which are widely used, such as candlesticks. More information about
candlesticks and other interesting tools can be found in [Lee 06] and [Irpi05]. For instance, Figure 6.1
illustrates an interval time series for the daily maximum and minimum Ibex 35 values in January 2006.


    So, the possibilities with symbolic data are evident. For instance, let us think of an application
of this with warrants. Warrant is a right, without obligation, to buy, namely call warrant, or to sell,
namely put warrant, something at an agreed price, namely strike. So you could get a predicted stock
price range to decide the best put warrant or the most suitable call warrant, and obtain more benefits.


                                                  61
6. Symbolic Data




                                   Figure 6.1: Interval time series




    Regarding to the aggregation method used by SDA lies the notion of a symbolic object. This is a
mathematical model of a concept (see [Dida95]) which, basically, let us select some individuals from
a group. Going further with SDA and according to [Bill06a], three main kinds of symbolic data can
be considered: multi-valued, interval-valued and modal- valued.


    As far as the former is concerned, a multi-valued symbolic random variable Y is one whose pos-
sible value takes one or more values from the list of values in its domain Y. The complete list of
possible values in Y is finite, and values may be well- defined categorical or quantitative values.


    For example, let us have all the companies which have formed the Ibex 35 index since its be-
ginning. Then we could define a variable Y = blue chips in the Ibex 35 having 15 observations
wu = year. Thus, we have, for instance, that during the first year, in 1992 (wu = w1 ), Telef´ nica,
                                                                                            o
Repsol, Endesa, SCH and BBVA were considered to be the blue chips. In 2007 (wu = w1 ) Santander,
Telef´ nica, BBVA, Endesa and Repsol YPF are considered to be the blue chips.
     o


    Likewise, an interval-valued symbolic random variable Y is one that takes values in an interval.


                                                 62
6. Symbolic Data




                   wu    Year                   Z = Blue chips in Ibex 35



                   w1    1992          {Telef´ nica, Repsol, Endesa, SCH, BBVA}
                                             o

                    .
                    .      .
                           .                                      .
                                                                  .
                    .      .                                      .


                   w15   2007    {Telef´ nica, Repsol YPF, Endesa, BBVA, Santander}
                                       o


                                  Table 6.1: Multivalued Data Example



That interval can be closed or open at either end. This is very important in SDA; furthermore, it can
extract the tendency of centralization and dispersion of a dataset. Let us recall the example of the
daily stock prices for a company in a month. This information can be recorded as the daily maximum
and minimum values during the month. As this is one of the most interesting types of symbolic data
for our purpose, we will take it up again below.


    Finally, let a random variable Y take possible values {ηk : k = 1, 2, . . . } over a domain Y. Then,
a modal valued outcome is that formed with the value ηk and an associated measure πk . This last
one is usually a weight, probability, relative frequency, and the like. But it can also be capacities,
necessities, possibilities, credibilities and related entities.


    Then modal multi-valued variable can be defined now. This is a variable whose observed outcome
takes values that are a subset of the domain with its respective measure. For example we could define
a variable Z = Importance of the companies in the Ibex 35 index. Thus, for instance, we have that the
most important company in 1992 was Telef´ nica and now Santander is currently the company with
                                        o
highest weight in the index in 2007.


    Another example: let us suppose we define a variable Y = Maximum daily stock price for enter-
prises in the Continuous Spanish Stock Market. We could have for the enterprise Endesa:




                                                      63
6. Symbolic Data




 wu        Year                 Z = Importance of a Ibex 35 company



                   {Telef´ nica, 13.7; Repsol, 9.7; Endesa, 9.2; SCH, 8.0, BBVA,
                         o
                   7.2; Iberdrola, 6.9; Santander, 5.9; Banco Popular, 3.8; Banesto,
                   3.6; Banco Exterior, 3.0; Cepsa, 2.5; Tabacalera, 2.4; Acesa,
                   2.1; Uni´ n FENOSA, 2.0; Gas Natural, 1.9; Sevillana de Elec-
                           o
                   tric, 1.8; Fuerzas E. Catalua, 1.7; Bankinter, 1.6; Dragados,
 w1        1992
                   1.4; Aguas de Barcelona, 1.3; Mapfre, 1.3; Asland, 1.2;FCC,
                   1.1; Portland Valderribas, 1.0; Hidrocantbrico, 0.8;Vallehermoso,
                   0.8; Metrovacesa, 0.8; Acerinox, 0.7; Viscofn, 0.6; Cubiertas y
                   MZOV, 0.5; Sarri´ , 0.4; Uralita, 0.4; Huarte, 0.3; Urbis, 0.3;
                                   o
                   Agromn, 0.2}

   .
   .         .
             .                                      .
                                                    .
   .         .                                      .

                   { Telef´ nica, 16.0; Repsol YPF, 5.9; Endesa, 7.5; BBVA, 13.0;
                          o
                   Iberdrola, 5.6; Santander, 17.2; Banco Popular, 3.4; Banesto,
                   0.5; Uni´ n FENOSA, 1.8; Gas Natural, 1.5; Bankinter, 0.9; Cor.
                           o
                   Mapfre, 0.7; FCC, 1.2; Sacyr Vallehermoso, 1.0; Metrovacesa,
 w15       2007    0.5; Acerinox, 1.0; Inditex 3.0; ACS Const, 2.9; B. Sabadell, 2.1;
                   Altadis, 2.0; Abertis A, 2.0; G. Ferrovial, 1.6; Acciona, 1.4; FCC,
                   1.2; Gamesa, 1.0; Enags, 0.8; REE, 0.8, Cintra, 0.7; Agbar, 0.7;
                   Telecinco, 0.6; Iberia, 0.5; Indra A, 0.5; Fadesa, 0.5; Sogecable,
                   0.4; Antena 3 TV, 0.4; NH Hoteles, 0.4}


                                  Table 6.2: Modal-multivalued Example




        Y (Endesa) = {38.7, 0.125; 38.75, 0.125; 38.8, 0.250; 38.85, 0.250; 38.9, 0.125; 39, 0.125}

       This means that we assign a probability of 0.125 to the possibility of that Endesa maximum daily
price is 38.7, 0.125 to the possibility of that Endesa maximum daily price is 38.75, a probability of

                                                        64
6. Symbolic Data


0.25 to the possibility of that Endesa maximum daily price is 38.8 and so on.


    Another very interesting variant of this type are modal interval-valued variables. That is, instead
of a value with a probability, the variable can take any value in an interval with a probability. Contin-
uing with the previous example:


            Y (Endesa) = {[38.7, 38.75), 0.125; [38.75, 38.85), 0.125; [38.8, 38.9), 0.25}

    For more information and other types of data, the reader is referenced to [Bill06a], [Huiw06] and
[Arro06].


6.2 Interval-valued variables

As it has been already mentioned, to summarize a dataset is one of the three possible sources or rea-
sons from which the interval data may result. According to [Huiw06], there are other two sources:
the imprecision of measurement and the expert’s knowledge including uncertainty.


    Now, suppose u ∈ E is the set of m symbolic objects with observations Y (u) with u = 1, . . . , m.
Let us suppose we are interested in the particular random variable Yj ≡ Z, and that the realization
of Z for the observation wu is the interval Z(wu ) = [au , bu ] = ξ. Then, according to [Bill06a], the
empirical density function of Z is

                                             1          Iu (ξ)
                                   f (ξ) =                     ,     ξ∈R                            (6.1)
                                             m          Z(u)
                                                 u∈E

where Iu (·) indicates if ξ is or is not in the interval Z(u) and where Z(u) is the length of that
interval.


    Likewise, it can be shown that the symbolic empirical mean is given by

                                          ¯   1
                                          Z=                (bu + au )                              (6.2)
                                             2m
                                                   u∈E

    and the symbolic empirical variance is given by

                                                                                            2
                      2  1                                      1
                    S =              b2
                                      u   + bu au +    a2
                                                        u    −                 (bu + au )       .   (6.3)
                        3m                                     4m2
                               u∈E                                       u∈E



                                                       65
6. Symbolic Data


    These formulas are coherent with the hypothesis of uniformity into the intervals. As well as the
symbolic mean can be understood intuitively as the centre of gravity, the symbolic variance is not so
easy to be understood. In fact, it would seem more reasonable to formulate the variance as:

                                                                                                 2
                              2  1                            12
                            S =                 (bu + au ) −                        (bu + au )       .          (6.4)
                                4m                           4m2
                                          u∈E                                 u∈E

    That is, the variance of the midpoints. But this last formulation does not take into account the
internal variation of the intervals, while the former does and, hence, this is higher.


    For example, let us consider the maximum and minimum points for the Ibex 35 during December
2006.


    Then, according to what said above, the mean point in that month was:

                                    ¯   1
                                    Z=                (highu + lowu ) = 14116.
                                       38
                                                u∈E

    And the empirical symbolic variance is:

                                                                                                 2
                   2 1                                                1
                S =                 b2
                                     u    + bu au ) +     a2
                                                           u       −                (bu + au )       = 28006.
                    3m                                               4m2
                              u∈E                                          u∈E

    If we had calculated the variance taking only the midpoints the result would have been:

                                                                                           2
                        2   1                         2 1
                       S =                (bu + au ) −                        (bu + au )       = 26023
                           4m                          4m2
                                    u∈E                                 u∈E

which is lower than that obtained previously because it does not take into consideration the internal
variation of the intervals.


    Although it seems that everything related to interval-valued data are advantages, according to
[Huiw06], there are two major limitations when applying multivariate analysis on an interval dataset.
The first one is that the computing work is hard, and the second one is that the hyperrectangle may
enlarge the range of the original dataset and reduce analysis accuracy.


    The methodology of interval data applied to multivariate analysis involved transforming symbolic
data matrix into numerical matrix. That is, to reduce p-dimensional observations into s-dimensional


                                                                   66
6. Symbolic Data


components (where usually s         p). This is called Principal Component Analysis. There are two
main methods that carry out this: the Vertices Method and the Centres Method. The former consists
of getting a matrix with 2p rows and p columns, from a hyperrectangle in the p−dimensional space,
where each row contains the coordinates of one vertex of hyperrectangle in Rp . On the other hand,
the latter deals with the idea of the average value of every variable for each category of data. A more
extended review of these two methods can found in [Bill06a]. [Huiw06] point up some limitations of
these methods and propose a new type of symbolic data: factor interval data. Due to the fact that the
symbolic data is a wide field, the reader is referenced to all the above citations.


6.3 Classical regression analysis with Interval-valued data

Regarding to classical multiple regression there are three current approaches to be considered, though
one of them is just a regression fit. Let us begin with the most intuitive to finish with the most con-
ceptual.


    Due to the fact that now we have intervals instead of single values, it would be natural to take
midpoints and to proceed as it was done with multiple classical regression. That is, to use the result
to make new predictions from a new interval applying it to each extreme of the interval. Moreover,
[DeCa05] remark the need of establishing the constraint βi ≥ 0 to ensure that the lower extreme of the
predicted interval is lower than the higher extreme, and suggest the algorithm presented by [Laws74]
to solve such constraint. We suggest the alternative of getting enough draws from the posterior distri-
bution of β, discard those who are negative and average.


    Let us recall the same example shown in the classical multiple regression, but taking now the
maximum and minimum values that Ibex 35, Dow Jones, FTSE 100 and DAX took in the first ten
months in 2006. We would take the midpoints of those intervals and we would obtain the same result
that we got in the classical multiple regression:



     IBEX35t = 1.0102DowJonest−1 − 2.0144F T SE100t−1 + 2.1229DAXt−1 +                     t


where



                                         t     N     0, 332.712

                                                    67
6. Symbolic Data


    We could use this model to predict a new observation for November, 1st , applying it to each
extreme of the intervals:



  max (IBEX35t ) = 1.0102 × 12161 − 2.0144 × 6149.9 + 2.1229 × 6289.7 = 13242.41
  min (IBEX35t ) = 1.0102 × 11986.84 − 2.0144 × 6110.9 + 2.1229 × 6237.55 = 13040.7

    So the prediction would be: [13040.7, 13242.41].


    A disadvantage of this approach is that it does not take into account the interval length.


    To solve that problem, [DeCa05] and [DeCa04] suggest another regression for the interval range.
They refer to this new approach as the constrained centre and range method (CCRM). In that case
the constraint is applied to the interval range regression instead of to the centres regression. We will
employ the radii instead of the ranges. So, going on with the previous example, we would have the
radios for the different indexes to build the next model:



                RadioIBEX35t   = 0.35RadioDowJonest−1 + 0.484RadioF T SE100t−1 +
                               + 0.272RadioDAXt−1 +         t


where



                                         t       N (0, 26.312 )

    With this new approach, the prediction could be calculated from the midpoint and the range of
the interval:



M idpointIBEX35t       = 1.0102 × 12073.65 − 2.0144 × 6130.4 + 2.1229 × 6262.125 = 13141.3
     RadioIBEX35t      = 0.35 × 86.81 + 0.484 × 19.5 + 0.272 × 24.575 = 46.53

    Now the prediction would be: [13094.75, 13187.81].


    Finally, the last approach is the use of the symbolic mean, the symbolic variance and the sym-
bolic covariance to make the regression. This means that a symbolic regression is used instead of the

                                                  68
6. Symbolic Data


classical regression. For this new approach another way of estimating is needed.


    Recall the classical univariate multiple regression model:


                                 Y = β0 + X1 β1 + · · · + Xp βp +                             (6.5)

where



                                               N (0, σ 2 )                                    (6.6)

    Calculating the mean values we have:

                                 ¯        ¯               ¯
                                 Y = β0 + X1 β1 + · · · + Xp βp + ¯                           (6.7)

from which it can be easily deduced that:


                                            β0 = 0 ⇒ ¯ = 0                                    (6.8)

    This means that the mean error is zero if there is a constant term in the model. This is a very
important point for the posterior consequences.


    Then we can obtain an equivalent model as:

                         ¯             ¯                    ¯
                     Y − Y = β0 + X1 − X1 β1 + · · · + Xp − Xp βp +                           (6.9)
              ¯                                       ¯
    where Y − Y is the new dependent variable and X − X is the new matrix of independent variables.


    β can be estimated in the following way:

                                            ˆ    −1
                                            β = SXX SXY                                      (6.10)

where                                                                       
                                var (X1 ) cov (X1 , X2 ) . . . cov (X1 , Xp )
                                                                            
                          cov (X1 , X2 )   var (X2 )    . . . cov (X2 , Xp )
                                                                            
                   SXX   =      .              .        ..          .       
                                .
                                 .              .
                                                .            .       .
                                                                     .       
                                                                            
                           cov (X1 , Xp ) cov (X2 , Xp ) . . .   var (Xp )
    and

                                                   69
6. Symbolic Data



                                                                   
                                                   cov (X1 , Y )
                                                           
                                              cov (X2 , Y )
                                                           
                                       SXY   =     .       
                                                   .
                                                    .       
                                                           
                                               cov (Xp , Y )
where independent term is not being taken into account (so there is no column of ones in matrix X).


    The independent term β0 is estimated in the following way:

                                                        p
                                         ˆ    ¯
                                         β0 = Y −              ¯
                                                            βj Xj                                (6.11)
                                                     j=1

    With this new way, the symbolic variance, the symbolic covariance and the symbolic mean for
interval- valued variables can be used to estimate β.


    But this approach has the limitation that to be able to employ the symbolic statistics and the last
way of estimating it is necessary to introduce the independent term in the regression model. In fact,
the most important point is that this last approach suggested by [Bill06a] is just a regression fit since
they do not defined any residual term for symbolic data.


6.4 Bayesian regression analysis with Interval-valued data

Once we know how the interval-valued data can be employed in the classical regression, let us see
how this could be included in the Bayesian approach. For such purpose we will employ the CCRM
proposed by [DeCa05].


    According to what has been said above and in Bayesian Regression, there is nothing new to be
done. The problem is reduced to two Bayesian regressions: one for the centres and another for the
radios with a constraint applied to . As we saw in Bayesian Regression, the constraint is much easier
to be incorporated into the Bayesian approach than into the classical one.


    So, introducing the Bayesian approach into the regression with symbolic data, the engineer will
be able to incorporate more information to the problem that he could do with Bayesian regression and
traditional data. This is due to the fact that now two regressions are being made, and the expert will
be able to establish if the centres value will increase or decrease and the same for the radios. In this

                                                   70
6. Symbolic Data


sense, an opinion like:


    ’I think that Dow Jones will have less importance over Ibex 35 and DAX will have more relevance
than they have had until now, and there will be more volatility.’


would mean that, for instance, the prior mean for the Dow Jones midpoint will be less than the
indicated by the data. On the contrary, the prior mean for the DAX midpoint would be greater, as it
would occur with the prior means for the radios.




                                                   71
Chapter 7

Results

To show the usefulness of the Bayesian Centre and Radius approach proposed in this project, exper-
iments with real symbolic interval-valued data sets fitting a linear regression model together with a
data set from Spanish Continuous Stock Market are considered in this section.


7.1 Spanish Continuous Stock Market data sets

We have considered two situations in the Spanish Continuous Stock Market. On one hand, we have
used the monthly minimum and maximum prices of BBVA and BSCH from January 2000 to June
2007 in order to show how the classical regression approach applied to interval- valued data can be
improved through the Bayesian Centre and Radius approach when the variables are directly related.
This will let us see other advantages of the proposed approach over the classical regression with sin-
gles values.


    On the other hand, we have taken the daily minimum and maximum prices of others two Spanish
Continuous Stock Market companies such as Dogi and Zardoya from January 2006 to December 2006
in order to show that the Bayesian Centre and Radius approach is better than other approaches even
when the variables are not related; that is, they are uncorrelated.


7.2 Direct Relation between Variables

In this case 66 of the total 89 months will be applied to the training set. The other 23 months will be
applied to the testing set.



                                                   72
7. Results


    Let us begin with the classical regression approach applied to the midpoints of the monthly mini-
mum and maximum prices that BBVA and BSCH took in the considered training period. These data
yields the next model:


                     BSCHM idpoint = 1.3008 + 0.6229 × BBV AM idpoint +

where


                                                  N 0, 0.52372

    Figures 7.1 and 7.2 show that this model fits well enough for both training and testing sets.

                         13


                         12


                         11


                         10


                          9


                          8


                          7


                          6


                          5
                              7   8     9    10   11             12          13   14     15    16   17
                                                       Midpoints BBVA prices




                 Figure 7.1: Classical Regression with single values in training test


    If we calculate the different error measures (Mean Error, Mean Absolute Error, Mean Square
Error and Root Mean Square Error) for each set, we obtain the Table 7.1. This means that it is a good
model, but we are only using the midpoints to fit new data when we have much more available data.
Therefore we are wasting information we have gathered. Actually, this can be seen graphically in
Figure 7.3. This figure makes oneself think that the model is not as good as it was believed before,
since there are too available information for a very simple result and one could expect more from
those data.
    Thus, other approach, known as Centre Method, could be considered by applying the obtained
model to each maximum and minimum price to get a predicted maximum and minimum price. This
provides the results displayed in Figures 7.4 and 7.5.
    According to [Bill00], the total deviation is given by:


                                      CenterM ethod2000        =        lower     +    upper             (7.1)

                                                           73
7. Results



                        14.5



                         14



                        13.5



                         13



                        12.5



                         12



                        11.5



                         11



                        10.5



                         10



                         9.5
                            13           14    15      16                17        18        19      20
                                                       Midpoints BBVA prices




                  Figure 7.2: Classical Regression with single values in testing test




                                   Set         ME        MAE                   MSE          RMSE



                                 Training       0       0.4208                 0.2660       0.5157


                                 Testing      0.2321    0.3831                 0.2446       0.4946


               Table 7.1: Error Measures for Classical Regression with single values



    The resulting error measures can be seen in Table 7.2.
    Now, we have a fitted interval-valued data for each observed interval- valued data. This approach
seems to take advantage of the extracted data.


    Now let us see the resulting error measures according to Centre method proposed by [Bill02],
where the sum of square errors is given by:

                                                                      n
                                                                                2           2
                                 SSECenterM ethod2002 =                         lower   +   upper         (7.2)
                                                                    i=1
    and, thus, the absolute mean error is given by:

                                                           74
7. Results



                        13



                        12



                        11



                        10



                         9



                         8



                         7



                         6



                         5



                         4
                             6       8       10                 12                  14        16          18
                                                  Minimum and Maximum BBVA prices




                    Figure 7.3: Classical Regression with interval- valued data


                        13


                        12


                        11


                        10


                         9


                         8


                         7


                         6


                         5


                         4
                             6       8       10                 12                  14        16          18
                                                  Minimum and Maximum BBVA prices




                             Figure 7.4: Centre Method (2000) in training set



                                                                     n
                                                                     i=1 (| lower |          +|    upper |)
                       M AECenterM ethod2002 =                                                                             (7.3)
                                                                                         n
    Table 7.3 shows that this new definition of the error does not improve very much the previous
one.
    However, let us compare these last approaches with the Centre and Radius Method. In this case
we will have the next model:



              BCSHM idpoint = 1.3008 + 0.6299 × BBV AM idpoint +                                               M idpoint


                                                            75
7. Results



                  15




                  14




                  13




                  12




                  11




                  10




                   9
                    12       13     14         15           16          17            18   19      20    21
                                                    Minimum and Maximum BBVA prices




                       Figure 7.5: Centre Method (2000) in testing set




                           Set            ME                MAE                   MSE           RMSE



                         Training          0               0.8416               1.0638          1.0314


                         Testing         0.4643            0.7663               0.9784          0.9891


                 Table 7.2: Error Measure for Centre Method (2000)


where


                                    M idpoint                     N (0, 0.52372 )

and


             BCSHRadius = 0.106 + 0.6188 × BBV ARadius +                                                 Radius

where

                                     Radius                    N (0, 0.14582 )

                                                              76
7. Results




                                    Set        ME                MAE                   MSE          RMSE


                                  Training      0               0.8917               0.5922         0.7695


                                  Testing     0.4643            0.7717               0.5125         0.7159


                        Table 7.3: Error Measure for Centre Method (2002)



    According to [DeCa07], the sum of squares of deviations is given by:

                                                                          n
                                                                                     2                    2
                       SSECentreRadiusM ethod =                                      M idpoint     +      Radius        (7.4)
                                                                        i=1

    Therefore, the mean absolute error is given by:

                                                    n
                                                    i=1 (| M idpoint |                 +|       Radius |)
                                    M AE =                                                                              (7.5)
                                                                               n


                         13



                         12



                         11



                         10



                          9



                          8



                          7



                          6



                          5



                          4
                              6           8         10                 12                  14        16            18
                                                         Minimum and Maximum BBVA prices




                        Figure 7.6: Centre and Radius Method in training set


    The results shown in Figures 7.6 and and in Table 7.4 clearly clearly show how the error measures
are less with Centre and Radius Method than with Centre Method, and thus, the former is better than


                                                                   77
7. Results



                         15




                         14




                         13




                         12




                         11




                         10




                          9




                          8
                           12       13     14         15           16          17            18   19      20    21
                                                           Minimum and Maximum BBVA prices




                        Figure 7.7: Centre and Radius Method in testing set




                                  Set            ME                MAE                   MSE           RMSE



                                Training          0               0.5233               0.2866          0.5353


                                Testing         0.1837            0.4712               0.2558          0.5058


                      Table 7.4: Error Measures for Centre and Radius Method


the latter.


    Now, let us take into consideration an expert’s knowledge about the Spanish Continuous Stock
Market and see the results of the Bayesian Centre and Radius Method. Obviously, the Bayesian
methodology is mainly useful in the testing set since it is there where there are unobserved data.


    Bearing in mind the previous Centre and Radius model, an expert could think that BSCH would
slightly get better respect to BBVA and assign the prior distribution seen in 5.36 with the following
prior parameters for the Midpoints:


                                                                     78
7. Results




                                                    1.3008
                                        β0 =
                                                       0.64
                                        V0 = 0.000000001
                                        s2 = 0.52372
                                         0

                                        v0 = 107

    Then the final Midpoints model would be:



                 BCSHM idpoint = 1.3008 + 0.64 × BBV AM idpoint +               M idpoint


    Let us assume that the expert would consider that the volatility would not vary and she or he
assign a vague prior parameters for the Radius distribution:


                                                       0.106
                                        β0 =
                                                    0.6188
                                        V0 = 106
                                        s2 = 0.14582
                                         0

                                        v0 = 4

    Then, the final Radius model would be:



                   BCSHRadius = 0.106 + 0.6188 × BBV ARadius +                  Radius


    The results for the testing set are shown in Figure 7.8 and in Table 7.5.
    This shows that the proposed Bayesian Centre and Radius Method improve all the previous ap-
proaches since let us manage more information than the classical regression and we obtain best results
than those obtained with the Centre and Centre and Radius methods.


7.3 Uncorrelated Variables

In this other case 170 of the total 255 days will be applied to the training set. The other days will be
applied to the testing set.



                                                  79
7. Results



                         15




                         14




                         13




                         12




                         11




                         10




                          9
                           12      13     14    15           16          17            18   19      20    21
                                                     Minimum and Maximum BBVA prices




                   Figure 7.8: Bayesian Centre and Radius Method in testing test




                                 Set           ME           MAE                   MSE            RMSE



                                Testing    0.0126         0.4409                0.1997           0.4469


                 Table 7.5: Error Measures in Bayesian Centre and Radius Method



    The classical regression with the midpoints of the prices range yields the next model:



                   DogiM idpoint = 5.6570 − 0.0806 × ZardoyaM idpoint +

where



                                                    N (0, 0.28822 )

    Figures 7.9 and 7.10 show that this model does not fits well for neither training nor testing sets.
    If we calculate the different error measures (Mean Error, Mean Absolute Error, Mean Square
Error and Root Mean Square Error) for each set, we obtain the Table 7.6.

                                                               80
7. Results



                        5



                       4.8



                       4.6



                       4.4



                       4.2



                        4



                       3.8



                       3.6



                       3.4



                       3.2



                        3
                         21     21.5   22     22.5   23          23.5          24    24.5   25     25.5   26
                                                       Midpoints Zardoya prices




                Figure 7.9: Classical Regression with single values in training test


                       4.1



                        4



                       3.9



                       3.8



                       3.7



                       3.6



                       3.5



                       3.4



                       3.3



                       3.2



                       3.1
                                22.4   22.6   22.8   23          23.2         23.4   23.6   23.8   24     24.2
                                                       Midpoints Zardoya prices




                Figure 7.10: Classical Regression with single values in testing test



    The Centre Method could be applied to get a predicted maximum and minimum price. This
method yields the next model:



                 DogiM idpoint = 5.6570 + 0.0792 × ZardoyaM idpoint +

where


                                                            81
7. Results




                                                                 Set         ME               MAE                    MSE      RMSE



                                                              Training         0             0.4231                0.2268     0.4763


                                                                Testing     -0.3518          0.3651                0.1642     0.4052


               Table 7.6: Error Measures for Classical Regression with single values




                                                                                   N (0, 7.21372 )

    Note that the slope has changed since, according to [DeCa04], it cannot be negative to ensure that
the fitted maximum is greater than the fitted minimum.


    This provides the results shown in Figures 7.11 and 7.12.

                                                          8




                                                          7
                        Minimum and Maximum Dogi prices




                                                          6




                                                          5




                                                          4




                                                          3




                                                          2
                                                           20          21     22           23              24            25    26      27
                                                                                    Minimum and Maximum Zardoya prices




                                                          Figure 7.11: Centre Method (2000) in training set


    Table 7.7 shows the resulting error measures.


                                                                                               82
7. Results



                          8



                         7.5



                          7



                         6.5



                          6



                         5.5



                          5



                         4.5



                          4



                         3.5



                          3
                           22            22.5            23                   23.5          24            24.5
                                                     Minimum and Maximum Zardoya prices




                           Figure 7.12: Centre Method (2000) in testing set


                                Set             ME            MAE                     MSE        RMSE


                           Training         -7.1288         7.1288                51.8315        7.1994


                               Testing      -8.0653         8.0653                65.1544        8.0718


                         Table 7.7: Error Measure for Centre Method (2000)



    It is very clear that this model is not accurate. This example evidences the main weak point of
this approach: the positive constraint imposed to the coefficients. This makes impossible having an
inverse relationship between variables. This is pointed out by the error measures, which are very high.


    Now let us see the resulting error measures according to Centre method proposed by [Bill02]:
    This new definition of the error improves the previous one.


    However, let us compare these last approaches with the Centre and Radius Method. In this case
we will have the next model:



                    DogiM idpoint = 5.6570 − 0.086 × ZardoyaM idpoint +

                                                                83
7. Results




                              Set        ME       MAE        MSE        RMSE


                           Training    -7.1288   7.1288    25.9183      5.0910


                            Testing    -8.0653   8.0653    32.5825      5.7081


                          Table 7.8: Error Measure for Centre Method (2002)


where



                                                 N (0, 0.28822 )

and



                        DogiRadius = 0.0283 + 0.08 × ZardoyaRadius +

where

                                                 N (0, 0.02592 )

      The results can be seen in Figures 7.13 and 7.14 and Table 7.9.



                               Set       ME        MAE       MSE        RMSE


                            Training       0      0.4385    0.2273      0.4768


                             Testing   -0.3426    0.3882    0.1655      0.4068


                       Table 7.9: Error Measures for Centre and Radius Method


      As it occurred with a direct relationship between variables, now the error measures are less with
Centre and Radius Method than with Centre Method, and thus, the former is better than the latter even

                                                   84
7. Results



                          5




                         4.5




                          4




                         3.5




                          3




                         2.5
                            20    21          22          23              24            25        26   27
                                                   Minimum and Maximum Zardoya prices




                        Figure 7.13: Centre and Radius Method in training set


                         4.4




                         4.2




                          4




                         3.8




                         3.6




                         3.4




                         3.2




                          3
                           22          22.5            23                   23.5             24        24.5
                                                   Minimum and Maximum Zardoya prices




                        Figure 7.14: Centre and Radius Method in testing set



when there is not a clear relationship.


    Now, let us see what happens introducing the Bayesian methodology. Bearing in mind the previ-
ous Centre and Radius model, an expert could think that the situation would change drastically and
assign the following prior parameters to the prior distribution explained in 5.36 for the Midpoints:




                                                              85
7. Results




                                                       3.1
                                         β0 =
                                                       0.02
                                         V0 = 10−8
                                         s2 = 0.28822
                                          0

                                         v0 = 106

    So the final Midpoint model would be:



                  DogiM idpoint = 3.1 + 0.02 × ZardoyaM idpoint +              M idpoint


    And the following prior parameters to the prior distribution for the Radii:


                                                    0.0283
                                        β0 =
                                                       0.08
                                        V0 = 106
                                        s2 = 0.02592
                                         0

                                        v0 = 4

    So the final Radius model would be:



                       DogiRadius = 0.0283 + 0.08 × ZardoyaRadius +

    The results for the testing set are shown in Figure 7.15 and Table 7.10.



                              Set       ME       MAE          MSE     RMSE


                            Testing   0.1031    0.2008       0.0443   0.2104


                 Table 7.10: Error Measures in Bayesian Centre and Radius Method




                                                  86
7. Results



                           4.4




                           4.2




                            4




                           3.8




                           3.6




                           3.4




                           3.2




                            3
                             22    22.5        23                   23.5        24   24.5
                                           Minimum and Maximum Zardoya prices




                  Figure 7.15: Bayesian Centre and Radius Method in testing set



    The Bayesian Centre and Radius Method results again better than the rest of approaches, even in
unfavourable conditions.


    Therefore, we can conclude that the Bayesian Centre and Radius method has the same advantages
as the Centre and Radius method as [DeCa07] describe, but it has also the great advantages of the
Bayesian methodology. All this makes obtaining less errors in new predictions. An important future
development could be to build a Bayesian symbolic regression model with uniformly distributed
errors.




                                                      87
Chapter 8

A Guide to Statistical Software Today

8.1 Introduction

Statistical software begins to blend in one direction with relational database software such as Oracle
or Sybase (software we do not discuss here) and with mathematical software such as MATLAB in
the other direction. Mathematical software exhibits not only statistical capabilities flowing from code
for matrix manipulation, but also optimization and symbolic manipulation useful for statistical pur-
poses. This chapter is an assessment of the state of the art of the statistical software arena as of 2007.
It attempts to touch upon a few commercial packages, a few general public license packages, a few
analysis packages with statistic add-ons, and a few general purpose languages with statistical libraries.


    We begin with the most important commercial packages such as SAS, Minitab, BMDP, SPSS
or S-PLUS, followed by some of the public licenses statistical and Bayesian software such as R or
BUGS and then some general purpose mathematical software and some general programming lan-
guage with statistical libraries.


    Finally, it is exposed the role of the developed application in the current statistical scene, remark-
ing the main advantages and disadvantages.




                                                   88
8. A Guide to Statistical Software Today


8.2 Commercial Packages

8.2.1 The SAS System for Statistical Analysis

SAS began as a statistical analysis system in the late 1960’s growing out of a project in the Depart-
ment of Experimental Statistics at North Carolina State University. The SAS Institute was founded in
1976. Since that time, the SAS System has expanded to become an evolving system for complete data
management and analysis. This means that SAS is really much more than a simple software system.
As an example of its great potential, it is worth to mention that it is used by the 90 percent of those
companies on the Fortune 500 list. This expansion is probably due to the fact that the SAS manage-
ment has aligned themselves with the recent ”statistical-like” advances within the computer science
community such as data mining. This clever integration of mathematical/statistical methodologies,
database technology, and business applications has helped propel SAS to the top of the commercial
statistical software arena.


    The architecture for the SAS approach is called the SAS Intelligence Platform, which is really a
closely integrated set of hardware/software components that allow users to fully utilize the business
intelligence (BI) that can be extracted from their client base. Among the products making up the SAS
System are products for: management of large data bases; statistical analysis of time series; statistical
analysis of most classical statistical problems, including multivariate analysis, linear models (as well
as generalized linear models), and clustering; data visualization and plotting. Being more precise, the
SAS Intelligence Platform consists of the following components:

    • The SAS Enterprise ETL Servers

    • The SAS Intelligence Storage

    • The SAS Enterprise BI Server

    • The SAS Analytic Technologies

    One of the strengths of SAS is the fact that the package which contains those capabilities that one
normally associates with a data analysis package is constantly being upgraded with each release in
order to reflect the latest and greatest algorithmic developments in the statistical field.


    The SAS System is available on PC and UNIX based platforms, as well as on mainframe com-
puters so it covers almost the main options, except Macintosh. As one could guess from what has

                                                   89
8. A Guide to Statistical Software Today


been said above, this system is aimed mainly to industrial, scientists and statisticians users with a
very high needs and knowledge and who do not care about spending time in learning process to use
this complex system.


    Some useful URL’s are:



    • http://www.sas.com/ which is the main URL for SAS

    • http://is.rice.edu/ radam/prog.html which contains some user-developed tips on using SAS

    Other statistical systems which are of the same general vintage as SAS are MINITAB, BMDP and
SPSS. All of these systems began as mainframe systems, but have evolved to smaller scale systems
as computing has evolved.



8.2.2 Minitab

Minitab Inc. was formed more than 20 years ago around its flagship product, MINITAB statistical
software. MINITAB Statistical Software provides tools to analyze data across a variety of disciplines,
and is targeted for users at every level: scientists, business and industrial users, faculty, and students.


    In relation to the operating system, MINITAB is available on the most widely-used computer
platforms, including Windows, DOS, Macintosh, OpenVMS, and Unix.


    As the opposite of SAS, MINITAB is quite easy to learn and use. There is no lengthy learning
process and little need for unwieldy manuals. Perhaps, this may be the main reason why MINITAB
is used extensively in the educational community.


    For more details about this software visit the URL http://www.minitab.com/.



8.2.3 BMDP

BMDP has its roots as a bio-medical analysis packages from the late 1960s. In many ways it has re-
mained true to its origins and this is evidenced by its long list of clients which includes such biomed-
ical giants as Bristol- Myers Squibb, Merck and Glaxo Wellcome. There are three main distributions:

                                                    90
8. A Guide to Statistical Software Today


BMDP New System Personal Edition, the BMDP Classic for PCs - Release 7, and the BMDP New
System Professional Edition. As BMDP New System has an easy-to-use interface that makes data
analysis possible with simple point and click and fill-in-the-blank interactions, the Professional Edi-
tion combines the full suite of BMDP Classic for PCs Release 7 statistics with the powerful data
management and front-end data exploration features of the BMDP New System Personal Edition.


    A reference URL for BMDP is http://www.ppgsoft.com/bmdp00.html.


8.2.4 SPSS

SPSS is a multinational software company founded in the late 1960s that provides statistical product
and service solutions for survey research, marketing and sales analysis, quality improvement, scien-
tific research, government reporting and education.


    SPSS starts with the SPSS Base which includes most popular statistics, complete graphics, broad
data management and reporting capabilities. The SPSS products are a modular system and includes
SPSS Professional Statistics, SPSS Advanced Statistics, SPSS Tables, SPSS Trends, SPSS Cate-
gories, SPSS CHAID, SPSS LISREL 7, SPSS Developer’s Kit, SPSS Exact Tests, Teleform, and
MapInfo. Although this software was originally designed for mainframe use, SPSS has adapted to
market demand and it has releases for Windows, MAC and UNIX.


    A reference URL for SPSS is http://www.spss.com/.



8.2.5 S-PLUS

While there are many different packages for performing statistical analysis, one that offers some of
the greatest flexibility with regard to the implementation of user defined functions and the customiza-
tion of ones environment is S-PLUS, which is one of the two implementations of the S language (R
is the other one, which will be viewed later).


    S is an exceptionally well-developed tool for statistical research and analysis. S is especially
strong for statistical graphics, the output of data analysis through which both raw data and results are
displayed for both analysts and clients. S was originally developed at AT&T Bell Labs (recently split
into AT&T Laboratories and Lucent Bell Labs) by a team of researchers including Richard A. Becker,


                                                  91
8. A Guide to Statistical Software Today


John M. Chambers, Allan Wilks, William S. Cleveland and Trevor Hastie. The original description
of the S language, which was written by Becker, Chambers, and Wilks in 1988, was awarded by the
Association for Computing Machinery (ACM) for the 1998 Software System Award. The aim of the
language, as expressed by John Chambers, is ”to turn ideas into software, quickly and faithfully”.


    A good introduction to the application of S to statistical analysis problems is contained in [Cham92]
and [Cham83]. More recent work that focus on the statistical capabilities of the S-PLUS system can
be found in [Vena02].


    S-PLUS is manufactured and supported by the Statistical Sciences Corporation, now a division of
MathSoft. It runs on both PC and UNIX based platforms. In addition the company offers easy links
for the user to call S-PLUS from within C/FORTRAN or for the user to call C/FORTRAN compiled
functions within the S-PLUS environment. Statistical Sciences has made great efforts to keep the
software current with regard to the needs of the statistical community. They have released dedicated
modules which are targeted at specific application areas.


    The S-PLUS home page can be reached at http://www.mathsoft.com/. This site contains an inter-
esting comparison between SAS and S-PLUS.



8.2.6 Others

Other statistically oriented packages enjoying good reputations are SYSTAT, DataDesk, JMP and
StatGraphics. SYSTAT originated as a PC-based package developed by Leland Wilkinson and is now
owned by SPSS. The current version is 6.0 and is a Microsoft Windows oriented product. On the con-
trary, DataDesk is a Macintosh-based product authored by Paul Velleman from Cornell University.
Currently released is version 5.0.1. and it is a GUI-based product which contains many innovative
graphical data analysis and statistical analysis features. More information about DataDesk can be
found at URL: http://www.lightlink.com/datadesk/. JMP is another SAS product that is highly visu-
alization oriented. It is a stand alone product for PC and Macintosh platforms. Information on JMP
can be found at http://www.sas.com/. Statgraphics is an education- orientated statistical software to
be used mainly in Universities and which offers a friendly-user interface. A good reference showing
how to use StatGrahics can be found in [Mat´ 95].
                                           e




                                                  92
8. A Guide to Statistical Software Today


8.3 Public License Packages

8.3.1 R

R is an Open Source implementation of the well-known S language which was originated at the Uni-
versity of Auckland, New Zealand, in the early 1990’s. It works on multiple computing platforms like
Unix systems or Windows, but the most important characteristic is that a software system that exists
under the Open source paradigm benefits from having ”many pairs of eyes” to examine the software
to help insure quality of the software. An example of the rapid development of this software is that in
1997, only two years after the public release in June 1995, the led team had to select a Core group of
around 10 members, who was responsible for changes to source code.


    R software, for the most part, is a command line based language which is organized into vari-
ous packages. Basic packages are installed by default, and the user can download and install a great
variety of packages to be used. There are also several major projects that are ”R spin-offs”, such as
”Bioconductor”, which is an R package for gene expression analysis or ”Omega”, which is another
package focused on providing seamless interface between R and a number of other languages (PERL,
PYTHON, MATLAB). There are two main packages that have to be mentioned because of their im-
portance and implication for this project: JRI and bayesm. The first one deals with the problem of
communicating Java with R. This will let us create a graphical user interface using Swing in Java
and make all the statistical calculates with R. The second one, developed by [Rossi06], contains the
main functions to be used in Bayesian analysis. Precisely, in Bayesian data analysis is where R can
be better than the other statistical softwares.


    More information about R can be found at http://www.r-project.org/.



8.3.2 BUGS

The BUGS (Bayesian inference Using Gibbs Sampling) project is concerned with flexible software
for the Bayesian analysis of complex statistical models using Markov chain Monte Carlo methods.
The project began in 1989 in the MRC Biostatistics Unit and led initially to the ”Classic” BUGS
program, and the onto the WinBUGS software developed jointly with the Imperial College School
of Medicine at St. Mary’s, London. Development now also includes the OpenBUGS project in the
University of Helsinki, Finland.


                                                  93
8. A Guide to Statistical Software Today




    The main advantages of these software is, as R, the flexibility that offers to the researcher to
model whatever he needs, but they are slightly more complex to learn than R is. Due to this fact, Phil
Woodward developed BugsXLA, which is an Excel Add-In that not only allows the user to specify
a model as one would in a package such as SAS or S-PLUS, but also aids the specification of priors
and control of the MCMC run itself.


    More information can be found in http://www.mrc-bsu.cam.ac.uk/bugs/.




8.4      Analysis Packages with Statistical Libraries

8.4.1 Matlab

MATLAB is an interactive computing environment that can be used for scientific and statistical data
analysis and visualization. The basic data object in MATLAB is the matrix. The user can per-
form numerical analysis, signal processing, image processing and statistics on matrices, thus freeing
the user from programming considerations inherent in other programming languages such as C and
FORTRAN. There are versions of MATLAB for Unix platforms, PC’s running Microsoft Windows
and Macintosh. Because the functions are platform independent, provides the user with maximum
reusability of their work.


    MATLAB comes with many functions for basic data analysis and graphics. Most of these are
written as M-file functions, which are basically text files that the user can read and adapt for other
uses. The user also has the ability to create their own M-file functions and script files, thus making
MATLAB a programming language. The recent addition of the MATLAB C-Compiler and C-Math
Library allow the user to write executable code from their MATLAB library of functions, yielding
faster execution times and stand-alone applications.


    For researchers who need more specific functionality, MATLAB offers several modules or tool-
boxes. These typically focus on areas that might not be of interest to the general scientific community.
Basically, the toolboxes are a collection of M-file functions that implement algorithms and functions
common to an area of interest.




                                                  94
8. A Guide to Statistical Software Today


    One of the most useful capabilities of MATLAB is the tools available for visualizing data. MAT-
LAB supports standard two and three dimensional scatter plots along with surface plots. In addition it
provides the user with a graphics property editor. As it occurs with R, there is a considerable amount
of contributed MATLAB code available on the internet. One notably useful source of code is avail-
able via the home page for MATLAB at http://www.mathworks.com/, where more information about
this software can be found.



8.4.2 Mathematica

Mathematica is an algebra computational system developed originally by Stephen Wolfram and sold
by his company, Wolfram Research. It has numerical and graphical features and powerful symbolic
processing capabilities but is comparatively complex to learn. Information on Mathematica is avail-
able at URL http://www.wolfram.com/.



8.4.3 Others

Other mathematical software worth noting is MAPLE, with powerful symbolic processing capabili-
ties, and MATHCAD, a package which combines numerical, symbolic, and graphical features. More
information about these software can be found at their official web sites, which are :

    • http://www.maplesoft.com/

    • http://www.mathsoft.com/


8.5 Some General Languages with Statistical Libraries

8.5.1 Java

It is difficult to asses the state of the art with regards to Java statistical libraries in that there may be
many custom user developed packages that we are unaware of. Given this caveat, there are three main
packages to mention.


    The first one is StatCrunch, which provides the user capability to perform interactive exploratory
data analysis, logistic regression, nonparametric procedures, regression and regression diagnostics,


                                                    95
8. A Guide to Statistical Software Today


and others. The reader is referred to a review that appeared in [West04].


    Another source of Java- based statistics functions is the Apache Software Foundation Jakarta
math project. The math project seeks to provide common mathematical functionality to the Java user
community.


    The final source for Java- based statistical analysis is the Visual Numerics JSML package. It
provides the user with an integrated set of statistical, visualization, data mining, neural network, and
numerical packages. The reader is referred to http://www.vni.com/products/imsl/jmsl/jmsl.html for
additional discussions on JSML.



8.5.2 C++

C++ is another object oriented language program, like Java, with different statistical libraries. There
are two libraries that are worth mentioning. Goose and Probability and Statistics.


    The first one is dedicated to statistical computation, and provide support for t-tests, F-tests,
Kruskall-Wallis tests, Spearman tests and others with an implementation of simple linear regression
models. More information about this in http://www.gnu.org/software/goose/goose.html.


    The second one is aimed to Microsoft Windows developers and consists of five packages: statis-
tics, discrete probability, standard probability distributions, hypothesis testing and correlation and
regression. A strength of these modules are their support for various interfaces including C# and C++
.NET. The reader is referred to the URL http://www.webcabcomponents.com/dotNET/dotnet/pss/.




8.6 Developed Software Tool: BARESIMDA

The software tool developed throughout this project, as it has been said above, is based on Java and
R, both of them public licenses software. It has not been developed with the intention of creating
a complete statistical package which can be an alternative to any of the above software. Evidently,
it is very difficult to incorporate all the facilities that those programs have, and much more in a one
year period time for only one developer. In fact, BARESIMDA only focus on regression analysis


                                                  96
8. A Guide to Statistical Software Today


procedures with different approaches and data. In that sense, the developed tool gathers classical and
Bayesian regression and let user analyze Normal regression models in a very simple way with a very
intuitive graphical user interface. This is a very important feature in Bayesian approach, where there
is a complex theoretical basis and many users may not be familiarized with it.


    Another advantage, maybe the most important one, over the rest of statistical package is that
BARESIMDA incorporates regression analysis with interval data in either classical and Bayesian ap-
proach. Not only it displays the analytical results but also let us see graphically the goodness fit and
the centres and radii tendencies.


    With this first version of BARESIMDA, we have wanted to start the way towards public license
software which can take the advantages from the Java graphical user interface with Swing and from
statistical libraries in R.




                                                  97
Chapter 9

Software Requirements Specification

This chapter defines the complete description of the functions to be performed by the BARESIMDA
software, so it will assist the potential users to determine if the software specified meets their needs
or how the software must be modified in order to meet their needs.


   This also reduces the development effort since the preparation of the Software Requirements
Specification (SRS) forces the developer to consider rigorously all of the requirements before de-
sign begins and reduces later redesign, recoding, and retesting. Careful review of the requirements
in the SRS can reveal omissions, misunderstandings, and inconsistencies early in the development
cycle when these problems are easy to correct. Likewise it provides a basis for estimating costs and
schedules and a baseline for verification and validation.


9.1 Purpose

The aim of this system is to provide a tool to build different types of regression analysis and check
advantages and disadvantages in each approach that has been developed.


9.2 Intended Audience

The software is intended to be handled by different types of users such:

   • Inexperienced people who has a minimum knowledge about what regression is and what con-
      sists on.



                                                  98
9. Software Requirements Specification


    • Students and people with a medium degree of knowledge about regression and minimum infor-
      mation about Bayesian paradigm.

    • Graduate and experienced people who has a deep knowledge about regression and Bayesian
      analysis and want to learn about symbolic regression.


9.3 Functionality

The software is supposed to do those things shown in the following points.


9.3.1 Classical Regression with crisp data

This refers to analytic and graphical analysis of multiple and simple classical Normal regression
models with crisp data. Being precise, the software has to provide the following facilities:

    • Regression analysis summary with estimated parameters

    • ANOVA table.

    • Normality test.

    • Heteroscedasticity test.

    • Autocorrelated errors test.

    • To predict new data.

    • Complementary graphics to see the fitted model.


9.3.2 Classical Regression with interval- valued data

As well as it was done with crisp data, a regression analysis must be carried out with symbolic data,
specifically, with interval- valued data. All the functions which were described previously must be
implemented for the centres and radii regressions. In addition, the software will display graphically
the adequacy of the fitted model to the original interval- valued data.




                                                  99
9. Software Requirements Specification


9.3.3 Bayesian Regression with crisp data

The user must be capable of creating two different Bayesian models: Normal and Independent Nor-
mal. Since the main characteristic of Bayesian paradigm is the possibility to introduce subjective
information, the application will provide a very intuitive dialog to retrieve user’s beliefs about the
different parameters. The software will display the estimated parameters, it will provide a Normality
test for residuals and input fields to make new predictions.


9.3.4 Bayesian Regression with interval- valued data

As it occurred with classical regression, Bayesian regression must be possible to be carried out with
interval- valued data so the user will be able to incorporate prior information about the centres and
the radii. The analysis options are the same that those for crisp data and it has additional graphics to
see the adequacy of fitted interval- valued data with observed data.


9.3.5 Data Manipulation

The user will be able to type in new data by hand or to load an existing excel file into the application.
In the same way, he will be able to save both source data and the following resulting data:

    • Residuals

    • Normalized residuals

    • Studentized residuals

    • Fitted values

    • Predicted values in an excel file.


9.3.6 Portability

The application must be able to be executed in the main platforms such as Windows, Linux and Unix.


9.3.7 Maintainability

In the same way, the tool must be well structured to be easily maintainable since changes and exten-
sions in the future are quite probable.



                                                  100
9. Software Requirements Specification


9.4 External Interfaces

9.4.1 User Interfaces

The application to be developed will have a Multiple Document Interface (MDI) with a high degree
of usability. The former means that its windows will reside under a single parent window like Figure
C.1 shows.




                                        Figure 9.1: BARESIMDA MDI



    This will avoid filling up the operating system task management interface, as they are hierarchi-
cally organized, and it will let the user hidden/ show/minimize/maximize them as a whole.
    The second characteristic means that the user will not have to think too much about what the
application does or how it does it.
    There will be an option to configure the application look to be able to be adapted to user’s prefer-
ences. The user will have the possibility to set the windows look as:

    • Unix

    • Windows


                                                   101
9. Software Requirements Specification


    • Windows Classic

    • Java

    In the same way, the user will be able to indicate if she or he is an experienced or an inexperienced
user, what will help her or him to specify prior information in Bayesian regression.


9.4.2 Software Interfaces

BARESIMDA will connect to a Statistics software which will be responsible for making all compu-
tations and returning them to BARESIMDA. In this way all the operations must be transparent for
the end user through an interface which will let us interact between both programs. This makes the
application more usable.


    Regarding input and output data, there will be necessary an interface to let us read and write from
and to an excel file.




                                                  102
Chapter 10

Software Architecture Study

10.1 Hardware/ Software Architecture

The application will be programmed in Java and built, executed and tested with SDK version 1.4.2
or posterior. Specifically, the graphical user interface will be developed using Swing. This is one
of the most powerful tools for developing user-friendly mechanism for interacting with an applica-
tion giving it a distinctive ”look” and ”feel”. Its libraries are part of the Java Foundation Classes
(JFC) - Java’s libraries for cross- platform GUI development. For more information on JFC visit
http://java.sun.com/products/jfc/. This will let us develop the main interface in a particular system,
but then can be executed in any platform, allowing users from different operating systems use the
look and feel of their own platform.


    The software which has been chosen to carry out the statistics processes has been R since it is
distributed under public license, like Java; it let developer a high degree of flexibility to model and to
program the models that he wants to build and it is having a great expansion between statisticians and
scientists.


    The way to communicate BARESIMDA and R is through the Java to R interface, called JRI. This
is a .jar library which can be obtained from http://rosuda.org/JRI/ and it allows running R inside Java
applications as a single thread. Basically it loads R dynamic library into Java and provides a Java API
to R functionality. JRI uses native code, but it supports all platforms where Sun’s Java (or compatible)
is available, including Windows, Mac OS X, Sun and Linux. More information about this interface
can be found in the reference cited above.


                                                  103
10. Software Architecture Study




                          Figure 10.1: Interface between BARESIMDA and R


    As it was indicated in the previous chapter, BARESIMDA is required to read/ write excel files. For
such purpose, POI project consists of various parts that fit together to deliver the data in a Microsoft
file format to the Java application. Specifically, and according to our requirements, HSSF is the POI
project’s pure Java implementation of the Excel file format. It provides a way to create, modify, read
and write XLS spreadsheets. Being more precise, it offers:

    • Low level structures for those with special needs

    • An event-model API for efficient read-only access.

    • A full user model API for creating, reading and modifying XLS files.

    Visit http://jakarta.apache.org/poi/hssf/index.html for more information.


10.2 Logical Architecture

The application will be structured in three levels or layers where each of them will have a well defined
responsibility:

                                                 104
10. Software Architecture Study




                        Figure 10.2: Interface between BARESIMDA and Excel



    • gui: it will be responsible for showing the graphical user interface and for getting the input
      parameters and requests and passing them to the classes which will process them.

    • action: it will contain the main procedures that treat the information and elaborate the regres-
      sion models and analysis. The results will be given back to the caller process. It will be
      responsible for calling the dao classes too.

    • dao: it will be responsible for accessing to permanent data, this is, to load and to save informa-
      tion.

    Figure 10.3 shows the relation among these packages.




                                  Figure 10.3: Logical Architecture




                                                  105
Chapter 11

Project Budget

Project costs for this system have been divided into two types of costs, which will be commented in
the following sections:

   • Engineering costs.

   • Investment and Materials Costs.

    There is also a section summarizing the entire expected budget for the project. There is not any
commercial cost since it is intended to be a public license software for free distribution.


11.1 Engineering Costs

A computer engineer working in the environment where the project is focused on is expected to earn
around 2500 e/month. There is an additional extra cost of 30% for paying the Social Security.


    The programmer works 8 hours/day, in a mean of 22 days/month. This makes a mean of 176
hours/month. Thus, the cost per hour is 18.46 e/h.


    The estimated time required for the development of the project is divided in the work packages
explained in the beginning of this project:

   • Bayesian Data Analysis: 168 hours

   • Regression Models: 160 hours.


                                                  106
11. Project Budget


    • Symbolic Data: 64 hours.

    • Requirements Specification: 40 hours.

    • Architecture Study: 56 hours.

    • Design: 80 hours.

    • Programming: 416 hours.

    • Testing: 40 hours.

    The estimated time required for the project is: 1024 hours (5 months and 18 days). Thus, the
estimated engineering costs is: 18903.04e.


11.2 Investment and Elements Costs

The elements used for the development of this project have been computer and software equipment.
These costs can be seen in Table 11.1.


                     Element                                                 Price



                     Pentium D925 to 3GHz                                    630e


                     Other expenses (Internet connection, office materials)   60e


                     Total                                                   690e


                                 Table 11.1: Estimated material costs


    The amortization period for this type of elements is considered to be complete after 10000 work-
ing hours. Moreover, the usage rate is considered to be of about 85% of the engineering work hours,
thus obtaining the results shown in Table 11.2.




                                                  107
11. Project Budget




                           Concept                                  Total



                           Hours of use of the material          870.4 hours


                           Resources cost/hour                     0.19e/h


                           Total amortization materials costs     165.38e


                                     Table 11.2: Amortization Costs


    Thus, the sum of the engineering and material cost is 19068.42e. It can be assumed that the
investment done is about 5% of the engineering cost, thus the investment cost sums 945.15e.


    Therefore, the total cost of the project, which is the sum of the engineering, materials, and invest-
ment costs, is estimated to be 20013.57e.


11.2.1 Summarized Budget

The overall expected budget can be observed in Table 11.3.




                                                  108
11. Project Budget




                       Cost                Total



                        Engineering      18903.04e


                        Material          165.38e


                        Investment        945.15e


                       Total             20013.57e


                     Table 11.3: Summarized Budget




                                   109
Chapter 12

Conclusions

12.1 Bayesian Regression applied to Symbolic Data

Dealing with a current and recent investigation matter such as symbolic data involves having a high
English level, since it is the universal language. On the other hand, a project like this that depends
on current investigations makes more difficult its progress since it is not treating with an established
issue.


    A good investigation task requires a rigorous documentation and a complete bibliography. There
must be enough well cited references to let the reader find more information about those points inter-
esting for her or him.


    Bayesian methodology is called to be a fundamental element in business processes orientated to
predict and forecast new situations and quantities. Although I have really enjoyed a lot with this
project, I suspect that, with a more complete previous formation in Bayesian data analysis, I could
have saved some initial time in learning concepts that, later, result obvious. This would let me extend
the project to other fields such as regression with hierarchical models or nonparametric Bayesian re-
gression, where the authentic Bayesian potential resides. However, this is probably due to the fact
that the more one knows about an issue, the more one likes it and the more one wants to learn about
it, and therefore the problem would never finish. Regarding to this, the project has responded and
exceeded to the initial personal perspectives, arousing a great interest for the investigation field and
learning to value this hard but exciting arena.




                                                  110
12. Conclusions


    If I could change any of the past related to the project planning, I would have tried to condense the
study stage in order to spend more time on the application of the software tool to more real problems
and situations. Nevertheless, it would be difficult to carry out since the project development is done
within an academic year, where more activities are done.


12.2 BARESIMDA Software Tool

Public license software is increasingly enormously fortunately. This lets everybody have more op-
tions to choose.


    In that sense, R is a great tool for programming new models, but it requires, on one hand, a very
high Statistics knowledge, since people requirements with a low- medium Statistics level are satisfied
by current Statistics software. On the other hand, it requires a medium programming level to be able
to carry out one’s ideas. Moreover, the way in which R handles data results tedious for someone used
to work with matrix representation.


    Interconnecting different interfaces or applications is usually a difficult task, especially when there
is very little documentation to establish the connection in both sides. This problem is very important,
and is not usually taken into consideration when integrating different environments.


    Concerning Java, it is really incredible all the possibilities and facilities that this programming
language offers. This makes programming task much easier.




12.3 Future Developments

As it can be deduced from all what has been said above and in previous chapters, the project could
have many and different extensions. The more important ones are:

    • Bayesian regression with hierarchical models for interval- valued data.

    • Bayesian time series for interval- valued data.

    • Bayesian linear regression for histogram-valued data.

    • Nonparametric Bayesian regression for interval- valued data.

                                                  111
12. Conclusions


    • Bayesian Vector Autoregressive for interval- valued data.

    • Bayesian regression for functional data.

    • Bayesian symbolic regression with uniformly distributed errors.

    Likewise, the software tool can be improved adding some conventional statistical functions in
order to get public license Statistics software with a user-friendly graphical interface.




12.4 Summary

In one hand, we have built a new Bayesian regression model for interval- valued data fitting better
than other existing approaches providing that prior information is accurate. And, as it has been shown,
this works well for both directly relationship between variables and uncorrelated variables. This is
an important advance in symbolic data field, since to our best knowledge there is not any Bayesian
approach for this kind of data.


    On the other hand, a new software tool letting user make Bayesian symbolic regression has been
developed. Again, to our best knowledge, there is not any package with the same friendly-user in-
terface and the same facilities. Furthermore, it offers the possibility to make standard and Bayesian
regression with classical and symbolic data individually respectively.


    As a result of this project, author and director are working together in a paper about past, present
and future of regression which is intended to be sent to ANALES. In the same way, another possible
article about Bayesian symbolic regression is born in mind to a more transcendent journal such as
Computational Statistics and Data Analysis (CSDA).




                                                  112
Appendix A

Probability Distributions

A number of probability distributions together with their density or probability mass function, mean
or variance have been used or mentioned previously. For ease of reference, their definitions are
regrouped in this chapter, together with a short discussion of their key properties. More information
about these distributions in a Bayesian context can be found in [Gelm04] or [Mat´ 93].
                                                                                e


A.1 Discrete Distributions

A.1.1 Binomial

The Binomial distribution is perhaps the most commonly encountered discrete distribution in Statis-
tics, and it is used in quality control by attributes and sampling techniques with replacement. Consider
a sequence of n independent trials, each of which can result in one of just two possible outcomes,
namely success and failure. Further assume that the probability of success, p, is the same for each
trial. Let Y denote the number of successes observed in the n trials. Then Y has a Binomial distri-
bution with parameters n and p. Properly, a discrete random variable, Y , has a Binomial distribution
with parameters n and p, denoted Y      Bin(n, p), if its probability mass function is given by:


                                      f (y|n, p) = py (1 − p)n−y                                   (A.1)

    where n > 0, y = 0, 1, . . . , n and 0 ≤ p ≤ 1.
    Likewise, mean and variance are defined:




                                                  113
A. Probability Distributions




                                                   E(Y ) = np                                        (A.2)
                                       V ar(Y ) = np(1 − p)                                          (A.3)


A.1.2 Geometric

The Geometric distribution is related in a certain way with the previous one. Consider the same
situation that before with a sequence of n independent trials with a constant success probability p
in each trial. In this case the number of trials varies until the first success is obtained, that is, it is
used to model the number of trials until the first success is obtained and it is common in reliability
analysis. Formally, a discrete random variable, Y , has a Geometric distribution with parameter p,
denoted Y      Geo(p), if its probability mass function is given by:



                                       f (y|p) = (1 − p)y−1 p                                        (A.4)

where p ≥ 0 and y = 1, 2, . . . , n.


    In the same way, mean and variance are denoted by:


                                                        1
                                                E(Y ) =                                              (A.5)
                                                        p
                                                    1−p
                                         V ar(Y ) =                                                  (A.6)
                                                     p2

A.1.3 Poisson

The Poisson distribution is commonly used to represent count data, such as the number of shares sold
in a fixed time period. As well it is usual to see it in reliability analysis. In an strict way, a discrete
random variable, Y , has a Poisson distribution with parameter λ, denoted Y       P (λ), if its probability
mass function is given by:


                                                   exp(−λ)λy
                                       f (y|λ) =                                                     (A.7)
                                                       y!
where λ ≥ 0 and y = 0, 1, . . . , n.



                                                    114
A. Probability Distributions


    In the same way, mean and variance are denoted by:


                                                E(Y ) = λ                                              (A.8)
                                            V ar(Y ) = λ                                               (A.9)


A.2 Continuous Distributions

A.2.1 Uniform

The uniform distribution is used to represent a variable that is known to lie in an interval and equally
likely to be found anywhere in the interval. The main characteristic is that if a variable, X, has a
probability distribution F (x), then the variable Y = F (X) is uniform in the interval [0, 1]. Properly,
a continuous random variable, Y , has a Uniform distribution over the interval [a,b], denoted Y
U (a, b), if its probability density function is given by:


                                                   1
                                                  b−a     a≤y≤b
                                 f (y|a, b) =                                                        (A.10)
                                                   0      otherwise
where −∞ < a < b < ∞.
    Mean and variance are specified alike by:


                                                      a+b
                                                E(Y ) =                                              (A.11)
                                                        2
                                                   (b − a)2
                                        V ar(Y ) =                                                   (A.12)
                                                      12

A.2.2 Univariate Normal

The Normal distribution, also called Gaussian distribution, is ubiquitous in statistical work. It is a
family of distributions of the same general form, differing in their location and scale parameters: the
mean and standard deviation, respectively. The standard normal distribution is the normal distribution
with a mean of zero and a variance of one. Formally, a continuous random variable, Y , has a Normal
distribution with mean µ and variance σ 2 , denoted Y        N (µ, σ 2 ), if its probability density function
is given by:


                                                 1          (y − µ)2
                               f (y|µ, σ 2 ) = √      exp(−          )                               (A.13)
                                                2πσ 2          2σ 2

                                                    115
A. Probability Distributions


where σ 2 ≥ 0, −∞ < µ < ∞ and y ∈ .


    Likewise, mean and variance are formulated by:



                                               E(Y ) = µ                                           (A.14)
                                           V ar(Y ) = σ 2                                          (A.15)


A.2.3 Exponential

This distribution is used to model the time, t, between independent events that happen at a constant
rate,λ. Therefore, this is the distribution of waiting times for the next event in a Poisson process and
is a special case of the Gamma distribution with α = 1. Formally, a continuous random variable,
Y , has an Exponential distribution with parameter λ, denoted Y        Exp(λ), if its probability density
function is given by:



                                       f (y|λ) = λexp(−λy)                                         (A.16)

where λ ≥ 0 and y ≥ 0.
    Similarly, mean and variance are identified by:


                                                       1
                                               E(Y ) =                                             (A.17)
                                                       λ
                                          V ar(Y ) = 1λ2                                           (A.18)


A.2.4 Gamma

A Gamma distribution is a general type of statistical distribution that is related to the Beta distribution
and arises naturally in processes for which the waiting times between Poisson distributed events are
relevant.


    In Bayesian context, the Gamma distribution is the conjugate prior distribution for the inverse of
the normal variance and for the mean parameter of the Poisson distribution.




                                                   116
A. Probability Distributions


    In a formal way, a continuous random variable, Y , has a Gamma distribution with shape and scale
parameters α and β , respectively, denoted Y        Gamma(α, β), if its probability density function is
given by:

                                                               y
                                                   y α−1 exp(− β )
                                    f (y|α, β) =                                                   (A.19)
                                                      β α Γ(α)
where α > 0 and y > 0.
    Similarly, mean and variance are identified by:



                                              E(Y ) = αβ                                           (A.20)
                                          V ar(Y ) = αβ 2                                          (A.21)


A.2.5 Inverse- Gamma

If Y −1 has a Gamma distribution with parameters α and β then Y has the Inverse- Gamma distribu-
tion. In a Bayesian context, this distribution is the conjugate prior distribution for the normal variance.


    Formally, a continuous random variable, Y , has an Inverse-Gamma distribution with shape and
scale parameters α and β, respectively, denoted Y         Inv − Gamma(α, β), if its probability density
function is given by:


                                                β α y −α−1 exp(− β )
                                                                 y
                                 f (y|α, β) =                                                      (A.22)
                                                       Γ(α)
where α > 0, β > 0 and y > 0.
    Similarly, mean and variance are identified by:


                                                        β
                                             E(Y ) =             α>1                               (A.23)
                                                       α−1
                                                  β2
                               V ar(Y ) =                        α>2                               (A.24)
                                            (α − 1)2 (α − 2)

A.2.6 Chi-square

It is an essential distribution in Inference Statistics and in goodness tests. The χ2 distribution is a
                                                                         v
special case of the Gamma distribution, with shape parameter α =         2   and scale parameter β = 12.

                                                    117
A. Probability Distributions


Since it is a special case, we need not define again the density function and mean and variance as they
can be deduced easily from the Gamma distribution.


A.2.7 Inverse- Chi-square and Inverse-Scaled Chi-Square

As the χ2 distribution is a special case of the Gamma distribution, the inverse χ2 distribution is a
                                                                                   v
special case of the inverse- Gamma distribution, with shape parameter α =          2   and scale parameter
β = 12. So, for density function and mean and variance, see inverse- Gamma distribution. We also
define the scaled inverse χ2 distribution, which is useful for variance parameters in normal models.
A continuous random variable, Y , has a scaled inverse χ2 distribution with v degrees of freedom and
scale s, denoted Y       ScaledInv − χ2 (v, s2 ), if its probability density function is given by:

                                                v
                                           (v)2         v           vs2
                               f (y|v, s) = v v sv y −( 2 +1) exp(−     )                            (A.25)
                                           Γ( 2 )                   2y

    The mean and variance are defined by:


                                                     v 2
                                          E(Y ) =       s         v>2                                (A.26)
                                                  v−2
                                                2v 2
                               V ar(Y ) =                         v>4                                (A.27)
                                          (v − 2)2 (v − 4)

    Note that this is the same as Inv − Gamma(α = v , β = v s2 ).
                                                  2       2


A.2.8 Univariate Student- t

The Student’s t-distribution is a probability distribution that arises in the problem of estimating the
mean of a normally distributed population when the sample size is small. In regression analysis, it is
used to represent the posterior predictive distribution in Normal regression. As anecdote, it is worth
to mention that this distribution was published by William Gosset in 1908, but he was not allowed
to bring it out under his own name, so the paper was written under the pseudonym Student. Strictly,
a continuous random variable, Y , has a Student’s t-distribution with degrees of freedom, denoted
Y     t(v), if its probability density function is given by:


                                            Γ( v+1 )  y 2 v+1
                                 f (y|v) = √ 2 v (1 + )− 2                                           (A.28)
                                             vπΓ( 2 ) v


                                                    118
A. Probability Distributions


where v > 0 and y > 0.


    In the same way, mean and variance are identified by:



                                           E(Y ) = 0             v>1                               (A.29)
                                                 v
                                    V ar(Y ) =                   v>2                               (A.30)
                                               v−2

A.2.9 Beta

In probability theory and statistics, the Beta distribution is a family of continuous distributions de-
fined on the interval [a, b] differing the values of their two non- negative shape parameters, α and
β. In Bayesian context, the Beta is the conjugate prior distribution for the binomial probability. A
continuous random variable, Y , has a Beta distribution with α and β, denoted Y         Beta(α, β) if its
probability density function is given by:


                                              Γ(α + β) α−1
                               f (y|α, β) =            y   (1 − y)β−1                              (A.31)
                                              Γ(α)γ(β)

where α ≥ 0 and β ≥ 0


    Mean and variance are identified by:


                                                                    α
                                                      E(Y ) =                                      (A.32)
                                                                   α+β
                                                            αβ
                                 V ar(Y ) =                                                        (A.33)
                                               (α +   β)2 (α     + β + 1)

A.2.10 Multivariate Normal

The multivariate normal distribution extends the univarate Normal distribution model to fit vector
observations. A p- dimensional vector of continuous random variables, Y = (Y1 , Y2 , . . . , Yp ), is said
to have a multivariate Normal distribution with vector of means µ and variance- covariance matrix Σ,
if its probability density function is given by:


                                   1 p       1      1
                         f () = ( √ ) 2 |Σ|− 2 exp[− (y − µ) Σ−1 (y − µ)]                          (A.34)
                                   2π               2

                                                      119
A. Probability Distributions


    Likewise, mean and variance are formulated by:


                                                         E(Y ) = µ                                      (A.35)
                                                       V ar(Y ) = Σ                                     (A.36)


A.2.11 Multivariate Student- t

It is a multivariate generalization of the Student’s t-distribution. Rigorously, a continuous random
variable has a multivariate Student’s t-distribution with v degrees of freedom, location µ = (µ1 , . . . , µd )
and symmetric, positive definite dxd scale matrix Σ, denoted Y             t(v, µ, Σ), if its probability density
function is given by:


                                        Γ( v+d ) 1     1                     −(v+d)
               f (y|v, µ, Σ) =              2
                                             Σ−1 2 (1 + (y − µ) Σ−1 (y − µ))− 2
                                              d    d                                                    (A.37)
                                   Γ( v )v π  2    2   v
                                      2
    In the same way, mean and variance are defined by:


                                                     E(Y ) = µ v > 1)                                   (A.38)
                                                          v
                                             V ar(Y ) =      Σ v>2                                      (A.39)
                                                        v−2

A.2.12 Wishart

The Wishart is the conjugate prior distribution for the inverse covariance matrix in a multivariate Nor-
mal distribution. It is a multivariate generalization of the Gamma distribution. The integral is finite if
the degrees of freedom parameter, v, is greater or equal to the dimension, k.


    Formally, a continuous random variable, Y , has a Wishart distribution with v degrees of freedom
and symmetric, positive definite k × k scale matrix S, denoted Y              W ishartv (S), if its probability
density function is given by (W positive definite):

                                         k
                        vk     k(k−1)              v + 1 − i −1 − v       v+k+1   1
      f (y|v, S) = (2 π  2        4           Γ(            )) |S| 2 |W |− 2 exp[− tr(S −1 W )]         (A.40)
                                                       2                          2
                                        i=1
    Similarly, mean is defined by:


                                                       E(Y ) = vS                                       (A.41)

                                                             120
A. Probability Distributions


A.2.13 Inverse- Wishart

If W −1      W ishartv (S), then W has the inverse- Wishart distribution. This is the conjugate prior
distribution for the multivariate Normal covariance matrix. Formally, a continuous random variable,
Y, has an inverse- Wishart distribution with v degrees of freedom and symmetric, positive definite kxk
scale matrix S, denoted Y         Inv − W ishartv (S −1 ), if its probability density function is given by
(W positive definite):


                                        k
                          vk   k(k−1)              v + 1 − i −1 v        v+k+1   1
        f (y|v, S) = (2 2 π       4           Γ(            )) |S| 2 |W | 2 exp[− tr(SW −1 )]      (A.42)
                                                       2                         2
                                        i=1

    Similarly, mean is defined by:



                                            E(Y ) = (v − k − 1)−1 S                                (A.43)




                                                          121
Appendix B

Installation Guide

B.1 From source folder

Source folder contains the following files and folders:

   • BARESIMDA.jar: it is the executable application file. Java Runtime Enviroment 1.4.2 or pos-
      terior, R 2.4.1 or posterior and the libraries provided in the folder requires to be installed.

   • R Libraries: with the libraries to be moved into the R software library %R HOME%library.

   • Java Library: it contains the file to be moved into %JAVA HOME%libext.

%R HOME% and %JAVA HOME% refers to the path in which R and Java are installed respec-
tively. For instance, in Windows, if you have installed them into the root directory C: you should have
C:RR-2.4.1library and C:Javalibext.


B.2 From installer

An installer will be provided to make the installation process much easier. It is not necessary any
previous program, since the installer will install the Java Runtime Environment and R. As result of
executing this installer, a new folder and a shortcut icon will be created.




                                                   122
Appendix C

User’s Guide

C.1 Data Entry

C.1.1 Loading an excel file

  1. Select the file menu item in the menu bar.




                                  Figure C.1: Load Data Menu



  2. Put the mouse over the Load element and click on it.

  3. A dialog box shown in Figure C.2 will be displayed. Click on the Search button to select the
     excel file to load and indicate the sheet number in the field with that label. If the first row in
     the data sheet is the header with the variable names, then click OK to load data. Otherwise,
     deselect the variable names option and click OK.

  4. Then, data will be displayed in the Data windows as it was done in an excel sheet (see Figure
     C.3)


C.1.2 Defining a new variable

  1. Ensure that Data window is the active window.

                                                 123
C. User’s Guide




                                  Figure C.2: Select File Dialog




                                 Figure C.3: Display Loaded Data



   2. Define the new variable by clicking on the New Variable button (see Figure C.4).

   3. You will be required to type in the name of the new variable. Type it in and click OK (see
      Figure C.5).

   4. A new column will be added to the spreadsheet with the new variable as header (see Figure
      C.6).

   5. If you want to define several new variables, repeat from step 2 as necessary




                                                124
C. User’s Guide




                    Figure C.4: Define New Variable




                  Figure C.5: Enter New Variable Name




                   Figure C.6: Display New Variable




                                 125
C. User’s Guide


C.1.3 Editing an existing variable

   1. Ensure that Data window is the active window.

   2. Click on the Edit Variable button (see Figure C.7).




                                      Figure C.7: Edit Variable



   3. A dialog will be displayed. Select the variable to edit and go on (see Figure C.8).




                              Figure C.8: Select Variable to Be Editted



   4. A new dialog will be shown and you will be required to type in the new name of the variable.
      Type it in and the variable will be stored with the new name (see Figure C.9).


                                                 126
C. User’s Guide




                                   Figure C.9: Enter New Name



C.1.4 Deleting an existing variable

   1. Ensure that Data window is the active window.

   2. Click on the Delete Variable button and a dialog will be displayed.

   3. Select the variable to delete and go on. A confirmation dialog will be shown. Confirm that is
      the variable to be deleted and the variable and its data will be removed from the application
      (see Figure C.10).




                                     Figure C.10: Confirmation




C.1.5 Typing in a new data row

   1. Ensure that Data window is the active window.

   2. Click on the New Row button. If there is any defined variable previously, a row will be added
      to the spreadsheet with so many columns as variables are defined (see Figure C.11).

   3. Double-click onto the cell to edit and enter the new value. When you finish, press enter (see
      Figure C.12).

   4. Repeat step 2 and 3 as necessary.


                                                127
C. User’s Guide




                                   Figure C.11: New Row data




                                     Figure C.12: Type Data



C.1.6 Deleting an existing data row

   1. Ensure that Data window is the active window.

   2. Select the data row or rows to be deleted. Then click on the Delete Row button. A confirmation
      dialog will be displayed.

   3. Confirm and all data in those rows will be removed.




                                               128
C. User’s Guide


C.1.7 Modifying an existing data

   1. Ensure that Data window is the active window.

   2. Select the data cell to be modified and double-click onto it. You will be able to edit the cell
      value. When you finish, press Enter.


C.2 Configuration

C.2.1 Setting the look& feel

   1. Select the Look&Feel item in the Configuration element of the menu bar (see Figure C.13).




                                Figure C.13: Look And Feel Menu



   2. Select the Look&Feel style you want. The available options are: Metal (Java style), CDE/
      Motif (Unix/ Linux style), Windows and Windows Classic (see Figure C.14).




                                Figure C.14: Look And Feel Styles



   3. When you have selected your option (for instance CDE/ Motif), the application appearance will
      be modified (see Figure C.15).


C.2.2 Selecting the type of user

   1. Select the Type Of User item in the Configuration element of the menu bar (see Figure C.16).

   2. A dialog will be displayed. Select the type of user you are and accept (see Figure C.17). This
      will be useful to define prior information in Bayesian regression.


                                                129
C. User’s Guide




                  Figure C.15: New Look And Feel




                  Figure C.16: Type Of User Menu




                  Figure C.17: Select Type Of User




                                130
C. User’s Guide


C.3 Non Symbolic Regression

C.3.1 Simple Classical Regression

   1. Select the Classical menu in the Non-Symbolic Regression element of the menu bar. Then,
      select Simple Regression (see Figure C.18).




                       Figure C.18: Non-Symbolic Classical Regression Menu



   2. You will be required to select the independent and dependent variables from the defined vari-
      ables. Select them and go on (see Figure C.19).




                  Figure C.19: Select Non-Symbolic Variables in Simple Regression



   3. A brief report will be displayed in the Classical Simple regression window indicating that for
      more details see Analysis Options in the ToolBar (see Figure C.20).

   4. From this point, you can:

        (a) Change dependent and independent variables in the Variables Options, by selecting them
            again as it was done before.


                                                131
C. User’s Guide




                                     Figure C.20: Brief Report



        (b) Select tests and analysis in the Analysis Options, by clicking on the wanted analysis
            options. The analysis options available are shown in Figure C.21.




            Figure C.21: Analysis Options in Non-Symbolic Classical Simple Regression


            To make new predictions, you will have to select the predict option and introduce the new
            observed value and press OK (see Figure C.22).
        (c) Select graphics in the Graphics Options, by clicking on the wanted graphics options. The
            graphics options available are show in Figure C.28.
        (d) Save some results in the Save Options, by clicking on the wanted save options and select-
            ing the file where is going to be saved. The save options available are shown in Figure
            C.24.




                                                132
C. User’s Guide




             Figure C.22: New Prediction in Non-Symbolic Classical Simple Regression




            Figure C.23: Graphics options in Non-Symbolic Classical Simple Regression


C.3.2 Multiple Classical Regression

   1. Select the Classical menu in the Non-Symbolic Regression element of the menu bar. Then,
      select Multiple Regression (see Figure C.25).

   2. You will be required to select the dependent and independent variables from the defined vari-
      ables. Select them and go on (see Figure C.26).

                                               133
C. User’s Guide




              Figure C.24: Save options in Non-Symbolic Classical Simple Regression




                  Figure C.25: Non-Symbolic Classical Multiple Regression Menu




            Figure C.26: Select Variables in Non-Symbolic Classical Multiple Regression



   3. From this point a new Multiple Classical regression window is created, and the procedure is
      similar to that described in Simple Classical Regression. Therefore the user is referenced to
      that section to see how to select variable, analysis, graphics and save options.

        (a) Available Analysis Options can be seen in Figure C.21.
            There are two new analysis options: backward and forward selection. This will let you to


                                                 134
C. User’s Guide




           Figure C.27: Analysis options in Non-Symbolic Classical Multiple Regression


            know those independent variables that really influences into the dependent variable.
        (b) Available Graphics Options are shown in Figure C.28.




           Figure C.28: Graphics options in Non-Symbolic Classical Multiple Regression


        (c) Available Save Options can be seen in Figure C.29.

   4. You will be able to select if there is an intercept in the model or not by clicking on the Model
      option (see Figure C.30).




                                                135
C. User’s Guide




              Figure C.29: Save options in Non-Symbolic Classical Multiple Regression




                  Figure C.30: Intercept in Non-Symbolic Classical Multiple Regression



C.3.3 Simple Bayesian Regression

   1. Select the Bayesian menu in the Non-Symbolic Regression element of the menu bar. Then,
      select Simple Regression (see Figure C.31).




                     Figure C.31: Non-Symbolic Bayesian Simple Regression Menu



   2. You will be required to select the dependent and independent variables from the defined vari-
      ables as it was done in Simple Classical Regression. Select them and go on (see Figure C.32).

   3. A new Bayesian Simple Regression window will be created. The estimates mean and standard
      deviation of the parameters will be displayed as well as the 95% highest posterior density
      interval and the standard numerical error.

   4. You will be able to select variable, analysis, graphics and save options as it was done in Simple


                                                   136
C. User’s Guide




             Figure C.32: Select Variables in Non-Symbolic Bayesian Simple Regression



      Classical Regression, although for Bayesian regression, these options are more limited. How-
      ever, the procedure is the same.

        (a) Available Analysis Options are shown in Figure C.33.




            Figure C.33: Analysis Options in Non-Symbolic Bayesian Simple Regression


        (b) Available Graphics Options can be seen in C.34.
        (c) Available Save Options are shown in Figure C.35.


   5. In Bayesian regression, new options are available in the ToolBar:

        (a) Specifying Prior Information, by clicking on the Prior Information item in the ToolBar. A
            new input dialog will be displayed, where you will be able to specify prior information.
            If you have selected Experienced User in the Type Of User option in the Configuration
            menu, you will see a dialog like that shown in Figure C.38.

                                                137
C. User’s Guide




            Figure C.34: Graphics Options in Non-Symbolic Bayesian Simple Regression




              Figure C.35: Save Options in Non-Symbolic Bayesian Simple Regression




Figure C.36: Prior Experienced Especification Options in Non-Symbolic Bayesian Simple Regression




                                              138
C. User’s Guide


            Otherwise, you will see that shown in Figure C.37.




   Figure C.37: Prior Inexperienced Especification in Non-Symbolic Bayesian Simple Regression


        (b) Selecting the Bayesian regression model, by clicking on the Model option in the ToolBar.




   Figure C.38: Prior Experienced Especification in Non-Symbolic Bayesian Simple Regression



C.3.4 Multiple Bayesian Regression

   1. Select the Bayesian menu in the Non-Symbolic Regression element of the menu bar. Then,
      select Simple Regression (see Figure C.39).




                  Figure C.39: Non-Symbolic Bayesian Multiple Regression menu



   2. You will be required to select the dependent and independent variables from the defined vari-
      ables as it was done in Multiple Classical Regression. Select them and go on.

   3. A new Bayesian Multiple regression will be created. From this point the procedure is the same
      as in Bayesian Simple Regression.

        (a) Analysis Options are shown in Figure C.40.

                                                139
C. User’s Guide




           Figure C.40: Analysis Options in Non-Symbolic Bayesian Multiple Regression


        (b) Graphics options can be seen in Figure C.41.




           Figure C.41: Graphics Options in Non-Symbolic Bayesian Multiple Regression


        (c) Save Options are shown in Figure C.42.




             Figure C.42: Save Options in Non-Symbolic Bayesian Multiple Regression


        (d) Model Options are those shown in Figure C.43.


C.4 Symbolic Regression

C.4.1 Simple Classical Regression

   1. Select the Classical menu in the Symbolic Regression element of the menu bar. Then, select
      Simple Regression (see Figure C.44).

                                               140
C. User’s Guide




            Figure C.43: Model Options in Non-Symbolic Bayesian Multiple Regression




                        Figure C.44: Symbolic Classical Simple Regression Menu



   2. You will be required to select the minimum and maximum dependent and minimum and max-
      imum independent variables from the defined variables. Select them and go on (see Figure
      C.45).




                  Figure C.45: Select Variables in Symbolic Classical Simple Regression



   3. A brief report will be displayed for midpoints and radii analysis. This is very similar to the case
      for Non-Symbolic Regression, but now you will have an analysis for the midpoints and other
      one for the radii. In this case, there more graphics options.

        (a) Analysis Options are shown in Figure C.46.
        (b) Graphics Options can be seen in Figure refGraphics SCSS.


                                                  141
C. User’s Guide




                  Figure C.46: Analysis Options in Symbolic Classical Simple Regression




                  Figure C.47: Graphics Options in Symbolic Classical Simple Regression


        (c) Save Options are the same that in Non-Symbolic Regression were.



                                                  142
C. User’s Guide


C.4.2 Multiple Classical Regression

   1. Select the Classical menu in the Symbolic Regression element of the menu bar. Then, select
      Multiple Regression (see Figure C.48).




                        Figure C.48: Symbolic Classical Multiple Regression Menu



   2. You will be required to select the minimum and maximum dependent and minimum and max-
      imum independent variables from the defined variables (se Figure C.49). Ensure that the first
      maximum independent variable selected is the adequate for the minimum independent variable
      chosen.




                  Figure C.49: Select Variables in Symbolic Classical Multiple Regression



   3. A brief report will be displayed for midpoints and radii analysis. This is very similar to the case
      for Non-Symbolic Regression, but now you will have an analysis for the midpoints and other
      one for the radii. In this case, there more graphics options.

        (a) Analysis Options are shown in Figure C.50.
        (b) Graphics Options can be seen in Figure C.51.
        (c) Save Options are the same the same that were in Non-Symbolic Regression.




                                                   143
C. User’s Guide




              Figure C.50: Analysis Options in Symbolic Classical Multiple Regression




              Figure C.51: Graphics Options in Symbolic Classical Multiple Regression


C.4.3 Simple Bayesian Regression

   1. Select the Bayesian menu in the Symbolic Regression element of the menu bar. Then, select
      Simple Regression (see Figure C.52).

   2. You will be required to select the minimum and maximum dependent and minimum and maxi-
      mum independent variables from the defined variables (see Figure C.53).

                                               144
C. User’s Guide




                           Figure C.52: Symbolic Bayesian Simple Regression




                  Figure C.53: Select Variables in Symbolic Bayesian Simple Regression



   3. A new Bayesian Simple Regression window will be created. The estimates mean and standard
      deviation of the midpoints and radii parameters will be displayed as well as the 95% highest
      posterior density interval and the standard numerical error.

   4. You will be able to select variable, analysis, graphics and save options as it was done in Non-
      Symbolic Regression.

        (a) Available Analysis Options are shown in Figure C.54.




                  Figure C.54: Analysis Options in Symbolic Bayesian Simple Regression


        (b) Available Graphics Options can be seen in Figure C.55.

                                                  145
C. User’s Guide




              Figure C.55: Graphics Options in Symbolic Bayesian Simple Regression


        (c) Save Options are the same that were in Non-Symbolic Regression.

   5. As it occurred in Non- Symbolic Regression, in Bayesian analysis, new options are available in
      the ToolBar:

        (a) Specifying Midpoints and Radii Prior Information, by clicking on the Midpoints or Raddi
            Prior Information item in the ToolBar. A new input dialog will be displayed, where you
            will be able to specify prior information.
        (b) Selecting the Bayesian regression model, by clicking on the Model option in the ToolBar
            (see Figure C.56).


C.4.4 Multiple Bayesian Regression

   1. Select the Bayesian menu in the Symbolic Regression element of the menu bar. Then, select
      Simple Regression (see Figure C.57).

   2. You will be required to select the minimum and maximum dependent and minimum and max-
      imum independent variables from the defined variables (see Figure C.58). Ensure that the first

                                                 146
C. User’s Guide




                  Figure C.56: Model Options in Symbolic Bayesian Simple Regression




                      Figure C.57: Symbolic Bayesian Multiple Regression Menu



      maximum independent variable selected is the adequate for the minimum independent variable
      chosen.




                Figure C.58: Select Variables in Symbolic Bayesian Multiple Regression



   3. A new Bayesian Multiple Regression window will be created. The estimates mean and standard
      deviation of the midpoints and radii parameters will be displayed as well as the 95% highest
      posterior density interval and the standard numerical error.

   4. You will be able to select variable, analysis, graphics and save options as it was done in Non-
      Symbolic Regression.

        (a) Analysis Options are the same that were in Non-Symbolic Regression.

                                                 147
C. User’s Guide


        (b) Graphics Options are shown in Figure C.59.




              Figure C.59: Graphics Options in Symbolic Bayesian Multiple Regression


        (c) Save Options are the same that were in Non-Symbolic Regression.

   5. As it occurred in Non- Symbolic Regression, in Bayesian analysis, new options are available in
      the ToolBar:

        (a) Specifying Midpoints and Radii Prior Information, by clicking on the Midpoints or Raddi
            Prior Information item in the ToolBar. A new input dialog will be displayed, where you
            will be able to specify prior information.
        (b) Selecting the Bayesian regression model, by clicking on the Model option in the ToolBar.




                                                 148
Appendix D

Obtaining and Installing R

The way to obtain R is to download it from one of the CRAN (Comprehensive R Archive Network)
sites. The main site is http://cran.r-project.org. It has a number of mirror sites worldwide, which may
be closer to you and give faster download times.


   Installation details tend to vary over time, so you should read the accompanying documents and
any other information offered on CRAN.


D.1 Binary distributions

The version for recent variants of Microsoft Windows comes as a single SetupR.exe file, on which
you simply double- click with the mouse and then follow the on- screen instructions. When the pro-
cess is completed, you will have an entry under Programs on the Start men for invoking R, as well as
a desktop icon.


   For Linux distributions that use RPM package format (RedHat, Mandrake,LinuxRPC and SuSE)
and also for Alpha Unix (OSF/Tru64), .rpm files of R and the recommended add-on packages can be
installed using the rpm command. Packages for the Debian APT package manager are also available.


   For the Macintosh platforms there are two different binary distributions: the ”Carbon” R and the
”Darwin” R. The first version is intended to run natively on MacOS System from 8.6 to OS X, and
the second one as a usual Unix command undex OS X. The Darwin R also requires an X windows
manager like XDarwin to use the X11 graphic device.


                                                   149
D. Obtaining and Installing R




    Carbon R comes in single .sit archive file that you simply decompress by dragging the file onto
Stuffit Expander ad move the resulting folder rmxyz into your favourite applications folder. The Dar-
win version is a .tgz archive, which can be installed, after decompression, with some (fairly trivial)
manual adjustments.


    Darwin R can also be installed using the ”fink”. Fink installs all dynamic libraries that might be
needed, and it can update R to newer versions when available.


D.2 Installation from source

Installation from source code is possible on all supported platforms, although nontrivial on Macintosh
and Windows, mainly because the build environment is not part of the system. On Unix-like systems
(Macintosh OS X included), the process can be as simple as unpacking the sources and writing


    .configure


    make


    make install


    and then you would unpack the recommended package bundle, change to its directory and enter


    R CMD INSTALL *.tar.gz


    The above works on widely used platforms, provided that the relevant compilers and support li-
braries are installed. If your system is more esoteric or you want to use special compilers or libraries,
then you may need to dig deeper.


    For Windows and Carbon Macintosh, the directories src/gnuwin32 and src/macintosh have IN-
STALL file with detailed information about the procedure to follow.




                                                  150
D. Obtaining and Installing R


D.3 Package installation

To install R packages such as bayesm under Unix/Linux or Windows, you can connect to the Internet,
start R, and enter


    install.packages(”bayesm”,.libPaths()[1])


    The Windows version provides a convenient menu interface for the operation.


    If your R machine is not connected to the Internet, you can also download the package the pack-
age as a file and install that. For Windows and the Carbon version of Macintosh, you need to get
the binary package (.zip or .sit extension). For Windows, installation from a local .zip file is possible
via a menu entry. For Macintosh users, the procedure is described in the Macintosh FAQ. For Unix
and Linux, you can issue the following at the shell prompt (the -l option allows you to give a private
library):


    R CMD INSTALL bayesm


    On Unix and Linux systems you will need superuser permissions to install. Otherwise you can
set up a private library directory and install into that. Use the R LIBS environment variable to use
your private library subsequently. A similar issue arises if R is installed on a read-only file system in
a Windows environment. Further details can be found in the help page for library.


    Information and further Internet resources for R can be obtained from CRAN and the R homepage
at http://www.r-project.org. Notice in particular the mailing lists, the user-contributed documents, and
the FAQs.




                                                  151
Appendix E

Obtaining and installing Java Runtime
Environment

The way to obtain the Java Runtime Enviroment (JRE) is to download it from Sun Microsystems offi-
cial site. The main site is http://java.sun.com, from where you can select the version to be downloaded.
The link to download the current version, which is J2SE v1.4.2 14 JRE, is http://java.sun.com-
/j2se/1.4.2/download.html.


E.1 Microsoft Windows

You must have administrative permissions in order to install the Java 2 Runtime Environment on Mi-
crosoft Windows 2000 and XP. The download page provides the following two choices of installation.
Continue based on your choice.



   1. Windows Installation- After clicking the ”Download” link for the JRE, a dialog box pops up.
      Choose the open option to start a small program which then prompts you for more information
      about what you want to install.

   2. Windows Offline Installation- After clicking the JRE ”Download” link for the ”Windows Of-
      fline Installation”, a dialog box pops up. Choose the save option to save the downloaded file
      without installing it. Run this file by double-clicking on the installer’s icon. Then follow the
      instructions that the installer provides. When done with the installation, you can delete the
      downloaded file to recover disk space.


                                                  152
E. Obtaining and installing Java Runtime Environment


E.2 Linux

Java 2 Runtime Environment 1.4.2 is available in two installation formats:

   1. Self-extracting Binary File - This file can be used to install the Java 2 Runtime Environment in
      a location chosen by the user. This one can be installed by anyone (not only root users), and
      it can easily be installed in any location. As long as you are not root user, it cannot displace
      the system version of the Java platform supplied by Linux. To use this file, see Installation of
      Self-Extracting Binary below.

   2. RPM Packages - A rpm.bin file which contains RPM packages, installed with the rpm utility.
      It requires root access to install, and installs by default in a location that replaces the system
      version of the Java platform supplied by Linux. To use this bundle, see Installation of RPM
      File below.

    Choose the install format that is most suitable to your needs.


E.2.1 Installation of Self-Extracting Binary

Use these instructions if you want to use the self-extracting binary file to install the Java 2 Runtime
Environment. If you want to install RPM packages instead, see Installation of RPM File.

   1. Download and check the download file size to ensure that you have downloaded the full, uncor-
      rupted software bundle. You can download to any directory you choose; it does not have to be
      the directory where you want to install the Java 2 Runtime Environment. Before you download
      the file, notice its byte size provided on the download page on the web site. Once the download
      has completed, compare that file size to the size of the downloaded file to make sure they are
      equal.

   2. Make sure that execute permissions are set on the self-extracting binary. Run this command:
      chmod +x j2re-1 4 2 14-linux-i586.bin.

   3. Change directory to the location where you would like the files to be installed. The next step
      installs the Java 2 Runtime Environment into the current directory.

   4. Run the self-extracting binary. Execute the downloaded file, prep ended by the path to it. For
      example, if the file is in the current directory, prep end it with ”./” (necessary if ”.” is not in the



                                                       153
E. Obtaining and installing Java Runtime Environment


      PATH environment variable):


      ./j2re-1 4 2 14-linux-i586.bin


      The binary code license is displayed, and you are prompted to agree to its terms. The Java
      2 Runtime Environment files are installed in a directory called j2re1.4.2 14 in the current
      directory.


E.2.2 Installation of RPM File

Use these instructions if you want to install Java 2 Runtime Environment in the form of RPM pack-
ages. If you want to use the self-extracting binary file instead, see Installation of Self-Extracting
Binary.



   1. Download and check the file size. You can download to any directory you choose. Before you
      download the file, notice its byte size provided on the download page on the web site. Once the
      download has completed, compare that file size to the size of the downloaded file to make sure
      they are equal.

   2. Extract the contents of the downloaded file. Change directory to where the downloaded file is
      located and run these commands to first set the executable permissions and then run the binary
      to extract the RPM file:


      chmod a+x j2re-1 4 2 14-linux-i586-rpm.bin


      ./j2re-1 4 2 14-linux-i586-rpm.bin


      Note that the initial ”./” is required if you do not have ”.” in your PATH environment variable.

    The script displays a binary license agreement, which you are asked to agree to before installation
can proceed. Once you have agreed to the license, the install script creates the file j2re-1 4 2 14-
linux-i586.rpm in the current directory.

   1. Become root by running the su command and entering the super-user password.

                                                       154
E. Obtaining and installing Java Runtime Environment


   2. Run the rpm command to install the packages that comprise the Java 2 Runtime Environment:


      rpm -iv j2re-1 4 2 14-linux-i586.rpm



   3. Delete the bin and rpm file if you want to save disk space.

   4. Exit the root shell.


E.3 UNIX

   1. Check the download file size. You can download to any directory you choose; it does not have
      to be the directory where you want to install the J2RE. Before you download the file, notice its
      byte size provided on the download page on the web site. Once the download has completed,
      compare that file size to the size of the downloaded file to make sure they are equal.

   2. Make sure that execute permissions are set on the self-extracting binary:


      On SPARC processors: chmod +x j2re-1 4 2 14-solaris-sparc.sh


      On x86 processors: chmod +x j2re-1 4 2 14-solaris-i586.sh

   3. Change directory to the location where you would like the files to be installed. The next step
      installs the J2RE into the current directory.

   4. Run the self-extracting binary. Execute the downloaded file, prep ending the path to it. For
      example, if the downloaded file is in the current directory, prep end it with ”./”:


      On SPARC processors: ./j2re-1 4 2 14-solaris-sparc.sh


      On x86 processors: ./j2re-1 4 2 14-solaris-i586.sh


      The binary code license is displayed, and you are prompted to agree to its terms. The J2RE
      files are installed in a directory called j2re1.4.2 14 in the current directory.



                                                       155
E. Obtaining and installing Java Runtime Environment


    More information about installation process on different kinds of operating systems can be found
in the Sun Microsystems official site which has been mentioned above.




                                                       156
Bibliography

[Aitk97] Aitkin, M., The calibration of P-values, posterior Bayes factors and the AIC from the pos-
     terior distribution of the likelihood, Statistics and Computing 7 (4), 253-261. 1997

[Arro06] Arroyo, J. and Mat´ , C., Introducing interval time series: accuracy measures, COMPSTAT,
                           e
     Rome 2006.

[Berg05] Berg, B.A., Introduction to Markov Chain Monte Carlo Simulations and their Statistical
     Analysis, NATIONAL UNIVERSITY OF SINGAPORE 7. 2005

[Berg98] Berger, J. and Pericchi, L., Accurate and stable Bayesian model selection: the median
     intrinsic Bayes Factor, The Indian Journal Of Statistics 60 (1), 1-18. 1998

[Bill00] Billard, L. and Diday, E., Regression Analysis for Interval-Valued Data, Data Analysis,
     Classification and Related Methods: Proceedings of the Seventh Conference of the International
     Federation of Classification Societies, Namur, Belgium 2000.

[Bill02] Billard, L. and Diday, E., From the Statistics of Data to the Statistics of Knowledge: Sym-
     bolic Data Analysis, Journal of the American Statistical Association 98 (462), 470-487. 2002.

[Bill06a] Billard, L. and Diday, E., Symbolic Data Analysis: Conceptual Statistics and Data Mining,
     Wiley ,England 2006.

[Bill06b] Billard, L. and Diday, E., Symbolic Data Analysis: what is it?, COMPSTAT, Rome 2006.

[Cham83] Chambers, J.M., Cleveland, W.S., Kleiner, B. and Tukey, P.A., Graphics Methods for Data
     Analysis, Wadsworth, 1983.

[Cham92] Chambers, J.M. and Hastie, T.J., Statistical Models in S, Hall/CRC, 1992.

[Chen00] Chen, M., Shao, Q. and Ibrahim, J.G., Monte Carlo Methods in Bayesian Computation,
     Springer, New York 2000.

                                                157
[Chen03] Cheng, R. and Sahu, S., A fast distance based approach for determining the number of
     components in mixtures, Canadian Journal of Statistics 31, 3-22, 2003.

[Cong06] Congdon, P., Bayesian Statistical Modelling, Wiley, England 2006.

[Dalg02] Dalgaard, P., Introductory Statistics with R, Springer, New York 2002.

[DeCa04] De Carvalho, F.A.T., Freire, E.S. and Lima Neto, E.A. A New Method to Fit a Linear
     Regression Model for Interval-Valued Data, KI 2004: Advances in Artificial Intelligence: 27th
     Annual German Conference in AI, 295-306, Springer, Ulm, Germany, 2004.

[DeCa05] De Carvalho, F.A.T., Freire, E.S. and Lima Neto, E.A. Applying Constrained Linear Re-
     gression Models to Predict Interval-Valued Data , KI 2005: Advances in Artificial Intelligence
     3698, 92-106, Springer, Koblenz, Germany 2005.

[DeCa07] De Carvalho, F.A.T. and Lima Neto, E.A., Centre and Range method for fitting a linear
     regression model to symbolic interval data, Computational Statistics and Data Analysis, 2007.

[Dida95] Diday, E., Probabilist, Possibilist and Belief Objects for Knowledge Analysis, Annals of
     Operations Research, 55, 227-276, 1995.

[Gelf90] Gelfand, A.E. and Smith, A.F.M., Sampling-based approaches to calculating marginal den-
     sities, Journal of the American Statistical Association 85, 398-409, 1990.

[Gelm04] Gelman, A., Carlin, J.B., Stern, H.S. and Rubin, D.B., Bayesian Data Analysis, Hall/CRC,
     Boca Raton, Florida 2004.

[Gilk95] Gilks, W.R., Best, N. and Tan, K.K.C., Adaptive rejection Metropolis sampling within Gibbs
     sampling, Applied Statistics 44, 455-472, 1995.

[Gosh03] Gosh, J.K. and Ramamoorthi, R.V., Bayesian Nonparametrics, Spriger, New York 2003.

[Hast70] Hastings, W.K., Monte Carlo sampling methods using Markov chains and their applica-
     tions, Biometrika 57, 97-109, 1970.

[Huiw06] Huiwen, W., Mok, H.M.K. and Dapeng, L., Factor interval data analysis and its applica-
     tion, COMPSTAT, Rome 2006.

[Irpi05] Irpino, A., ”Spaghetti” PCA analysis: An extension of principal component analysis to time
     dependent interval data, Pattern Recognition Letters, 2005.


                                                158
[Jeff61] Jeffreys, H., Theory of Probability, Oxford University Press, 1961.

[Kend05] Kendall, W. S., Liang, F. and Wang, J-S., Markov chain Monte Carlo: Innovations and
     Applications, National University of Singapore 7, 2005.

[Koop03] Koop, G., Bayesian Econometrics, Wiley, England 2003.

[Laws74] Lawson, C.l. and Hanson, R.J., Solving Least Squares Problem, Prentice-Hall, New York
     1974.

[Lee 06] Lee, C-H.L., Liu, A. and Chen, W-S., Pattern Discovery of Fuzzy Time Series for Financial
     Prediction, IEEE 18, (5), 2006.

[Mart01] Martinez, W.L. and Martinez, A.R., Computational Statistics Handbook with MATLAB ,
     Hall/CRC, Boca Raton, Florida 2001.

    e        e                   ´
[Mat´ 93] Mat´ , C. and Sarabia, A., Problemas de Probabilidad y Estad´stica, CLAGSA, Madrid
                                                                      ı
     1993.

[Mat´ 95] Mat´ , C., Curso General sobre StatGraphics II, Universidad Pontifica Comillas, Madrid
    e        e
     1995.

[Mat´ 06] Mat´ , C., An´ lisis Bayesiano de Datos, Asociaci´ n Espaola para la Calidad, Madrid 2006.
    e        e         a                                   o

[Metr53] Metropolis, N., Rosenbluth, A.W., Rosenbluth, M.N., Teller, A.H. and Teller, E., Equation
     of state calculations by fast computing machines, Journal of Chemical Physics 21, 1087-1092,
     1953.

[Mont02] Montgomery, D.C. and Runger, G.C., Probabilidad y Estad´stica Aplicadas a la Inge-
                                                                ı
     nier´a, Wiley, 2002.
         ı

[Mull04] Muller, P. and Quintana, F.A., Nonparametric Bayesian Data Analysis, Statistical Science
     19, 95-110, 2004.

[Poir95] Poirier, D., Intermediate Statistics and Econometrics: A Comparative Approach., The MIT
     Press, Cambridge 1995.

[Rossi06] Rossi, P.E., Allenby, G. and McCulloch, R., Bayesian Statistics and Marketing, Wiley,
     New York 2006.



                                                159
[Rupp04] Rupp, A.A., Dey, D.K. and Zumbo, B.D., To Bayes or Not to Bayes, From Whether to
     When: Applications of Bayesian Methodology to Modeling, Structural Equation Modeling: A
     Multidisciplinary Journal 11 (3), 424-451. 2004.

[Spie03] Spiegelhalter, D., Thomas, A., Best, N., Gilks, W. and Lunn, D., BUGS: Bayesian inference
     using Gibbs sampling, 2003.

[Urba92] Urbach, P., Regression Analysis: Classical and Bayesian , The British Journal for the Phi-
     losophy of Science 43 (3), 311-342, 1992.

[Vena02] Venables, W.N. and Ripley, B.D., Modern Applied Statistics with S, Springer, New York
     2002.

[West04] West, R.W., Wu,T. and Heydt, D., An introduction to StatCrunch 3.0, Journal of Statistical
     Software 9 (6), 2004.

[Zamo01] Zamora, MM. and Estavillo, J., Modelo de regresi´ n normal cl´ sico, 2001.
                                                         o            a




                                                 160

More Related Content

PDF
Rent To Own
PDF
Daily news later of equity market by marketmagnify
PDF
Equity Market News Letter for Trading
PDF
Marketing Campaign focused on Loyalty
PDF
Today's Equity News Letter By Marketmagnify
PDF
Beachbody Coach
PDF
National AIDS Program under UHC
PDF
64 ginga brasil
Rent To Own
Daily news later of equity market by marketmagnify
Equity Market News Letter for Trading
Marketing Campaign focused on Loyalty
Today's Equity News Letter By Marketmagnify
Beachbody Coach
National AIDS Program under UHC
64 ginga brasil

Similar to Bayesian Regression System for Interval-valued data (20)

PDF
As Analytics Subsumes O.R., will INFORMS Subsume Analytics?
DOCX
Data mining BY Zubair Yaseen
PDF
850 keynote siegel
PPTX
What is A/B-testing? An Introduction
PDF
DMBAR - Data Mining and Business Analytics with R - Johannes Ledolter.pdf
PDF
What Is Statistics
PDF
Data mining in support of fraud management
PDF
Data Mining In Support Of Fraud Management
PPT
1. research intro.
PDF
Applied Statistical Inference with MINITAB Sally Lesik
PDF
Applied Statistical Inference with MINITAB Sally Lesik
PPT
BRM Consolidated.ppt BRM Consolidated.ppt
PPT
Paradigm shifts in wildlife and biodiversity management through machine learning
PDF
Brm unit i - cheet sheet
PDF
Anderson%20and%20 gerbing%201988
PDF
Time Series Analysis
PDF
BRM.pdf
PDF
Data Anayltics: How to predict anything
PDF
Analytics: The widening divide
DOCX
internal Assign no 206 ( JAIPUR NATIONAL UNI)
As Analytics Subsumes O.R., will INFORMS Subsume Analytics?
Data mining BY Zubair Yaseen
850 keynote siegel
What is A/B-testing? An Introduction
DMBAR - Data Mining and Business Analytics with R - Johannes Ledolter.pdf
What Is Statistics
Data mining in support of fraud management
Data Mining In Support Of Fraud Management
1. research intro.
Applied Statistical Inference with MINITAB Sally Lesik
Applied Statistical Inference with MINITAB Sally Lesik
BRM Consolidated.ppt BRM Consolidated.ppt
Paradigm shifts in wildlife and biodiversity management through machine learning
Brm unit i - cheet sheet
Anderson%20and%20 gerbing%201988
Time Series Analysis
BRM.pdf
Data Anayltics: How to predict anything
Analytics: The widening divide
internal Assign no 206 ( JAIPUR NATIONAL UNI)
Ad

Bayesian Regression System for Interval-valued data

  • 1. Autorizada la entrega del proyecto del alumno: Rub´ n Salgado Fern´ ndez e a EL DIRECTOR DEL PROYECTO Carlos Mat´ Jim´ nez e e Fdo.: Fecha: 12/06/2007 Vo Bo DEL COORDINADOR DE PROYECTOS Claudia Meseguer Velasco Fdo.: Fecha: 12/06/2007
  • 2. UNIVERSIDAD PONTIFICIA DE COMILLAS ESCUELA TECNICA SUPERIOR DE INGENIER´ (ICAI) ´ IA ´ INGENIERO EN ORGANIZACION INDUSTRIAL PROYECTO FIN DE CARRERA Bayesian Regression System for Interval-Valued Data. Application to the Spanish Continuous Stock Market AUTOR : Salgado Fern´ ndez, Rub´ n a e M ADRID , Junio 2007
  • 3. Acknowlegdements Firstly, I would like to thank my director, Carlos Mat´ Jim´ nez, PhD, for giving me the chance of e e making this project. With him, I have learnt, not only about Statistics and investigation, but also about how to enjoy with them. Special thanks to my parents. Their love and all they have taught me in this life are the things what have made possible being the person I am now. Thanks to my brothers, my sister and the rest of my family for their support and for the stolen time. Thanks to Charo for standing my bad mood in the bad moments, for supporting me and for giving me the inspiration to go ahead. Madrid, June 2007 i
  • 4. Resumen ´ En los ultimos a˜ os los m´ todos Bayesianos se han extendido y se han venido utilizando de forma n e exitosa en muchos y variados campos tales como marketing, medicina, ingenier´a, econometr´a o mer- ı ı cados financieros. La principal caracter´stica que hace destacar al an´ lisis Bayesiano de datos (AN- ı a BAD) frente a otras alternativas es que, no s´ lo tiene en cuenta la informaci´ n objetiva procedente de o o los datos del suceso en estudio, sino tambi´ n el conocimiento anterior al mismo. Los beneficios que e se obtienen de este enfoque son m´ ltiples ya que, cuanto mayor sea el conocimiento de la situaci´ n, u o a ´ con mayor fiabilidad se podr´ n tomar las decisiones y estas ser´ n m´ s acertadas. Pero no siempre todo a a han sido ventajas. El ANBAD, hasta hace unos a˜ os, presentaba una serie de dificultades que limita- n ban el desarrollo del mismo a los investigadores. Si bien la metodolog´a Bayesiana existe como tal ı desde hace bastante tiempo, no se ha empezado emplear de manera generalizada hasta los 90’s. Esta expansi´ n ha sido propiciada en gran parte por el avance en el desarrollo computacional y la mejora y o perfeccionamiento de distintos m´ todos de c´ lculo como los m´ todos de cadenas de Markov-Monte e a e Carlo. ı ´ En especial, esta metodolog´a se ha mostrado extraordinariamente util en la aplicaci´ n a los mod- o elos de regresi´ n, ampliamente adoptados. En m´ ltiples ocasiones en la pr´ ctica, se dan situaciones o u a en las que se requiere analizar la relaci´ n entre dos variables cuantitativas. Los dos objetivos fun- o damentales de este an´ lisis ser´ n, por un lado, determinar si dichas variables est´ n asociadas y en a a a qu´ sentido se da dicha asociaci´ n (es decir, si los valores de una de las variables tienden a aumentar e o -o disminuir- al aumentar los valores de la otra); y por otro, estudiar si los valores de una variable pueden ser utilizados para predecir el valor de la otra. Un modelo de regresi´ n trata de proporcionar o informaci´ n sobre uno o varios sucesos a trav´ s de su relaci´ n con el comportamiento de otros. Con o e o la metodolog´a Bayesiana se permite incorporar el conocimiento del investigador al an´ lisis, haciendo ı a los resultados m´ s precisos, ya que no se a´slan los resultados a los datos de una determinada muestra. a ı ii
  • 5. iii a ´ Por otro lado, se est´ empezando a aceptar que el siglo XXI en el ambito de la estad´stica va a ı ser el siglo de la ”estad´stica del conocimiento” a diferencia del anterior que fue el de la ”estad´stica ı ı de los datos”. El concepto b´ sico para construir dicha estad´stica es el de dato simb´ lico y se han a ı o desarrollado m´ todos estad´sticos para algunos tipos de datos simb´ licos. e ı o En la actualidad, la exigencia del mercado, la demanda y, en general, del mundo crece. Esto implica que cada vez sea mayor el deseo de predecir la ocurrencia de un evento o poder controlar el comportamiento de ciertas cantidades con el menor error posible con el fin de ofrecer mejores pro- ductos, obtener mayores beneficios o adelantos cient´ficos y mejores resultados. ı Sobre esta realidad, este proyecto trata de responder a dichas necesidades proporcionando una amplia documentaci´ n sobre varias de las t´ cnicas m´ s utilizadas y m´ s punteras a d´a de hoy, como o e a a ı son el an´ lisis Bayesiano de datos, los modelos de regresi´ n y los datos simb´ licos, y proponiendo a o o diferentes t´ cnicas de regresi´ n. De igual forma se desarrollar´ una herramienta que permita poner e o a en pr´ ctica todos los conocimientos adquiridos. Dicha aplicaci´ n estar´ dirigida al mercado burs´ til a o a a espa˜ ol y permitir´ al usuario utilizarla de manera sencilla y amigable. En cuanto al desarrollo de esta n a herramienta se emplear´ uno de los lenguajes m´ s novedosos y con m´ s proyecci´ n del momento: R. a a a o Se trata, por tanto, de un proyecto que combina las t´ cnicas m´ s novedosas y con mayor proyecci´ n e a o tanto en materia te´ rica, como es la regresi´ n Bayesiana aplicada a datos de tipo intervalo, como en o o materia pr´ ctica, como es el empleo del lenguaje R. a
  • 6. Abstract In the recent years, Bayesian methods have been spread and successfully used in many and several fields such as Marketing, Medicine, Engineering, Econometrics or Financial Markets. The main char- acteristic that makes Bayesian Data Analysis (BADAN) remarkable compared with other alternatives is that not only does it take into account the objective information coming from the analyzed event, but also the pre-event knowledge. The benefits obtained from this approach are innumerable due to the fact that the more knowledge of the situation one has, the more reliable and accurate decisions could be taken. However, although Bayesian methodology was set long time ago, it has not been applied in a general way until the 90’s because of the computational difficulties. Such expansion has been mainly favoured by the advances in that field and the improvement on different calculus meth- ods, such as Markov-chain Monte Carlo methods. Particularly, this Bayesian methodology has been resulted in an extraordinary useful application for the regression models, which have been adopted by large. There are many times in real life in which it is necessary to analyse the situation between two quantitive variables. The two main objec- tives of this analysis would be, on the one hand, to determine whether such variables are associated and in what sense that association comes about (that is, whether the value of one of the variables tends to rise- or to decrease- when augmented the value of the other); and on the other hand, to study whether the values of one variable can be used to predict the value of the other. A regression model offers information about one or more events through their relationship with the behaviour of the oth- ers. With the Bayesian methodology it is possible to add the researcher’s knowledge to the analysis, making thus the results be more accurate due to the fact that the results are not isolated from the data of one determined sample. On the other hand, in the Statistics field, it has been more and more accepted the fact that the XXI century will be the century of the ”Statistics of knowledge” contrary to the last one, which was the iv
  • 7. v one of the ”Statistics of data”. The most basic concept to constitute such Statistics is the symbolic data; furthermore, there have been developed more statistics methods for some types of symbolic data. Nowadays, the requirements of the market, and the demands of the world in general, are growing up. This implies the continuous increase of the desire for predicting the occurrence of an event or for the ability of controlling the behaviour of certain quantities with the minimum error with the aim of offering better products, obtaining more benefits or scientific improvements and better outcomes. Under this frame, this project tries to responds such needs by offering a large documentation about several of the most applied and leading nowadays techniques, such as Bayesian data analysis, regression models, and symbolic data, and suggesting different regression techniques. Similarly, it has been developed a tool that allow the reader to put all the acquired knowledge into practice. Such application will be aimed to the Spanish Continuous Stock Market and it will let the user apply it eas- ily. As far as the development of this tool is concerned, it has been used one of the more innovative and with more projection languages of the moment: R. So, the project is about a combination of the techniques that are most innovative and with the most projection both in theoretical questions such as Bayesian regression applied to interval- valued data and in practical questions such us the employment of the R language.
  • 8. List of Figures 1.1 Project Work Packages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.1 Univariate Normal Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 6.1 Interval time series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 7.1 Classical Regression with single values in training test . . . . . . . . . . . . . . . . 73 7.2 Classical Regression with single values in testing test . . . . . . . . . . . . . . . . . 74 7.3 Classical Regression with interval- valued data . . . . . . . . . . . . . . . . . . . . 75 7.4 Centre Method (2000) in training set . . . . . . . . . . . . . . . . . . . . . . . . . . 75 7.5 Centre Method (2000) in testing set . . . . . . . . . . . . . . . . . . . . . . . . . . 76 7.6 Centre and Radius Method in training set . . . . . . . . . . . . . . . . . . . . . . . 77 7.7 Centre and Radius Method in testing set . . . . . . . . . . . . . . . . . . . . . . . . 78 7.8 Bayesian Centre and Radius Method in testing test . . . . . . . . . . . . . . . . . . 80 7.9 Classical Regression with single values in training test . . . . . . . . . . . . . . . . 81 7.10 Classical Regression with single values in testing test . . . . . . . . . . . . . . . . . 81 7.11 Centre Method (2000) in training set . . . . . . . . . . . . . . . . . . . . . . . . . . 82 7.12 Centre Method (2000) in testing set . . . . . . . . . . . . . . . . . . . . . . . . . . 83 7.13 Centre and Radius Method in training set . . . . . . . . . . . . . . . . . . . . . . . 85 7.14 Centre and Radius Method in testing set . . . . . . . . . . . . . . . . . . . . . . . . 85 7.15 Bayesian Centre and Radius Method in testing set . . . . . . . . . . . . . . . . . . . 87 9.1 BARESIMDA MDI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 10.1 Interface between BARESIMDA and R . . . . . . . . . . . . . . . . . . . . . . . . 104 10.2 Interface between BARESIMDA and Excel . . . . . . . . . . . . . . . . . . . . . . 105 10.3 Logical Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 vi
  • 9. LIST OF FIGURES vii C.1 Load Data Menu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 C.2 Select File Dialog . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 C.3 Display Loaded Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 C.4 Define New Variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 C.5 Enter New Variable Name . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 C.6 Display New Variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 C.7 Edit Variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126 C.8 Select Variable to Be Editted . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126 C.9 Enter New Name . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 C.10 Confirmation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 C.11 New Row data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128 C.12 Type Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128 C.13 Look And Feel Menu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 C.14 Look And Feel Styles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 C.15 New Look And Feel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130 C.16 Type Of User Menu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130 C.17 Select Type Of User . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130 C.18 Non-Symbolic Classical Regression Menu . . . . . . . . . . . . . . . . . . . . . . . 131 C.19 Select Non-Symbolic Variables in Simple Regression . . . . . . . . . . . . . . . . . 131 C.20 Brief Report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 C.21 Analysis Options in Non-Symbolic Classical Simple Regression . . . . . . . . . . . 132 C.22 New Prediction in Non-Symbolic Classical Simple Regression . . . . . . . . . . . . 133 C.23 Graphics options in Non-Symbolic Classical Simple Regression . . . . . . . . . . . 133 C.24 Save options in Non-Symbolic Classical Simple Regression . . . . . . . . . . . . . . 134 C.25 Non-Symbolic Classical Multiple Regression Menu . . . . . . . . . . . . . . . . . . 134 C.26 Select Variables in Non-Symbolic Classical Multiple Regression . . . . . . . . . . . 134 C.27 Analysis options in Non-Symbolic Classical Multiple Regression . . . . . . . . . . . 135 C.28 Graphics options in Non-Symbolic Classical Multiple Regression . . . . . . . . . . 135 C.29 Save options in Non-Symbolic Classical Multiple Regression . . . . . . . . . . . . . 136 C.30 Intercept in Non-Symbolic Classical Multiple Regression . . . . . . . . . . . . . . . 136 C.31 Non-Symbolic Bayesian Simple Regression Menu . . . . . . . . . . . . . . . . . . . 136 C.32 Select Variables in Non-Symbolic Bayesian Simple Regression . . . . . . . . . . . . 137 C.33 Analysis Options in Non-Symbolic Bayesian Simple Regression . . . . . . . . . . . 137 C.34 Graphics Options in Non-Symbolic Bayesian Simple Regression . . . . . . . . . . . 138
  • 10. LIST OF FIGURES viii C.35 Save Options in Non-Symbolic Bayesian Simple Regression . . . . . . . . . . . . . 138 C.36 Prior Experienced Especification Options in Non-Symbolic Bayesian Simple Regres- sion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138 C.37 Prior Inexperienced Especification in Non-Symbolic Bayesian Simple Regression . . 139 C.38 Prior Experienced Especification in Non-Symbolic Bayesian Simple Regression . . . 139 C.39 Non-Symbolic Bayesian Multiple Regression menu . . . . . . . . . . . . . . . . . . 139 C.40 Analysis Options in Non-Symbolic Bayesian Multiple Regression . . . . . . . . . . 140 C.41 Graphics Options in Non-Symbolic Bayesian Multiple Regression . . . . . . . . . . 140 C.42 Save Options in Non-Symbolic Bayesian Multiple Regression . . . . . . . . . . . . 140 C.43 Model Options in Non-Symbolic Bayesian Multiple Regression . . . . . . . . . . . 141 C.44 Symbolic Classical Simple Regression Menu . . . . . . . . . . . . . . . . . . . . . 141 C.45 Select Variables in Symbolic Classical Simple Regression . . . . . . . . . . . . . . . 141 C.46 Analysis Options in Symbolic Classical Simple Regression . . . . . . . . . . . . . . 142 C.47 Graphics Options in Symbolic Classical Simple Regression . . . . . . . . . . . . . . 142 C.48 Symbolic Classical Multiple Regression Menu . . . . . . . . . . . . . . . . . . . . . 143 C.49 Select Variables in Symbolic Classical Multiple Regression . . . . . . . . . . . . . . 143 C.50 Analysis Options in Symbolic Classical Multiple Regression . . . . . . . . . . . . . 144 C.51 Graphics Options in Symbolic Classical Multiple Regression . . . . . . . . . . . . . 144 C.52 Symbolic Bayesian Simple Regression . . . . . . . . . . . . . . . . . . . . . . . . . 145 C.53 Select Variables in Symbolic Bayesian Simple Regression . . . . . . . . . . . . . . 145 C.54 Analysis Options in Symbolic Bayesian Simple Regression . . . . . . . . . . . . . . 145 C.55 Graphics Options in Symbolic Bayesian Simple Regression . . . . . . . . . . . . . . 146 C.56 Model Options in Symbolic Bayesian Simple Regression . . . . . . . . . . . . . . . 147 C.57 Symbolic Bayesian Multiple Regression Menu . . . . . . . . . . . . . . . . . . . . 147 C.58 Select Variables in Symbolic Bayesian Multiple Regression . . . . . . . . . . . . . . 147 C.59 Graphics Options in Symbolic Bayesian Multiple Regression . . . . . . . . . . . . . 148
  • 11. List of Tables 2.1 Distributions in Bayesian Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . 10 2.2 Comparison between Univariate and Multivariate Normal . . . . . . . . . . . . . . . 15 2.3 Conjugate distributions for other likelihood distributions . . . . . . . . . . . . . . . 16 4.1 Bayes Factor Interpretation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 4.2 Sensitivity Summary I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 4.3 Sensitivity Summary II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 5.1 Multiple and Simple Regression Comparison . . . . . . . . . . . . . . . . . . . . . 40 5.2 Sensitivity analysis of parameter β . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 5.3 Sensitivity analysis of parameter σ2 . . . . . . . . . . . . . . . . . . . . . . . . . . 46 5.4 Classical and Bayesian regression comparison . . . . . . . . . . . . . . . . . . . . . 48 5.5 Main Prior Distributions Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 5.6 Main Posterior Distributions Summary . . . . . . . . . . . . . . . . . . . . . . . . . 58 5.7 Prior and Posterior Parameters Summary . . . . . . . . . . . . . . . . . . . . . . . . 59 5.8 Main Posterior Predictive Distributions Summary . . . . . . . . . . . . . . . . . . . 60 6.1 Multivalued Data Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 6.2 Modal-multivalued Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 7.1 Error Measures for Classical Regression with single values . . . . . . . . . . . . . . 74 7.2 Error Measure for Centre Method (2000) . . . . . . . . . . . . . . . . . . . . . . . . 76 7.3 Error Measure for Centre Method (2002) . . . . . . . . . . . . . . . . . . . . . . . . 77 7.4 Error Measures for Centre and Radius Method . . . . . . . . . . . . . . . . . . . . . 78 7.5 Error Measures in Bayesian Centre and Radius Method . . . . . . . . . . . . . . . . 80 7.6 Error Measures for Classical Regression with single values . . . . . . . . . . . . . . 82 ix
  • 12. LIST OF TABLES x 7.7 Error Measure for Centre Method (2000) . . . . . . . . . . . . . . . . . . . . . . . . 83 7.8 Error Measure for Centre Method (2002) . . . . . . . . . . . . . . . . . . . . . . . . 84 7.9 Error Measures for Centre and Radius Method . . . . . . . . . . . . . . . . . . . . . 84 7.10 Error Measures in Bayesian Centre and Radius Method . . . . . . . . . . . . . . . . 86 11.1 Estimated material costs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 11.2 Amortization Costs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 11.3 Summarized Budget . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
  • 13. Contents Acknowlegdements i Resumen ii Abstract iv List of Figures vi List of Tables x Contents xvi 1 Introduction 1 1.1 Project Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.3 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2 Bayesian Data Analysis 6 2.1 What is Bayesian Data Analysis? . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.2 Bayesian Analysis for Normal and other distributions . . . . . . . . . . . . . . . . . 10 2.2.1 Univariate Normal distribution . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.2.2 Multivariate Normal distribution . . . . . . . . . . . . . . . . . . . . . . . . 13 2.2.3 Other distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.3 Hierarchical Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.4 Nonparametric Bayesian . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 3 Posterior Simulation 20 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 xi
  • 14. CONTENTS xii 3.2 Markov chains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 3.3 Monte Carlo Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 3.4 Gibbs sampler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 3.5 Metropolis-Hastings sampler and its special cases . . . . . . . . . . . . . . . . . . . 25 3.5.1 Metropolis-Hastings sampler . . . . . . . . . . . . . . . . . . . . . . . . . . 25 3.5.2 Metropolis sampler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 3.5.3 Random-walk sampler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 3.5.4 Independence sampler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 3.6 Importance sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 4 Sensitivity Analysis 28 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 4.2 Bayes Factor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 4.3 Alternative Stats to Bayes Factor . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 4.4 Highest Posterior Density Intervals . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 4.5 Model Comparison Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 5 Regression Analysis 35 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 5.2 Classical Regression Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 5.3 The Bayesian Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 5.4 Normal Linear Regression Model subject to inequality constraints . . . . . . . . . . 48 5.5 Normal Linear Regression Model with Independent Parameters . . . . . . . . . . . . 49 5.6 Normal Linear Regression Model with Heteroscedasticity and Correlation . . . . . . 51 5.6.1 Heteroscedasticity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 5.6.2 Correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 5.7 Models Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 6 Symbolic Data 61 6.1 What is symbolic data analysis? . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 6.2 Interval-valued variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 6.3 Classical regression analysis with Interval-valued data . . . . . . . . . . . . . . . . . 67 6.4 Bayesian regression analysis with Interval-valued data . . . . . . . . . . . . . . . . 70
  • 15. CONTENTS xiii 7 Results 72 7.1 Spanish Continuous Stock Market data sets . . . . . . . . . . . . . . . . . . . . . . 72 7.2 Direct Relation between Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 7.3 Uncorrelated Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 8 A Guide to Statistical Software Today 88 8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 8.2 Commercial Packages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 8.2.1 The SAS System for Statistical Analysis . . . . . . . . . . . . . . . . . . . . 89 8.2.2 Minitab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 8.2.3 BMDP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 8.2.4 SPSS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 8.2.5 S-PLUS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 8.2.6 Others . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 8.3 Public License Packages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 8.3.1 R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 8.3.2 BUGS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 8.4 Analysis Packages with Statistical Libraries . . . . . . . . . . . . . . . . . . . . . . 94 8.4.1 Matlab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 8.4.2 Mathematica . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 8.4.3 Others . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 8.5 Some General Languages with Statistical Libraries . . . . . . . . . . . . . . . . . . 95 8.5.1 Java . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 8.5.2 C++ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 8.6 Developed Software Tool: BARESIMDA . . . . . . . . . . . . . . . . . . . . . . . 96 9 Software Requirements Specification 98 9.1 Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 9.2 Intended Audience . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 9.3 Functionality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 9.3.1 Classical Regression with crisp data . . . . . . . . . . . . . . . . . . . . . . 99 9.3.2 Classical Regression with interval- valued data . . . . . . . . . . . . . . . . 99 9.3.3 Bayesian Regression with crisp data . . . . . . . . . . . . . . . . . . . . . . 100 9.3.4 Bayesian Regression with interval- valued data . . . . . . . . . . . . . . . . 100
  • 16. CONTENTS xiv 9.3.5 Data Manipulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 9.3.6 Portability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 9.3.7 Maintainability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 9.4 External Interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 9.4.1 User Interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 9.4.2 Software Interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 10 Software Architecture Study 103 10.1 Hardware/ Software Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 10.2 Logical Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 11 Project Budget 106 11.1 Engineering Costs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 11.2 Investment and Elements Costs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 11.2.1 Summarized Budget . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 12 Conclusions 110 12.1 Bayesian Regression applied to Symbolic Data . . . . . . . . . . . . . . . . . . . . 110 12.2 BARESIMDA Software Tool . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 12.3 Future Developments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 12.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 A Probability Distributions 113 A.1 Discrete Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 A.1.1 Binomial . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 A.1.2 Geometric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 A.1.3 Poisson . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 A.2 Continuous Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 A.2.1 Uniform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 A.2.2 Univariate Normal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 A.2.3 Exponential . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 A.2.4 Gamma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 A.2.5 Inverse- Gamma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 A.2.6 Chi-square . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 A.2.7 Inverse- Chi-square and Inverse-Scaled Chi-Square . . . . . . . . . . . . . . 118
  • 17. CONTENTS xv A.2.8 Univariate Student- t . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 A.2.9 Beta . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 A.2.10 Multivariate Normal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 A.2.11 Multivariate Student- t . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 A.2.12 Wishart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 A.2.13 Inverse- Wishart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 B Installation Guide 122 B.1 From source folder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 B.2 From installer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 C User’s Guide 123 C.1 Data Entry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 C.1.1 Loading an excel file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 C.1.2 Defining a new variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 C.1.3 Editing an existing variable . . . . . . . . . . . . . . . . . . . . . . . . . . . 126 C.1.4 Deleting an existing variable . . . . . . . . . . . . . . . . . . . . . . . . . . 127 C.1.5 Typing in a new data row . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 C.1.6 Deleting an existing data row . . . . . . . . . . . . . . . . . . . . . . . . . . 128 C.1.7 Modifying an existing data . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 C.2 Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 C.2.1 Setting the look& feel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 C.2.2 Selecting the type of user . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 C.3 Non Symbolic Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 C.3.1 Simple Classical Regression . . . . . . . . . . . . . . . . . . . . . . . . . . 131 C.3.2 Multiple Classical Regression . . . . . . . . . . . . . . . . . . . . . . . . . 133 C.3.3 Simple Bayesian Regression . . . . . . . . . . . . . . . . . . . . . . . . . . 136 C.3.4 Multiple Bayesian Regression . . . . . . . . . . . . . . . . . . . . . . . . . 139 C.4 Symbolic Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140 C.4.1 Simple Classical Regression . . . . . . . . . . . . . . . . . . . . . . . . . . 140 C.4.2 Multiple Classical Regression . . . . . . . . . . . . . . . . . . . . . . . . . 143 C.4.3 Simple Bayesian Regression . . . . . . . . . . . . . . . . . . . . . . . . . . 144 C.4.4 Multiple Bayesian Regression . . . . . . . . . . . . . . . . . . . . . . . . . 146
  • 18. CONTENTS xvi D Obtaining and Installing R 149 D.1 Binary distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149 D.2 Installation from source . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150 D.3 Package installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151 E Obtaining and installing Java Runtime Environment 152 E.1 Microsoft Windows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152 E.2 Linux . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 E.2.1 Installation of Self-Extracting Binary . . . . . . . . . . . . . . . . . . . . . 153 E.2.2 Installation of RPM File . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154 E.3 UNIX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155 Bibliography 157
  • 19. Chapter 1 Introduction 1.1 Project Motivation Statistics is primarily concerned with the analysis of data, either to assist in arriving at an improved understanding of some underlying mechanism, or as a means for making informed rational decisions. Both these aspects generally involve some degree of uncertainty. The statistician’s task is then to explain such uncertainty, and to reduce it to the extent in which this is possible. Problems of this type occur throughout all the physical, social and other sciences. One way of looking at statistics stems from the perception that, ultimately, probability is the only appropriate way to describe and system- atically deal with uncertainty, as if it were the language for the logic of uncertainty. Thus, inference statements are precisely framed as probability statements on the possible values of the unknown quan- tities of interest (parameters or future observations) conditional on the observed, available data. The scientific discipline based on this understanding is called Bayesian Statistics. Moreover, increasingly needed and sophisticated models, often hierarchical models, to describe available data are typically too much complex for conventional statistics to handle, but can be tackled within Bayesian Statistics. In principle, Bayesian Statistics is designed to handle all situations where uncertainty is found. Since some uncertainty is present in most aspects of life, it may be argued that Bayesian Statistics should be appreciated and used by everyone. It is the logic of contemporary society and science. According to [Rupp04], applying Bayesian methodology is no more discussed, but the question is when this has to be done. Bayesian methods have matured and improved in several ways during last fifteen years. Actually, they are increasingly becoming attractive to researchers as well as successful applications of Bayesian 1
  • 20. 1. Introduction data analysis have been appeared in many different fields, including Actuarial Science, Biometrics, Finance, Market Research, Marketing, Medicine, Engineering or Social Science. It is not only that the Bayesian approach produces appropriate answers to many current important problems, but also there is an evident need for it, given the inapplicability of conventional statistics to many of them. Thus, the main characteristic offered by Bayesian data analysis is the possibility of incorporating researcher’s knowledge about the problem to be handled. This supposes obtaining the better and the more reliable results as far as prior knowledge is more and more precise. But Bayesian Statistics was restrained until mid 90’s by its computational complexity. Since then, it has had a great expansion favoured by the development and improvement of different computational methods in this field such as Markov chain Monte Carlo. This methodology has shown to be extremely useful in its application to regression models, which are widely accepted. Let us remember that the general purpose of regression analysis is to learn more about the relationship between several independent or predictor variables and a dependent or criterion variable. Bayesian methodology let the researcher incorporate her or his knowledge to the analysis, improving the results since they do not only depend on the sampling data. On the other hand, increasingly, datasets are so large that they must be summarized in some fash- ion so that the resulting summary dataset is of a more manageable size, while still retaining as much knowledge inherent to the entire dataset as possible. One consequence of this situation is that data may no longer be formatted as single values such as is the case for classical data, but rather may be represented by lists, intervals, distributions, and the like. These summarized data are examples of symbolic data. This kind of data also lets us represent better the knowledge and beliefs having in our mind and that it is limited and hardly to take out with classical Statistics. According to [Bill02], this responds to the current need of changing from a Statistics of data in the past century to a Statistics of knowledge in XXI century. Market and demand requirements are increasing continuously throughout the time. This implies a need of better and more accurate methods to forecast new situations and to control different quanti- ties with the minimum error in order to supply better products, to obtain higher incomes or scientist advantages and better results. Dealing with this outlook, this project is intended to respond to those requirements providing a 2
  • 21. 1. Introduction wide and exhaustive documentation about some of the currently more used and advanced techniques, including Bayesian data analysis, regression models and symbolic data. Different examples related to the Continuous Spanish Stock Market have been explained throughout this writing, making clear the advantages of employing the described methods. Likewise a software tool with a user- friendly graphical interface has been developed to practice and to check all the acquired knowledge. Therefore, this is a project combining the most recent techniques with major future implications in theoretical issues, as Bayesian regression applied to interval- valued data is, with a technological part dealing with the problem of interconnecting two software programs: one used to show the graph- ical user interface and the other one employed to make computations. Regarding to a more personal motivation, when accepting this project, several factors were taken into consideration by the author: • A great challenge: it is an ambitious project with a high technical complexity related to both its theoretical basis and its technological basis. This represents a very good letter of introduction in order to be incorporated to the labour world. • A good planning time: this project was designed to be finished before June of 2007, which means to be able of finishing the career in June and incorporating to labour world in September. • Some very interesting issues: on one hand, it deals with the always needed issue of forecasting and modelling observations and situations in order to get the best possible results. On the other hand, it focuses on the Stock Market, which meets my personal hobbies. • A new programming language: the possibility of learning deeply a new and relatively recent programming language, such as R, was an extra- motivation factor. • The project director: Carlos Mat´ is considered a demanding and very competent director by e the students of the university. • An investigation scholarship: The possibility of being in the Industrial Organization department of the University learning from people such as the director mentioned above and another very recognized professors was a great factor. 3
  • 22. 1. Introduction 1.2 Objectives This project pretends to get the following aims. • To provide a wide and rigorous documentation about the following issues: Bayesian data anal- ysis, regression models and symbolic data. From this point, documentation about Bayesian regression will be developed, as well as the software tool designed. • To build a software tool in order to fit Bayesian regression models to interval- valued data, finding out the most efficient way to design the graphical user interface. This must be as user- friendly as possible. • To find out the most efficient way to offer that system to future clients from the tests carried out with the application. • To design a survey to measure the quality of the tool and users’ satisfaction. • The possibility to write an article for a scientific journal. 1.3 Methodology As the title of the project indicates, the last purpose is the development of an application aimed to- wards stock markets based on a Bayesian regression system and, therefore, some previous knowledge is required. The first stage is the familiarization of the Bayesian data analysis, regression models applied to Bayesian methodology and symbolic data. Within this phase, Bayesian data analysis will be firstly studied, trying to synthesize and to get the most important elements. A special dedication will be given to posterior simulation and computa- tional algorithms. Then, regression models will be treated, reviewing quickly the classical approach, to deep later into the different Bayesian regression models, applying great part of what was explained in Bayesian methodology. Finally, this first stage will be completed with the application to symbolic data, paying special attention to interval- valued data. The second stage is referred to the development of the software application, employing an incre- mental methodology for programming and testing iterative prototypes. This methodology has been 4
  • 23. 1. Introduction considered the most suitable for this project since it will let us introduce successive models into the application. The following figure shows the structure of the work packages the project is divided into: Figure 1.1: Project Work Packages 5
  • 24. Chapter 2 Bayesian Data Analysis 2.1 What is Bayesian Data Analysis? Statistics can be defined as the discipline that provides us with a methodology to collect, to organize, to summarize and to analyze a set of data. Regarding data analysis, it can be divided into two ways of analysis: exploratory data analysis and confirmatory data analysis. The former is used to represent, describe and analyze a set of data through simple methods in the first stages of statistical analysis. The latter is applied to make inferences from data, based on probability models. In the same way, confirmatory data analysis is divided into two branches depending on the adopted approach. The first one, known as frequentist, is used to make the inference of the data resulting from a sampling through classical methods. The second branch, known as Bayesian, goes further in the analysis and adds to those data the prior knowledge which the researcher has about the treated prob- lem. Since the frequentist approach is not worthy to explain everything here, a more extended revision of different classical methods related to the frequentist approach can be found in [Mont02].   Exploratory   Data Analysis Frequentist  Confirmatory   Bayesian 6
  • 25. 2. Bayesian Data Analysis As far as Bayesian analysis is concerned and according to [Gelm04], the process can be divided into the following three steps: • To set up a full probability model, through a joint probability distribution for all observable and unobservable quantities in a problem. • To condition on observed data, obtaining the posterior distribution. • Finally, to evaluate the fit of the model and the implications of the resulting posterior distribu- tion. f (θ, y), known as the joint probability distribution (or f (y|θ), if there are several parameters θ), is obtained by means of f (θ, y) = f (y|θ)f (θ) (resp. f (θ, y) = f (y|θ)f (θ)) (2.1) where y is the set of sampled data. So this distribution is the product of two densities that are referred to as the sampling distribution f (y|θ) (resp. f (y|θ)) and the prior distribution f (θ) (resp. f (θ)). The sampling distribution, as its name suggests, is the probability model that the researcher as- signs to the statistics (resp. set of statistics) to be studied after the data have been observed. Here, an important problem stands up in relation to parametric approach due to the fact that the probability model that the researcher chooses could not be adequate. The nonparametric approach overcomes this inconvenient as it will be seen later. When y is considered fixed, so it is function of θ (resp. θ), the sampling distribution is called the likelihood function and obeys the likelihood principle, which states that for a given sample of data, any two probability models f (y|θ) (resp. f (y|θ)) with the same likelihood function yield the same inference for θ, (resp. θ). The prior distribution does not depend upon the data. Accordingly, it contains the information and the knowledge that the researcher has about the situation or problem to be solved. When there is not any previous significant population from which the engineer can take his knowledge, that is, the researcher has not any prior information about the problem, a non-informative prior distribution must be used in the analysis in order to let the data speak for themselves. Hence, it is assumed that the prior knowledge will have very little importance in the results. But most non- informative priors 7
  • 26. 2. Bayesian Data Analysis are ”improper” in that they do not integrate to 1, and this fact can cause problems. In these cases it is necessary to be sure that the posterior distribution is proper. Another possibility is to use an informative prior distribution but with an insignificant weight (around zero) associated to it. Though the prior distribution can take any form, it is common to choose particular classes of priors that make computation and interpretation easier. These are the conjugate priors. A conjugate prior distribution is one which, when combined with the likelihood function, gives a distribution that falls in the same class of distributions as the prior. Furthermore, and according to [Koop03], a natural conjugate prior has the additional property that it has the same form as the likelihood does. But it is not always possible to find this kind of distribution and the researcher has to manage a lot of distribu- tions to be able to give expression to his prior knowledge about the problem. This is another handicap that the nonparametric approach reduces. In relation to the prior, what distribution should be chosen? There are three different points of view corresponding to different styles of Bayesians: • Classical Bayesians consider that the prior is a necessary evil and priors that interject the least information possible should be chosen. • Modern parametric Bayesians considers that the prior is a useful convenience and priors with desirable properties such as conjugacy should be chosen. They remark that given a distribu- tional choice, prior hyper-parameters that interject the least information possible should be chosen. • Subjective Bayesians give essential importance to the prior, in the sense they consider it as a summary of old beliefs. So prior distributions which are based on previous knowledge (either the results of earlier studies or non-scientific opinion) should be chosen. Returning to Bayesian data analysis process, simply conditioning on the observed data y and applying the Bayes’ Theorem, the posterior distribution, namely f (θ|y) (resp. f (θ|y)), yields: f (θ, y) f (θ)f (y|θ) f (θ, y) f (θ)f (y|θ) f (θ|y) = = (resp. f (θ|y) = = ) (2.2) f (y) f (y) f (y) f (y) where ∞ ∞ ∞ f (y) = f (θ)f (y|θ)dθ (resp. f (y) = f (θ)f (y|θ)dθ) (2.3) 0 0 0 8
  • 27. 2. Bayesian Data Analysis is known as the prior predictive distribution, since it is not conditional upon a previous observation of the process and is applied to an observable quantity. An equivalent form of the posterior distribution displayed above omits the prior predictive distri- bution, since it does not involve θ (resp. θ) and the interest is based on learning about θ (resp. θ). So, with fixed y, it can be said that the posterior distribution is proportional to the joint probability distribution f (θ, y). Once the posterior distribution is calculated, some kind of summary measure will be required to estimate the uncertainty about the parameter θ (resp. θ). This is due to the fact that the posterior distribution is a high- dimensional object and its use is not practical for a problem. That measure which will summarize the posterior distribution can be the posterior mean, mode, median or variance, apart from others. Its choice will depend on the requirements of the problem. So the posterior dis- tribution has a great importance since it lets the researcher manage the uncertainty about θ (resp. θ) and provide him information about it (resp. them) taking into account both his prior knowledge and the data collected from sampling on that parameter. According to [Mat´ 06], it is not difficult to deduce that posterior inference will fit in the non- e Bayesian one as long as the estimation which the researcher gives to the parameter θ (resp. θ) is the same as the one resulting from the sampling. Once the data y have been observed, a new unknown observable quantity y can be predicted for ˜ the same process through the posterior predictive distribution, namely f (˜|y): y f (˜|y) = y f (˜, θ|y)dθ = y f (˜|θ, y)dθ = y f (˜|θ)f (θ|y)dθ y (2.4) To sum up, the basic idea is to update the prior distribution f (θ) through Bayes’ theorem by observing the data y in order to get a posterior distribution f (θ|y). Then a summary measure or a prediction for new data can be obtained from f (θ|y). Table 2.1 reflects what has been said. 9
  • 28. 2. Bayesian Data Analysis Distribution Expression Information Required Result Likelihood f (y|θ) Data Distribution f (y|θ) Prior f (θ) Researcher’s Knowledge Parameter Distribution f (θ) Joint f (y|θ)f (θ) Likelihood Distribution Prior Distribution f (θ, y) Posterior f (θ)f (y|θ) Prior Joint Distribution f (θ|y) Predictive f (˜|θ)f (θ|y)dθ y New Data Distribution Posterior Distribution f (˜|y) y Table 2.1: Distributions in Bayesian Data Analysis 2.2 Bayesian Analysis for Normal and other distributions 2.2.1 Univariate Normal distribution The basic model to be discussed concerns an observable variable , normally distributed with mean µ and unknown variance σ 2 : y|µ, σ 2 N (µ, σ 2 ) (2.5) As it can be seen in Appendix A, the likelihood function for a single observation is 1 f (y|µ, σ 2 ) ∝ (σ 2 )−1/2 exp − (y − µ)2 (2.6) 2σ 2 This means that the likelihood function is proportional to a Normal distribution, omitting those terms that are constant. Now let us consider we have n independent observations y1 , y2 , . . . , yn . According to the previ- ous section, the parameters to be estimated θ are µ and σ 2 : 10
  • 29. 2. Bayesian Data Analysis θ = (θ1 , θ2 ) = (µ, σ 2 ) (2.7) A full probability model must be set up through a joint probability distribution: f (θ, (y1 , y2 , . . . , yn )) = f (θ, y) = f (y|θ)f (θ) (2.8) The likelihood function for a sample of n iid observations in this case is n 1 f (y|θ) = f (y|µ, σ 2 ) ∝ (σ 2 )−1/2 exp − (yi − µ)2 (2.9) 2σ 2 i=1 As it was recommended previously, a conjugate prior will be chosen; in fact, it will be a natural conjugate prior. According to [Gelm04], this likelihood function suggests a conjugate prior distribu- tion of the form f (θ) = f (µ, σ 2 ) = f (µ|σ 2 )f (σ 2 ) (2.10) where the marginal distribution of σ 2 is the Scaled Inverse-χ2 and the conditional distribution of µ given σ 2 is Normal (details about these distributions in Appendix A): µ|σ 2 N (µ0 , σ 2 V0 ) (2.11) σ2 Inv − χ2 (µ0 , s2 ) 0 (2.12) So the joint prior distribution is: f (θ) = f (µ, σ 2 ) = f (µ|σ 2 )f (σ 2 ) ∝ N − Inv − χ2 (µ0 , s2 V0 , ν0 , s2 ) 0 0 (2.13) Its four parameters can be identified as the location and scale of µ and the degrees of freedom and scale of σ 2 , respectively. As a natural conjugate prior was employed, the posterior joint distribution will have the same form that the prior has. So, conditioning on the data, and according to Bayes’ Theorem, we have: f (θ|y) = f (µ, σ 2 |y) = f (y|µ, σ 2 )f (µ, σ 2 ) ∝ N − Inv − χ2 (µ1 , s2 V1 , ν1 , s2 ) 1 1 (2.14) where it be can shown that 11
  • 30. 2. Bayesian Data Analysis µ1 = (V0−1 + n)−1 V0−1 µ0 + n¯ y (2.15) −1 V1 = V0−1 + n (2.16) ν1 = ν0 + n (2.17) V0−1 n ν1 s2 = ν0 s2 + (n − 1)s2 + 1 0 (¯ − µ0 )2 y (2.18) V0−1 + n All these formulae evidence that Bayesian inference combines prior and posterior information. The first term means that posterior mean µ1 is a weighted mean of prior mean µ0 and empirical mean divided by the sum of their respective weights, where these are represented by V0−1 and the simple size n. The second term represents the importance that posterior mean has and it can be seen as a com- promise between the sample size and the significance given to the prior mean. The third term indicates that the degrees of freedom of posterior variance are the sum of the prior degrees of freedom and the sample size. That is, the prior degrees of freedom can be understood as a fictitious sample size on which the expert’s prior information is based. The last term explains the posterior sum of square errors as a combination of prior and empirical sum of square errors plus a term that measures the conflict between prior and posterior information. A more detailed explanation of this last step can be found in [Gelm04], [Koop03] or [Cong06]. It is obvious that the marginal posterior distributions are: µ|σ 2 , y N (µ1 , σ 2 V0 ) (2.19) σ 2 |y Inv − χ2 (ν1 , s2 ) 1 (2.20) If we integrate out σ 2 , the marginal for µ will be a t-distribution (see Appendix A for details): µ|y tν1 (µ1 , s2 V0 ) 1 (2.21) 12
  • 31. 2. Bayesian Data Analysis Let us see an application to the Spanish Stock market. Let us suppose that the monthly close values associated with Ibex 35 are normally distributed. If we take the values at which the Span- ish index closed during the first two weeks in January in 2006, it can be shown that the mean was 10893.29 and the standard deviation was 61.66. So the non- Bayesian approach would inference a Normal distribution with the previous mean and standard deviation. Let us guess that we had asked any analyst about the Ibex 35 evolution in January, he would have affirmed strongly that it would decrease slightly, the mean close value at the end of the month would be around 10870 and, hence, the standard deviation would be higher, around 100. Then, according to the previous formulas, the posterior parameters would be µ1 = (100 + 10)−1 (100 × 10870 + 10 × 10893.29) = 10872.12 V1 = (100 + 10)−1 = 0.0091 ν1 = 100 + 10 = 110 (100 × 1002 + 9 × 61.66 + 1000 (10893.29 − 10870)2 ) 110 s1 = = 95.60 110 This means that there is a difference of almost 20 points between the Bayesian estimation and the non-Bayesian for the mean close value of January. When the month of January would have passed, we could compare both results and we could note that the Bayesian estimation was closer to the finally real mean close value and standard deviation: 10871.2 and 112.44. In figure 2.1, it can be seen how the blue line representing the Bayesian estimation is closer to the cyan line representing the final real mean close value than the red line representing the frequentist estimation: 2.2.2 Multivariate Normal distribution Now, let us consider that we have an observable vector y of d components with the multivariate Normal distribution: y N (µ, Σ) (2.22) where the first parameter is the mean column vector and the second one is the variance-covariance matrix. Extending what was said above to the multivariate case, we have: 13
  • 32. 2. Bayesian Data Analysis −3 x 10 7 Frequentist Approach Bayesian Approach 6 Real Mean Colse Value in January 5 4 3 2 1 0 10000 10200 10400 10600 10800 11000 11200 11400 11600 11800 12000 Figure 2.1: Univariate Normal Example 1 f (y|µ, Σ) ∝ Σ−1/2 exp − (y − µ) Σ−1 (y − µ) (2.23) 2 And for n iid observations: n −n/2 1 f (y1 , y2 , . . . , yn |µ, Σ) ∝ Σ exp − (yi − µ) Σ−1 (yi − µ) (2.24) 2 i=1 A multivariate generalization of the Scaled-Inverse χ2 is the Inverse Wishart distribution (see details in Appendix A), so the prior joint distribution is Λ0 f (θ|y) = f (µ, Σ|y) ∝ N − Inv − W ishart µ0 , , ν0 , Λ0 (2.25) k0 due to the fact that Σ µ|Σ N µ0 , (2.26) k0 Σ Inv − W ishart ν0 , Λ−1 0 (2.27) 14
  • 33. 2. Bayesian Data Analysis Univariate Normal Multivariate Normal Expression y N (µ, σ 2 ) y N (µ, Σ) Parameters to estimate µ, σ 2 µ, Σ 2 µ|σ 2 N µ0 , σ0 k µ|Σ Σ N µ0 , k0 Prior Distributions σ2 Inv − χ2 ν0 , σ0 2 Σ Inv − W ishart ν0 , Λ−1 0 2 σ0 µ, σ 2 N − Inv − χ2 µ0 , 2 k0 , ν0 , σ0 µ, Σ N − Inv − W ishart µ0 , k0 , ν0 , Λ−1 Σ 0 2 µ|σ 2 N µ1 , σ1 k µ|Σ Σ N µ1 , k1 Posterior Distributions σ2 Inv − χ2 ν1 , σ1 2 Σ Inv − W ishart ν1 , Λ−1 1 2 σ1 µ, σ 2 N − Inv − χ2 µ1 , 2 k1 , ν1 , σ1 µ, Σ N − Inv − W ishart µ1 , Λ1 , ν1 , Λ1 k 1 Table 2.2: Comparison between Univariate and Multivariate Normal The posterior results are the same that were told for the univariate case but applying these distri- butions. For those interested readers, more information in [Gelm04] or [Cong06]. A summary is shown in Table 2.2 in order to get the most important ideas. 2.2.3 Other distributions As it has just been made with the Normal distribution, a Bayesian analysis for other distributions could be done. For instance, the exponential distribution is commonly used in reliability analysis. Because of this project will deal with the Normal distribution for the likelihood, it will not be explained in detail the analysis with other distributions. Table 2.3 shows the conjugate prior and posterior distributions 15
  • 34. 2. Bayesian Data Analysis for other likelihood distributions. More details can be found in [Cong06], [Gelm04], or [Rossi06]. Likelihood Parameter Conjugate Prior Hyperparameters Posterior Hyperparameters Bin(y|n, θ) θ Beta α, β α + y, β + n − y P (y|θ) θ Gamma α, β α + n¯, β + n y Exp (y|θ) θ Gamma α, β α + 1, β + y Geo(y|θ) θ Beta α, β α + 1, β + y Table 2.3: Conjugate distributions for other likelihood distributions 2.3 Hierarchical Models Hierarchical data arise when they are structured or related among them. When this occurs, standard techniques either assume that these groups belong to entirely different populations or ignore the ag- gregate information entirely. Hierarchical models provide a way of pooling the information for the disparate groups without assuming that they belong to precisely the same population. Suppose we have collected data about some random variable Y from m different populations with n observations for each population. Let yij represent observation j from population i. Now suppose yij f (θi ), where θi is a vector of parameters for population i. Furthermore, θi f (Θ), where Θ may also be a vector. Until this point, we have only rewritten what it was said previously. 16
  • 35. 2. Bayesian Data Analysis Now let us extend the model, and assume that the parameters Θ11 , Θ12 that govern the distribution of the θ’s are themselves random variables and assign a prior distribution to these variables as well: Θ f (ψ) (2.28) where Θ is called the hyperprior. The vector parameter ψ for the hyperprior may be ”known” and represents our prior beliefs about Θ or, in theory; we can also assign a probability distribution for these quantities as well, and proceed to another layer of hierarchy. According to [Gelm04], the idea of exchangeability will be used to create a joint probability distribution model for all the parameters θ. A formal definition to explain what exchangeability consists of is: ”The parameters θ1 , θ2 , . . . , θn are exchangeable in their joint distribution if f (θ1 , θ2 , . . . , θn ) is invariant to permutations in the index 1, 2, . . . , n”. This means that if no information other than the data is available to distinguish any of the θi from any of the others, and no ordering of the parameters can be made, one must assume symmetry among the parameters in the prior distribution. So we can treat the parameters for each sub-population as exchangeable units. This can be formulated by: f θ1 , θ2 , . . . , θn |Θ = Πl f θi |Θ i=1 (2.29) The prior joint distribution is now: f θ1 , θ2 , . . . , θn , Θ = f θ1 , θ2 , . . . , θn |Θ f (Θ) (2.30) And conditioning on the data, it yields: f θ1 , θ2 , . . . , θn |y = f θ1 , θ2 , . . . , θn , Θ f y|θ1 , θ2 , . . . , θn , Θ (2.31) Perhaps the most important point in practice is that non-hierarchical models are usually inappro- priate for hierarchical data, while non-hierarchical data can be modelled following the hierarchical structure and assigning concrete values to the hyperprior parameters. This kind of models will be used in Bayesian regression models with autocorrelated errors, as it will be seen in the following chapters. 17
  • 36. 2. Bayesian Data Analysis For more details about Bayesian hierarchical models, the reader is referenced to [Cong06], [Gelm04] and [Rossi06]. 2.4 Nonparametric Bayesian To overcome the limitations that have been mentioned throughout this chapter, it is the nonparametric approach which achieves to get through and to reduce the restrictions of the parametric approach. This kind of analysis can be performed through the so-called Dirichlet Process, which allows us to express in a simple way the prior distributions or the distribution family of F , where F is the distri- bution function of the studied variable. This process has a parameter, called α, which is transformed into a distribution probability. According to [Mat´ 06], a Dirichlet Process for F (t) requires to know: e • A previous proposal for F (t), F0 (t), that corresponds to the distribution function that remarks the prior knowledge which the engineer has and it is denoted by α(t) F0 (t) = (2.32) M • A measure of the confidence about the previous proposal, denoted by M , and whose values can vary between 0 and ∞, depending on whether there is a total confidence in the data or in the previous proposal respectively. ˆ It can be demonstrated that the posterior distribution for F (t), Fn (t), with a sampling over n data, is given by ˆ Fn (t) = pn Fn (t) + (1 − pn )Fn (t) (2.33) M where Fn (t) is the empirical distribution function and pn = M +n . A more detailed information about the nonparametric approach and how Dirichlet processes are used can be found in [Mull04] or [Gosh03]. 18
  • 37. 2. Bayesian Data Analysis With this approach not only the limitation of the parametric approach related to the probability model of the variable to study is avoided, since no hypothesis is required, but also it allows us to confer a quantified importance to the prior knowledge which the engineer gives, depending on the confidence on the certainty about this knowledge. 19
  • 38. Chapter 3 Posterior Simulation 3.1 Introduction A practical problem with Bayesian inference is the difficulty of summarizing realistically complex posterior distributions. In most practical problems, posterior densities will not take the form of any well-known and understood density, so summary statistics, such as the posterior mean and variance of parameters of interest, will not be analytically available. It is at this point where the importance of the Bayesian computation arises and any computational tools are required to gain meaningful inference from the posterior distribution. Its importance is such that the computing revolution of the last 20 years has led to a blossoming of Bayesian methods in many fields such Econometrics, Ecology or Health. Regarding to this, the most transcendent simulation methods are the Markov chain Monte Carlo methods (MCMC). MCMC methods date from the original work of [Metr53], who were interested in methods for the efficient simulation of the energy levels of atoms in a crystalline structure. The original idea was subsequently generalized by [Hast70], but its true potential was not fully realized within the statistical literature until [Gelf90] demonstrated its application to the estimation of inte- grals commonly occurring in the context of Bayesian statistical inference. As [Berg05] points up, the underlying principle is simple: if one wishes to sample randomly from a specific probability distribution then design a Markov chain whose long-time equilibrium is that distribution, write a computer program to simulate the Markov chain, run it for a time long enough to be confident that approximate equilibrium has been attained, then record the state of the Markov 20
  • 39. 3. Posterior Simulation chain as an approximate draw from equilibrium. The technique has been developed strongly in different fields and with rather different emphases in the computer science community concerned with the study of random algorithms (where the em- phasis is on whether the resulting algorithm scales well with increasing size of the problem), in the spatial statistics community (where one is interested in understanding what kinds of patterns arise from complex stochastic models), and also in the applied statistics community (where it is applied largely in Bayesian contexts, enabling researchers to formulate statistical models which would other- wise be resistant to effective statistical analyses). The development of the theoretical work also benefits the development of statistical applications. The MCMC simulation techniques have been applied to develop practical statistical inferences for almost all problems in (bio) statistics, for example, the problems in longitudinal data analysis, im- age analysis, genetics, contagious disease epidemics, random spatial pattern, and financial statistical models such as GARCH and stochastic volatility. The simplicity of the underlying principle of MCMC is a major reason for its success. However a substantial complication arises as the underlying target problem becomes more complex; namely, how long should one run the Markov chain so as to ensure that it is close to equilibrium? According to [Gelm04], with n = 100 independent samples should be enough for reasonable posterior summaries, but in some cases more samples are needed to assure more accuracy. 3.2 Markov chains The essential theory required in developing Monte Carlo methods based on Markov chains is pre- sented here. The most fundamental result is that certain Markov chains converge to a unique invariant distribution, and can be used to estimate expectations with respect to this distribution. But in order to reach this conclusion, some concepts need to be defined firstly. A Markov chain is a series of random variables, X0 , . . . , Xn , also called a statistic process, in which only the value of Xn−1 influences the distribution of Xn . Formally: P (Xn = xn |X0 = x0 , . . . , Xn−1 = xn−1 ) = P (Xn = xn |Xn−1 = xn−1 ) (3.1) 21
  • 40. 3. Posterior Simulation where the Xn−1 have a common range called the state space of the Markov chain. The common language to refer to different situations in which a Markov chain can be found is the following. If Xn = i, it is said that the chain is in the state i in the step n or that it has the value i in the step n. This language confers the chain certain dynamic view, which is corroborated by the main tool to study it: the transition probabilities P (Xn+1 = j|Xn = i), which are represented by the transition matrix P = (Pij ) with Pij = P (Xn+1 = j|Xn = i) . This is used to show the probability of changing of state i to state j. Due to the fact that in major interesting applications Markov chains are homogeneous, the transi- tion matrix can be defined from the initial probability, P0 = P (X1 = j|X0 = i). Regarding to this, a Markov chain Xt is homogeneous if P (Xn+1 = j|Xn = i) = P (X1 = j|X0 = i) for all n, i, j. Furthermore, using Chapman- Kolmogorov equation, it can be shown that, given the transition matrixes P and, for step n, Pn of a homogenous Markov chain, then Pn = P n . On the other hand we will see the concepts of invariant or stationary distribution, ergodicity and irreducibility, which are indispensable to reach the main result. It will be assumed that Xt is a ho- mogenous Markov chain. Then, vector P is an invariant distribution of the chain Xt if satisfies: a) πj ≥ 0 such as j πj = 1. b) π = πP . That is, a stationary distribution over the states of a Markov chain is one that persists forever once it is reached. The concept of ergodic state requires making other definitions clear such as recurrence and aperi- odicity: • The state i is recurrent if P (Xn = i|X0 = i) = 1 for any n ≥ 1. Otherwise, it is transient. Moreover, i will be positive recurrent if the expected (average) return time is finite, and null recurrent if it is not. 22
  • 41. 3. Posterior Simulation • The period of a state i, denoted by d, is defined as di = mcd(n : [Pn ]ii > 0). The state i is aperiodic if di = 1, or periodic if it is greater. Then a state is ergodic if it is positive recurrent and aperiodic. The last concept to define is the irreducibility. A set of states C ∈ S, where S is the set of all possible states, is irreducible if for all i, j ∈ C: • i and j have the same period. • i is transient if and only if j is transient. • i is recurrent if and only if j is null recurrent. Now, having all these concepts in mind, we can know if a Markov chain has a stationary distribu- tion with next lemma: Lema 3.2.1. Let Xt be a homogenous and irreducible Markov chain. The chain will have only one stationary distribution if, and only if, all the states are positive recurrent. In that case, it will have inputs given by πi = µi −1 , where µi denotes the expected return time of the state i. The relation with the long time behaviour is given by this other lemma: Lema 3.2.2. Let Xt be a homogenous, irreducible and aperiodic Markov chain. Then 1 [Pn ]ij −→ for all i, j ∈ S as n ∞ (3.2) µi 3.3 Monte Carlo Integration Monte Carlo integration estimates the integral E[g(θ)] by obtaining samples θt , t = 1, . . . , n from the posterior distribution p(θ|y) and averaging n 1 E[g(θ)] = g(θt ) (3.3) n t=1 where the function g(θ) represents the function of interest to estimate. Note that if samples θt , t = 1, . . . , n has p(θ|y) as its stationary distribution, the θt form a Markov chain. 23
  • 42. 3. Posterior Simulation 3.4 Gibbs sampler In many models, it is not easy to draw directly from the posterior distribution p(θ|y). However, if the parameter θ is partitioned into several blocks as θ = (θ1 , . . . , θp ) where θj for j = 1, . . . , p, then the full conditional posterior distributions, p(θ1 |y, θ2 , . . . , θp ), . . . , p(θp |y, θ1 , . . . , θp−1 ) , could be sim- ple to draw from to obtain a sequence θ1 , . . . , θp . For instance, in the Normal linear regression model it is convenient to set j=2, with θ1 = β and θ2 = σ 2 , and the full conditional distributions would be p(θ1 = β|y, θ2 = σ 2 ) and p(θ2 = σ 2 |y, θ1 = β), which are very useful in the Normal independent model which will be explained later. The Gibbs sampler is defined by iterative sampling from each of those p conditional distributions: 1. Set a starting value, θ0 = (θ2 , . . . , θp ). 0 0 2. Take random draws 1 0 0 - θ1 from p(θ1 |y, θ2 , . . . , θp ) 1 1 0 - θ2 from p(θ2 |y, θ1 , . . . , θp ) . -. . 1 1 1 - θp from p(θp |y, θ1 , . . . , θp−1 ) 3. Repeat step 2 as necessary. 4. Reject those θ affected by θ0 = (θ2 , . . . , θp ), that is the first p − 1 draws, and average the rest 0 0 of draws applying the Monte Carlo integration. For instance, in the Normal regression model we would have: 1. Set a starting value, θ0 = (θ2 = (σ 2 )0 ). 0 2 2. Take random draws - θ1 = β1 from p(θ1 = β|y, θ2 = (σ 2 )0 ) 1 1 0 2 - θ2 = (σ 2 )1 from p(θ2 = σ 2 |y, θ1 = β) 1 2 1 3. Repeat step 2 as necessary. 1 1 4. Eliminate those θ1 = β1 and average the rest of draws applying the Monte Carlo integration. 24
  • 43. 3. Posterior Simulation Those values dropped which are affected by the starting point are called the burn-in. Generally, any set of values which are discarded in a MCMC simulation is called the burn-in. The size of the burn-in period is the subject of current research in MCMC methods. As the state of each draw depends on the state of the previous one, the sequence is a Markov chain. More detail information can found in [Chen00], [Mart01] or [Rossi06]. 3.5 Metropolis-Hastings sampler and its special cases 3.5.1 Metropolis-Hastings sampler The Metropolis-Hastings method is adequate to simulate models that are not conditionally conjugate. Furthermore, it can be combined with the Gibbs sampler to simulate posterior distributions where some of the conditional posterior distributions are easy to sample from and other ones are not. As the algorithms above explained, this is based on formulating a Markov chain, but using a proposal distribution, q(.|θt ), which depends on the current state θt , to generate a new proposed sample θ∗ . This proposal is accepted as the next state with probability given by p(θ∗ |y)q(θt |θ∗ ) α(θt , θ∗ ) = min 1, (3.4) p(θt |y)q(θ∗ |θt ) If the point θ∗ is not accepted, then the chain does not move and θt+1 = θt . According to [Mart01], the steps to follow are: 1. Initialize the chain to θ0 and set t=0. 2. Generate a candidate point θ∗ from q(.|θt ). 3. Generate U from a uniform (0,1) distribution. 4. If U ≤ α(θt , θ∗ ) then set θt+1 = θ∗ , else set θt+1 = θt . 5. Set t=t+1 and repeat steps 2 trough 5. 6. Take the average of the draws g(θ1 ), . . . , g(θn ) Note that it should be, not only recommendable, but also essential that the proposal distribution q(·|θt ) were easy to sample from. 25
  • 44. 3. Posterior Simulation There are some special cases of this method. The most important are briefly explained below. As well as those, it can shown according to [Gelm04] that the Gibbs sampler is another special case of the Metropolis-Hastings algorithm where the proposal point is always accepted. 3.5.2 Metropolis sampler This method is a particular case of the Metropolis-Hastings sampler where the proposal distribution has to be symmetric. That is, q(θ∗ |θt ) = q(θt |θ∗ ) (3.5) for all θ∗ and θt . Then, the probability of accepting the new point is p(θ = θ∗ |y) α(θt , θ∗ ) = min 1, (3.6) p(θ = θt |y) The same procedure seen in the Metropolis-Hastings sampler has to be followed. 3.5.3 Random-walk sampler This special case refers to a proposal distribution of the form q(θ∗ |θt) = q(|θt − θ∗ |) (3.7) And the candidate point is θ∗ = θt + z, where z is called the increment random variable from q. Then, the probability of accepting the new point is p(θ = θ∗ |y) α(θt , θ∗ ) = min 1, (3.8) p(θ = θt |y) The same procedure seen in the Metropolis-Hastings sampler has to be followed. 3.5.4 Independence sampler The last variation has a proposal distribution such that q(θ∗ |θt ) = q(θ∗ ) (3.9) So it does not depend on θt . Then, the probability of accepting the new point is 26
  • 45. 3. Posterior Simulation p(θ∗ |y)p(θt ) w(θ∗ ) α(θt , θ∗ ) = min 1, = min 1, (3.10) p(θt |y)p(θ∗ ) w(θt ) where p(θ|y) w(θ) = (3.11) q(θ) It is important to remark that to make this method works well, the proposal distribution q should be very similar to the posterior distribution p(θ|y). The same procedure seen in the Metropolis-Hastings sampler has to be followed. 3.6 Importance sampling Importance sampling is a variance reduction technique that can be used in the Monte Carlo method. The idea behind this method is that certain values of the input random variables in a simulation have more impact on the parameter being estimated than others. So instead of taking a simple average, importance sampling takes a weighted average. Let q(θ) be a density from which is easy to obtain random draws θ(s) for s = 1, . . . , S. Then q(θ) is called the importance function, and the importance sampling can be defined: PS (s) )g(θ (s) ) s=1 w(θ p(θ=θ(s) |y) The function gs = ˆ PS (s) ) , where w(θ(s) ) = q(θ=θ (s) ) , converges to E[g(θ)|y] as s=1 w(θ S −→ inf. p∗ (θ|y) In fact, w(θ(s) ) can be formulated by w(θ(s) ) = q ∗ (θ|y) , where the new densities are proportional to the old ones. For more information and details about Markov chain Monte Carlo methods and their application, the reader is referred to [Chen00], [Gilk95], [Berg05] and [Kend05]. 27
  • 46. Chapter 4 Sensitivity Analysis 4.1 Introduction There will be many times where the researcher, having selected a model, wants to consider the pos- sibility of choosing another model or simply to compare with it. Then it is necessary any tool that help him to compare both models, and to select one of them. This will be useful to make the variables selection too in the regression models. In this section, the Bayesian Model Comparison is briefly discussed, remarking those methods which will be more useful. In the Bayesian field, common methods for model comparison are based on the following: sepa- rate estimation, comparative estimation and simultaneous estimation. Comparative estimation is based on distance measures such as entropy distance, and the underly- ing idea is that the more parsimonious model may be preferred between two models whose distance between their posterior or posterior predictive distributions is sufficiently small. Simultaneous model estimation let us compare many models at the same time, and the main meth- ods are reversible jump MCMC (RJMCMC) and birth and death MCMC (BDMCMC). Separate estimation compares two models not necessarily nested, and the most used terms are the posterior predictive distributions and the posterior probability of the model. Since methods which can be considered to be into this type are the most accepted, we will explain some of them, remarking the most important ones. 28
  • 47. 4. Sensitivity Analysis 4.2 Bayes Factor This is probably the dominant method of Bayesian model testing. It is the analogue of likelihood ratio tests within the frequentist framework, and the basic intuition is that prior and posterior information are combined in a ratio that provides evidence in favour of one model specification versus another. Let us suppose we have two models to compare, M1 and M2 . Let p(M1 ) and p(M2 ) be the prior probabilities for the model M1 , M2 , respectively, and p(M1 |y) and p(M2 |y) be the posterior probabilities for the model M1 , M2 , respectively. Then the Bayes Factor is: p(y|M1 ) p(M1 |y)p(M1 ) B(y) = = (4.1) p(y|M2 ) p(M2 |y)p(M2 ) This means that the Bayes Factor chooses the very model for which the marginal likelihood of the data, namely p(y|Mi ), is maximum. Therefore, the value of a factor gives evidence of the preference between two models. According to [Jeff61], the following interpretation is suggested: Bayes Factor Interpretation 1 B(y) < 10 Strong evidence for M2 1 1 10 < B(y) < 3 Moderate evidence for M2 1 3 < B(y) < 1 Weak evidence for M2 1 < B(y) < 3 Weak evidence for M1 13 < B(y) < 10 Moderate evidence for M2 B(y) > 10 Strong evidence for M1 Table 4.1: Bayes Factor Interpretation 29
  • 48. 4. Sensitivity Analysis The marginal likelihood usually involves an integral which can be analytically evaluated only for some special cases. So, while Bayes Factors are rather intuitive, they are often quite difficult or even impossible to calculate from a practical point of view. Because of this, there are other alternatives to this method. 4.3 Alternative Stats to Bayes Factor ˆ Let θ be the posterior mean of the posterior distribution and let us assume that the Bayes estimate for the parameters θ is approximately equal to the maximum likelihood estimate. Then, the following stats, from which some of them are used in frequentist statistics, could be useful diagnostics: • The likelihood Ratio, which will always favour the unrestricted model, and where the ratio is: ˆ ˆ Ratio = −2[log(p(θRestricted |y)) − log(p(θF ull |y))] (4.2) The ratio is distributed as a χ2 , where p is the number of parameters, including the intercept. p • Akaike Information Criterion (AIC), where a ratio between AIC1 (AIC for M1 ) and AIC2 (AIC for M2 ) less than 1 indicates that M1 is better. This method let the models not have to be nested and it favours more complicated models. The stat is: ˆ AIC = −2log(p(θ|y)) + 2p (4.3) where p is the number of parameters, including the intercept. It is used to be better than the previous one. • The Bayesian Information Criterion (BIC), which is also known as Schwarz Criterion (SC), Schwarz Information Criterion (SIC) or Schwarz Bayesian Criterion (SBC). As it occurred with the AIC, this method can be used for non- nested models. The BIC is: ˆ BIC = −2log(p(θ|y)) + plog(n) (4.4) where p is the number of parameters, including the intercept, and n is the sample size. Given any two estimated models, the model with the lower value of BIC is the one to be preferred. Since this method promotes model parsimony by penalizing models with increased model complexity (larger p) and sample size, say n, it may be preferred to the AIC. 30
  • 49. 4. Sensitivity Analysis • The Deviance Information Criterion (DIC), which is a new statistic introduced by the devel- opers of the WinBugs software, who explained it in a detailed way in [Spie03]. The main and most important difference with the previous methods is that this is not an approximation of the Bayes Factor. It is a hierarchical modelling generalization of the AIC and the BIC, and it is particularly useful when the posterior distributions have been obtained by simulation. The DIC is: L −4 ˆ DIC = log(p(y|θl )) + 2log(p(y|θ)) (4.5) L l=1 where θL is the draw which has been obtained by simulating the posterior distribution in the L iteration. This method also penalizes against higher dimensional models, and it may be preferred to previous ones, mainly in linear models context. 4.4 Highest Posterior Density Intervals All the techniques mentioned above typically require the elicitation of informative priors. However, there could be Bayesians who were interested to do model comparison with a non-informative prior. In such case, there are other techniques which can be used. Since the most common one in regression analysis is the Highest Posterior Density Interval (HPDI), we will only explain this method and let will the interested reader reference to the below citations. Before defining the idea of HPDI is required to make the concept of credible set clear. Let us assume that ω is the region over which the coefficients β are defined. Then, C ⊆ ω is a 100(1 − α)% credible set with respect to β if: p(β ∈ C|y) = 1 − α (4.6) Since there are commonly numerous credible intervals, it is used to choose the one with smallest area, namely the Highest Posterior Density Interval. Formally, a 100(1 − α)% highest posterior density interval for α is a 100(1 − α)% credible inter- val for θ with the property that it has a smaller area than any other 100(1−α)% credible interval for β. 31
  • 50. 4. Sensitivity Analysis This is the Bayesian analogue of confidence intervals within frequentist framework, but now the meaning is more in line with commonsense. More information about all these methods and other variants of the Bayes Factor can be found in a more detailed way in [Aitk97], [Berg98], [Chen00], [Cong06] or [Koop03]. 4.5 Model Comparison Summary A model comparison summary can be found in Tables 4.2 and 4.3 where the mark symbols mean: • * Good • ** Better • *** Still better • **** Probably the best 32
  • 51. 4. Sensitivity Analysis Method Formulae Interpretation Mark p(y|M1 ) 1 Bayes Factor B(y) = p(y|M2 ) B(y) < 10 Strong evidence for M2 * 1 1 10 < B(y) <3 Moderate evidence for M2 1 3 < B(y) < 1 Weak evidence for M2 1 < B(y) < 3 Weak evidence for M1 3 < B(y) < 10 Moderate evidence for M1 B(y) > 10 Strong evidence for M1 33 Likelihood Ratio ˆ ˆ Ratio=−2 log p(βRestricted |y) − log p(βF ull |y) Ratio > χ2 Reject the restricted model * p Ratio < χ2 p Reject the restricted model ˆ AIC1 AIC AIC=−2 log p(β|y) + 2p AIC2 < 1 M1 is better than M2 ** AIC1 AIC2 > 1 M2 is better than M1 Table 4.2: Sensitivity Summary I
  • 52. 4. Sensitivity Analysis Method Formulae Interpretation Mark ˆ BIC1 BIC BIC=−2log p(β|y) + plog(n) BIC2 < 1 M1 is better than M2 *** BIC1 BIC2 > 1 M2 is better than M1 1 L ˆ DIC1 DIC DIC=−4 L i=1 log p(y|β L ) + 2log p(y|β) DIC2 < 1 M1 is better than M2 **** 34 DIC1 DIC2 > 1 M2 is better than M1 There is a probability of HPDI HPDI=p(β ∈ C|y) = 1 − α with the smallest area 100(1 − α)% of β being **** in the region C Table 4.3: Sensitivity Summary II
  • 53. Chapter 5 Regression Analysis 5.1 Introduction Regression analysis is a statistical tool for the investigation of relationships between variables, such as models the relationship between one or more random variables y, called the response variables, and an independent variable or variables x, called the predictors. That is, it allows us to examine the conditional distribution of y given x, denoted by p(y|β, x), when the n observations (xi , yi ), are exchangeable. Applications of regression analysis exist in almost every field. In economics, the dependent vari- able might be Ibex 35 index and the independent variables could be Dow Jones and FTSE 100 indexes. In political science, the dependent variable might be a state’s level of welfare spending and the inde- pendent variables measures of public opinion and institutional variables that would cause the state to have higher or lower levels of welfare spending. In sociology, the dependent variable might be a mea- sure of the social status of various occupations and the independent variables characteristics of the occupations (pay, qualifications, etc.). In psychology, the dependent variable might be individual’s racial tolerance as measured on a standard scale and with indicators of social background as indepen- dent variables. In education, the dependent variable might be a student’s score on an achievement test and the independent variables characteristics of the student’s family, teachers, or school. Before explaining the Bayesian regression, it will be reviewed the classical regression model, focusing on those parts useful for the former. 35
  • 54. 5. Regression Analysis 5.2 Classical Regression Model The simplest version of this model is the Normal linear model, where the variable y given X is a Normal distribution whose mean is a linear function of X: E(yi |β, X) = β0 + β1 xi1 + · · · + βp xip for all i = 1, . . . , n. (5.1) Even though the mean of y is a linear function of X, the real and the observed data do not fit in, and this is due to a random error, namely , so the appropriate form to reach a probabilistic linear model is through yi = β0 + β1 xi1 + · · · + βp xip + i for all i = 1, . . . , n. (5.2) where i is the term of the random error, which has a Normal distribution with mean 0 and variance σ 2 . Due to the fact that the random variable yi is the result of the addition of a constant (the mean) and a random variable which has Normal distribution, yi follows a Normal distribution: yi N (β0 + β1 xi1 + · · · + βp xip , σ 2 ) for all i = 1, . . . , n (5.3) When the variance of y given X, β is assumed to be constant over all observations, the model will be called ordinary linear regression model. In a matrix notation, the Normal linear model can be denoted by Y = Xβ + (5.4) and Y N (Xβ, σ 2 I) (5.5) where:         y1 1 x11 . . . x1p β0 0          y2  1 x21 . . . x1p  β1   1         Y = .  X = . . .. .  β= .  =. . . . . .  . . . . . .  . . yn 1 xn1 . . . xnp βp p and I is the identity matrix. ˆ It can be shown that the ordinary least squares estimate of β, namely β, is 36
  • 55. 5. Regression Analysis   ˆ β0   β0  ˆ ˆ   β = (X X −1 )X Y =  .  (5.6) . . ˆ β0 where     n n n n i=1 xi1 ... i=1 xik i=1 yi      n n 2 n   n   i=1 xi1 i=1 xi1 ... i=1 xik xik   i=1 xi1 yi  XX= . . .. .  XY = .   . . . . . . .   . .      n n n 2 n i=1 xik i=1 xik xi1 ... i=1 xik i=1 xik yi As well, it can be shown that ˆ E(β) = β (5.7) ˆ Furthermore, the variances of β are proportional to the elements of the matrix (X X)−1 , denoted by C, which multiplied by the constant σ 2 represents the covariance matrix. The elements of the diagonal of that matrix are the variances of ˆ V ar(βj ) = σ 2 Cjj for all j = 0, 1, . . . , p. (5.8) where C = (X X)−1 . Likewise, the classical estimation of σ 2 is given in terms of the sum of squares error, SSE = n i=1 (yi − yi )2 , and is given by the mean squares error: ˆ SSE ˆ ˆ (YX β) (YX β) ˆ Y Y −β X Y σ 2 = M SE = = = (5.9) n−p n−p n−p where n is the number of observations and p corresponds to the number of parameters β. Regarding individual regression coefficients β, there will be sometimes where to make hypothesis tests about them can be interesting in order to evaluate the potential value of each regressor variable of the model. The statistic to use in these cases is ˆ βj T0 = (5.10) σ 2 Cjj ˆ 37
  • 56. 5. Regression Analysis ˆ where Cjj is the element of the diagonal of the matrix XX corresponding to βj . So the null hypoth- esis will be rejected if |T0 | > tn−p, α . 2 Finally, once the model has been estimated and validated, one of its more important applications consists of new predictions about the response variable Y when a new explanatory variable X ∗ is observed. In this case, a point estimate would be ˆ ˆ Y ∗ = X∗ β (5.11) and a confidence interval for this future observation will be ˆ Y ∗ ± tn−p, α σ 2 (1 + X ∗ (X X)−1 X ∗ ) ˆ (5.12) 2 where X ∗ = [x∗ 1 x∗ 2 ... x∗ ] k (5.13) These results can be found in a more detailed way in [Mont02], [Zamo01] or [Mat´ 95]. e To understand better all that has been said above, let us see a practical application in the Stock Markets. Let us suppose we are interesting in investigating the relationship between Ibex 35 index and Dow Jones, FTSE 100 and Dax indexes the previous day. For such purpose, we have the points (taken as the mean of the daily maximum and minimum points) from January to October in 2006; this is during the first ten months in 2006. The model to adjust is: IBEX35t = β1 DowJonest−1 + β2 F T SE100t−1 + β3 DAXt−1 + t where t N (0, σ2 ) ˆ The estimates β are calculated according to what said before resulting:     β1 1.0147     β2  = −2.0085     β3 2.1082 38
  • 57. 5. Regression Analysis The estimate for the variance σ 2 is: σ 2 = 332.182 So the model calculated is: IBEX35t = 1.0147 × DowJonest−1 − 2.0085 × F T SE100t−1 + +2.1082 × DAXt−1 + t where t N (0, 332.182 ) This indicates that when Dow Jones or DAX goes up, Ibex 35 will increase the next day too. However, when FTSE100 arises, Ibex 35 will decrease the next day. If we use this model to predict the value which Ibex 35 will have on November, 1st , when DOW Jones, FTSE 100 and DAX values are known the previous day, we have: IBEX35t = 1.0147 × 12067 − 2.0085 × 6155 + 2.1082 × 6287 = 13137 Finally, a comparison between the multiple and the simple Normal linear regression models is shown in Table 5.1 indicating the different parameters to use in each case. The goal of this compar- ison is to make clear that the simple Normal regression is a particular case of the multiple Normal regression where there is only a regressor variable or predictor. 5.3 The Bayesian Approach The main difference between classical and Bayesian approach of the regression analysis is that the latter treats the parameters like random variables which have a distribution. The aim of Bayesian approach is to make inferences through the posterior distribution based on a prior distribution for the parameters β and σ 2 of the Normal linear model and to provide a predictive distribution for the model’s predictions. As it was said in the preceding section, and according to [Rossi06], the Normal linear regression model is given by: 39
  • 58. 5. Regression Analysis Multiple Normal Linear Regression Simple Normal Linear Regression Function yi = β0 + β1 xi1 + · · · + βp xip + i y = β0 + β1 x + Mean µi = β0 + β1 xi1 + · · · + βp xip µ = β0 + β1 x Variance σ2 σ2 Model Y N (µ, σ 2 I) Y N (µ.σ 2 ) ˆ β ˆ β = (X X)−1 X Y ˆ E[β] β β ¯ ˆ V ar(β) ˆ V ar(βj ) = σ 2 Cjj ˆ V ar(β0 ) = σ 2 1 + Pn x 2 n x 2 i=1 (xi −¯) 2 ˆ V ar(β1 ) = Pn σ i=1 (xi −¯)2 x Pn ˆ Y Y −β X Y y 2 i=1 (yi −ˆ1 ) σ2 σ2 = n−p σ2 = ˆ n−2 ˆ ˆ 1 (x −¯)2 x Prediction Yf ± tn−p, α σ 2 (1 + Xf (X X)−1 Xf ) ˆ Yf ± tn−p, α σ 2 (1 + ˆ n + Pn f x 2 ) 2 2 i=1 (xi −¯) Only applied to those data in same Only applied to those data in same Limitation range as sampled data range as sampled data Table 5.1: Multiple and Simple Regression Comparison Y = Xβ + (5.14) where 40
  • 59. 5. Regression Analysis N (0, σ 2 I) (5.15) So Y |X, β, σ 2 N (Xβ, σ 2 I) (5.16) For simplicity of notation, we will not explicitly include X in our conditioning set for regression model. Using the definition of the multivariate Normal density, the likelihood function is obtained: (σ 2 )−n/2 −1 p(Y |β, σ 2 ) = exp (Y − Xβ) (Y − Xβ) (5.17) (2π)n 2σ 2 It will be convenient to write (Y − Xβ) (Y − Xβ) (5.18) in terms of the ordinary least squares estimators v =n−p (5.19) ˆ β = (X X)−1 X Y (5.20) ˆ ˆ (Y −X β) (Y −X β) s2 = n−p (5.21) So ˆ ˆ (Y − Xβ) (Y − Xβ) = vs2 + (β − β)X X(β − β) (5.22) Then 1 −1 ˆ ˆ −vs2 p(Y |β, σ 2 ) = exp (β − β) (X X)(β − β) (σ 2 )−v/2 exp (5.23) (2π)n/2 σ p 2σ 2 2σ 2 As it was said before, n corresponds to the number of observations and p refers to the number of parameters β. This new form of expressing the likelihood function would be more useful to find a natural conjugate prior distribution, which would have the same form that the former has. 41
  • 60. 5. Regression Analysis The prior distribution for β and σ 2 , denoted by p(β, σ 2 ), can be written in a more convenient way applying the definition of the joint distribution: p(β, σ 2 ) = p(β|σ 2 )p(σ 2 ) (5.24) Note that β and σ 2 are supposed to be dependent, which will rarely occur. Some authors prefer 1 to work with the error precision, σ2 say, instead of the variance σ 2 . All this is very similar to that explained in the Bayesian Analysis for the Normal distribution. The term of the first parenthesis in the likelihood function suggests a form of a Normal distribution for the parameter β knowing σ 2 . So −1 p(β|σ 2 ) ∝ (σ 2 )−p exp (β − β0 ) V0−1 (β − β0 ) (5.25) 2σ 2 and, hence, β|σ 2 N (β0 , σ 2 V0 ) (5.26) According to [Rossi06] the term of the second parenthesis in the likelihood function suggests a form of an inverse gamma distribution for the parameter σ 2 (see appendix A). So v0 −v0 s2 p(σ 2 ) ∝ (σ 2 )−( 2 +1) exp 0 (5.27) 2σ 2 and, hence, v0 v0 s2 σ2 Inv − G , 0 (5.28) 2 2 Note that there is an extra term (σ 2 )−1 here which is not suggested by the form of the likelihood explained above. This term can be rationalized by viewing the conjugate prior as arising from the posterior of a sample of size v0 with sufficient statistics, s2 , β0 , formed with the noninformative prior, 0 p(β, σ 2 ) ∝ σ −2 , which will be briefly explained later. So the natural conjugate prior distribution of the parameters β and σ 2 is: p+v0 −1 p(β, σ 2 ) ∝ (σ 2 )−( 2 +1) exp [v0 s2 + (β − β0 ) V0−1 (β − β0 )] 0 (5.29) 2σ 2 and, hence, 42
  • 61. 5. Regression Analysis β, σ 2 N − Inv − χ2 (β0 , V0 s2 ; v0 , s2 ) 0 0 (5.30) where the prior hyper-parameters β0 ,V0 ,v0 and s2 show the knowledge that the researcher has about 0 the problem and her or his confidence in it. Furthermore, the parameter β0 measures the marginal effect of the explanatory variable on the dependent variable. As well, V0 indicates the uncertainty about the prior information and it plays the same role than (X X)−1 does in the classical approach, v0 represents a fictitious data set so it plays a similar role than n and s2 is an imaginary s2 for those 0 fictitious data. In terms of the distribution, β0 and V0 σ 2 represent the location and scale of β, respec- tively, and v0 and s2 the degrees of freedom and scale of σ 2 , respectively. 0 Since a conjugate prior distribution has been used, the posterior distribution will have the same form. That is, the posterior distribution will be a Normal-Scaled Inverse χ2 with a posterior hyper- parameters β1 , V1 , v1 and s2 . According to [Rossi06] and [Koop03], it can be shown that 1 β, σ 2 |y N − Inv − χ2 (β1 , V1 s2 ; v1 , s2 ) 1 1 (5.31) The relation between the prior and the posterior hyper-parameters, according to [Koop03], is: V1 = (V0−1 + X X)−1 (5.32) ˆ β1 = V1 (V0−1 β0 + X X β) (5.33) v1 = v0 + n (5.34) ˆ ˆ v1 s2 = v0 s2 + vs2 + (β − β0 )[V0 + (X X)−1 ]−1 (β − β0 ) (5.35) 1 0 As it was mentioned in the Bayesian Data Analysis chapter, a measure is needed to summarize the posterior distribution, and this is usually the posterior mean, namely E(β|y). According to what said in previous chapters, the marginal for β will be a multivariate t-distribution (see Appendix A): β|y tv1 (β1 , s2 V1 ) 1 (5.36) where ˆ E(β|y) = β1 = V1 (V0−1 β0 + X X β) (5.37) and 43
  • 62. 5. Regression Analysis v1 s2 1 V ar(β|y) = V1 (5.38) v1 − 2 ˆ So the posterior mean is a weighted average of the ordinary least squares estimate, β, and the prior mean, β1 , where those weights are proportional to the observed data, X X, and the importance given to the prior, V0−1 , respectively. This should make clear that as prior variance for β is decreased, greater posterior weight is placed on prior beliefs relative to the data, so the posterior mean will be closer to the prior mean. v1 s2 The elements of the diagonal of the matrix v1 −2 V1 1 are the variances of β0 , β1 , . . . , βp . v1 s2 1 V ar(βj ) = V1jj for all j = 0, 1, . . . , p (5.39) v1 − 2 Likewise, the marginal posterior for σ 2 is: σ 2 |y Inv − χ2 (v1 , s2 ) 1 (5.40) and, hence, v1 s2 E(σ 2 |y) = 1 (5.41) v1 − 2 2v1 s4 2 V ar(σ 2 |y) = 1 (5.42) (v1 − 2)2 (v1 − 4) So, as we increase the total of fictitious data v0 , then v1 tends towards v0 , and, hence, σ 2 get closer to s2 . Tables 5.2 and 5.3 shows how the different posterior parameters of interest vary depending on the prior parameters V0 (considering V0 as cIk ) and v0 and the sample size n: Table 5.2 means that if the size of the sample increases towards infinity, then the prior information that the researcher gives has very little or almost none importance, as it occurs if the precision of the prior distribution for β decreases (that is, V0 increases) towards 0. The difference between both cases is that in the former the variance of β is lower than in the latter. The number of fictitious data does not seem to affect to the posterior mean, but it affects to the posterior variance increasing it (resp. decreasing) as the fictitious data increase (resp. decrease). 44
  • 63. 5. Regression Analysis Action E[β|y] V ar[β|y] n Increase Closer to OLS estimates Closer to 0 Decrease Closer to β0 Further from 0 V0 Increase Closer to OLS estimates Further from 0 Decrease Closer to β0 Closer to 0 ν0 Increase Not affected Increase Decrease Not affected Increase Table 5.2: Sensitivity analysis of parameter β Table 5.3 refers to the parameter σ 2 , and it means that if the fictitious data increase, then the information given by the researcher will have much more weight over the posterior mean of σ 2 than the real data have, and the variance will be lower too. The other way round occurs when the number of real data increases. Then, the data information will have the most important weight and the prior information will not have any value. Another interesting result is that as the precision of the prior distribution for β decreases (that is, V0 increases) the posterior mean of σ 2 will approximate to the number of real data times the ordinary least estimates. Continuing in a different issue, the fact that the natural conjugate prior implies prior information enters in the same manner as data information helps with prior elicitation. When several priors can be applied to the same problem, two strategies can be adopted to surmount the possible criticisms. First, a prior sensitivity analysis can be carried out to demonstrate that results are the same with different priors chosen. But, if results are sensitive to choice of prior, Bayesian approach allows for the scientifically honest finding of such a state of affairs. There has been work done on extreme bounds analysis for quantities such as the posterior mean of a parameter. [Poir95] provides a detailed 45
  • 64. 5. Regression Analysis Action E[β|y] V ar[β|y] n Increase Closer to OLS estimates Closer to 0 Decrease Closer to s2 0 Further from 0 V0 Increase Closer to vs2 Closer to 0 Decrease Closer to OLS estimates Further from 0 ν0 Increase Closer to s2 0 Closer to 0 Decrease Closer to OLS estimates Further from 0 Table 5.3: Sensitivity analysis of parameter σ 2 discussion about this issue. A second strategy is to use a non-informative prior to let the data speak loudly and be predominant over prior information. For example, let’s set v0 = 0, and V0−1 = 0. Then β, σ 2 |y N − Inv − χ2 (β1 , V1 s2 ; v1 , s2 ) 1 1 (5.43) where V1 = (X X)−1 (5.44) ˆ β1 = β (5.45) v1 = n (5.46) v1 s2 = vs2 1 (5.47) With this non-informative prior, all of these formulas involve only data information and equal to ordinary least squares results. Bayesians often write this prior as: 46
  • 65. 5. Regression Analysis p(β, σ 2 ) ∝ σ −2 (5.48) Finally, one of the goals of the Bayesian approach is to provide a predictive model to predict an unobserved data point generated from the same model that the data set with n observations (N (0, σ 2 ) with the same β). This is denoted by: Y ∗ = X ∗β + ∗ (5.49) where Y ∗ is not observed and ∗ is independent of . Bayesian prediction is based on calculating p(y ∗ |y) = p(y ∗ |y, β, σ 2 )p(β, σ 2 |y)dβdσ 2 (5.50) The key to get the prediction is to find out the form of p(y ∗ |y, β, σ 2 ), since the posterior p(β, σ 2 |y) has been already calculated, and to test if p(y ∗ |y) is easy to integrate or, on the contrary, a posterior simulator has to be employed. Since ∗ is independent of , then Y ∗ is independent of Y , and p(y ∗ |y, β, σ 2 ) can be written as p(y ∗ |β, σ 2 ), which is a multivariate Normal, as it was seen before. T (σ 2 )− 2 1 p(y ∗ |β, σ 2 ) = exp − (y ∗ − X ∗ β) (y ∗ − X ∗ β) (5.51) (2π)S 2σ 2 Multiplying this by the posterior obtained previously and integrating yields a multivariate t: y ∗ |y tv1 (X ∗ β1 , s2 (IT + X ∗ V1 X ∗ )) 1 (5.52) where T is the number of observed X ∗ . It is easy to see that: E(y ∗ |y) = X ∗ β1 V ar(y ∗ |y) = s2 (IT + X ∗ V1 X ∗ ) 1 (5.53) A brief summary that compares the classical and the Bayesian approaches is displayed to note the coincidences and differences between them. 47
  • 66. 5. Regression Analysis Classical Regression Bayesian Regression ˆ β = (X X)−1 X Y ˆ β1 = V1 V0−1 β0 + X X β ˆ ˆ (β−β0 ) (β−β0 ) ˆ ν0 s2 +νs2 + Y Y −βX Y 0 V0 +(X X)−1 σ2 ˆ = n−p s2 1 = ν1 ˆ E[β] = β E[β|y] = β1 ˆ ν1 s2 V ar βj = σ 2 Cjj V ar (βj |y) = ν1 −2 V1jj 1 Y ∗ |y tn−p X ∗ β, σ 2 IT ˆ Y ∗ |y tν1 X ∗ β1 , s2 IT + X ∗ V1 X ∗ 1 Table 5.4: Classical and Bayesian regression comparison A very interesting and more exhaustive comparison between these two approaches can be read in the article written by [Urba92], where he explains the advantages and disadvantages of using each of them. 5.4 Normal Linear Regression Model subject to inequality constraints In this section, let us guess we want to impose inequality constraints on the coefficients in the Normal linear regression model, such as βj ∈ A, where A is the region of all valid values of the coefficients. This is quite simple in Bayesian regression since they are imposed through the prior distribution: p(β, σ 2 ) N − Inv − χ2 (β0 , V0 s2 ; v0 , s2 )1(β ∈ A) 0 0 (5.54) where β0 , V0 , v0 and s2 are prior hyper-parameters to be chosen and 1(β ∈ A) is the indicator func- 0 tion, which equals 1 if β ∈ A and 0 otherwise. Likewise, the posterior distribution for β is now: 48
  • 67. 5. Regression Analysis p(β|y) ∝ tv1 (β1 , s2 V1 )1(β ∈ A) 1 (5.55) where β1 , V1 , v1 and s2 were defined previously. 1 So the difference introducing inequality constraints is that we must add the indicator function now. This can result very easy, but for general choice of A neither analytical posterior results nor Gibbs sampling work. The most suitable method is the importance sampling, which has already explained. In this case, according to [Koop03] the importance function is: q(β) = tv1 (β1 , s2 V1 ) 1 (5.56) The strategy consists of getting draws y ∗(s) drawing from p(y ∗ |β (s) , σ 2(s) using the draws β (s) and σ 2(s) which were obtained for the posterior distribution. Then using these draws (y ∗ )(s) in the Importance Sampling, the mean and the variance can be calculated. Other more simple way consists of ignoring the constraints until the end of simulation, and then discarding those draws which violate the restrictions. According to [Gelm04], this works reasonably well if the constraints do not eliminate a large portion of data. 5.5 Normal Linear Regression Model with Independent Parameters Now, suppose that the parameters β and σ 2 are independent, so p(β, σ 2 ) = p(β)p(σ 2 ) (5.57) With the same likelihood function as that used in the previous section, this assumption implies that β follows a Multivariate Normal Distribution with mean β0 , as it occurred with β and σ 2 dependent, but with variance V0 , and σ 2 has exactly the same Scaled − Inv − χ2 distribution used previously. That is: β N (β0 , V0 )σ 2 Inv − χ2 (v0 , s2 ) 0 (5.58) The prior joint distribution is 49
  • 68. 5. Regression Analysis 1 v0 v0 s2 p(β, σ 2 ) ∝ exp − (β − β0 ) V0−1 (β − β0 ) (σ 2 )−( 2 +1) exp − 0 (5.59) 2 2σ 2 β, σ 2 N − Inv − χ2 (β0 , V0 , v0 , s2 ) 0 (5.60) As the posterior joint distribution is proportional to the prior times the likelihood: 1 (Y − Xβ) (Y − Xβ) p(β, σ 2 |Y ) ∝ exp − + (β − β0 ) V0−1 (β − β0 ) × 2 σ2 n+v0 v0 s2 + vs2 × (σ 2 )−( 1 +1) exp − 0 2 (5.61) 2σ Since this function does not take the form of any well-known density, it is interesting to find the conditional distributions for β, p(β|Y, σ 2 ), and for σ 2 , p(σ 2 |Y, β), because with them any informa- tion from p(β, σ 2 |Y ) can be obtained through the posterior simulation with the Gibbs sampler already explained in previous chapters. According to [Koop03], it can be shown that those conditional distributions are: 1 p(β|Y, σ 2 ) ∝ exp − (β − β1 ) V1−1 (β − β1 ) (5.62) 2 p+v0 1 p(σ 2 |Y, β) ∝ (σ 2 )−( 2 +1) exp − 2 (Y − Xβ) (Y − Xβ) + v0 s2 + vs2 0 (5.63) 2σ And this all yields: β|y, σ 2 N (β1 , V1 ) (5.64) σ 2 |y, β Inv − χ2 (v1 , s2 ) 1 (5.65) where 1 V1 = (V0−1 + X X)−1 (5.66) σ2 1 β1 = V1 (V0−1 |β0 + 2 X Y ) (5.67) σ v1 = n + v0 (5.68) (Y − Xβ) (Y − Xβ) + v0 s2 s2 1 = 0 (5.69) v1 50
  • 69. 5. Regression Analysis The fact that the posterior distribution has an unknown form affects to the prediction for y ∗ , p(y ∗ |y), too. As it has been already said for the posterior predictive in Bayesian Approach chapter, the interest is on p(y ∗ |y, β, σ 2 ). Since y and y ∗ are independent of one another, p(y ∗ |y, β, σ 2 ) = p(y ∗ |β, σ 2 ) (5.70) And hence T ∗ 2 (σ 2 ) 2 1 p(y |β, σ ) = exp − (y ∗ − X ∗ β)(y ∗ − X ∗ β) (5.71) (2π) T 2 2σ 2 As the analytical solution of the integral of this figure is not trivial, the importance of Gibbs sampler arises again, and, combine it with the Monte Carlo integration, any posterior and predictive inference can be done. The strategy consists of getting draws y ∗(s) drawing from p(y ∗ |β s , σ 2(s) ) using the draws β (s) , σ 2(s) which were obtained for the posterior distribution. Then using these draws y ∗(s) in the Monte Carlo integration the mean and the variance can be calculated. 5.6 Normal Linear Regression Model with Heteroscedasticity and Cor- relation Until now the variances have been supposed to be equal and having no correlation, but this is not very realistic. In this section we are going to relax that assumption and to consider the next model: Y = Xβ + (5.72) where N (0, Σ) (5.73) That is, we are considering heteroscedasticity and correlation. According to [Koop03], since Σ is a positive definite matrix, a matrix P can be found that verifies P ΣP = I, and it can be shown that Y ∗ = X ∗β + ∗ (5.74) where ∗ (0, σ 2 I) (5.75) 51
  • 70. 5. Regression Analysis and Y ∗ = PY (5.76) X∗ = P X (5.77) ∗ =P (5.78) Then, the likelihood function to consider now is: 1 2 −p 1 ˆ ˆ p(Y |β, σ 2 , Σ) = n (σ ) 2 exp − (β − βΣ) X Σ−1 X(β − βΣ) × (2π) 2 2σ 2 v −vsΣ−2 × (σ 2 )− 2 exp (5.79) 2σ 2 where: v = n−p (5.80) ˆ βΣ = (X ∗ X ∗ )−1 X ∗ Y ∗ (5.81) ˆ ˆ (Y ∗ − X ∗ βΣ) (Y ∗ − X ∗ βΣ) s2 (Σ) = (5.82) v which is very similar to that use with equal variances. Using the prior distributions described in the previous section, we have: p(β, σ 2 , Σ) = p(β)p(σ 2 )p(Σ) (5.83) where β is normally distributed with prior parameters β0 , V0 and σ 2 is an scaled inverse Chi-square with parameters v0 and s2 . 0 Hence, knowing that the posterior distribution is proportional to the prior times the likelihood: 1 (Y ∗ − X ∗ β) (Y ∗ − X ∗ β) p(β, σ 2 , Σ|Y ) ∝ p(Σ) × exp − + (β − β0 ) V0−1 (β − β0 ) × 2 σ2 n+v0 v0 s2 × (σ 2 )−( 2 +1) exp − 2 0 (5.84) 2σ 52
  • 71. 5. Regression Analysis This suggests a Normal distribution for the posterior conditional for β and an scaled inverse Chi- square for the posterior conditional for σ 2 , as occurred before. Therefore: β|Y, σ 2 , Σ N (β1 , V1 ) (5.85) σ 2 |Y, β, Σ Inv − χ2 (v1 , s2 ) 1 (5.86) where X Σ−1 X −1 V1 = (V0−1 + ) (5.87) σ2 ˆ X Σ−1 X βΣ β1 = V1 (V0−1 β0 + 2 ) (5.88) σ v1 = n + v0 (5.89) (Y − Xβ) Σ (Y − Xβ) + v0 s2 s2 1 = 0 (5.90) v1 According to [Koop03], the posterior conditional for Σ yields: 1 1 p(Σ|Y, β, σ 2 ) ∝ p(Σ)|Σ|− 2 exp − (Y − Xβ) Σ−1 (Y − Xβ) (5.91) σ2 So we have come to the point where the form that Σ takes is crucial. 5.6.1 Heteroscedasticity Let us suppose we suspect that there is not correlation among the errors but their variances are differ- ent. Hence, we will have n variances ωi for n errors i . It could be that the researcher has an idea of the form of Σ and assumes that ωi = h(xi , α) = (1 + α1 xi1 + · · · + αp xip )2 (5.92) That is, the variances are related to some or all independent variables. The researcher should choose a prior for α, and then, Bayesian inference can be carried out through a Metropolis-Hastings algorithm such as Random walk. If the researcher knows that the error variances are different but has not idea of their form, then a prior for Σ has to be chosen. According to [Koop03]: 53
  • 72. 5. Regression Analysis n p(Σ) = p(ωi ) (5.93) i=1 where ωi Inv − χ2 (vw , 1) (5.94) But now a hyper-prior distribution should be fixed for vw , such as p(Σ) = p(Σ|vw )p(vw ) (5.95) That is, we are using a hierarchical prior to treat the heteroscedasticity. According to [Gelm04] a Metropolis-Hastings algorithm can be used to draw posterior simulations. 5.6.2 Correlation Now, let us assume that there is some correlation among the errors through the time- space relationship such as the error in one period depends on that in the previous period. This is a type of regression called autoregressive, and it can be considered a time series. For example, if we are considering the relation among the Ibex 35 values one day and the previous ones, we could say that there is a correlation between errors that exists in the relation among Fridays and previous days and what exists in the relation among the values on Thursdays or Wednesdays or Tuesdays and the previous days. That is: t = ρ1 t−1 + ρ2 t−2 + · · · + ρp t−p + ut (5.96) where ut N (0, σ 2 ) (5.97) We will consider that there is stationary. This means, in a general way, that the probability dis- tribution does not vary through the time. Some time series does not seem to be stationary, but the differences do. The main difference to take into account is the first one mentioned. The first differ- ence of t is δ t and it indicates the variation in among the periods t and t − 1, t − 2 ,. . . , t − p. According to [Koop03], the irregular component ut can be formulated in the following way: ρ(L) t = ut (5.98) 54
  • 73. 5. Regression Analysis where L is called the lag operator and has the property that L t = t−1 and ρ(L) = (1 − ρ1 L − · · · − ρp Lp ). So, if we have the regression model: Yt = Xt β + t (5.99) Then, it is possible to find a model such as Yt∗ = Xt β + ut , ut ∗ N (0, σ 2 ) (5.100) where Yt∗ = ρ(L)Yt (5.101) ∗ Xt = ρ(L)Xt (5.102) Therefore, using an independent Normal scaled inverse chi-square prior for β and σ 2 , it yields: β|Y, σ 2 , ρ N (β1 , V 1) (5.103) σ 2 |Y, β, ρ Inv − χ2 (v1 , s2 ) 1 (5.104) where X ∗ X ∗ −1 V1 = (V0−1 + ) (5.105) σ2 X∗ Y ∗ β1 = V1 (V0−1 β0 + ) (5.106) σ2 v1 = v0 + T − p (5.107) (Y ∗ − X ∗ β) (Y ∗ − X ∗ β) + v0 s2 s2 1 = 0 (5.108) v1 And now, as it occurred with heteroscedasticity, a prior should be selected for ρ. Let us choose a multivariate Normal subject to the constraint ρ ∈ φ, where φ is the stationary region. Then, p(ρ) N (ρ0 , Vρ0 )1(ρ ∈ φ) (5.109) p(ρ|Y, β, σ 2 ) N (ρ1 , Vρ1 )1(ρ ∈ φ) (5.110) 55
  • 74. 5. Regression Analysis where ρ0 and Vρ0 are the prior parameters which the research should establish and ρ1 and Vρ1 are the posterior parameters with the next relation: −1 E E −1 Vρ1 = (Vρ0 + ) (5.111) σ2 −1 EE ρ1 = Vρ1 (Vρ0 ρ0 + 2 ) (5.112) σ where E is a matrix with the errors through the time from t − 1 to t − p for each independent variable. According to [Koop03], a Gibbs sampler can be used to draw posterior simulations. 5.7 Models Summary Since the main models to be used in the posterior application are those homoscedastic and not auto correlated, the main ideas are shown in Tables 5.5, 5.6, 5.7 and 5.8. 56
  • 75. 5. Regression Analysis Case β σ2 Joint Prior Distribution p(β, σ 2 ) = p(β|σ 2 )p(σ 2 ) β|σ 2 N (β0 , σ 2 V0 ) σ2 Inv − χ2 (v0 , s2 ) 0 β, σ 2 N − Invχ2 (β0 , V0 s2 ; v0 , s2 ) 0 0 p(β, σ 2 ) = p(β|σ 2 )p(σ 2 ) β|σ 2 N (β0 , σ 2 V0 )1(β ∈ A) σ2 Inv − χ2 (v0 , s2 ) 0 β, σ 2 N − Invχ2 (β0 , V0 s2 ; v0 , s2 )1(β ∈ A) 0 0 57 p(β, σ 2 ) = p(β)p(σ 2 ) β N (β0 , V0 ) σ2 Inv − χ2 (v0 , s2 ) 0 β, σ 2 N − Invχ2 (β0 , V0 ; v0 , s2 ) 0 p(β, σ 2 ) = p(β)p(σ 2 ) β N (β0 , V0 )1(β ∈ A) σ2 Inv − χ2 (v0 , s2 ) 0 β, σ 2 N − Invχ2 (β0 , V0 ; v0 , s2 )1(β ∈ A) 0 Table 5.5: Main Prior Distributions Summary
  • 76. Case Joint Posterior Distribution Key 5. Regression Analysis Obtain Margin Distribu- p(β, σ 2 |y) = p(y|β, σ 2 )p(β|σ 2 )p(σ 2 ) β, σ 2 N − Inv − χ2 (β1 , V1 s2 ; v1 , s2 ) 1 1 tions, Draw directly from them and summarize Obtain Margin Distri- butions, Draw directly p(β, σ 2 |y) = p(y|β, σ 2 )p(β|σ 2 )p(σ 2 ) β, σ 2 N − Inv − χ2 (β1 , V1 s2 ; v1 , s2 )1(β ∈ A) 1 1 from them, discard invalid draws and summarize 58 Obtain Conditional Distri- 1 (Y −Xβ) (Y −Xβ) p(β, σ 2 |y) = p(y|β, σ 2 )p(β)p(σ 2 ) ∝ exp − 2 σ2 + (β − β0 ) V0−1 (β − β0 ) × butions, Draw with Gibbs Sampler and summarize n+v0 v s2 +vs2 2 +1) 0 × (σ 2 )−( exp − 0 2σ2 Obtain Conditional Distri- (Y −Xβ) (Y −Xβ) butions, Draw with Gibbs p(β, σ 2 |y) = p(y|β, σ 2 )p(β)p(σ 2 ) ∝ exp − 1 2 σ2 + (β − β0 ) V0−1 (β − β0 ) × Sampler, discard invalid n+v0 draws and summarize +1) v0 s2 +vs2 0 × (σ 2 )−( 2 exp − 1(β ∈ A) 2σ 2 Table 5.6: Main Posterior Distributions Summary
  • 77. 5. Regression Analysis Case Prior Parameters Posterior Parameters Relation β0 β1 ˆ β1 = V1 (V0−1 β0 + X X β) V0 V1 V1 = (V0−1 + X X)−1 p(β, σ 2 |y) = p(y|β, σ 2 )p(β|σ 2 )p(σ 2 ) v0 v1 v1 = v0 + n ˆ ˆ v0 s2 +vs2 +(β−β0 )[V0 +(X X)−1 ](β−β0 ) s2 s2 0 0 1 s2 = 1 v1 59 X Y β0 β1 β1 = V1 (V0−1 β0 + σ2 ) X X −1 V0 V1 V1 = (V0−1 + σ2 ) p(β, σ 2 |y) = p(y|β, σ 2 )p(β)p(σ 2 ) v0 v1 v1 = v0 + n v0 s2 +(Y −Xβ) (Y −Xβ) 0 s2 0 s2 1 s2 1 = v1 Table 5.7: Prior and Posterior Parameters Summary
  • 78. Case p(y ∗ |y, β, σ 2 ) Key Constraint 5. Regression Analysis Obyain draws y ∗ from p(y ∗ |y, β, σ 2 ) using previous draws from posterior p(y ∗ |y) = p(y ∗ |y, β, σ 2 )p(β|σ 2 , y)p(σ 2 |y)dβdσ 2 N (β, σ 2 ) No simulation. Use Monte Carlo integra- tion to get predictive inferences. Obyain draws y ∗ from p(y ∗ |y, β, σ 2 ) using previous draws from posterior p(y ∗ |y) = p(y ∗ |y, β, σ 2 )p(β|σ 2 , y)p(σ 2 |y)dβdσ 2 N (β, σ 2 ) Yes simulation. Use Monte Carlo integra- tion to get predictive inferences. 60 Obyain draws y ∗ from p(y ∗ |y, β, σ 2 ) using previous draws from posterior p(y ∗ |y) = p(y ∗ |y, β, σ 2 )p(β|y)p(σ 2 |y)dβdσ 2 N (β, σ 2 ) No simulation. Use Monte Carlo integra- tion to get predictive inferences. Obyain draws y ∗ from p(y ∗ |y, β, σ 2 ) using previous draws from posterior p(y ∗ |y) = p(y ∗ |y, β, σ 2 )p(β, y)p(σ 2 |y)dβdσ 2 N (β, σ 2 ) Yes simulation. Use Monte Carlo integra- tion to get predictive inferences. Table 5.8: Main Posterior Predictive Distributions Summary
  • 79. Chapter 6 Symbolic Data 6.1 What is symbolic data analysis? Nowadays there are more and more data which are susceptible to be analyzed and studied. The tech- nological advances let us get huge quantities of information about a specific variable. But part of that information is lost due to the fact that standard statistical methods do not have the flexibility to manage such quantity of information. For example, let us assume we are studying the evolution of stock prices for an enterprise. At the end of each month we would have the different values that the stock has been taking daily. It seems reasonable to think that the researcher would take only the daily close prices, or the daily mean prices, but he would not manage all the gathered information. The symbolic data analysis (SDA) deals with this problem and let us analyse vast information ef- ficiently in order to extract the required knowledge and to represent it better. Going on with the same example, the symbolic data will let the engineer manage the daily maximum and minimum prices of a month, or manage a histogram for monthly prices and work with them. In this way, SDA com- plements other statistical tools which are widely used, such as candlesticks. More information about candlesticks and other interesting tools can be found in [Lee 06] and [Irpi05]. For instance, Figure 6.1 illustrates an interval time series for the daily maximum and minimum Ibex 35 values in January 2006. So, the possibilities with symbolic data are evident. For instance, let us think of an application of this with warrants. Warrant is a right, without obligation, to buy, namely call warrant, or to sell, namely put warrant, something at an agreed price, namely strike. So you could get a predicted stock price range to decide the best put warrant or the most suitable call warrant, and obtain more benefits. 61
  • 80. 6. Symbolic Data Figure 6.1: Interval time series Regarding to the aggregation method used by SDA lies the notion of a symbolic object. This is a mathematical model of a concept (see [Dida95]) which, basically, let us select some individuals from a group. Going further with SDA and according to [Bill06a], three main kinds of symbolic data can be considered: multi-valued, interval-valued and modal- valued. As far as the former is concerned, a multi-valued symbolic random variable Y is one whose pos- sible value takes one or more values from the list of values in its domain Y. The complete list of possible values in Y is finite, and values may be well- defined categorical or quantitative values. For example, let us have all the companies which have formed the Ibex 35 index since its be- ginning. Then we could define a variable Y = blue chips in the Ibex 35 having 15 observations wu = year. Thus, we have, for instance, that during the first year, in 1992 (wu = w1 ), Telef´ nica, o Repsol, Endesa, SCH and BBVA were considered to be the blue chips. In 2007 (wu = w1 ) Santander, Telef´ nica, BBVA, Endesa and Repsol YPF are considered to be the blue chips. o Likewise, an interval-valued symbolic random variable Y is one that takes values in an interval. 62
  • 81. 6. Symbolic Data wu Year Z = Blue chips in Ibex 35 w1 1992 {Telef´ nica, Repsol, Endesa, SCH, BBVA} o . . . . . . . . . w15 2007 {Telef´ nica, Repsol YPF, Endesa, BBVA, Santander} o Table 6.1: Multivalued Data Example That interval can be closed or open at either end. This is very important in SDA; furthermore, it can extract the tendency of centralization and dispersion of a dataset. Let us recall the example of the daily stock prices for a company in a month. This information can be recorded as the daily maximum and minimum values during the month. As this is one of the most interesting types of symbolic data for our purpose, we will take it up again below. Finally, let a random variable Y take possible values {ηk : k = 1, 2, . . . } over a domain Y. Then, a modal valued outcome is that formed with the value ηk and an associated measure πk . This last one is usually a weight, probability, relative frequency, and the like. But it can also be capacities, necessities, possibilities, credibilities and related entities. Then modal multi-valued variable can be defined now. This is a variable whose observed outcome takes values that are a subset of the domain with its respective measure. For example we could define a variable Z = Importance of the companies in the Ibex 35 index. Thus, for instance, we have that the most important company in 1992 was Telef´ nica and now Santander is currently the company with o highest weight in the index in 2007. Another example: let us suppose we define a variable Y = Maximum daily stock price for enter- prises in the Continuous Spanish Stock Market. We could have for the enterprise Endesa: 63
  • 82. 6. Symbolic Data wu Year Z = Importance of a Ibex 35 company {Telef´ nica, 13.7; Repsol, 9.7; Endesa, 9.2; SCH, 8.0, BBVA, o 7.2; Iberdrola, 6.9; Santander, 5.9; Banco Popular, 3.8; Banesto, 3.6; Banco Exterior, 3.0; Cepsa, 2.5; Tabacalera, 2.4; Acesa, 2.1; Uni´ n FENOSA, 2.0; Gas Natural, 1.9; Sevillana de Elec- o tric, 1.8; Fuerzas E. Catalua, 1.7; Bankinter, 1.6; Dragados, w1 1992 1.4; Aguas de Barcelona, 1.3; Mapfre, 1.3; Asland, 1.2;FCC, 1.1; Portland Valderribas, 1.0; Hidrocantbrico, 0.8;Vallehermoso, 0.8; Metrovacesa, 0.8; Acerinox, 0.7; Viscofn, 0.6; Cubiertas y MZOV, 0.5; Sarri´ , 0.4; Uralita, 0.4; Huarte, 0.3; Urbis, 0.3; o Agromn, 0.2} . . . . . . . . . { Telef´ nica, 16.0; Repsol YPF, 5.9; Endesa, 7.5; BBVA, 13.0; o Iberdrola, 5.6; Santander, 17.2; Banco Popular, 3.4; Banesto, 0.5; Uni´ n FENOSA, 1.8; Gas Natural, 1.5; Bankinter, 0.9; Cor. o Mapfre, 0.7; FCC, 1.2; Sacyr Vallehermoso, 1.0; Metrovacesa, w15 2007 0.5; Acerinox, 1.0; Inditex 3.0; ACS Const, 2.9; B. Sabadell, 2.1; Altadis, 2.0; Abertis A, 2.0; G. Ferrovial, 1.6; Acciona, 1.4; FCC, 1.2; Gamesa, 1.0; Enags, 0.8; REE, 0.8, Cintra, 0.7; Agbar, 0.7; Telecinco, 0.6; Iberia, 0.5; Indra A, 0.5; Fadesa, 0.5; Sogecable, 0.4; Antena 3 TV, 0.4; NH Hoteles, 0.4} Table 6.2: Modal-multivalued Example Y (Endesa) = {38.7, 0.125; 38.75, 0.125; 38.8, 0.250; 38.85, 0.250; 38.9, 0.125; 39, 0.125} This means that we assign a probability of 0.125 to the possibility of that Endesa maximum daily price is 38.7, 0.125 to the possibility of that Endesa maximum daily price is 38.75, a probability of 64
  • 83. 6. Symbolic Data 0.25 to the possibility of that Endesa maximum daily price is 38.8 and so on. Another very interesting variant of this type are modal interval-valued variables. That is, instead of a value with a probability, the variable can take any value in an interval with a probability. Contin- uing with the previous example: Y (Endesa) = {[38.7, 38.75), 0.125; [38.75, 38.85), 0.125; [38.8, 38.9), 0.25} For more information and other types of data, the reader is referenced to [Bill06a], [Huiw06] and [Arro06]. 6.2 Interval-valued variables As it has been already mentioned, to summarize a dataset is one of the three possible sources or rea- sons from which the interval data may result. According to [Huiw06], there are other two sources: the imprecision of measurement and the expert’s knowledge including uncertainty. Now, suppose u ∈ E is the set of m symbolic objects with observations Y (u) with u = 1, . . . , m. Let us suppose we are interested in the particular random variable Yj ≡ Z, and that the realization of Z for the observation wu is the interval Z(wu ) = [au , bu ] = ξ. Then, according to [Bill06a], the empirical density function of Z is 1 Iu (ξ) f (ξ) = , ξ∈R (6.1) m Z(u) u∈E where Iu (·) indicates if ξ is or is not in the interval Z(u) and where Z(u) is the length of that interval. Likewise, it can be shown that the symbolic empirical mean is given by ¯ 1 Z= (bu + au ) (6.2) 2m u∈E and the symbolic empirical variance is given by 2 2 1 1 S = b2 u + bu au + a2 u − (bu + au ) . (6.3) 3m 4m2 u∈E u∈E 65
  • 84. 6. Symbolic Data These formulas are coherent with the hypothesis of uniformity into the intervals. As well as the symbolic mean can be understood intuitively as the centre of gravity, the symbolic variance is not so easy to be understood. In fact, it would seem more reasonable to formulate the variance as: 2 2 1 12 S = (bu + au ) − (bu + au ) . (6.4) 4m 4m2 u∈E u∈E That is, the variance of the midpoints. But this last formulation does not take into account the internal variation of the intervals, while the former does and, hence, this is higher. For example, let us consider the maximum and minimum points for the Ibex 35 during December 2006. Then, according to what said above, the mean point in that month was: ¯ 1 Z= (highu + lowu ) = 14116. 38 u∈E And the empirical symbolic variance is: 2 2 1 1 S = b2 u + bu au ) + a2 u − (bu + au ) = 28006. 3m 4m2 u∈E u∈E If we had calculated the variance taking only the midpoints the result would have been: 2 2 1 2 1 S = (bu + au ) − (bu + au ) = 26023 4m 4m2 u∈E u∈E which is lower than that obtained previously because it does not take into consideration the internal variation of the intervals. Although it seems that everything related to interval-valued data are advantages, according to [Huiw06], there are two major limitations when applying multivariate analysis on an interval dataset. The first one is that the computing work is hard, and the second one is that the hyperrectangle may enlarge the range of the original dataset and reduce analysis accuracy. The methodology of interval data applied to multivariate analysis involved transforming symbolic data matrix into numerical matrix. That is, to reduce p-dimensional observations into s-dimensional 66
  • 85. 6. Symbolic Data components (where usually s p). This is called Principal Component Analysis. There are two main methods that carry out this: the Vertices Method and the Centres Method. The former consists of getting a matrix with 2p rows and p columns, from a hyperrectangle in the p−dimensional space, where each row contains the coordinates of one vertex of hyperrectangle in Rp . On the other hand, the latter deals with the idea of the average value of every variable for each category of data. A more extended review of these two methods can found in [Bill06a]. [Huiw06] point up some limitations of these methods and propose a new type of symbolic data: factor interval data. Due to the fact that the symbolic data is a wide field, the reader is referenced to all the above citations. 6.3 Classical regression analysis with Interval-valued data Regarding to classical multiple regression there are three current approaches to be considered, though one of them is just a regression fit. Let us begin with the most intuitive to finish with the most con- ceptual. Due to the fact that now we have intervals instead of single values, it would be natural to take midpoints and to proceed as it was done with multiple classical regression. That is, to use the result to make new predictions from a new interval applying it to each extreme of the interval. Moreover, [DeCa05] remark the need of establishing the constraint βi ≥ 0 to ensure that the lower extreme of the predicted interval is lower than the higher extreme, and suggest the algorithm presented by [Laws74] to solve such constraint. We suggest the alternative of getting enough draws from the posterior distri- bution of β, discard those who are negative and average. Let us recall the same example shown in the classical multiple regression, but taking now the maximum and minimum values that Ibex 35, Dow Jones, FTSE 100 and DAX took in the first ten months in 2006. We would take the midpoints of those intervals and we would obtain the same result that we got in the classical multiple regression: IBEX35t = 1.0102DowJonest−1 − 2.0144F T SE100t−1 + 2.1229DAXt−1 + t where t N 0, 332.712 67
  • 86. 6. Symbolic Data We could use this model to predict a new observation for November, 1st , applying it to each extreme of the intervals: max (IBEX35t ) = 1.0102 × 12161 − 2.0144 × 6149.9 + 2.1229 × 6289.7 = 13242.41 min (IBEX35t ) = 1.0102 × 11986.84 − 2.0144 × 6110.9 + 2.1229 × 6237.55 = 13040.7 So the prediction would be: [13040.7, 13242.41]. A disadvantage of this approach is that it does not take into account the interval length. To solve that problem, [DeCa05] and [DeCa04] suggest another regression for the interval range. They refer to this new approach as the constrained centre and range method (CCRM). In that case the constraint is applied to the interval range regression instead of to the centres regression. We will employ the radii instead of the ranges. So, going on with the previous example, we would have the radios for the different indexes to build the next model: RadioIBEX35t = 0.35RadioDowJonest−1 + 0.484RadioF T SE100t−1 + + 0.272RadioDAXt−1 + t where t N (0, 26.312 ) With this new approach, the prediction could be calculated from the midpoint and the range of the interval: M idpointIBEX35t = 1.0102 × 12073.65 − 2.0144 × 6130.4 + 2.1229 × 6262.125 = 13141.3 RadioIBEX35t = 0.35 × 86.81 + 0.484 × 19.5 + 0.272 × 24.575 = 46.53 Now the prediction would be: [13094.75, 13187.81]. Finally, the last approach is the use of the symbolic mean, the symbolic variance and the sym- bolic covariance to make the regression. This means that a symbolic regression is used instead of the 68
  • 87. 6. Symbolic Data classical regression. For this new approach another way of estimating is needed. Recall the classical univariate multiple regression model: Y = β0 + X1 β1 + · · · + Xp βp + (6.5) where N (0, σ 2 ) (6.6) Calculating the mean values we have: ¯ ¯ ¯ Y = β0 + X1 β1 + · · · + Xp βp + ¯ (6.7) from which it can be easily deduced that: β0 = 0 ⇒ ¯ = 0 (6.8) This means that the mean error is zero if there is a constant term in the model. This is a very important point for the posterior consequences. Then we can obtain an equivalent model as: ¯ ¯ ¯ Y − Y = β0 + X1 − X1 β1 + · · · + Xp − Xp βp + (6.9) ¯ ¯ where Y − Y is the new dependent variable and X − X is the new matrix of independent variables. β can be estimated in the following way: ˆ −1 β = SXX SXY (6.10) where   var (X1 ) cov (X1 , X2 ) . . . cov (X1 , Xp )   cov (X1 , X2 ) var (X2 ) . . . cov (X2 , Xp )   SXX = . . .. .   . . . . . . .    cov (X1 , Xp ) cov (X2 , Xp ) . . . var (Xp ) and 69
  • 88. 6. Symbolic Data   cov (X1 , Y )   cov (X2 , Y )   SXY = .   . .    cov (Xp , Y ) where independent term is not being taken into account (so there is no column of ones in matrix X). The independent term β0 is estimated in the following way: p ˆ ¯ β0 = Y − ¯ βj Xj (6.11) j=1 With this new way, the symbolic variance, the symbolic covariance and the symbolic mean for interval- valued variables can be used to estimate β. But this approach has the limitation that to be able to employ the symbolic statistics and the last way of estimating it is necessary to introduce the independent term in the regression model. In fact, the most important point is that this last approach suggested by [Bill06a] is just a regression fit since they do not defined any residual term for symbolic data. 6.4 Bayesian regression analysis with Interval-valued data Once we know how the interval-valued data can be employed in the classical regression, let us see how this could be included in the Bayesian approach. For such purpose we will employ the CCRM proposed by [DeCa05]. According to what has been said above and in Bayesian Regression, there is nothing new to be done. The problem is reduced to two Bayesian regressions: one for the centres and another for the radios with a constraint applied to . As we saw in Bayesian Regression, the constraint is much easier to be incorporated into the Bayesian approach than into the classical one. So, introducing the Bayesian approach into the regression with symbolic data, the engineer will be able to incorporate more information to the problem that he could do with Bayesian regression and traditional data. This is due to the fact that now two regressions are being made, and the expert will be able to establish if the centres value will increase or decrease and the same for the radios. In this 70
  • 89. 6. Symbolic Data sense, an opinion like: ’I think that Dow Jones will have less importance over Ibex 35 and DAX will have more relevance than they have had until now, and there will be more volatility.’ would mean that, for instance, the prior mean for the Dow Jones midpoint will be less than the indicated by the data. On the contrary, the prior mean for the DAX midpoint would be greater, as it would occur with the prior means for the radios. 71
  • 90. Chapter 7 Results To show the usefulness of the Bayesian Centre and Radius approach proposed in this project, exper- iments with real symbolic interval-valued data sets fitting a linear regression model together with a data set from Spanish Continuous Stock Market are considered in this section. 7.1 Spanish Continuous Stock Market data sets We have considered two situations in the Spanish Continuous Stock Market. On one hand, we have used the monthly minimum and maximum prices of BBVA and BSCH from January 2000 to June 2007 in order to show how the classical regression approach applied to interval- valued data can be improved through the Bayesian Centre and Radius approach when the variables are directly related. This will let us see other advantages of the proposed approach over the classical regression with sin- gles values. On the other hand, we have taken the daily minimum and maximum prices of others two Spanish Continuous Stock Market companies such as Dogi and Zardoya from January 2006 to December 2006 in order to show that the Bayesian Centre and Radius approach is better than other approaches even when the variables are not related; that is, they are uncorrelated. 7.2 Direct Relation between Variables In this case 66 of the total 89 months will be applied to the training set. The other 23 months will be applied to the testing set. 72
  • 91. 7. Results Let us begin with the classical regression approach applied to the midpoints of the monthly mini- mum and maximum prices that BBVA and BSCH took in the considered training period. These data yields the next model: BSCHM idpoint = 1.3008 + 0.6229 × BBV AM idpoint + where N 0, 0.52372 Figures 7.1 and 7.2 show that this model fits well enough for both training and testing sets. 13 12 11 10 9 8 7 6 5 7 8 9 10 11 12 13 14 15 16 17 Midpoints BBVA prices Figure 7.1: Classical Regression with single values in training test If we calculate the different error measures (Mean Error, Mean Absolute Error, Mean Square Error and Root Mean Square Error) for each set, we obtain the Table 7.1. This means that it is a good model, but we are only using the midpoints to fit new data when we have much more available data. Therefore we are wasting information we have gathered. Actually, this can be seen graphically in Figure 7.3. This figure makes oneself think that the model is not as good as it was believed before, since there are too available information for a very simple result and one could expect more from those data. Thus, other approach, known as Centre Method, could be considered by applying the obtained model to each maximum and minimum price to get a predicted maximum and minimum price. This provides the results displayed in Figures 7.4 and 7.5. According to [Bill00], the total deviation is given by: CenterM ethod2000 = lower + upper (7.1) 73
  • 92. 7. Results 14.5 14 13.5 13 12.5 12 11.5 11 10.5 10 9.5 13 14 15 16 17 18 19 20 Midpoints BBVA prices Figure 7.2: Classical Regression with single values in testing test Set ME MAE MSE RMSE Training 0 0.4208 0.2660 0.5157 Testing 0.2321 0.3831 0.2446 0.4946 Table 7.1: Error Measures for Classical Regression with single values The resulting error measures can be seen in Table 7.2. Now, we have a fitted interval-valued data for each observed interval- valued data. This approach seems to take advantage of the extracted data. Now let us see the resulting error measures according to Centre method proposed by [Bill02], where the sum of square errors is given by: n 2 2 SSECenterM ethod2002 = lower + upper (7.2) i=1 and, thus, the absolute mean error is given by: 74
  • 93. 7. Results 13 12 11 10 9 8 7 6 5 4 6 8 10 12 14 16 18 Minimum and Maximum BBVA prices Figure 7.3: Classical Regression with interval- valued data 13 12 11 10 9 8 7 6 5 4 6 8 10 12 14 16 18 Minimum and Maximum BBVA prices Figure 7.4: Centre Method (2000) in training set n i=1 (| lower | +| upper |) M AECenterM ethod2002 = (7.3) n Table 7.3 shows that this new definition of the error does not improve very much the previous one. However, let us compare these last approaches with the Centre and Radius Method. In this case we will have the next model: BCSHM idpoint = 1.3008 + 0.6299 × BBV AM idpoint + M idpoint 75
  • 94. 7. Results 15 14 13 12 11 10 9 12 13 14 15 16 17 18 19 20 21 Minimum and Maximum BBVA prices Figure 7.5: Centre Method (2000) in testing set Set ME MAE MSE RMSE Training 0 0.8416 1.0638 1.0314 Testing 0.4643 0.7663 0.9784 0.9891 Table 7.2: Error Measure for Centre Method (2000) where M idpoint N (0, 0.52372 ) and BCSHRadius = 0.106 + 0.6188 × BBV ARadius + Radius where Radius N (0, 0.14582 ) 76
  • 95. 7. Results Set ME MAE MSE RMSE Training 0 0.8917 0.5922 0.7695 Testing 0.4643 0.7717 0.5125 0.7159 Table 7.3: Error Measure for Centre Method (2002) According to [DeCa07], the sum of squares of deviations is given by: n 2 2 SSECentreRadiusM ethod = M idpoint + Radius (7.4) i=1 Therefore, the mean absolute error is given by: n i=1 (| M idpoint | +| Radius |) M AE = (7.5) n 13 12 11 10 9 8 7 6 5 4 6 8 10 12 14 16 18 Minimum and Maximum BBVA prices Figure 7.6: Centre and Radius Method in training set The results shown in Figures 7.6 and and in Table 7.4 clearly clearly show how the error measures are less with Centre and Radius Method than with Centre Method, and thus, the former is better than 77
  • 96. 7. Results 15 14 13 12 11 10 9 8 12 13 14 15 16 17 18 19 20 21 Minimum and Maximum BBVA prices Figure 7.7: Centre and Radius Method in testing set Set ME MAE MSE RMSE Training 0 0.5233 0.2866 0.5353 Testing 0.1837 0.4712 0.2558 0.5058 Table 7.4: Error Measures for Centre and Radius Method the latter. Now, let us take into consideration an expert’s knowledge about the Spanish Continuous Stock Market and see the results of the Bayesian Centre and Radius Method. Obviously, the Bayesian methodology is mainly useful in the testing set since it is there where there are unobserved data. Bearing in mind the previous Centre and Radius model, an expert could think that BSCH would slightly get better respect to BBVA and assign the prior distribution seen in 5.36 with the following prior parameters for the Midpoints: 78
  • 97. 7. Results 1.3008 β0 = 0.64 V0 = 0.000000001 s2 = 0.52372 0 v0 = 107 Then the final Midpoints model would be: BCSHM idpoint = 1.3008 + 0.64 × BBV AM idpoint + M idpoint Let us assume that the expert would consider that the volatility would not vary and she or he assign a vague prior parameters for the Radius distribution: 0.106 β0 = 0.6188 V0 = 106 s2 = 0.14582 0 v0 = 4 Then, the final Radius model would be: BCSHRadius = 0.106 + 0.6188 × BBV ARadius + Radius The results for the testing set are shown in Figure 7.8 and in Table 7.5. This shows that the proposed Bayesian Centre and Radius Method improve all the previous ap- proaches since let us manage more information than the classical regression and we obtain best results than those obtained with the Centre and Centre and Radius methods. 7.3 Uncorrelated Variables In this other case 170 of the total 255 days will be applied to the training set. The other days will be applied to the testing set. 79
  • 98. 7. Results 15 14 13 12 11 10 9 12 13 14 15 16 17 18 19 20 21 Minimum and Maximum BBVA prices Figure 7.8: Bayesian Centre and Radius Method in testing test Set ME MAE MSE RMSE Testing 0.0126 0.4409 0.1997 0.4469 Table 7.5: Error Measures in Bayesian Centre and Radius Method The classical regression with the midpoints of the prices range yields the next model: DogiM idpoint = 5.6570 − 0.0806 × ZardoyaM idpoint + where N (0, 0.28822 ) Figures 7.9 and 7.10 show that this model does not fits well for neither training nor testing sets. If we calculate the different error measures (Mean Error, Mean Absolute Error, Mean Square Error and Root Mean Square Error) for each set, we obtain the Table 7.6. 80
  • 99. 7. Results 5 4.8 4.6 4.4 4.2 4 3.8 3.6 3.4 3.2 3 21 21.5 22 22.5 23 23.5 24 24.5 25 25.5 26 Midpoints Zardoya prices Figure 7.9: Classical Regression with single values in training test 4.1 4 3.9 3.8 3.7 3.6 3.5 3.4 3.3 3.2 3.1 22.4 22.6 22.8 23 23.2 23.4 23.6 23.8 24 24.2 Midpoints Zardoya prices Figure 7.10: Classical Regression with single values in testing test The Centre Method could be applied to get a predicted maximum and minimum price. This method yields the next model: DogiM idpoint = 5.6570 + 0.0792 × ZardoyaM idpoint + where 81
  • 100. 7. Results Set ME MAE MSE RMSE Training 0 0.4231 0.2268 0.4763 Testing -0.3518 0.3651 0.1642 0.4052 Table 7.6: Error Measures for Classical Regression with single values N (0, 7.21372 ) Note that the slope has changed since, according to [DeCa04], it cannot be negative to ensure that the fitted maximum is greater than the fitted minimum. This provides the results shown in Figures 7.11 and 7.12. 8 7 Minimum and Maximum Dogi prices 6 5 4 3 2 20 21 22 23 24 25 26 27 Minimum and Maximum Zardoya prices Figure 7.11: Centre Method (2000) in training set Table 7.7 shows the resulting error measures. 82
  • 101. 7. Results 8 7.5 7 6.5 6 5.5 5 4.5 4 3.5 3 22 22.5 23 23.5 24 24.5 Minimum and Maximum Zardoya prices Figure 7.12: Centre Method (2000) in testing set Set ME MAE MSE RMSE Training -7.1288 7.1288 51.8315 7.1994 Testing -8.0653 8.0653 65.1544 8.0718 Table 7.7: Error Measure for Centre Method (2000) It is very clear that this model is not accurate. This example evidences the main weak point of this approach: the positive constraint imposed to the coefficients. This makes impossible having an inverse relationship between variables. This is pointed out by the error measures, which are very high. Now let us see the resulting error measures according to Centre method proposed by [Bill02]: This new definition of the error improves the previous one. However, let us compare these last approaches with the Centre and Radius Method. In this case we will have the next model: DogiM idpoint = 5.6570 − 0.086 × ZardoyaM idpoint + 83
  • 102. 7. Results Set ME MAE MSE RMSE Training -7.1288 7.1288 25.9183 5.0910 Testing -8.0653 8.0653 32.5825 5.7081 Table 7.8: Error Measure for Centre Method (2002) where N (0, 0.28822 ) and DogiRadius = 0.0283 + 0.08 × ZardoyaRadius + where N (0, 0.02592 ) The results can be seen in Figures 7.13 and 7.14 and Table 7.9. Set ME MAE MSE RMSE Training 0 0.4385 0.2273 0.4768 Testing -0.3426 0.3882 0.1655 0.4068 Table 7.9: Error Measures for Centre and Radius Method As it occurred with a direct relationship between variables, now the error measures are less with Centre and Radius Method than with Centre Method, and thus, the former is better than the latter even 84
  • 103. 7. Results 5 4.5 4 3.5 3 2.5 20 21 22 23 24 25 26 27 Minimum and Maximum Zardoya prices Figure 7.13: Centre and Radius Method in training set 4.4 4.2 4 3.8 3.6 3.4 3.2 3 22 22.5 23 23.5 24 24.5 Minimum and Maximum Zardoya prices Figure 7.14: Centre and Radius Method in testing set when there is not a clear relationship. Now, let us see what happens introducing the Bayesian methodology. Bearing in mind the previ- ous Centre and Radius model, an expert could think that the situation would change drastically and assign the following prior parameters to the prior distribution explained in 5.36 for the Midpoints: 85
  • 104. 7. Results 3.1 β0 = 0.02 V0 = 10−8 s2 = 0.28822 0 v0 = 106 So the final Midpoint model would be: DogiM idpoint = 3.1 + 0.02 × ZardoyaM idpoint + M idpoint And the following prior parameters to the prior distribution for the Radii: 0.0283 β0 = 0.08 V0 = 106 s2 = 0.02592 0 v0 = 4 So the final Radius model would be: DogiRadius = 0.0283 + 0.08 × ZardoyaRadius + The results for the testing set are shown in Figure 7.15 and Table 7.10. Set ME MAE MSE RMSE Testing 0.1031 0.2008 0.0443 0.2104 Table 7.10: Error Measures in Bayesian Centre and Radius Method 86
  • 105. 7. Results 4.4 4.2 4 3.8 3.6 3.4 3.2 3 22 22.5 23 23.5 24 24.5 Minimum and Maximum Zardoya prices Figure 7.15: Bayesian Centre and Radius Method in testing set The Bayesian Centre and Radius Method results again better than the rest of approaches, even in unfavourable conditions. Therefore, we can conclude that the Bayesian Centre and Radius method has the same advantages as the Centre and Radius method as [DeCa07] describe, but it has also the great advantages of the Bayesian methodology. All this makes obtaining less errors in new predictions. An important future development could be to build a Bayesian symbolic regression model with uniformly distributed errors. 87
  • 106. Chapter 8 A Guide to Statistical Software Today 8.1 Introduction Statistical software begins to blend in one direction with relational database software such as Oracle or Sybase (software we do not discuss here) and with mathematical software such as MATLAB in the other direction. Mathematical software exhibits not only statistical capabilities flowing from code for matrix manipulation, but also optimization and symbolic manipulation useful for statistical pur- poses. This chapter is an assessment of the state of the art of the statistical software arena as of 2007. It attempts to touch upon a few commercial packages, a few general public license packages, a few analysis packages with statistic add-ons, and a few general purpose languages with statistical libraries. We begin with the most important commercial packages such as SAS, Minitab, BMDP, SPSS or S-PLUS, followed by some of the public licenses statistical and Bayesian software such as R or BUGS and then some general purpose mathematical software and some general programming lan- guage with statistical libraries. Finally, it is exposed the role of the developed application in the current statistical scene, remark- ing the main advantages and disadvantages. 88
  • 107. 8. A Guide to Statistical Software Today 8.2 Commercial Packages 8.2.1 The SAS System for Statistical Analysis SAS began as a statistical analysis system in the late 1960’s growing out of a project in the Depart- ment of Experimental Statistics at North Carolina State University. The SAS Institute was founded in 1976. Since that time, the SAS System has expanded to become an evolving system for complete data management and analysis. This means that SAS is really much more than a simple software system. As an example of its great potential, it is worth to mention that it is used by the 90 percent of those companies on the Fortune 500 list. This expansion is probably due to the fact that the SAS manage- ment has aligned themselves with the recent ”statistical-like” advances within the computer science community such as data mining. This clever integration of mathematical/statistical methodologies, database technology, and business applications has helped propel SAS to the top of the commercial statistical software arena. The architecture for the SAS approach is called the SAS Intelligence Platform, which is really a closely integrated set of hardware/software components that allow users to fully utilize the business intelligence (BI) that can be extracted from their client base. Among the products making up the SAS System are products for: management of large data bases; statistical analysis of time series; statistical analysis of most classical statistical problems, including multivariate analysis, linear models (as well as generalized linear models), and clustering; data visualization and plotting. Being more precise, the SAS Intelligence Platform consists of the following components: • The SAS Enterprise ETL Servers • The SAS Intelligence Storage • The SAS Enterprise BI Server • The SAS Analytic Technologies One of the strengths of SAS is the fact that the package which contains those capabilities that one normally associates with a data analysis package is constantly being upgraded with each release in order to reflect the latest and greatest algorithmic developments in the statistical field. The SAS System is available on PC and UNIX based platforms, as well as on mainframe com- puters so it covers almost the main options, except Macintosh. As one could guess from what has 89
  • 108. 8. A Guide to Statistical Software Today been said above, this system is aimed mainly to industrial, scientists and statisticians users with a very high needs and knowledge and who do not care about spending time in learning process to use this complex system. Some useful URL’s are: • http://www.sas.com/ which is the main URL for SAS • http://is.rice.edu/ radam/prog.html which contains some user-developed tips on using SAS Other statistical systems which are of the same general vintage as SAS are MINITAB, BMDP and SPSS. All of these systems began as mainframe systems, but have evolved to smaller scale systems as computing has evolved. 8.2.2 Minitab Minitab Inc. was formed more than 20 years ago around its flagship product, MINITAB statistical software. MINITAB Statistical Software provides tools to analyze data across a variety of disciplines, and is targeted for users at every level: scientists, business and industrial users, faculty, and students. In relation to the operating system, MINITAB is available on the most widely-used computer platforms, including Windows, DOS, Macintosh, OpenVMS, and Unix. As the opposite of SAS, MINITAB is quite easy to learn and use. There is no lengthy learning process and little need for unwieldy manuals. Perhaps, this may be the main reason why MINITAB is used extensively in the educational community. For more details about this software visit the URL http://www.minitab.com/. 8.2.3 BMDP BMDP has its roots as a bio-medical analysis packages from the late 1960s. In many ways it has re- mained true to its origins and this is evidenced by its long list of clients which includes such biomed- ical giants as Bristol- Myers Squibb, Merck and Glaxo Wellcome. There are three main distributions: 90
  • 109. 8. A Guide to Statistical Software Today BMDP New System Personal Edition, the BMDP Classic for PCs - Release 7, and the BMDP New System Professional Edition. As BMDP New System has an easy-to-use interface that makes data analysis possible with simple point and click and fill-in-the-blank interactions, the Professional Edi- tion combines the full suite of BMDP Classic for PCs Release 7 statistics with the powerful data management and front-end data exploration features of the BMDP New System Personal Edition. A reference URL for BMDP is http://www.ppgsoft.com/bmdp00.html. 8.2.4 SPSS SPSS is a multinational software company founded in the late 1960s that provides statistical product and service solutions for survey research, marketing and sales analysis, quality improvement, scien- tific research, government reporting and education. SPSS starts with the SPSS Base which includes most popular statistics, complete graphics, broad data management and reporting capabilities. The SPSS products are a modular system and includes SPSS Professional Statistics, SPSS Advanced Statistics, SPSS Tables, SPSS Trends, SPSS Cate- gories, SPSS CHAID, SPSS LISREL 7, SPSS Developer’s Kit, SPSS Exact Tests, Teleform, and MapInfo. Although this software was originally designed for mainframe use, SPSS has adapted to market demand and it has releases for Windows, MAC and UNIX. A reference URL for SPSS is http://www.spss.com/. 8.2.5 S-PLUS While there are many different packages for performing statistical analysis, one that offers some of the greatest flexibility with regard to the implementation of user defined functions and the customiza- tion of ones environment is S-PLUS, which is one of the two implementations of the S language (R is the other one, which will be viewed later). S is an exceptionally well-developed tool for statistical research and analysis. S is especially strong for statistical graphics, the output of data analysis through which both raw data and results are displayed for both analysts and clients. S was originally developed at AT&T Bell Labs (recently split into AT&T Laboratories and Lucent Bell Labs) by a team of researchers including Richard A. Becker, 91
  • 110. 8. A Guide to Statistical Software Today John M. Chambers, Allan Wilks, William S. Cleveland and Trevor Hastie. The original description of the S language, which was written by Becker, Chambers, and Wilks in 1988, was awarded by the Association for Computing Machinery (ACM) for the 1998 Software System Award. The aim of the language, as expressed by John Chambers, is ”to turn ideas into software, quickly and faithfully”. A good introduction to the application of S to statistical analysis problems is contained in [Cham92] and [Cham83]. More recent work that focus on the statistical capabilities of the S-PLUS system can be found in [Vena02]. S-PLUS is manufactured and supported by the Statistical Sciences Corporation, now a division of MathSoft. It runs on both PC and UNIX based platforms. In addition the company offers easy links for the user to call S-PLUS from within C/FORTRAN or for the user to call C/FORTRAN compiled functions within the S-PLUS environment. Statistical Sciences has made great efforts to keep the software current with regard to the needs of the statistical community. They have released dedicated modules which are targeted at specific application areas. The S-PLUS home page can be reached at http://www.mathsoft.com/. This site contains an inter- esting comparison between SAS and S-PLUS. 8.2.6 Others Other statistically oriented packages enjoying good reputations are SYSTAT, DataDesk, JMP and StatGraphics. SYSTAT originated as a PC-based package developed by Leland Wilkinson and is now owned by SPSS. The current version is 6.0 and is a Microsoft Windows oriented product. On the con- trary, DataDesk is a Macintosh-based product authored by Paul Velleman from Cornell University. Currently released is version 5.0.1. and it is a GUI-based product which contains many innovative graphical data analysis and statistical analysis features. More information about DataDesk can be found at URL: http://www.lightlink.com/datadesk/. JMP is another SAS product that is highly visu- alization oriented. It is a stand alone product for PC and Macintosh platforms. Information on JMP can be found at http://www.sas.com/. Statgraphics is an education- orientated statistical software to be used mainly in Universities and which offers a friendly-user interface. A good reference showing how to use StatGrahics can be found in [Mat´ 95]. e 92
  • 111. 8. A Guide to Statistical Software Today 8.3 Public License Packages 8.3.1 R R is an Open Source implementation of the well-known S language which was originated at the Uni- versity of Auckland, New Zealand, in the early 1990’s. It works on multiple computing platforms like Unix systems or Windows, but the most important characteristic is that a software system that exists under the Open source paradigm benefits from having ”many pairs of eyes” to examine the software to help insure quality of the software. An example of the rapid development of this software is that in 1997, only two years after the public release in June 1995, the led team had to select a Core group of around 10 members, who was responsible for changes to source code. R software, for the most part, is a command line based language which is organized into vari- ous packages. Basic packages are installed by default, and the user can download and install a great variety of packages to be used. There are also several major projects that are ”R spin-offs”, such as ”Bioconductor”, which is an R package for gene expression analysis or ”Omega”, which is another package focused on providing seamless interface between R and a number of other languages (PERL, PYTHON, MATLAB). There are two main packages that have to be mentioned because of their im- portance and implication for this project: JRI and bayesm. The first one deals with the problem of communicating Java with R. This will let us create a graphical user interface using Swing in Java and make all the statistical calculates with R. The second one, developed by [Rossi06], contains the main functions to be used in Bayesian analysis. Precisely, in Bayesian data analysis is where R can be better than the other statistical softwares. More information about R can be found at http://www.r-project.org/. 8.3.2 BUGS The BUGS (Bayesian inference Using Gibbs Sampling) project is concerned with flexible software for the Bayesian analysis of complex statistical models using Markov chain Monte Carlo methods. The project began in 1989 in the MRC Biostatistics Unit and led initially to the ”Classic” BUGS program, and the onto the WinBUGS software developed jointly with the Imperial College School of Medicine at St. Mary’s, London. Development now also includes the OpenBUGS project in the University of Helsinki, Finland. 93
  • 112. 8. A Guide to Statistical Software Today The main advantages of these software is, as R, the flexibility that offers to the researcher to model whatever he needs, but they are slightly more complex to learn than R is. Due to this fact, Phil Woodward developed BugsXLA, which is an Excel Add-In that not only allows the user to specify a model as one would in a package such as SAS or S-PLUS, but also aids the specification of priors and control of the MCMC run itself. More information can be found in http://www.mrc-bsu.cam.ac.uk/bugs/. 8.4 Analysis Packages with Statistical Libraries 8.4.1 Matlab MATLAB is an interactive computing environment that can be used for scientific and statistical data analysis and visualization. The basic data object in MATLAB is the matrix. The user can per- form numerical analysis, signal processing, image processing and statistics on matrices, thus freeing the user from programming considerations inherent in other programming languages such as C and FORTRAN. There are versions of MATLAB for Unix platforms, PC’s running Microsoft Windows and Macintosh. Because the functions are platform independent, provides the user with maximum reusability of their work. MATLAB comes with many functions for basic data analysis and graphics. Most of these are written as M-file functions, which are basically text files that the user can read and adapt for other uses. The user also has the ability to create their own M-file functions and script files, thus making MATLAB a programming language. The recent addition of the MATLAB C-Compiler and C-Math Library allow the user to write executable code from their MATLAB library of functions, yielding faster execution times and stand-alone applications. For researchers who need more specific functionality, MATLAB offers several modules or tool- boxes. These typically focus on areas that might not be of interest to the general scientific community. Basically, the toolboxes are a collection of M-file functions that implement algorithms and functions common to an area of interest. 94
  • 113. 8. A Guide to Statistical Software Today One of the most useful capabilities of MATLAB is the tools available for visualizing data. MAT- LAB supports standard two and three dimensional scatter plots along with surface plots. In addition it provides the user with a graphics property editor. As it occurs with R, there is a considerable amount of contributed MATLAB code available on the internet. One notably useful source of code is avail- able via the home page for MATLAB at http://www.mathworks.com/, where more information about this software can be found. 8.4.2 Mathematica Mathematica is an algebra computational system developed originally by Stephen Wolfram and sold by his company, Wolfram Research. It has numerical and graphical features and powerful symbolic processing capabilities but is comparatively complex to learn. Information on Mathematica is avail- able at URL http://www.wolfram.com/. 8.4.3 Others Other mathematical software worth noting is MAPLE, with powerful symbolic processing capabili- ties, and MATHCAD, a package which combines numerical, symbolic, and graphical features. More information about these software can be found at their official web sites, which are : • http://www.maplesoft.com/ • http://www.mathsoft.com/ 8.5 Some General Languages with Statistical Libraries 8.5.1 Java It is difficult to asses the state of the art with regards to Java statistical libraries in that there may be many custom user developed packages that we are unaware of. Given this caveat, there are three main packages to mention. The first one is StatCrunch, which provides the user capability to perform interactive exploratory data analysis, logistic regression, nonparametric procedures, regression and regression diagnostics, 95
  • 114. 8. A Guide to Statistical Software Today and others. The reader is referred to a review that appeared in [West04]. Another source of Java- based statistics functions is the Apache Software Foundation Jakarta math project. The math project seeks to provide common mathematical functionality to the Java user community. The final source for Java- based statistical analysis is the Visual Numerics JSML package. It provides the user with an integrated set of statistical, visualization, data mining, neural network, and numerical packages. The reader is referred to http://www.vni.com/products/imsl/jmsl/jmsl.html for additional discussions on JSML. 8.5.2 C++ C++ is another object oriented language program, like Java, with different statistical libraries. There are two libraries that are worth mentioning. Goose and Probability and Statistics. The first one is dedicated to statistical computation, and provide support for t-tests, F-tests, Kruskall-Wallis tests, Spearman tests and others with an implementation of simple linear regression models. More information about this in http://www.gnu.org/software/goose/goose.html. The second one is aimed to Microsoft Windows developers and consists of five packages: statis- tics, discrete probability, standard probability distributions, hypothesis testing and correlation and regression. A strength of these modules are their support for various interfaces including C# and C++ .NET. The reader is referred to the URL http://www.webcabcomponents.com/dotNET/dotnet/pss/. 8.6 Developed Software Tool: BARESIMDA The software tool developed throughout this project, as it has been said above, is based on Java and R, both of them public licenses software. It has not been developed with the intention of creating a complete statistical package which can be an alternative to any of the above software. Evidently, it is very difficult to incorporate all the facilities that those programs have, and much more in a one year period time for only one developer. In fact, BARESIMDA only focus on regression analysis 96
  • 115. 8. A Guide to Statistical Software Today procedures with different approaches and data. In that sense, the developed tool gathers classical and Bayesian regression and let user analyze Normal regression models in a very simple way with a very intuitive graphical user interface. This is a very important feature in Bayesian approach, where there is a complex theoretical basis and many users may not be familiarized with it. Another advantage, maybe the most important one, over the rest of statistical package is that BARESIMDA incorporates regression analysis with interval data in either classical and Bayesian ap- proach. Not only it displays the analytical results but also let us see graphically the goodness fit and the centres and radii tendencies. With this first version of BARESIMDA, we have wanted to start the way towards public license software which can take the advantages from the Java graphical user interface with Swing and from statistical libraries in R. 97
  • 116. Chapter 9 Software Requirements Specification This chapter defines the complete description of the functions to be performed by the BARESIMDA software, so it will assist the potential users to determine if the software specified meets their needs or how the software must be modified in order to meet their needs. This also reduces the development effort since the preparation of the Software Requirements Specification (SRS) forces the developer to consider rigorously all of the requirements before de- sign begins and reduces later redesign, recoding, and retesting. Careful review of the requirements in the SRS can reveal omissions, misunderstandings, and inconsistencies early in the development cycle when these problems are easy to correct. Likewise it provides a basis for estimating costs and schedules and a baseline for verification and validation. 9.1 Purpose The aim of this system is to provide a tool to build different types of regression analysis and check advantages and disadvantages in each approach that has been developed. 9.2 Intended Audience The software is intended to be handled by different types of users such: • Inexperienced people who has a minimum knowledge about what regression is and what con- sists on. 98
  • 117. 9. Software Requirements Specification • Students and people with a medium degree of knowledge about regression and minimum infor- mation about Bayesian paradigm. • Graduate and experienced people who has a deep knowledge about regression and Bayesian analysis and want to learn about symbolic regression. 9.3 Functionality The software is supposed to do those things shown in the following points. 9.3.1 Classical Regression with crisp data This refers to analytic and graphical analysis of multiple and simple classical Normal regression models with crisp data. Being precise, the software has to provide the following facilities: • Regression analysis summary with estimated parameters • ANOVA table. • Normality test. • Heteroscedasticity test. • Autocorrelated errors test. • To predict new data. • Complementary graphics to see the fitted model. 9.3.2 Classical Regression with interval- valued data As well as it was done with crisp data, a regression analysis must be carried out with symbolic data, specifically, with interval- valued data. All the functions which were described previously must be implemented for the centres and radii regressions. In addition, the software will display graphically the adequacy of the fitted model to the original interval- valued data. 99
  • 118. 9. Software Requirements Specification 9.3.3 Bayesian Regression with crisp data The user must be capable of creating two different Bayesian models: Normal and Independent Nor- mal. Since the main characteristic of Bayesian paradigm is the possibility to introduce subjective information, the application will provide a very intuitive dialog to retrieve user’s beliefs about the different parameters. The software will display the estimated parameters, it will provide a Normality test for residuals and input fields to make new predictions. 9.3.4 Bayesian Regression with interval- valued data As it occurred with classical regression, Bayesian regression must be possible to be carried out with interval- valued data so the user will be able to incorporate prior information about the centres and the radii. The analysis options are the same that those for crisp data and it has additional graphics to see the adequacy of fitted interval- valued data with observed data. 9.3.5 Data Manipulation The user will be able to type in new data by hand or to load an existing excel file into the application. In the same way, he will be able to save both source data and the following resulting data: • Residuals • Normalized residuals • Studentized residuals • Fitted values • Predicted values in an excel file. 9.3.6 Portability The application must be able to be executed in the main platforms such as Windows, Linux and Unix. 9.3.7 Maintainability In the same way, the tool must be well structured to be easily maintainable since changes and exten- sions in the future are quite probable. 100
  • 119. 9. Software Requirements Specification 9.4 External Interfaces 9.4.1 User Interfaces The application to be developed will have a Multiple Document Interface (MDI) with a high degree of usability. The former means that its windows will reside under a single parent window like Figure C.1 shows. Figure 9.1: BARESIMDA MDI This will avoid filling up the operating system task management interface, as they are hierarchi- cally organized, and it will let the user hidden/ show/minimize/maximize them as a whole. The second characteristic means that the user will not have to think too much about what the application does or how it does it. There will be an option to configure the application look to be able to be adapted to user’s prefer- ences. The user will have the possibility to set the windows look as: • Unix • Windows 101
  • 120. 9. Software Requirements Specification • Windows Classic • Java In the same way, the user will be able to indicate if she or he is an experienced or an inexperienced user, what will help her or him to specify prior information in Bayesian regression. 9.4.2 Software Interfaces BARESIMDA will connect to a Statistics software which will be responsible for making all compu- tations and returning them to BARESIMDA. In this way all the operations must be transparent for the end user through an interface which will let us interact between both programs. This makes the application more usable. Regarding input and output data, there will be necessary an interface to let us read and write from and to an excel file. 102
  • 121. Chapter 10 Software Architecture Study 10.1 Hardware/ Software Architecture The application will be programmed in Java and built, executed and tested with SDK version 1.4.2 or posterior. Specifically, the graphical user interface will be developed using Swing. This is one of the most powerful tools for developing user-friendly mechanism for interacting with an applica- tion giving it a distinctive ”look” and ”feel”. Its libraries are part of the Java Foundation Classes (JFC) - Java’s libraries for cross- platform GUI development. For more information on JFC visit http://java.sun.com/products/jfc/. This will let us develop the main interface in a particular system, but then can be executed in any platform, allowing users from different operating systems use the look and feel of their own platform. The software which has been chosen to carry out the statistics processes has been R since it is distributed under public license, like Java; it let developer a high degree of flexibility to model and to program the models that he wants to build and it is having a great expansion between statisticians and scientists. The way to communicate BARESIMDA and R is through the Java to R interface, called JRI. This is a .jar library which can be obtained from http://rosuda.org/JRI/ and it allows running R inside Java applications as a single thread. Basically it loads R dynamic library into Java and provides a Java API to R functionality. JRI uses native code, but it supports all platforms where Sun’s Java (or compatible) is available, including Windows, Mac OS X, Sun and Linux. More information about this interface can be found in the reference cited above. 103
  • 122. 10. Software Architecture Study Figure 10.1: Interface between BARESIMDA and R As it was indicated in the previous chapter, BARESIMDA is required to read/ write excel files. For such purpose, POI project consists of various parts that fit together to deliver the data in a Microsoft file format to the Java application. Specifically, and according to our requirements, HSSF is the POI project’s pure Java implementation of the Excel file format. It provides a way to create, modify, read and write XLS spreadsheets. Being more precise, it offers: • Low level structures for those with special needs • An event-model API for efficient read-only access. • A full user model API for creating, reading and modifying XLS files. Visit http://jakarta.apache.org/poi/hssf/index.html for more information. 10.2 Logical Architecture The application will be structured in three levels or layers where each of them will have a well defined responsibility: 104
  • 123. 10. Software Architecture Study Figure 10.2: Interface between BARESIMDA and Excel • gui: it will be responsible for showing the graphical user interface and for getting the input parameters and requests and passing them to the classes which will process them. • action: it will contain the main procedures that treat the information and elaborate the regres- sion models and analysis. The results will be given back to the caller process. It will be responsible for calling the dao classes too. • dao: it will be responsible for accessing to permanent data, this is, to load and to save informa- tion. Figure 10.3 shows the relation among these packages. Figure 10.3: Logical Architecture 105
  • 124. Chapter 11 Project Budget Project costs for this system have been divided into two types of costs, which will be commented in the following sections: • Engineering costs. • Investment and Materials Costs. There is also a section summarizing the entire expected budget for the project. There is not any commercial cost since it is intended to be a public license software for free distribution. 11.1 Engineering Costs A computer engineer working in the environment where the project is focused on is expected to earn around 2500 e/month. There is an additional extra cost of 30% for paying the Social Security. The programmer works 8 hours/day, in a mean of 22 days/month. This makes a mean of 176 hours/month. Thus, the cost per hour is 18.46 e/h. The estimated time required for the development of the project is divided in the work packages explained in the beginning of this project: • Bayesian Data Analysis: 168 hours • Regression Models: 160 hours. 106
  • 125. 11. Project Budget • Symbolic Data: 64 hours. • Requirements Specification: 40 hours. • Architecture Study: 56 hours. • Design: 80 hours. • Programming: 416 hours. • Testing: 40 hours. The estimated time required for the project is: 1024 hours (5 months and 18 days). Thus, the estimated engineering costs is: 18903.04e. 11.2 Investment and Elements Costs The elements used for the development of this project have been computer and software equipment. These costs can be seen in Table 11.1. Element Price Pentium D925 to 3GHz 630e Other expenses (Internet connection, office materials) 60e Total 690e Table 11.1: Estimated material costs The amortization period for this type of elements is considered to be complete after 10000 work- ing hours. Moreover, the usage rate is considered to be of about 85% of the engineering work hours, thus obtaining the results shown in Table 11.2. 107
  • 126. 11. Project Budget Concept Total Hours of use of the material 870.4 hours Resources cost/hour 0.19e/h Total amortization materials costs 165.38e Table 11.2: Amortization Costs Thus, the sum of the engineering and material cost is 19068.42e. It can be assumed that the investment done is about 5% of the engineering cost, thus the investment cost sums 945.15e. Therefore, the total cost of the project, which is the sum of the engineering, materials, and invest- ment costs, is estimated to be 20013.57e. 11.2.1 Summarized Budget The overall expected budget can be observed in Table 11.3. 108
  • 127. 11. Project Budget Cost Total Engineering 18903.04e Material 165.38e Investment 945.15e Total 20013.57e Table 11.3: Summarized Budget 109
  • 128. Chapter 12 Conclusions 12.1 Bayesian Regression applied to Symbolic Data Dealing with a current and recent investigation matter such as symbolic data involves having a high English level, since it is the universal language. On the other hand, a project like this that depends on current investigations makes more difficult its progress since it is not treating with an established issue. A good investigation task requires a rigorous documentation and a complete bibliography. There must be enough well cited references to let the reader find more information about those points inter- esting for her or him. Bayesian methodology is called to be a fundamental element in business processes orientated to predict and forecast new situations and quantities. Although I have really enjoyed a lot with this project, I suspect that, with a more complete previous formation in Bayesian data analysis, I could have saved some initial time in learning concepts that, later, result obvious. This would let me extend the project to other fields such as regression with hierarchical models or nonparametric Bayesian re- gression, where the authentic Bayesian potential resides. However, this is probably due to the fact that the more one knows about an issue, the more one likes it and the more one wants to learn about it, and therefore the problem would never finish. Regarding to this, the project has responded and exceeded to the initial personal perspectives, arousing a great interest for the investigation field and learning to value this hard but exciting arena. 110
  • 129. 12. Conclusions If I could change any of the past related to the project planning, I would have tried to condense the study stage in order to spend more time on the application of the software tool to more real problems and situations. Nevertheless, it would be difficult to carry out since the project development is done within an academic year, where more activities are done. 12.2 BARESIMDA Software Tool Public license software is increasingly enormously fortunately. This lets everybody have more op- tions to choose. In that sense, R is a great tool for programming new models, but it requires, on one hand, a very high Statistics knowledge, since people requirements with a low- medium Statistics level are satisfied by current Statistics software. On the other hand, it requires a medium programming level to be able to carry out one’s ideas. Moreover, the way in which R handles data results tedious for someone used to work with matrix representation. Interconnecting different interfaces or applications is usually a difficult task, especially when there is very little documentation to establish the connection in both sides. This problem is very important, and is not usually taken into consideration when integrating different environments. Concerning Java, it is really incredible all the possibilities and facilities that this programming language offers. This makes programming task much easier. 12.3 Future Developments As it can be deduced from all what has been said above and in previous chapters, the project could have many and different extensions. The more important ones are: • Bayesian regression with hierarchical models for interval- valued data. • Bayesian time series for interval- valued data. • Bayesian linear regression for histogram-valued data. • Nonparametric Bayesian regression for interval- valued data. 111
  • 130. 12. Conclusions • Bayesian Vector Autoregressive for interval- valued data. • Bayesian regression for functional data. • Bayesian symbolic regression with uniformly distributed errors. Likewise, the software tool can be improved adding some conventional statistical functions in order to get public license Statistics software with a user-friendly graphical interface. 12.4 Summary In one hand, we have built a new Bayesian regression model for interval- valued data fitting better than other existing approaches providing that prior information is accurate. And, as it has been shown, this works well for both directly relationship between variables and uncorrelated variables. This is an important advance in symbolic data field, since to our best knowledge there is not any Bayesian approach for this kind of data. On the other hand, a new software tool letting user make Bayesian symbolic regression has been developed. Again, to our best knowledge, there is not any package with the same friendly-user in- terface and the same facilities. Furthermore, it offers the possibility to make standard and Bayesian regression with classical and symbolic data individually respectively. As a result of this project, author and director are working together in a paper about past, present and future of regression which is intended to be sent to ANALES. In the same way, another possible article about Bayesian symbolic regression is born in mind to a more transcendent journal such as Computational Statistics and Data Analysis (CSDA). 112
  • 131. Appendix A Probability Distributions A number of probability distributions together with their density or probability mass function, mean or variance have been used or mentioned previously. For ease of reference, their definitions are regrouped in this chapter, together with a short discussion of their key properties. More information about these distributions in a Bayesian context can be found in [Gelm04] or [Mat´ 93]. e A.1 Discrete Distributions A.1.1 Binomial The Binomial distribution is perhaps the most commonly encountered discrete distribution in Statis- tics, and it is used in quality control by attributes and sampling techniques with replacement. Consider a sequence of n independent trials, each of which can result in one of just two possible outcomes, namely success and failure. Further assume that the probability of success, p, is the same for each trial. Let Y denote the number of successes observed in the n trials. Then Y has a Binomial distri- bution with parameters n and p. Properly, a discrete random variable, Y , has a Binomial distribution with parameters n and p, denoted Y Bin(n, p), if its probability mass function is given by: f (y|n, p) = py (1 − p)n−y (A.1) where n > 0, y = 0, 1, . . . , n and 0 ≤ p ≤ 1. Likewise, mean and variance are defined: 113
  • 132. A. Probability Distributions E(Y ) = np (A.2) V ar(Y ) = np(1 − p) (A.3) A.1.2 Geometric The Geometric distribution is related in a certain way with the previous one. Consider the same situation that before with a sequence of n independent trials with a constant success probability p in each trial. In this case the number of trials varies until the first success is obtained, that is, it is used to model the number of trials until the first success is obtained and it is common in reliability analysis. Formally, a discrete random variable, Y , has a Geometric distribution with parameter p, denoted Y Geo(p), if its probability mass function is given by: f (y|p) = (1 − p)y−1 p (A.4) where p ≥ 0 and y = 1, 2, . . . , n. In the same way, mean and variance are denoted by: 1 E(Y ) = (A.5) p 1−p V ar(Y ) = (A.6) p2 A.1.3 Poisson The Poisson distribution is commonly used to represent count data, such as the number of shares sold in a fixed time period. As well it is usual to see it in reliability analysis. In an strict way, a discrete random variable, Y , has a Poisson distribution with parameter λ, denoted Y P (λ), if its probability mass function is given by: exp(−λ)λy f (y|λ) = (A.7) y! where λ ≥ 0 and y = 0, 1, . . . , n. 114
  • 133. A. Probability Distributions In the same way, mean and variance are denoted by: E(Y ) = λ (A.8) V ar(Y ) = λ (A.9) A.2 Continuous Distributions A.2.1 Uniform The uniform distribution is used to represent a variable that is known to lie in an interval and equally likely to be found anywhere in the interval. The main characteristic is that if a variable, X, has a probability distribution F (x), then the variable Y = F (X) is uniform in the interval [0, 1]. Properly, a continuous random variable, Y , has a Uniform distribution over the interval [a,b], denoted Y U (a, b), if its probability density function is given by: 1 b−a a≤y≤b f (y|a, b) = (A.10) 0 otherwise where −∞ < a < b < ∞. Mean and variance are specified alike by: a+b E(Y ) = (A.11) 2 (b − a)2 V ar(Y ) = (A.12) 12 A.2.2 Univariate Normal The Normal distribution, also called Gaussian distribution, is ubiquitous in statistical work. It is a family of distributions of the same general form, differing in their location and scale parameters: the mean and standard deviation, respectively. The standard normal distribution is the normal distribution with a mean of zero and a variance of one. Formally, a continuous random variable, Y , has a Normal distribution with mean µ and variance σ 2 , denoted Y N (µ, σ 2 ), if its probability density function is given by: 1 (y − µ)2 f (y|µ, σ 2 ) = √ exp(− ) (A.13) 2πσ 2 2σ 2 115
  • 134. A. Probability Distributions where σ 2 ≥ 0, −∞ < µ < ∞ and y ∈ . Likewise, mean and variance are formulated by: E(Y ) = µ (A.14) V ar(Y ) = σ 2 (A.15) A.2.3 Exponential This distribution is used to model the time, t, between independent events that happen at a constant rate,λ. Therefore, this is the distribution of waiting times for the next event in a Poisson process and is a special case of the Gamma distribution with α = 1. Formally, a continuous random variable, Y , has an Exponential distribution with parameter λ, denoted Y Exp(λ), if its probability density function is given by: f (y|λ) = λexp(−λy) (A.16) where λ ≥ 0 and y ≥ 0. Similarly, mean and variance are identified by: 1 E(Y ) = (A.17) λ V ar(Y ) = 1λ2 (A.18) A.2.4 Gamma A Gamma distribution is a general type of statistical distribution that is related to the Beta distribution and arises naturally in processes for which the waiting times between Poisson distributed events are relevant. In Bayesian context, the Gamma distribution is the conjugate prior distribution for the inverse of the normal variance and for the mean parameter of the Poisson distribution. 116
  • 135. A. Probability Distributions In a formal way, a continuous random variable, Y , has a Gamma distribution with shape and scale parameters α and β , respectively, denoted Y Gamma(α, β), if its probability density function is given by: y y α−1 exp(− β ) f (y|α, β) = (A.19) β α Γ(α) where α > 0 and y > 0. Similarly, mean and variance are identified by: E(Y ) = αβ (A.20) V ar(Y ) = αβ 2 (A.21) A.2.5 Inverse- Gamma If Y −1 has a Gamma distribution with parameters α and β then Y has the Inverse- Gamma distribu- tion. In a Bayesian context, this distribution is the conjugate prior distribution for the normal variance. Formally, a continuous random variable, Y , has an Inverse-Gamma distribution with shape and scale parameters α and β, respectively, denoted Y Inv − Gamma(α, β), if its probability density function is given by: β α y −α−1 exp(− β ) y f (y|α, β) = (A.22) Γ(α) where α > 0, β > 0 and y > 0. Similarly, mean and variance are identified by: β E(Y ) = α>1 (A.23) α−1 β2 V ar(Y ) = α>2 (A.24) (α − 1)2 (α − 2) A.2.6 Chi-square It is an essential distribution in Inference Statistics and in goodness tests. The χ2 distribution is a v special case of the Gamma distribution, with shape parameter α = 2 and scale parameter β = 12. 117
  • 136. A. Probability Distributions Since it is a special case, we need not define again the density function and mean and variance as they can be deduced easily from the Gamma distribution. A.2.7 Inverse- Chi-square and Inverse-Scaled Chi-Square As the χ2 distribution is a special case of the Gamma distribution, the inverse χ2 distribution is a v special case of the inverse- Gamma distribution, with shape parameter α = 2 and scale parameter β = 12. So, for density function and mean and variance, see inverse- Gamma distribution. We also define the scaled inverse χ2 distribution, which is useful for variance parameters in normal models. A continuous random variable, Y , has a scaled inverse χ2 distribution with v degrees of freedom and scale s, denoted Y ScaledInv − χ2 (v, s2 ), if its probability density function is given by: v (v)2 v vs2 f (y|v, s) = v v sv y −( 2 +1) exp(− ) (A.25) Γ( 2 ) 2y The mean and variance are defined by: v 2 E(Y ) = s v>2 (A.26) v−2 2v 2 V ar(Y ) = v>4 (A.27) (v − 2)2 (v − 4) Note that this is the same as Inv − Gamma(α = v , β = v s2 ). 2 2 A.2.8 Univariate Student- t The Student’s t-distribution is a probability distribution that arises in the problem of estimating the mean of a normally distributed population when the sample size is small. In regression analysis, it is used to represent the posterior predictive distribution in Normal regression. As anecdote, it is worth to mention that this distribution was published by William Gosset in 1908, but he was not allowed to bring it out under his own name, so the paper was written under the pseudonym Student. Strictly, a continuous random variable, Y , has a Student’s t-distribution with degrees of freedom, denoted Y t(v), if its probability density function is given by: Γ( v+1 ) y 2 v+1 f (y|v) = √ 2 v (1 + )− 2 (A.28) vπΓ( 2 ) v 118
  • 137. A. Probability Distributions where v > 0 and y > 0. In the same way, mean and variance are identified by: E(Y ) = 0 v>1 (A.29) v V ar(Y ) = v>2 (A.30) v−2 A.2.9 Beta In probability theory and statistics, the Beta distribution is a family of continuous distributions de- fined on the interval [a, b] differing the values of their two non- negative shape parameters, α and β. In Bayesian context, the Beta is the conjugate prior distribution for the binomial probability. A continuous random variable, Y , has a Beta distribution with α and β, denoted Y Beta(α, β) if its probability density function is given by: Γ(α + β) α−1 f (y|α, β) = y (1 − y)β−1 (A.31) Γ(α)γ(β) where α ≥ 0 and β ≥ 0 Mean and variance are identified by: α E(Y ) = (A.32) α+β αβ V ar(Y ) = (A.33) (α + β)2 (α + β + 1) A.2.10 Multivariate Normal The multivariate normal distribution extends the univarate Normal distribution model to fit vector observations. A p- dimensional vector of continuous random variables, Y = (Y1 , Y2 , . . . , Yp ), is said to have a multivariate Normal distribution with vector of means µ and variance- covariance matrix Σ, if its probability density function is given by: 1 p 1 1 f () = ( √ ) 2 |Σ|− 2 exp[− (y − µ) Σ−1 (y − µ)] (A.34) 2π 2 119
  • 138. A. Probability Distributions Likewise, mean and variance are formulated by: E(Y ) = µ (A.35) V ar(Y ) = Σ (A.36) A.2.11 Multivariate Student- t It is a multivariate generalization of the Student’s t-distribution. Rigorously, a continuous random variable has a multivariate Student’s t-distribution with v degrees of freedom, location µ = (µ1 , . . . , µd ) and symmetric, positive definite dxd scale matrix Σ, denoted Y t(v, µ, Σ), if its probability density function is given by: Γ( v+d ) 1 1 −(v+d) f (y|v, µ, Σ) = 2 Σ−1 2 (1 + (y − µ) Σ−1 (y − µ))− 2 d d (A.37) Γ( v )v π 2 2 v 2 In the same way, mean and variance are defined by: E(Y ) = µ v > 1) (A.38) v V ar(Y ) = Σ v>2 (A.39) v−2 A.2.12 Wishart The Wishart is the conjugate prior distribution for the inverse covariance matrix in a multivariate Nor- mal distribution. It is a multivariate generalization of the Gamma distribution. The integral is finite if the degrees of freedom parameter, v, is greater or equal to the dimension, k. Formally, a continuous random variable, Y , has a Wishart distribution with v degrees of freedom and symmetric, positive definite k × k scale matrix S, denoted Y W ishartv (S), if its probability density function is given by (W positive definite): k vk k(k−1) v + 1 − i −1 − v v+k+1 1 f (y|v, S) = (2 π 2 4 Γ( )) |S| 2 |W |− 2 exp[− tr(S −1 W )] (A.40) 2 2 i=1 Similarly, mean is defined by: E(Y ) = vS (A.41) 120
  • 139. A. Probability Distributions A.2.13 Inverse- Wishart If W −1 W ishartv (S), then W has the inverse- Wishart distribution. This is the conjugate prior distribution for the multivariate Normal covariance matrix. Formally, a continuous random variable, Y, has an inverse- Wishart distribution with v degrees of freedom and symmetric, positive definite kxk scale matrix S, denoted Y Inv − W ishartv (S −1 ), if its probability density function is given by (W positive definite): k vk k(k−1) v + 1 − i −1 v v+k+1 1 f (y|v, S) = (2 2 π 4 Γ( )) |S| 2 |W | 2 exp[− tr(SW −1 )] (A.42) 2 2 i=1 Similarly, mean is defined by: E(Y ) = (v − k − 1)−1 S (A.43) 121
  • 140. Appendix B Installation Guide B.1 From source folder Source folder contains the following files and folders: • BARESIMDA.jar: it is the executable application file. Java Runtime Enviroment 1.4.2 or pos- terior, R 2.4.1 or posterior and the libraries provided in the folder requires to be installed. • R Libraries: with the libraries to be moved into the R software library %R HOME%library. • Java Library: it contains the file to be moved into %JAVA HOME%libext. %R HOME% and %JAVA HOME% refers to the path in which R and Java are installed respec- tively. For instance, in Windows, if you have installed them into the root directory C: you should have C:RR-2.4.1library and C:Javalibext. B.2 From installer An installer will be provided to make the installation process much easier. It is not necessary any previous program, since the installer will install the Java Runtime Environment and R. As result of executing this installer, a new folder and a shortcut icon will be created. 122
  • 141. Appendix C User’s Guide C.1 Data Entry C.1.1 Loading an excel file 1. Select the file menu item in the menu bar. Figure C.1: Load Data Menu 2. Put the mouse over the Load element and click on it. 3. A dialog box shown in Figure C.2 will be displayed. Click on the Search button to select the excel file to load and indicate the sheet number in the field with that label. If the first row in the data sheet is the header with the variable names, then click OK to load data. Otherwise, deselect the variable names option and click OK. 4. Then, data will be displayed in the Data windows as it was done in an excel sheet (see Figure C.3) C.1.2 Defining a new variable 1. Ensure that Data window is the active window. 123
  • 142. C. User’s Guide Figure C.2: Select File Dialog Figure C.3: Display Loaded Data 2. Define the new variable by clicking on the New Variable button (see Figure C.4). 3. You will be required to type in the name of the new variable. Type it in and click OK (see Figure C.5). 4. A new column will be added to the spreadsheet with the new variable as header (see Figure C.6). 5. If you want to define several new variables, repeat from step 2 as necessary 124
  • 143. C. User’s Guide Figure C.4: Define New Variable Figure C.5: Enter New Variable Name Figure C.6: Display New Variable 125
  • 144. C. User’s Guide C.1.3 Editing an existing variable 1. Ensure that Data window is the active window. 2. Click on the Edit Variable button (see Figure C.7). Figure C.7: Edit Variable 3. A dialog will be displayed. Select the variable to edit and go on (see Figure C.8). Figure C.8: Select Variable to Be Editted 4. A new dialog will be shown and you will be required to type in the new name of the variable. Type it in and the variable will be stored with the new name (see Figure C.9). 126
  • 145. C. User’s Guide Figure C.9: Enter New Name C.1.4 Deleting an existing variable 1. Ensure that Data window is the active window. 2. Click on the Delete Variable button and a dialog will be displayed. 3. Select the variable to delete and go on. A confirmation dialog will be shown. Confirm that is the variable to be deleted and the variable and its data will be removed from the application (see Figure C.10). Figure C.10: Confirmation C.1.5 Typing in a new data row 1. Ensure that Data window is the active window. 2. Click on the New Row button. If there is any defined variable previously, a row will be added to the spreadsheet with so many columns as variables are defined (see Figure C.11). 3. Double-click onto the cell to edit and enter the new value. When you finish, press enter (see Figure C.12). 4. Repeat step 2 and 3 as necessary. 127
  • 146. C. User’s Guide Figure C.11: New Row data Figure C.12: Type Data C.1.6 Deleting an existing data row 1. Ensure that Data window is the active window. 2. Select the data row or rows to be deleted. Then click on the Delete Row button. A confirmation dialog will be displayed. 3. Confirm and all data in those rows will be removed. 128
  • 147. C. User’s Guide C.1.7 Modifying an existing data 1. Ensure that Data window is the active window. 2. Select the data cell to be modified and double-click onto it. You will be able to edit the cell value. When you finish, press Enter. C.2 Configuration C.2.1 Setting the look& feel 1. Select the Look&Feel item in the Configuration element of the menu bar (see Figure C.13). Figure C.13: Look And Feel Menu 2. Select the Look&Feel style you want. The available options are: Metal (Java style), CDE/ Motif (Unix/ Linux style), Windows and Windows Classic (see Figure C.14). Figure C.14: Look And Feel Styles 3. When you have selected your option (for instance CDE/ Motif), the application appearance will be modified (see Figure C.15). C.2.2 Selecting the type of user 1. Select the Type Of User item in the Configuration element of the menu bar (see Figure C.16). 2. A dialog will be displayed. Select the type of user you are and accept (see Figure C.17). This will be useful to define prior information in Bayesian regression. 129
  • 148. C. User’s Guide Figure C.15: New Look And Feel Figure C.16: Type Of User Menu Figure C.17: Select Type Of User 130
  • 149. C. User’s Guide C.3 Non Symbolic Regression C.3.1 Simple Classical Regression 1. Select the Classical menu in the Non-Symbolic Regression element of the menu bar. Then, select Simple Regression (see Figure C.18). Figure C.18: Non-Symbolic Classical Regression Menu 2. You will be required to select the independent and dependent variables from the defined vari- ables. Select them and go on (see Figure C.19). Figure C.19: Select Non-Symbolic Variables in Simple Regression 3. A brief report will be displayed in the Classical Simple regression window indicating that for more details see Analysis Options in the ToolBar (see Figure C.20). 4. From this point, you can: (a) Change dependent and independent variables in the Variables Options, by selecting them again as it was done before. 131
  • 150. C. User’s Guide Figure C.20: Brief Report (b) Select tests and analysis in the Analysis Options, by clicking on the wanted analysis options. The analysis options available are shown in Figure C.21. Figure C.21: Analysis Options in Non-Symbolic Classical Simple Regression To make new predictions, you will have to select the predict option and introduce the new observed value and press OK (see Figure C.22). (c) Select graphics in the Graphics Options, by clicking on the wanted graphics options. The graphics options available are show in Figure C.28. (d) Save some results in the Save Options, by clicking on the wanted save options and select- ing the file where is going to be saved. The save options available are shown in Figure C.24. 132
  • 151. C. User’s Guide Figure C.22: New Prediction in Non-Symbolic Classical Simple Regression Figure C.23: Graphics options in Non-Symbolic Classical Simple Regression C.3.2 Multiple Classical Regression 1. Select the Classical menu in the Non-Symbolic Regression element of the menu bar. Then, select Multiple Regression (see Figure C.25). 2. You will be required to select the dependent and independent variables from the defined vari- ables. Select them and go on (see Figure C.26). 133
  • 152. C. User’s Guide Figure C.24: Save options in Non-Symbolic Classical Simple Regression Figure C.25: Non-Symbolic Classical Multiple Regression Menu Figure C.26: Select Variables in Non-Symbolic Classical Multiple Regression 3. From this point a new Multiple Classical regression window is created, and the procedure is similar to that described in Simple Classical Regression. Therefore the user is referenced to that section to see how to select variable, analysis, graphics and save options. (a) Available Analysis Options can be seen in Figure C.21. There are two new analysis options: backward and forward selection. This will let you to 134
  • 153. C. User’s Guide Figure C.27: Analysis options in Non-Symbolic Classical Multiple Regression know those independent variables that really influences into the dependent variable. (b) Available Graphics Options are shown in Figure C.28. Figure C.28: Graphics options in Non-Symbolic Classical Multiple Regression (c) Available Save Options can be seen in Figure C.29. 4. You will be able to select if there is an intercept in the model or not by clicking on the Model option (see Figure C.30). 135
  • 154. C. User’s Guide Figure C.29: Save options in Non-Symbolic Classical Multiple Regression Figure C.30: Intercept in Non-Symbolic Classical Multiple Regression C.3.3 Simple Bayesian Regression 1. Select the Bayesian menu in the Non-Symbolic Regression element of the menu bar. Then, select Simple Regression (see Figure C.31). Figure C.31: Non-Symbolic Bayesian Simple Regression Menu 2. You will be required to select the dependent and independent variables from the defined vari- ables as it was done in Simple Classical Regression. Select them and go on (see Figure C.32). 3. A new Bayesian Simple Regression window will be created. The estimates mean and standard deviation of the parameters will be displayed as well as the 95% highest posterior density interval and the standard numerical error. 4. You will be able to select variable, analysis, graphics and save options as it was done in Simple 136
  • 155. C. User’s Guide Figure C.32: Select Variables in Non-Symbolic Bayesian Simple Regression Classical Regression, although for Bayesian regression, these options are more limited. How- ever, the procedure is the same. (a) Available Analysis Options are shown in Figure C.33. Figure C.33: Analysis Options in Non-Symbolic Bayesian Simple Regression (b) Available Graphics Options can be seen in C.34. (c) Available Save Options are shown in Figure C.35. 5. In Bayesian regression, new options are available in the ToolBar: (a) Specifying Prior Information, by clicking on the Prior Information item in the ToolBar. A new input dialog will be displayed, where you will be able to specify prior information. If you have selected Experienced User in the Type Of User option in the Configuration menu, you will see a dialog like that shown in Figure C.38. 137
  • 156. C. User’s Guide Figure C.34: Graphics Options in Non-Symbolic Bayesian Simple Regression Figure C.35: Save Options in Non-Symbolic Bayesian Simple Regression Figure C.36: Prior Experienced Especification Options in Non-Symbolic Bayesian Simple Regression 138
  • 157. C. User’s Guide Otherwise, you will see that shown in Figure C.37. Figure C.37: Prior Inexperienced Especification in Non-Symbolic Bayesian Simple Regression (b) Selecting the Bayesian regression model, by clicking on the Model option in the ToolBar. Figure C.38: Prior Experienced Especification in Non-Symbolic Bayesian Simple Regression C.3.4 Multiple Bayesian Regression 1. Select the Bayesian menu in the Non-Symbolic Regression element of the menu bar. Then, select Simple Regression (see Figure C.39). Figure C.39: Non-Symbolic Bayesian Multiple Regression menu 2. You will be required to select the dependent and independent variables from the defined vari- ables as it was done in Multiple Classical Regression. Select them and go on. 3. A new Bayesian Multiple regression will be created. From this point the procedure is the same as in Bayesian Simple Regression. (a) Analysis Options are shown in Figure C.40. 139
  • 158. C. User’s Guide Figure C.40: Analysis Options in Non-Symbolic Bayesian Multiple Regression (b) Graphics options can be seen in Figure C.41. Figure C.41: Graphics Options in Non-Symbolic Bayesian Multiple Regression (c) Save Options are shown in Figure C.42. Figure C.42: Save Options in Non-Symbolic Bayesian Multiple Regression (d) Model Options are those shown in Figure C.43. C.4 Symbolic Regression C.4.1 Simple Classical Regression 1. Select the Classical menu in the Symbolic Regression element of the menu bar. Then, select Simple Regression (see Figure C.44). 140
  • 159. C. User’s Guide Figure C.43: Model Options in Non-Symbolic Bayesian Multiple Regression Figure C.44: Symbolic Classical Simple Regression Menu 2. You will be required to select the minimum and maximum dependent and minimum and max- imum independent variables from the defined variables. Select them and go on (see Figure C.45). Figure C.45: Select Variables in Symbolic Classical Simple Regression 3. A brief report will be displayed for midpoints and radii analysis. This is very similar to the case for Non-Symbolic Regression, but now you will have an analysis for the midpoints and other one for the radii. In this case, there more graphics options. (a) Analysis Options are shown in Figure C.46. (b) Graphics Options can be seen in Figure refGraphics SCSS. 141
  • 160. C. User’s Guide Figure C.46: Analysis Options in Symbolic Classical Simple Regression Figure C.47: Graphics Options in Symbolic Classical Simple Regression (c) Save Options are the same that in Non-Symbolic Regression were. 142
  • 161. C. User’s Guide C.4.2 Multiple Classical Regression 1. Select the Classical menu in the Symbolic Regression element of the menu bar. Then, select Multiple Regression (see Figure C.48). Figure C.48: Symbolic Classical Multiple Regression Menu 2. You will be required to select the minimum and maximum dependent and minimum and max- imum independent variables from the defined variables (se Figure C.49). Ensure that the first maximum independent variable selected is the adequate for the minimum independent variable chosen. Figure C.49: Select Variables in Symbolic Classical Multiple Regression 3. A brief report will be displayed for midpoints and radii analysis. This is very similar to the case for Non-Symbolic Regression, but now you will have an analysis for the midpoints and other one for the radii. In this case, there more graphics options. (a) Analysis Options are shown in Figure C.50. (b) Graphics Options can be seen in Figure C.51. (c) Save Options are the same the same that were in Non-Symbolic Regression. 143
  • 162. C. User’s Guide Figure C.50: Analysis Options in Symbolic Classical Multiple Regression Figure C.51: Graphics Options in Symbolic Classical Multiple Regression C.4.3 Simple Bayesian Regression 1. Select the Bayesian menu in the Symbolic Regression element of the menu bar. Then, select Simple Regression (see Figure C.52). 2. You will be required to select the minimum and maximum dependent and minimum and maxi- mum independent variables from the defined variables (see Figure C.53). 144
  • 163. C. User’s Guide Figure C.52: Symbolic Bayesian Simple Regression Figure C.53: Select Variables in Symbolic Bayesian Simple Regression 3. A new Bayesian Simple Regression window will be created. The estimates mean and standard deviation of the midpoints and radii parameters will be displayed as well as the 95% highest posterior density interval and the standard numerical error. 4. You will be able to select variable, analysis, graphics and save options as it was done in Non- Symbolic Regression. (a) Available Analysis Options are shown in Figure C.54. Figure C.54: Analysis Options in Symbolic Bayesian Simple Regression (b) Available Graphics Options can be seen in Figure C.55. 145
  • 164. C. User’s Guide Figure C.55: Graphics Options in Symbolic Bayesian Simple Regression (c) Save Options are the same that were in Non-Symbolic Regression. 5. As it occurred in Non- Symbolic Regression, in Bayesian analysis, new options are available in the ToolBar: (a) Specifying Midpoints and Radii Prior Information, by clicking on the Midpoints or Raddi Prior Information item in the ToolBar. A new input dialog will be displayed, where you will be able to specify prior information. (b) Selecting the Bayesian regression model, by clicking on the Model option in the ToolBar (see Figure C.56). C.4.4 Multiple Bayesian Regression 1. Select the Bayesian menu in the Symbolic Regression element of the menu bar. Then, select Simple Regression (see Figure C.57). 2. You will be required to select the minimum and maximum dependent and minimum and max- imum independent variables from the defined variables (see Figure C.58). Ensure that the first 146
  • 165. C. User’s Guide Figure C.56: Model Options in Symbolic Bayesian Simple Regression Figure C.57: Symbolic Bayesian Multiple Regression Menu maximum independent variable selected is the adequate for the minimum independent variable chosen. Figure C.58: Select Variables in Symbolic Bayesian Multiple Regression 3. A new Bayesian Multiple Regression window will be created. The estimates mean and standard deviation of the midpoints and radii parameters will be displayed as well as the 95% highest posterior density interval and the standard numerical error. 4. You will be able to select variable, analysis, graphics and save options as it was done in Non- Symbolic Regression. (a) Analysis Options are the same that were in Non-Symbolic Regression. 147
  • 166. C. User’s Guide (b) Graphics Options are shown in Figure C.59. Figure C.59: Graphics Options in Symbolic Bayesian Multiple Regression (c) Save Options are the same that were in Non-Symbolic Regression. 5. As it occurred in Non- Symbolic Regression, in Bayesian analysis, new options are available in the ToolBar: (a) Specifying Midpoints and Radii Prior Information, by clicking on the Midpoints or Raddi Prior Information item in the ToolBar. A new input dialog will be displayed, where you will be able to specify prior information. (b) Selecting the Bayesian regression model, by clicking on the Model option in the ToolBar. 148
  • 167. Appendix D Obtaining and Installing R The way to obtain R is to download it from one of the CRAN (Comprehensive R Archive Network) sites. The main site is http://cran.r-project.org. It has a number of mirror sites worldwide, which may be closer to you and give faster download times. Installation details tend to vary over time, so you should read the accompanying documents and any other information offered on CRAN. D.1 Binary distributions The version for recent variants of Microsoft Windows comes as a single SetupR.exe file, on which you simply double- click with the mouse and then follow the on- screen instructions. When the pro- cess is completed, you will have an entry under Programs on the Start men for invoking R, as well as a desktop icon. For Linux distributions that use RPM package format (RedHat, Mandrake,LinuxRPC and SuSE) and also for Alpha Unix (OSF/Tru64), .rpm files of R and the recommended add-on packages can be installed using the rpm command. Packages for the Debian APT package manager are also available. For the Macintosh platforms there are two different binary distributions: the ”Carbon” R and the ”Darwin” R. The first version is intended to run natively on MacOS System from 8.6 to OS X, and the second one as a usual Unix command undex OS X. The Darwin R also requires an X windows manager like XDarwin to use the X11 graphic device. 149
  • 168. D. Obtaining and Installing R Carbon R comes in single .sit archive file that you simply decompress by dragging the file onto Stuffit Expander ad move the resulting folder rmxyz into your favourite applications folder. The Dar- win version is a .tgz archive, which can be installed, after decompression, with some (fairly trivial) manual adjustments. Darwin R can also be installed using the ”fink”. Fink installs all dynamic libraries that might be needed, and it can update R to newer versions when available. D.2 Installation from source Installation from source code is possible on all supported platforms, although nontrivial on Macintosh and Windows, mainly because the build environment is not part of the system. On Unix-like systems (Macintosh OS X included), the process can be as simple as unpacking the sources and writing .configure make make install and then you would unpack the recommended package bundle, change to its directory and enter R CMD INSTALL *.tar.gz The above works on widely used platforms, provided that the relevant compilers and support li- braries are installed. If your system is more esoteric or you want to use special compilers or libraries, then you may need to dig deeper. For Windows and Carbon Macintosh, the directories src/gnuwin32 and src/macintosh have IN- STALL file with detailed information about the procedure to follow. 150
  • 169. D. Obtaining and Installing R D.3 Package installation To install R packages such as bayesm under Unix/Linux or Windows, you can connect to the Internet, start R, and enter install.packages(”bayesm”,.libPaths()[1]) The Windows version provides a convenient menu interface for the operation. If your R machine is not connected to the Internet, you can also download the package the pack- age as a file and install that. For Windows and the Carbon version of Macintosh, you need to get the binary package (.zip or .sit extension). For Windows, installation from a local .zip file is possible via a menu entry. For Macintosh users, the procedure is described in the Macintosh FAQ. For Unix and Linux, you can issue the following at the shell prompt (the -l option allows you to give a private library): R CMD INSTALL bayesm On Unix and Linux systems you will need superuser permissions to install. Otherwise you can set up a private library directory and install into that. Use the R LIBS environment variable to use your private library subsequently. A similar issue arises if R is installed on a read-only file system in a Windows environment. Further details can be found in the help page for library. Information and further Internet resources for R can be obtained from CRAN and the R homepage at http://www.r-project.org. Notice in particular the mailing lists, the user-contributed documents, and the FAQs. 151
  • 170. Appendix E Obtaining and installing Java Runtime Environment The way to obtain the Java Runtime Enviroment (JRE) is to download it from Sun Microsystems offi- cial site. The main site is http://java.sun.com, from where you can select the version to be downloaded. The link to download the current version, which is J2SE v1.4.2 14 JRE, is http://java.sun.com- /j2se/1.4.2/download.html. E.1 Microsoft Windows You must have administrative permissions in order to install the Java 2 Runtime Environment on Mi- crosoft Windows 2000 and XP. The download page provides the following two choices of installation. Continue based on your choice. 1. Windows Installation- After clicking the ”Download” link for the JRE, a dialog box pops up. Choose the open option to start a small program which then prompts you for more information about what you want to install. 2. Windows Offline Installation- After clicking the JRE ”Download” link for the ”Windows Of- fline Installation”, a dialog box pops up. Choose the save option to save the downloaded file without installing it. Run this file by double-clicking on the installer’s icon. Then follow the instructions that the installer provides. When done with the installation, you can delete the downloaded file to recover disk space. 152
  • 171. E. Obtaining and installing Java Runtime Environment E.2 Linux Java 2 Runtime Environment 1.4.2 is available in two installation formats: 1. Self-extracting Binary File - This file can be used to install the Java 2 Runtime Environment in a location chosen by the user. This one can be installed by anyone (not only root users), and it can easily be installed in any location. As long as you are not root user, it cannot displace the system version of the Java platform supplied by Linux. To use this file, see Installation of Self-Extracting Binary below. 2. RPM Packages - A rpm.bin file which contains RPM packages, installed with the rpm utility. It requires root access to install, and installs by default in a location that replaces the system version of the Java platform supplied by Linux. To use this bundle, see Installation of RPM File below. Choose the install format that is most suitable to your needs. E.2.1 Installation of Self-Extracting Binary Use these instructions if you want to use the self-extracting binary file to install the Java 2 Runtime Environment. If you want to install RPM packages instead, see Installation of RPM File. 1. Download and check the download file size to ensure that you have downloaded the full, uncor- rupted software bundle. You can download to any directory you choose; it does not have to be the directory where you want to install the Java 2 Runtime Environment. Before you download the file, notice its byte size provided on the download page on the web site. Once the download has completed, compare that file size to the size of the downloaded file to make sure they are equal. 2. Make sure that execute permissions are set on the self-extracting binary. Run this command: chmod +x j2re-1 4 2 14-linux-i586.bin. 3. Change directory to the location where you would like the files to be installed. The next step installs the Java 2 Runtime Environment into the current directory. 4. Run the self-extracting binary. Execute the downloaded file, prep ended by the path to it. For example, if the file is in the current directory, prep end it with ”./” (necessary if ”.” is not in the 153
  • 172. E. Obtaining and installing Java Runtime Environment PATH environment variable): ./j2re-1 4 2 14-linux-i586.bin The binary code license is displayed, and you are prompted to agree to its terms. The Java 2 Runtime Environment files are installed in a directory called j2re1.4.2 14 in the current directory. E.2.2 Installation of RPM File Use these instructions if you want to install Java 2 Runtime Environment in the form of RPM pack- ages. If you want to use the self-extracting binary file instead, see Installation of Self-Extracting Binary. 1. Download and check the file size. You can download to any directory you choose. Before you download the file, notice its byte size provided on the download page on the web site. Once the download has completed, compare that file size to the size of the downloaded file to make sure they are equal. 2. Extract the contents of the downloaded file. Change directory to where the downloaded file is located and run these commands to first set the executable permissions and then run the binary to extract the RPM file: chmod a+x j2re-1 4 2 14-linux-i586-rpm.bin ./j2re-1 4 2 14-linux-i586-rpm.bin Note that the initial ”./” is required if you do not have ”.” in your PATH environment variable. The script displays a binary license agreement, which you are asked to agree to before installation can proceed. Once you have agreed to the license, the install script creates the file j2re-1 4 2 14- linux-i586.rpm in the current directory. 1. Become root by running the su command and entering the super-user password. 154
  • 173. E. Obtaining and installing Java Runtime Environment 2. Run the rpm command to install the packages that comprise the Java 2 Runtime Environment: rpm -iv j2re-1 4 2 14-linux-i586.rpm 3. Delete the bin and rpm file if you want to save disk space. 4. Exit the root shell. E.3 UNIX 1. Check the download file size. You can download to any directory you choose; it does not have to be the directory where you want to install the J2RE. Before you download the file, notice its byte size provided on the download page on the web site. Once the download has completed, compare that file size to the size of the downloaded file to make sure they are equal. 2. Make sure that execute permissions are set on the self-extracting binary: On SPARC processors: chmod +x j2re-1 4 2 14-solaris-sparc.sh On x86 processors: chmod +x j2re-1 4 2 14-solaris-i586.sh 3. Change directory to the location where you would like the files to be installed. The next step installs the J2RE into the current directory. 4. Run the self-extracting binary. Execute the downloaded file, prep ending the path to it. For example, if the downloaded file is in the current directory, prep end it with ”./”: On SPARC processors: ./j2re-1 4 2 14-solaris-sparc.sh On x86 processors: ./j2re-1 4 2 14-solaris-i586.sh The binary code license is displayed, and you are prompted to agree to its terms. The J2RE files are installed in a directory called j2re1.4.2 14 in the current directory. 155
  • 174. E. Obtaining and installing Java Runtime Environment More information about installation process on different kinds of operating systems can be found in the Sun Microsystems official site which has been mentioned above. 156
  • 175. Bibliography [Aitk97] Aitkin, M., The calibration of P-values, posterior Bayes factors and the AIC from the pos- terior distribution of the likelihood, Statistics and Computing 7 (4), 253-261. 1997 [Arro06] Arroyo, J. and Mat´ , C., Introducing interval time series: accuracy measures, COMPSTAT, e Rome 2006. [Berg05] Berg, B.A., Introduction to Markov Chain Monte Carlo Simulations and their Statistical Analysis, NATIONAL UNIVERSITY OF SINGAPORE 7. 2005 [Berg98] Berger, J. and Pericchi, L., Accurate and stable Bayesian model selection: the median intrinsic Bayes Factor, The Indian Journal Of Statistics 60 (1), 1-18. 1998 [Bill00] Billard, L. and Diday, E., Regression Analysis for Interval-Valued Data, Data Analysis, Classification and Related Methods: Proceedings of the Seventh Conference of the International Federation of Classification Societies, Namur, Belgium 2000. [Bill02] Billard, L. and Diday, E., From the Statistics of Data to the Statistics of Knowledge: Sym- bolic Data Analysis, Journal of the American Statistical Association 98 (462), 470-487. 2002. [Bill06a] Billard, L. and Diday, E., Symbolic Data Analysis: Conceptual Statistics and Data Mining, Wiley ,England 2006. [Bill06b] Billard, L. and Diday, E., Symbolic Data Analysis: what is it?, COMPSTAT, Rome 2006. [Cham83] Chambers, J.M., Cleveland, W.S., Kleiner, B. and Tukey, P.A., Graphics Methods for Data Analysis, Wadsworth, 1983. [Cham92] Chambers, J.M. and Hastie, T.J., Statistical Models in S, Hall/CRC, 1992. [Chen00] Chen, M., Shao, Q. and Ibrahim, J.G., Monte Carlo Methods in Bayesian Computation, Springer, New York 2000. 157
  • 176. [Chen03] Cheng, R. and Sahu, S., A fast distance based approach for determining the number of components in mixtures, Canadian Journal of Statistics 31, 3-22, 2003. [Cong06] Congdon, P., Bayesian Statistical Modelling, Wiley, England 2006. [Dalg02] Dalgaard, P., Introductory Statistics with R, Springer, New York 2002. [DeCa04] De Carvalho, F.A.T., Freire, E.S. and Lima Neto, E.A. A New Method to Fit a Linear Regression Model for Interval-Valued Data, KI 2004: Advances in Artificial Intelligence: 27th Annual German Conference in AI, 295-306, Springer, Ulm, Germany, 2004. [DeCa05] De Carvalho, F.A.T., Freire, E.S. and Lima Neto, E.A. Applying Constrained Linear Re- gression Models to Predict Interval-Valued Data , KI 2005: Advances in Artificial Intelligence 3698, 92-106, Springer, Koblenz, Germany 2005. [DeCa07] De Carvalho, F.A.T. and Lima Neto, E.A., Centre and Range method for fitting a linear regression model to symbolic interval data, Computational Statistics and Data Analysis, 2007. [Dida95] Diday, E., Probabilist, Possibilist and Belief Objects for Knowledge Analysis, Annals of Operations Research, 55, 227-276, 1995. [Gelf90] Gelfand, A.E. and Smith, A.F.M., Sampling-based approaches to calculating marginal den- sities, Journal of the American Statistical Association 85, 398-409, 1990. [Gelm04] Gelman, A., Carlin, J.B., Stern, H.S. and Rubin, D.B., Bayesian Data Analysis, Hall/CRC, Boca Raton, Florida 2004. [Gilk95] Gilks, W.R., Best, N. and Tan, K.K.C., Adaptive rejection Metropolis sampling within Gibbs sampling, Applied Statistics 44, 455-472, 1995. [Gosh03] Gosh, J.K. and Ramamoorthi, R.V., Bayesian Nonparametrics, Spriger, New York 2003. [Hast70] Hastings, W.K., Monte Carlo sampling methods using Markov chains and their applica- tions, Biometrika 57, 97-109, 1970. [Huiw06] Huiwen, W., Mok, H.M.K. and Dapeng, L., Factor interval data analysis and its applica- tion, COMPSTAT, Rome 2006. [Irpi05] Irpino, A., ”Spaghetti” PCA analysis: An extension of principal component analysis to time dependent interval data, Pattern Recognition Letters, 2005. 158
  • 177. [Jeff61] Jeffreys, H., Theory of Probability, Oxford University Press, 1961. [Kend05] Kendall, W. S., Liang, F. and Wang, J-S., Markov chain Monte Carlo: Innovations and Applications, National University of Singapore 7, 2005. [Koop03] Koop, G., Bayesian Econometrics, Wiley, England 2003. [Laws74] Lawson, C.l. and Hanson, R.J., Solving Least Squares Problem, Prentice-Hall, New York 1974. [Lee 06] Lee, C-H.L., Liu, A. and Chen, W-S., Pattern Discovery of Fuzzy Time Series for Financial Prediction, IEEE 18, (5), 2006. [Mart01] Martinez, W.L. and Martinez, A.R., Computational Statistics Handbook with MATLAB , Hall/CRC, Boca Raton, Florida 2001. e e ´ [Mat´ 93] Mat´ , C. and Sarabia, A., Problemas de Probabilidad y Estad´stica, CLAGSA, Madrid ı 1993. [Mat´ 95] Mat´ , C., Curso General sobre StatGraphics II, Universidad Pontifica Comillas, Madrid e e 1995. [Mat´ 06] Mat´ , C., An´ lisis Bayesiano de Datos, Asociaci´ n Espaola para la Calidad, Madrid 2006. e e a o [Metr53] Metropolis, N., Rosenbluth, A.W., Rosenbluth, M.N., Teller, A.H. and Teller, E., Equation of state calculations by fast computing machines, Journal of Chemical Physics 21, 1087-1092, 1953. [Mont02] Montgomery, D.C. and Runger, G.C., Probabilidad y Estad´stica Aplicadas a la Inge- ı nier´a, Wiley, 2002. ı [Mull04] Muller, P. and Quintana, F.A., Nonparametric Bayesian Data Analysis, Statistical Science 19, 95-110, 2004. [Poir95] Poirier, D., Intermediate Statistics and Econometrics: A Comparative Approach., The MIT Press, Cambridge 1995. [Rossi06] Rossi, P.E., Allenby, G. and McCulloch, R., Bayesian Statistics and Marketing, Wiley, New York 2006. 159
  • 178. [Rupp04] Rupp, A.A., Dey, D.K. and Zumbo, B.D., To Bayes or Not to Bayes, From Whether to When: Applications of Bayesian Methodology to Modeling, Structural Equation Modeling: A Multidisciplinary Journal 11 (3), 424-451. 2004. [Spie03] Spiegelhalter, D., Thomas, A., Best, N., Gilks, W. and Lunn, D., BUGS: Bayesian inference using Gibbs sampling, 2003. [Urba92] Urbach, P., Regression Analysis: Classical and Bayesian , The British Journal for the Phi- losophy of Science 43 (3), 311-342, 1992. [Vena02] Venables, W.N. and Ripley, B.D., Modern Applied Statistics with S, Springer, New York 2002. [West04] West, R.W., Wu,T. and Heydt, D., An introduction to StatCrunch 3.0, Journal of Statistical Software 9 (6), 2004. [Zamo01] Zamora, MM. and Estavillo, J., Modelo de regresi´ n normal cl´ sico, 2001. o a 160