SlideShare a Scribd company logo
Stat405
Statistical computing & graphics


        Hadley Wickham
1. Introductions
2. Syllabus
3. Introduction to linux
4. Introduction to R
5. Basic graphics
HE LL O
 my name is




Hadley
had.co.nz/stat405
(if you can’t remember just google stat405)



   hadley@rice.edu
About me

From New Zealand
Divisional advisor for McMurtry
Major advisor for statistics
Syllabus
Introduction to linux
Essential tools
The terminal to run R. gedit to edit your R
code.
To load the terminal, right-click on the
desktop.
To load R, type R in the terminal. To load
gedit, type gedit & in the terminal (the &
tells it to run separately). To open a file in
gedit, type gedit filename &
Setup

Work through the instructions at http://
had.co.nz/stat405/linux.html.
I’ll circulate and make sure everyone gets
set up right.
Terminal essentials
Mouse select = Copy
Middle button = Paste
Ctrl + A = home
Ctrl + D = end
Alt + tab = change applications
Press tab to complete file names
Introduction to R
Learning a new
language is hard!
Scatterplot basics
install.packages("ggplot2")
library(ggplot2)

?mpg
head(mpg)
str(mpg)
summary(mpg)

qplot(displ, hwy, data = mpg)
Scatterplot basics
install.packages("ggplot2")
library(ggplot2)

?mpg
head(mpg)
str(mpg)
                   Always explicitly
summary(mpg)       specify the data

qplot(displ, hwy, data = mpg)
●




                          ●

             40


                      ●
                      ●

             35       ●
                      ●
                  ●   ●
                  ●                       ●
                              ●       ●●

             30       ●       ●       ●
                  ●   ●       ●   ●   ●●                              ●
       hwy




                              ●           ●                   ●       ●        ●
                              ●   ●   ●●                  ●   ●       ●        ●
                      ●       ●   ●   ●●          ●   ●●              ●●       ●    ●                                    ●               ●

             25       ●                   ●       ●   ●●              ●        ●                                ●                        ●
                                      ●●      ●●      ●       ●                     ●                                                            ●
                                          ●       ●                            ●    ●   ●       ●                        ●
                                              ●       ●       ●                ●                ●
                                                                               ●                ●

             20                               ●                                     ●                           ●●
                                                                  ●        ●        ●           ●●              ●
                                                                           ●        ●   ●   ●                       ●   ●●
                                                              ●●                   ●●   ●       ●●      ●   ●       ●    ●       ●           ●
                                                                                                ●●          ●       ●

             15                                                                     ●           ●●          ●●●          ●   ●
                                                                                                                ●                    ●


                                                                                                    ●



                              2                       3                             4                   5                        6               7
qplot(displ, hwy, data = mpg)
                          displ
Additional variables

Can display additional variables with
aesthetics (like shape, colour, size) or
facetting (small multiples displaying
different subsets)
●




                          ●

             40


                      ●
                      ●

             35       ●
                      ●
                  ● ●                                                                                                              class
                  ●                   ●
                              ●   ●●
                                                                                                                                    ●   2seater
             30       ● ●         ●                                                                                                 ●   compact
                  ● ● ● ● ●●                                    ●
                                                                                                                                    ●   midsize
       hwy




                              ●       ●                 ● ●             ●
                              ● ● ●●                  ● ● ●             ●                                                           ●   minivan
                      ● ● ● ●●                ● ●●              ●● ● ●                                   ●             ●

             25       ●               ●       ● ●●              ●       ●                         ●                    ●            ●   pickup
                                  ●● ●● ●               ●                    ●                                                 ●
                                      ●       ●                         ● ● ●        ●                   ●
                                                                                                                                    ●   subcompact
                                          ●       ●     ●               ●            ●                                              ●   suv
                                                                        ●            ●

             20                           ●                                  ●                    ●●
                                                            ●       ●        ●       ●●           ●
                                                                    ●        ● ● ●                    ● ●●
                                                        ●●                  ●● ●     ●●      ● ● ●       ●     ●           ●
                                                                                     ●●          ● ●

             15                                                              ●       ●●          ●●●     ● ●
                                                                                                  ●                ●


                                                                                         ●



                          2                       3                         4                5               6                 7
qplot(displ, hwy, colour = class, data = mpg)
                      displ
●




                          ●

             40


                      ●
                      ●

             35       ●
                      ●
                  ● ●                                                                                                              class
                  ●                   ●
                              ●   ●●
                                                                                                                                    ●   2seater
             30       ● ●         ●                                                                                                 ●   compact
                  ● ● ● ● ●●                                    ●
                                                                                                                                    ●   midsize
       hwy




                              ●       ●                 ● ●             ●
                              ● ● ●●                  ● ● ●             ●                                                           ●   minivan
                      ● ● ● ●●                ● ●●              ●● ● ●                                   ●             ●

             25       ●               ●       ● ●●              ●       ●                         ●                    ●            ●   pickup
                                  ●● ●● ●               ●                    ●                                                 ●
                                      ●       ●                         ● ● ●        ●                   ●
                                                                                                                                    ●   subcompact
                                          ●       ●     ●               ●            ●                                              ●   suv
                                                                        ●            ●

             20                           ●                                  ●                    ●●
                                                            ●       ●        ●       ●●           ●
                                                                    ●        ● ● ●                    ● ●●                       Legend chosen and
                                                                                                                               displayed automatically.
                                                        ●●                  ●● ●     ●●      ● ● ●       ●     ●           ●
                                                                                     ●●          ● ●

             15                                                              ●       ●●          ●●●     ● ●
                                                                                                  ●                ●


                                                                                         ●



                          2                       3                         4                5               6                 7
qplot(displ, hwy, colour = class, data = mpg)
                      displ
Your turn
Experiment with colour, size, and shape
aesthetics.
What’s the difference between discrete or
continuous variables?
What happens when you combine
multiple aesthetics?
Discrete        Continuous

          Rainbow of       Gradient from
Colour
           colours          red to blue

                           Linear mapping
          Discrete size
 Size                      between radius
             steps
                              and value

         Different shape
Shape                       Doesn’t work
             for each
Faceting

Small multiples displaying different
subsets of the data.
Useful for exploring conditional
relationships. Useful for large data.
Your turn
qplot(displ, hwy, data = mpg) +
facet_grid(. ~ cyl)
qplot(displ, hwy, data = mpg) +
facet_grid(drv ~ .)
qplot(displ, hwy, data = mpg) +
facet_grid(drv ~ cyl)
qplot(displ, hwy, data = mpg) +
facet_wrap(~ class)
Summary

facet_grid(): 2d grid, rows ~ cols, . for
no split
facet_wrap(): 1d ribbon wrapped into 2d
Aside: workflow

Keep a copy of the slides open so that
you can copy and paste the code.
For complicated commands, write them
in gedit and then copy and paste.
What’s the                                                                               ●   ●




            40   problem with                                                                    ●




                 this plot?                                                     ●   ●
                                                                                             ●


            35                                                                           ●
                                                                                         ●
                                                                                ●            ●
                                                                            ●   ●   ●
                                                               ●    ●   ●   ●

            30                                                      ●   ●       ●
                                                       ●   ●   ●    ●   ●   ●
      hwy




                                                       ●   ●   ●
                                                   ●   ●   ●   ●    ●
                                               ●   ●   ●   ●   ●    ●

            25                            ●    ●   ●   ●   ●   ●
                                          ●    ●   ●   ●
                                          ●    ●       ●
                                          ●    ●   ●
                                          ●

            20                        ●   ●    ●
                                  ●   ●   ●
                              ●   ●   ●   ●
                          ●   ●   ●   ●   ●
                          ●   ●   ●

            15            ●
                          ●


                 ●



                     10                   15                   20                   25               30       35
qplot(cty, hwy, data = mpg)
                       cty
●   ●




                                                                                              ●

            40


                                                                                          ●
                                                                                ●
                                                                            ●
                                                                                     ●●
            35
                                                                                      ●
                                                                           ●              ●
                                                                       ●    ● ●
                                                                             ●
                                                           ●      ●
                                                                 ●● ● ●
                                                                   ●
                                                                           ●
            30                                                  ●
                                                                   ●
                                                                     ●
                                                    ●
                                                    ●●          ●● ● ●
                                                                 ●●
                                                                  ●
                                                                 ●●
                                                          ● ● ●● ●●    ●
                                                     ●     ●● ●
      hwy




                                                        ●
                                                        ●●     ●
                                                 ● ●● ●● ●    ● ●
                                                          ● ●  ● ●
                                                      ● ● ●
                                               ● ● ●● ●● ●
                                                      ● ●
                                                      ● ●         ●
                                            ●● ●     ●● ●
                                               ●● ● ● ●
                                                      ● ●      ●
                                       ● ●         ● ●● ●●
                                                         ●
            25                             ● ●● ● ●     ●    ●
                                        ●      ● ●
                                              ● ●● ●
                                                  ●
                                        ●        ●
                                                 ●● ●
                                          ●● ●
                                             ●
                                         ● ●
                                          ●        ●●
                                          ● ●
                                           ●
                                              ● ●
                                        ●
                                          ●
                                             ●
            20                     ●● ● ● ●●
                                    ●
                                    ●    ● ●
                                 ●● ●    ●●
                                          ●
                              ● ●● ●  ● ●
                                       ●
                           ● ● ●● ● ●
                           ● ●
                                 ●
                        ●   ● ● ● ●● ● ●
                        ●
                       ●●      ●●●●●● ●
                               ●
                               ●●     ●
                        ●      ● ● ●●●
                                  ●
                        ● ● ●●● ●
                         ●
            15         ●●
                       ●●
                       ●●
                        ●
                        ●
                       ●
                        ●

                 ●
                 ●●
                 ●●



                      10               15                 20                    25                30       35
qplot(cty, hwy, data = mpg, geom = "jitter")
                       cty
●   ●




                                                                                              ●

            40


                                                                                          ●
                                                                                ●
                                                                            ●
                                                                                     ●●
            35
                                                                                      ●
                                                                           ●              ●
                                                                       ●    ● ●
                                                                             ●
                                                           ●      ●
                                                                 ●● ● ●
                                                                   ●
                                                                           ●
            30                                                  ●
                                                                   ●
                                                                     ●
                                                    ●
                                                    ●●          ●● ● ●
                                                                 ●●
                                                                  ●
                                                                 ●●
                                                          ● ● ●● ●●    ●
                                                     ●     ●● ●
      hwy




                                                        ●
                                                        ●●     ●
                                                 ● ●● ●● ●    ● ●
                                                          ● ●  ● ●
                                                      ● ● ●
                                               ● ● ●● ●● ●
                                                      ● ●
                                                      ● ●         ●
                                            ●● ●     ●● ●
                                               ●● ● ● ●
                                                      ● ●      ●
                                       ● ●         ● ●● ●●
                                                         ●
            25                             ● ●● ● ●     ●    ●
                                        ●      ● ●
                                              ● ●● ●
                                                  ●
                                        ●        ●
                                                 ●● ●
                                          ●● ●
                                             ●
                                         ● ●
                                          ●        ●●
                                          ● ●
                                           ●
                                              ● ●
                                        ●
                                          ●
                                             ●
            20                     ●● ● ● ●●
                                    ●
                                    ●    ● ●
                                 ●● ●    ●●
                                          ●
                              ● ●● ●  ● ●
                                       ●
                           ● ● ●● ● ●
                           ● ●
                                 ●
                        ●   ● ● ● ●● ● ●
                        ●
                       ●●      ●●●●●● ●
                               ●
                               ●●     ●
                        ●      ● ● ●●●
                                  ●
                        ● ● ●●● ●
                         ●
            15         ●●
                       ●●
                       ●●
                        ●
                        ●
                       ●
                        ●

                 ●
                 ●●
                                                                                      geom controls
                 ●●
                                                                                      “type” of plot
                      10               15                 20                    25                30       35
qplot(cty, hwy, data = mpg, geom = "jitter")
                       cty
●                                       ●




                                                                ●

      40


                        ●
                                                                ●

      35                ●
                                                                ●
                        ●                                       ●
                                  ●                             ●
                        ●         ●

      30                ●         ●
                        ●         ●                             ●
hwy




                        ●         ●                             ●
                        ●         ●                             ●         ●
              ●         ●         ●                             ●         ●

      25      ●         ●         ●                             ●         ●
              ●         ●         ●         ●                   ●         ●
              ●         ●         ●         ●                   ●         ●
                                            ●        ●          ●         ●
                                            ●                   ●

      20                                             ●          ●         ●
                                                     ●                    ●
                                                     ●                    ●
                                            ●        ●                    ●
                                                     ●                    ●

      15                                             ●                    ●
                                                                          ●


                                                     ●                    ●



           2seater   compact   midsize   minivan   pickup   subcompact   suv
qplot(class, hwy, data = mpg)
                          class
How could     ●                                       ●




           we improve
                                                                 ●

      40


           this plot?    ●
                                                                 ●

      35                 ●
                                                                 ●



           Brainstorm    ●


                         ●
                                   ●
                                   ●
                                                                 ●
                                                                 ●


      30
           for 1 minute.
                         ●         ●
                         ●         ●                             ●
hwy




                         ●         ●                             ●
                         ●         ●                             ●         ●
               ●         ●         ●                             ●         ●

      25       ●         ●         ●                             ●         ●
               ●         ●         ●         ●                   ●         ●
               ●         ●         ●         ●                   ●         ●
                                             ●        ●          ●         ●
                                             ●                   ●

      20                                              ●          ●         ●
                                                      ●                    ●
                                                      ●                    ●
                                             ●        ●                    ●
                                                      ●                    ●

      15                                              ●                    ●
                                                                           ●


                                                      ●                    ●



            2seater   compact   midsize   minivan   pickup   subcompact   suv
qplot(class, hwy, data = mpg)
                          class
●           ●




                                                                     ●

      40


                                                                                 ●
                                                                     ●

      35                                                                         ●
                                                                     ●
                                                                     ●           ●
                                                          ●          ●
                                                          ●                      ●

      30                                                  ●                      ●
                                                          ●          ●           ●
hwy




                                                          ●          ●           ●
                     ●                                    ●          ●           ●
                     ●                      ●             ●          ●           ●

      25             ●                      ●             ●          ●           ●
                     ●       ●              ●             ●          ●           ●
                     ●       ●              ●             ●          ●           ●
             ●       ●       ●                                       ●
                             ●                                       ●

      20     ●       ●                                               ●
             ●       ●
             ●       ●
             ●       ●       ●
             ●       ●

      15     ●       ●
                     ●


             ●       ●



           pickup   suv   minivan        2seater       midsize   subcompact   compact
                                 reorder(class, hwy)
●           ●




                                                                 ●

      40


                                                                             ●
                                                                 ●

      35                                                                     ●
                                                                 ●
                                                                 ●           ●
                                                      ●          ●
                                                      ●                      ●

      30                                              ●                      ●
                                                      ●          ●           ●
hwy




                                                      ●          ●           ●
                       ●                              ●          ●           ●
                       ●                    ●         ●          ●           ●

      25               ●                    ●         ●          ●           ●
                       ●          ●         ●         ●          ●           ●
                       ●          ●         ●         ●          ●           ●
             ●         ●          ●                              ●
                                  ●                              ●

      20     ●         ●                                         ●
             ●         ●
             ●         ●
             ●         ●          ●
             ●         ●

      15     ●         ●

           Incredibly useful
                       ●


             ●
              technique!
                       ●



           pickup     suv      minivan   2seater   midsize   subcompact   compact
qplot(reorder(class, hwy), hwy, data = mpg)
                      reorder(class, hwy)
●                            ●




                                                                                                                                                                ●

      40

                                                                                                                                                                            ●

                                                                                                                                                    ●
                                                                                                                                   ●
                                                                                                                                                                                 ●    ●
      35
                                                                                                                                           ●
                                                                                                                                                                        ●
                                                                                                                                                            ●
                                                                                                                 ●                 ●       ●
                                                                                                            ●                                           ●
                                                                                                            ●                                                                   ●          ●
                                                                                                             ●       ●                                                           ●
                                                                                                                                                                        ●
      30                                                                                                              ●   ●                                                               ●

                                                                                                                  ● ● ●       ●●           ●       ●●                  ● ● ● ●
                                                                                                                                                                       ●
                                                                                                                                                                    ● ●●
                                                                                                                                                                     ●
                                                                                                                                   ●                        ●       ●●  ●
                                                                                                                      ●                                     ●             ●
                                                                                                                 ● ● ●                 ●
hwy




                                                                                                                    ●                                               ●                ● ●
                                                                                                                                                                                     ● ●
                                                            ●                                                ● ●     ●●                                         ●   ●
                                                ●                                               ●             ●● ●
                                                                                                              ●        ●
                                                                                                                         ●         ●●              ●●                   ●        ●
                                                                                                    ●       ● ●
                                                                                                            ● ●
                                                                                                            ●         ●●●          ● ●                                   ●        ●     ●●
                                                                                                            ●                                               ●                          ●
                                                                                                                                           ●                            ● ●
      25                                    ●       ●                                     ●                          ● ●●
                                                                                                                                                                                 ●●
                                                                                                                                                                                 ●
                                                                                                                                                                                       ●
                                                                                                                                                                                       ●
                                        ●                            ●                ●                                                    ●                 ●              ●    ●
                                                                                                        ●                              ●
                                                                             ●●                              ●                                              ●
                                                                                                    ●                ●
                                                        ●                         ●                                                    ●                                               ●
                                                                 ●
                           ●                    ●                                 ●                                                                     ●
                                                                         ●        ●                                                                      ●
                                                                 ●
                                                                                                                                               ●
               ●       ●
      20                   ●
                                   ●     ●
                                       ● ●          ●●      ●                                                                                           ●
             ●                                      ●●      ●●
           ●●      ●                ●                       ●●
                                          ●      ●
                               ●      ●
             ●                        ●● ● ●●●
              ●     ●
                    ●    ●         ● ● ● ●●
                ●  ●     ●         ●●● ●     ●     ●                     ●
            ● ● ●
             ●                         ●● ●     ●  ●
            ●● ●        ●                      ● ●
                  ●
           ●                          ● ●
      15   ● ●        ●              ● ●        ● ●
                                         ●
                                         ●


           ●           ●   ●                 ●          ●




               pickup                        suv                     minivan                  2seater            midsize           subcompact                           compact
qplot(reorder(class, hwy),reorder(class, hwy) = mpg, geom = "jitter")
                            hwy, data
●           ●




                                                            ●

      40


                                                                        ●


      35                                                                ●




      30
hwy




                     ●
                     ●

      25             ●
                     ●
                     ●
             ●


      20


                             ●


      15


             ●       ●



           pickup   suv   minivan   2seater   midsize   subcompact   compact
qplot(reorder(class, hwy), hwy, data hwy)mpg, geom = "boxplot")
                           reorder(class, =
●
                                                                                                                                               ●                            ●
                                                                                                                                                                            ●




                                                                                                                                               ●           ●

      40


                                                                                                                                                                            ● ●
                                                                                                                                           ● ●

      35                                                                                                                                                           ●        ● ●
                                                                                                                                               ●
                                                                                                                                                                                   ●
                                                                                                                                                           ●
                                                                                                                        ●             ● ● ●
                                                                                                       ●                                                           ●                 ●
                                                                                                               ●        ●●                                                       ●
      30                                                                                               ●           ●    ●                                              ●
                                                                                                                                                                      ●●
                                                                                                                                                                                ●●
                                                                                                                        ●
                                                                                                                       ● ●                                 ●                      ●●
                                                                                                                                  ●   ●●●                           ●● ●●       ● ●●
                                                                                                                           ●
                                                                                                               ●       ●               ●                           ●
hwy




                                                                                                                        ●
                                                                                                                        ●                              ●
                                                                                                       ●                 ●             ●                                    ●● ●●
                                                 ● ●                                                                                                               ●           ●
                                                                                                           ●     ●            ●                                                 ●
                                         ●                                                             ●        ●
                                                                                                                ● ●          ●    ●   ●●●                  ●              ●●
                                                 ●                                             ●               ●●            ●
                                                                                                                            ●●                                        ●      ●●
                                                                                                                                                                              ●
                                                                                       ●                   ●    ●
                                                                                                                ●           ●       ●   ●●                              ●
                                                                                                   ●
      25                             ●           ●         ●                                               ●                ●                                        ● ●● ●
                                                                                                                             ●
                                                                                                                             ●                             ●       ●            ●
                                                 ●             ●                                   ●                              ●                ●
                                                                                                                                                   ●                         ●●
                                                               ●       ●   ●                                   ●                                       ●
                                                       ●                                                                                                             ●
                                     ●           ●                                         ●               ●                                       ●
                                                               ●                   ●
                   ●                                               ●
                   ●                             ●                 ●
                                                                                   ●                                                       ●           ●
                                                                                                                                                               ●
                                                                               ●
                           ●          ●
      20   ●       ●               ●● ●
                                    ●        ●
                                             ●                                                                                                 ●
                                            ●● ● ●●
           ●               ●● ●    ● ●       ●    ●
                            ●
                            ●          ●   ● ●  ●
                                         ●
                                         ●     ●
                       ●                ●      ●                       ●
           ● ●●
              ●     ● ●●           ●●●●● ● ● ●
                                     ●
                                      ● ●● ● ● ● ●
                                         ●                             ●
                ● ● ●                          ●
                 ●                     ●
            ●   ●
                ●    ●
                   ●                 ●       ●
      15    ●     ●    ●             ●        ●●           ●
                                             ●
                                             ●

                           ●             ●
               ●       ●                         ●
                                                 ●
                               ●



qplot(reorder(class,minivan
       pickup suv
                      hwy), 2seater data = subcompact
                                  hwy, midsize mpg,                                                                                                                    compact
  geom = c("jitter", "boxplot"))
                         reorder(class, hwy)
Your turn

Read the help for reorder. Redraw the
previously plots with class ordered by
median hwy.
How would you put the jittered points on
top of the boxplots?
Aside: coding strategy

At the end of each interactive session, you
want a summary of everything you did. Two
options:
1. Save everything you did with savehistory()
then remove the unimportant bits.
2. Build up the important bits as you go.
(this is how I work)

More Related Content

PDF
1 basics
PDF
Model Visualisation (with ggplot2)
PDF
04 Wrapup
PDF
14 case-study
PDF
17 polishing
PDF
20 Polishing
PDF
12 adv-manip
PDF
SevillaR meetup: dplyr and magrittr
1 basics
Model Visualisation (with ggplot2)
04 Wrapup
14 case-study
17 polishing
20 Polishing
12 adv-manip
SevillaR meetup: dplyr and magrittr

Similar to 01 Intro (20)

PDF
Over Visie, Missie En Strategie
PDF
About Vision, Mission And Strategy
PDF
02 large
PDF
02 Large
PDF
01 intro
PDF
Modul mulus bahagian c sjk (modul murid)
PDF
Modul mulus bahagian c sjk (modul guru)
PDF
Modul mulus bahagian c sk (modul murid)
PDF
Modul mulus bahagian c sk (modul guru)
PDF
13 Bivariate
PDF
Los Angeles R users group - July 12 2011 - Part 1
PDF
Fairisle knitting
PDF
正誤表 p39
PDF
17 Sampling Dist
PDF
08 Continuous
PDF
08 Continuous
PDF
研修企画書11 12term voda-カヤック
PDF
PPTX
How People Use Facebook -- And Why It Matters
PDF
研修企画書11-12term voda-カヤック
Over Visie, Missie En Strategie
About Vision, Mission And Strategy
02 large
02 Large
01 intro
Modul mulus bahagian c sjk (modul murid)
Modul mulus bahagian c sjk (modul guru)
Modul mulus bahagian c sk (modul murid)
Modul mulus bahagian c sk (modul guru)
13 Bivariate
Los Angeles R users group - July 12 2011 - Part 1
Fairisle knitting
正誤表 p39
17 Sampling Dist
08 Continuous
08 Continuous
研修企画書11 12term voda-カヤック
How People Use Facebook -- And Why It Matters
研修企画書11-12term voda-カヤック
Ad

More from Hadley Wickham (20)

PDF
27 development
PDF
27 development
PDF
24 modelling
PDF
23 data-structures
PDF
Graphical inference
PDF
R packages
PDF
PDF
PDF
20 date-times
PDF
19 tables
PDF
18 cleaning
PDF
16 critique
PDF
15 time-space
PDF
13 case-study
PDF
11 adv-manip
PDF
11 adv-manip
PDF
10 simulation
PDF
10 simulation
PDF
09 bootstrapping
PDF
08 functions
27 development
27 development
24 modelling
23 data-structures
Graphical inference
R packages
20 date-times
19 tables
18 cleaning
16 critique
15 time-space
13 case-study
11 adv-manip
11 adv-manip
10 simulation
10 simulation
09 bootstrapping
08 functions
Ad

Recently uploaded (20)

PDF
A novel scalable deep ensemble learning framework for big data classification...
PPTX
The various Industrial Revolutions .pptx
PDF
Getting Started with Data Integration: FME Form 101
PPTX
O2C Customer Invoices to Receipt V15A.pptx
PDF
1 - Historical Antecedents, Social Consideration.pdf
PDF
Developing a website for English-speaking practice to English as a foreign la...
PDF
WOOl fibre morphology and structure.pdf for textiles
PPT
Module 1.ppt Iot fundamentals and Architecture
PPTX
Chapter 5: Probability Theory and Statistics
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PDF
Zenith AI: Advanced Artificial Intelligence
PDF
Web App vs Mobile App What Should You Build First.pdf
PPTX
Modernising the Digital Integration Hub
PDF
Getting started with AI Agents and Multi-Agent Systems
PDF
Microsoft Solutions Partner Drive Digital Transformation with D365.pdf
PPTX
OMC Textile Division Presentation 2021.pptx
PPTX
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
PPTX
Final SEM Unit 1 for mit wpu at pune .pptx
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
Univ-Connecticut-ChatGPT-Presentaion.pdf
A novel scalable deep ensemble learning framework for big data classification...
The various Industrial Revolutions .pptx
Getting Started with Data Integration: FME Form 101
O2C Customer Invoices to Receipt V15A.pptx
1 - Historical Antecedents, Social Consideration.pdf
Developing a website for English-speaking practice to English as a foreign la...
WOOl fibre morphology and structure.pdf for textiles
Module 1.ppt Iot fundamentals and Architecture
Chapter 5: Probability Theory and Statistics
NewMind AI Weekly Chronicles - August'25-Week II
Zenith AI: Advanced Artificial Intelligence
Web App vs Mobile App What Should You Build First.pdf
Modernising the Digital Integration Hub
Getting started with AI Agents and Multi-Agent Systems
Microsoft Solutions Partner Drive Digital Transformation with D365.pdf
OMC Textile Division Presentation 2021.pptx
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
Final SEM Unit 1 for mit wpu at pune .pptx
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Univ-Connecticut-ChatGPT-Presentaion.pdf

01 Intro

  • 1. Stat405 Statistical computing & graphics Hadley Wickham
  • 2. 1. Introductions 2. Syllabus 3. Introduction to linux 4. Introduction to R 5. Basic graphics
  • 3. HE LL O my name is Hadley
  • 4. had.co.nz/stat405 (if you can’t remember just google stat405) hadley@rice.edu
  • 5. About me From New Zealand Divisional advisor for McMurtry Major advisor for statistics
  • 8. Essential tools The terminal to run R. gedit to edit your R code. To load the terminal, right-click on the desktop. To load R, type R in the terminal. To load gedit, type gedit & in the terminal (the & tells it to run separately). To open a file in gedit, type gedit filename &
  • 9. Setup Work through the instructions at http:// had.co.nz/stat405/linux.html. I’ll circulate and make sure everyone gets set up right.
  • 10. Terminal essentials Mouse select = Copy Middle button = Paste Ctrl + A = home Ctrl + D = end Alt + tab = change applications Press tab to complete file names
  • 14. Scatterplot basics install.packages("ggplot2") library(ggplot2) ?mpg head(mpg) str(mpg) Always explicitly summary(mpg) specify the data qplot(displ, hwy, data = mpg)
  • 15. ● 40 ● ● 35 ● ● ● ● ● ● ● ●● 30 ● ● ● ● ● ● ● ●● ● hwy ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ●● ●● ● ● ● ● 25 ● ● ● ●● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 20 ● ● ●● ● ● ● ●● ● ● ● ● ● ● ●● ●● ●● ● ●● ● ● ● ● ● ● ●● ● ● 15 ● ●● ●●● ● ● ● ● ● 2 3 4 5 6 7 qplot(displ, hwy, data = mpg) displ
  • 16. Additional variables Can display additional variables with aesthetics (like shape, colour, size) or facetting (small multiples displaying different subsets)
  • 17. ● 40 ● ● 35 ● ● ● ● class ● ● ● ●● ● 2seater 30 ● ● ● ● compact ● ● ● ● ●● ● ● midsize hwy ● ● ● ● ● ● ● ●● ● ● ● ● ● minivan ● ● ● ●● ● ●● ●● ● ● ● ● 25 ● ● ● ●● ● ● ● ● ● pickup ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● subcompact ● ● ● ● ● ● suv ● ● 20 ● ● ●● ● ● ● ●● ● ● ● ● ● ● ●● ●● ●● ● ●● ● ● ● ● ● ● ●● ● ● 15 ● ●● ●●● ● ● ● ● ● 2 3 4 5 6 7 qplot(displ, hwy, colour = class, data = mpg) displ
  • 18. ● 40 ● ● 35 ● ● ● ● class ● ● ● ●● ● 2seater 30 ● ● ● ● compact ● ● ● ● ●● ● ● midsize hwy ● ● ● ● ● ● ● ●● ● ● ● ● ● minivan ● ● ● ●● ● ●● ●● ● ● ● ● 25 ● ● ● ●● ● ● ● ● ● pickup ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● subcompact ● ● ● ● ● ● suv ● ● 20 ● ● ●● ● ● ● ●● ● ● ● ● ● ● ●● Legend chosen and displayed automatically. ●● ●● ● ●● ● ● ● ● ● ● ●● ● ● 15 ● ●● ●●● ● ● ● ● ● 2 3 4 5 6 7 qplot(displ, hwy, colour = class, data = mpg) displ
  • 19. Your turn Experiment with colour, size, and shape aesthetics. What’s the difference between discrete or continuous variables? What happens when you combine multiple aesthetics?
  • 20. Discrete Continuous Rainbow of Gradient from Colour colours red to blue Linear mapping Discrete size Size between radius steps and value Different shape Shape Doesn’t work for each
  • 21. Faceting Small multiples displaying different subsets of the data. Useful for exploring conditional relationships. Useful for large data.
  • 22. Your turn qplot(displ, hwy, data = mpg) + facet_grid(. ~ cyl) qplot(displ, hwy, data = mpg) + facet_grid(drv ~ .) qplot(displ, hwy, data = mpg) + facet_grid(drv ~ cyl) qplot(displ, hwy, data = mpg) + facet_wrap(~ class)
  • 23. Summary facet_grid(): 2d grid, rows ~ cols, . for no split facet_wrap(): 1d ribbon wrapped into 2d
  • 24. Aside: workflow Keep a copy of the slides open so that you can copy and paste the code. For complicated commands, write them in gedit and then copy and paste.
  • 25. What’s the ● ● 40 problem with ● this plot? ● ● ● 35 ● ● ● ● ● ● ● ● ● ● ● 30 ● ● ● ● ● ● ● ● ● hwy ● ● ● ● ● ● ● ● ● ● ● ● ● ● 25 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 20 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 15 ● ● ● 10 15 20 25 30 35 qplot(cty, hwy, data = mpg) cty
  • 26. ● ● 40 ● ● ● ●● 35 ● ● ● ● ● ● ● ● ● ●● ● ● ● ● 30 ● ● ● ● ●● ●● ● ● ●● ● ●● ● ● ●● ●● ● ● ●● ● hwy ● ●● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ●● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ●● ●● ● 25 ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● 20 ●● ● ● ●● ● ● ● ● ●● ● ●● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ●●●●●● ● ● ●● ● ● ● ● ●●● ● ● ● ●●● ● ● 15 ●● ●● ●● ● ● ● ● ● ●● ●● 10 15 20 25 30 35 qplot(cty, hwy, data = mpg, geom = "jitter") cty
  • 27. ● ● 40 ● ● ● ●● 35 ● ● ● ● ● ● ● ● ● ●● ● ● ● ● 30 ● ● ● ● ●● ●● ● ● ●● ● ●● ● ● ●● ●● ● ● ●● ● hwy ● ●● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ●● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ●● ●● ● 25 ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● 20 ●● ● ● ●● ● ● ● ● ●● ● ●● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ●●●●●● ● ● ●● ● ● ● ● ●●● ● ● ● ●●● ● ● 15 ●● ●● ●● ● ● ● ● ● ●● geom controls ●● “type” of plot 10 15 20 25 30 35 qplot(cty, hwy, data = mpg, geom = "jitter") cty
  • 28. ● ● 40 ● ● 35 ● ● ● ● ● ● ● ● 30 ● ● ● ● ● hwy ● ● ● ● ● ● ● ● ● ● ● ● 25 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 20 ● ● ● ● ● ● ● ● ● ● ● ● 15 ● ● ● ● ● 2seater compact midsize minivan pickup subcompact suv qplot(class, hwy, data = mpg) class
  • 29. How could ● ● we improve ● 40 this plot? ● ● 35 ● ● Brainstorm ● ● ● ● ● ● 30 for 1 minute. ● ● ● ● ● hwy ● ● ● ● ● ● ● ● ● ● ● ● 25 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 20 ● ● ● ● ● ● ● ● ● ● ● ● 15 ● ● ● ● ● 2seater compact midsize minivan pickup subcompact suv qplot(class, hwy, data = mpg) class
  • 30. ● ● 40 ● ● 35 ● ● ● ● ● ● ● ● 30 ● ● ● ● ● hwy ● ● ● ● ● ● ● ● ● ● ● ● 25 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 20 ● ● ● ● ● ● ● ● ● ● ● ● 15 ● ● ● ● ● pickup suv minivan 2seater midsize subcompact compact reorder(class, hwy)
  • 31. ● ● 40 ● ● 35 ● ● ● ● ● ● ● ● 30 ● ● ● ● ● hwy ● ● ● ● ● ● ● ● ● ● ● ● 25 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 20 ● ● ● ● ● ● ● ● ● ● ● ● 15 ● ● Incredibly useful ● ● technique! ● pickup suv minivan 2seater midsize subcompact compact qplot(reorder(class, hwy), hwy, data = mpg) reorder(class, hwy)
  • 32. ● ● 40 ● ● ● ● ● 35 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 30 ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● hwy ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ●● ● ● ● ● ● ● 25 ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 20 ● ● ● ● ● ●● ● ● ● ●● ●● ●● ● ● ●● ● ● ● ● ● ●● ● ●●● ● ● ● ● ● ● ● ●● ● ● ● ●●● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● 15 ● ● ● ● ● ● ● ● ● ● ● ● ● ● pickup suv minivan 2seater midsize subcompact compact qplot(reorder(class, hwy),reorder(class, hwy) = mpg, geom = "jitter") hwy, data
  • 33. ● ● 40 ● 35 ● 30 hwy ● ● 25 ● ● ● ● 20 ● 15 ● ● pickup suv minivan 2seater midsize subcompact compact qplot(reorder(class, hwy), hwy, data hwy)mpg, geom = "boxplot") reorder(class, =
  • 34. ● ● ● ● ● 40 ● ● ● ● 35 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● 30 ● ● ● ● ●● ●● ● ● ● ● ●● ● ●●● ●● ●● ● ●● ● ● ● ● ● hwy ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ●● ● ● ●● ● ●● ● ●● ● ● ● ● ● ● ● ●● ● ● 25 ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 20 ● ● ●● ● ● ● ● ● ●● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ●●●●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 15 ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● qplot(reorder(class,minivan pickup suv hwy), 2seater data = subcompact hwy, midsize mpg, compact geom = c("jitter", "boxplot")) reorder(class, hwy)
  • 35. Your turn Read the help for reorder. Redraw the previously plots with class ordered by median hwy. How would you put the jittered points on top of the boxplots?
  • 36. Aside: coding strategy At the end of each interactive session, you want a summary of everything you did. Two options: 1. Save everything you did with savehistory() then remove the unimportant bits. 2. Build up the important bits as you go. (this is how I work)