SlideShare a Scribd company logo
plyr
                         Wrap up


                       Hadley Wickham
Tuesday, 7 July 2009
1. Fitting multiple models to the same
                   data
                2. Reporting progress & dealing with
                   errors
                3. Overall structure & correspondence to
                   base R functions
                4. Plans
                5. Feedback

Tuesday, 7 July 2009
Multiple models
                       May need to fit multiple models to the
                       same data, with varying parameters or
                       many random starts.
                       Two plyr functions make this easy: rlply
                       & mlply
                       Example: fitting a neural network



Tuesday, 7 July 2009
1.0                                                                                                                                                                                         ●

                                                                                                                                                      ●                                                        ●
                                                                                                                              ●                                           ●
                                                                                                                                      ●                                           ●                    ●
                                                                                                                                                                      ●
                                                                                                                      ●                           ●       ●                                        ●
                                                                                                  ●                                                                   ●                        ●                   ●
                                                                                          ●               ●                                                                                        ●
                                                                                                                                                  ●                               ●           ●            ●
                                                                      ●                       ●                                   ●       ●                                                   ●
                                                                                                          ●               ●                                   ●                   ●
                           0.8                                                                    ●                                               ●
                                                                                      ●               ●               ●               ●                               ●               ●                            ●
                                                                  ●                                                                                                                                        ●
                                                                       ●                                                      ●                                   ●           ●           ●●
                                                                                       ●                                                                  ●                                            ●
                                                                                                                                          ●
                                                                                                                  ●           ●                   ●
                                                          ●                   ●                   ●
                                                                                                                                              ●           ●                                                    ●
                                                                                      ●                   ●                                                           ●       ●           ●
                                                                                                                          ●           ●                                                                    ●
                                                 ●                                                                                                                                             ●
                                                                      ●                       ●                                                                   ● ●
                                     ●                                ●●                                                  ●                       ● ●                                 ●
                           0.6                                                    ●                                                       ●
                                                              ●            ●                          ●           ●           ●                   ●
                                                                                                                      ●                                       ●                   ●            ●
                                                     ●                 ●                                                          ●                               ●
                                                                                       ●                      ●                           ●
                                                              ●                                                                                                                           ●
                                                                                              ●       ●                                       ●                                   ●                                        class
                                                                               ●                              ●               ●                               ●
                                                     ●                                                                                ●                                                                                     ●   A
                       y




                                                                               ●          ●               ●       ●                                       ●●                  ●
                                                  ●               ●                                                                                                                       ●
                                                                                                  ●                           ●       ● ●                                                                                   ●   B
                                                 ● ●                      ●                                                                                               ●
                                                                                                                                                                                  ●
                           0.4                                                        ●                   ●       ●
                                                                                                          ●                                   ●                               ●
                                             ●            ●                                                                           ●                   ●
                                                          ●                                                               ●
                                  ●                                                   ●                                           ●                                       ●
                                                                       ●                                      ●
                                                                                                                                                  ●       ●
                                         ●                                                                                        ●
                                                              ●            ●                                                  ●       ●                                           ●
                                 ●               ●                                                                ●                                                   ●
                                                 ●
                                                 ●
                                                                                                                                              ●                                   ●
                                                                   ●               ●                              ●           ●       ●               ●
                           0.2                                ●
                                                                                              ● ●                                                             ●
                                         ●                                     ●                                          ●
                                                                                          ●
                                                     ●●                                           ●               ●                       ●
                                  ●                                    ●
                                                                  ●                ●
                                                          ●                                               ●               ●       ●
                                             ●                                            ●
                                                              ●        ●
                                                     ●                                                        ●
                                         ●                ●                   ●               ●

                           0.0                                     ●



                                 0.0                              0.2                                 0.4                                 0.6                                     0.8                              1.0
                                                                                                                      x
Tuesday, 7 July 2009
library(nnet)
      library(ggplot2)

      w <- read.csv("wiggly.csv")
      qplot(x, y, data = w, colour = class)

      accuracy <- function(mod, true) {
        pred <- factor(predict(mod, type = "class"),
          levels = levels(true))
        tb <- table(pred, true)
        sum(diag(tb)) / sum(tb)
      }

      nnet(class ~ x + y, data = w, size = 3)

Tuesday, 7 July 2009
rlply

                       A little different to the other plyr functions:
                       first argument is number of times to run,
                       second argument is an expression (not a
                       function).
                       Automatically adds run number (.n) to
                       labels.



Tuesday, 7 July 2009
models <- rlply(50, nnet(class ~ x + y, data = w,
      size = 3, trace = FALSE))

      accdf <- ldply(models, "accuracy", true = w$class)
      accdf
      qplot(accuracy, data = accdf, binwith = 0.02)




Tuesday, 7 July 2009
mlply
                       What if we want to systematically vary the
                       input parameters?
                       mlply allows us to vary all of the
                       arguments to the applicator function, not
                       just the first argument
                       Input is a data frame of parameter values



Tuesday, 7 July 2009
wiggly_nnet <- function(...) {
        nnet(class ~ x + y, data = w, trace = FALSE, ...)
      }
      rlply(5, wiggly_nnet(size = 3))

      # Unfortunately need 2+ parameters because of bug
      opts <- data.frame(size = 1:10, maxiter = 50)
      opts

      models <- mlply(opts, wiggly_nnet)
      ldply(models, "accuracy", true = w$class)

      # expand.grid() useful if you want to explore
      # all combinations

Tuesday, 7 July 2009
Progress & errors



Tuesday, 7 July 2009
Progress bars
                       Things seem to take much less time when
                       you are regularly updated on their
                       progress.
                       Plyr provides convenient method for
                       doing this: all arguments
                       accept .progress = "text" argument




Tuesday, 7 July 2009
Error handling
                       Helper function: failwith
                       Takes a function as an input and returns a
                       function as output, but instead of
                       throwing an error it will return the value
                       you specify.
                       failwith(NULL, lm)(cty ~ displ, data = mpg)
                       failwith(NULL, lm)(cty ~ displ, data = NULL)



Tuesday, 7 July 2009
Overall structure



Tuesday, 7 July 2009
array   data frame    list   nothing

             array     aaply     adply      alply    a_ply

      data frame       daply     ddply      dlply   d_ply

                list   laply     ldply      llply    l_ply

     n replicates      raply     rdply      rlply    r_ply

       function
                       maply     mdply      mlply   m_ply
      arguments

Tuesday, 7 July 2009
No output

                       Useful for functions called purely for their
                       side effects: write.table, save, graphics.
                       If .print = TRUE will print each result
                       (particularly useful lattice and ggplot2 graphics)




Tuesday, 7 July 2009
Your turn

                       With your partner, using your collective R
                       knowledge, come up with all of the
                       functions in base R (or contributed
                       packages) that do the same thing.




Tuesday, 7 July 2009
array      data frame      list     nothing

             array      apply        adply        alply      a_ply

      data frame        daply      aggregate       by       d_ply

                list    sapply       ldply       lapply      l_ply

     n replicates      replicate     rdply      replicate    r_ply

       function
                       mapply        mdply      mapply      m_ply
      arguments

Tuesday, 7 July 2009
Plans

                       Deal better with large and larger data:
                       trivial parallelisation & on-disk data (sql
                       etc)
                       Stay tuned for details.




Tuesday, 7 July 2009
http://hadley.wufoo.com/
                 forms/course-evaluation/



Tuesday, 7 July 2009

More Related Content

PDF
Model Visualisation (with ggplot2)
PDF
1 basics
PDF
01 Intro
PDF
17 polishing
PDF
14 case-study
PDF
03 Modelling
PDF
01 Intro
PDF
02 Ddply
Model Visualisation (with ggplot2)
1 basics
01 Intro
17 polishing
14 case-study
03 Modelling
01 Intro
02 Ddply

Similar to 04 Wrapup (20)

PDF
02 large
PDF
02 Large
PDF
Los Angeles R users group - July 12 2011 - Part 1
PDF
08 Continuous
PDF
08 Continuous
PDF
Over Visie, Missie En Strategie
PDF
About Vision, Mission And Strategy
PDF
13 Bivariate
PDF
PPTX
How People Use Facebook -- And Why It Matters
PDF
研修企画書11 12term voda-カヤック
PDF
01 intro
PDF
研修企画書11-12term voda-カヤック
PDF
Modul mulus bahagian c sjk (modul murid)
PDF
Modul mulus bahagian c sjk (modul guru)
PDF
17 Sampling Dist
PDF
Modul mulus bahagian c sk (modul murid)
PDF
Modul mulus bahagian c sk (modul guru)
PPTX
The ecological and evolutionary impacts of altered
PDF
正誤表 p39
02 large
02 Large
Los Angeles R users group - July 12 2011 - Part 1
08 Continuous
08 Continuous
Over Visie, Missie En Strategie
About Vision, Mission And Strategy
13 Bivariate
How People Use Facebook -- And Why It Matters
研修企画書11 12term voda-カヤック
01 intro
研修企画書11-12term voda-カヤック
Modul mulus bahagian c sjk (modul murid)
Modul mulus bahagian c sjk (modul guru)
17 Sampling Dist
Modul mulus bahagian c sk (modul murid)
Modul mulus bahagian c sk (modul guru)
The ecological and evolutionary impacts of altered
正誤表 p39
Ad

More from Hadley Wickham (20)

PDF
27 development
PDF
27 development
PDF
24 modelling
PDF
23 data-structures
PDF
Graphical inference
PDF
R packages
PDF
PDF
PDF
20 date-times
PDF
19 tables
PDF
18 cleaning
PDF
16 critique
PDF
15 time-space
PDF
13 case-study
PDF
12 adv-manip
PDF
11 adv-manip
PDF
11 adv-manip
PDF
10 simulation
PDF
10 simulation
PDF
09 bootstrapping
27 development
27 development
24 modelling
23 data-structures
Graphical inference
R packages
20 date-times
19 tables
18 cleaning
16 critique
15 time-space
13 case-study
12 adv-manip
11 adv-manip
11 adv-manip
10 simulation
10 simulation
09 bootstrapping
Ad

Recently uploaded (20)

PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PDF
2021 HotChips TSMC Packaging Technologies for Chiplets and 3D_0819 publish_pu...
PPTX
Final SEM Unit 1 for mit wpu at pune .pptx
PDF
Web App vs Mobile App What Should You Build First.pdf
PPTX
Chapter 5: Probability Theory and Statistics
PPTX
O2C Customer Invoices to Receipt V15A.pptx
PPT
What is a Computer? Input Devices /output devices
PDF
A novel scalable deep ensemble learning framework for big data classification...
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PPTX
observCloud-Native Containerability and monitoring.pptx
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PPTX
1. Introduction to Computer Programming.pptx
PDF
Architecture types and enterprise applications.pdf
PPTX
Modernising the Digital Integration Hub
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
How ambidextrous entrepreneurial leaders react to the artificial intelligence...
PDF
TrustArc Webinar - Click, Consent, Trust: Winning the Privacy Game
PPTX
MicrosoftCybserSecurityReferenceArchitecture-April-2025.pptx
PDF
A contest of sentiment analysis: k-nearest neighbor versus neural network
PDF
Developing a website for English-speaking practice to English as a foreign la...
gpt5_lecture_notes_comprehensive_20250812015547.pdf
2021 HotChips TSMC Packaging Technologies for Chiplets and 3D_0819 publish_pu...
Final SEM Unit 1 for mit wpu at pune .pptx
Web App vs Mobile App What Should You Build First.pdf
Chapter 5: Probability Theory and Statistics
O2C Customer Invoices to Receipt V15A.pptx
What is a Computer? Input Devices /output devices
A novel scalable deep ensemble learning framework for big data classification...
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
observCloud-Native Containerability and monitoring.pptx
NewMind AI Weekly Chronicles - August'25-Week II
1. Introduction to Computer Programming.pptx
Architecture types and enterprise applications.pdf
Modernising the Digital Integration Hub
Programs and apps: productivity, graphics, security and other tools
How ambidextrous entrepreneurial leaders react to the artificial intelligence...
TrustArc Webinar - Click, Consent, Trust: Winning the Privacy Game
MicrosoftCybserSecurityReferenceArchitecture-April-2025.pptx
A contest of sentiment analysis: k-nearest neighbor versus neural network
Developing a website for English-speaking practice to English as a foreign la...

04 Wrapup

  • 1. plyr Wrap up Hadley Wickham Tuesday, 7 July 2009
  • 2. 1. Fitting multiple models to the same data 2. Reporting progress & dealing with errors 3. Overall structure & correspondence to base R functions 4. Plans 5. Feedback Tuesday, 7 July 2009
  • 3. Multiple models May need to fit multiple models to the same data, with varying parameters or many random starts. Two plyr functions make this easy: rlply & mlply Example: fitting a neural network Tuesday, 7 July 2009
  • 4. 1.0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.8 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● 0.6 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● class ● ● ● ● ● ● ● A y ● ● ● ● ●● ● ● ● ● ● ● ● ● ● B ● ● ● ● ● 0.4 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.2 ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.0 ● 0.0 0.2 0.4 0.6 0.8 1.0 x Tuesday, 7 July 2009
  • 5. library(nnet) library(ggplot2) w <- read.csv("wiggly.csv") qplot(x, y, data = w, colour = class) accuracy <- function(mod, true) { pred <- factor(predict(mod, type = "class"), levels = levels(true)) tb <- table(pred, true) sum(diag(tb)) / sum(tb) } nnet(class ~ x + y, data = w, size = 3) Tuesday, 7 July 2009
  • 6. rlply A little different to the other plyr functions: first argument is number of times to run, second argument is an expression (not a function). Automatically adds run number (.n) to labels. Tuesday, 7 July 2009
  • 7. models <- rlply(50, nnet(class ~ x + y, data = w, size = 3, trace = FALSE)) accdf <- ldply(models, "accuracy", true = w$class) accdf qplot(accuracy, data = accdf, binwith = 0.02) Tuesday, 7 July 2009
  • 8. mlply What if we want to systematically vary the input parameters? mlply allows us to vary all of the arguments to the applicator function, not just the first argument Input is a data frame of parameter values Tuesday, 7 July 2009
  • 9. wiggly_nnet <- function(...) { nnet(class ~ x + y, data = w, trace = FALSE, ...) } rlply(5, wiggly_nnet(size = 3)) # Unfortunately need 2+ parameters because of bug opts <- data.frame(size = 1:10, maxiter = 50) opts models <- mlply(opts, wiggly_nnet) ldply(models, "accuracy", true = w$class) # expand.grid() useful if you want to explore # all combinations Tuesday, 7 July 2009
  • 11. Progress bars Things seem to take much less time when you are regularly updated on their progress. Plyr provides convenient method for doing this: all arguments accept .progress = "text" argument Tuesday, 7 July 2009
  • 12. Error handling Helper function: failwith Takes a function as an input and returns a function as output, but instead of throwing an error it will return the value you specify. failwith(NULL, lm)(cty ~ displ, data = mpg) failwith(NULL, lm)(cty ~ displ, data = NULL) Tuesday, 7 July 2009
  • 14. array data frame list nothing array aaply adply alply a_ply data frame daply ddply dlply d_ply list laply ldply llply l_ply n replicates raply rdply rlply r_ply function maply mdply mlply m_ply arguments Tuesday, 7 July 2009
  • 15. No output Useful for functions called purely for their side effects: write.table, save, graphics. If .print = TRUE will print each result (particularly useful lattice and ggplot2 graphics) Tuesday, 7 July 2009
  • 16. Your turn With your partner, using your collective R knowledge, come up with all of the functions in base R (or contributed packages) that do the same thing. Tuesday, 7 July 2009
  • 17. array data frame list nothing array apply adply alply a_ply data frame daply aggregate by d_ply list sapply ldply lapply l_ply n replicates replicate rdply replicate r_ply function mapply mdply mapply m_ply arguments Tuesday, 7 July 2009
  • 18. Plans Deal better with large and larger data: trivial parallelisation & on-disk data (sql etc) Stay tuned for details. Tuesday, 7 July 2009
  • 19. http://hadley.wufoo.com/ forms/course-evaluation/ Tuesday, 7 July 2009