SlideShare a Scribd company logo
Stat405             Development


                           Hadley Wickham
Monday, 30 November 2009
1. Floating point math
                  2. Optimisation

                  3. Continuing

                     education
                  4. Feedback


Monday, 30 November 2009
Your turn
                  Perform the following calculations in R.
                  Are the answers what you expect?
                   seq(0.1, 0.9, by = 0.1) - 1:9 / 10
                   sqrt(2)^2 - 2
                  What is the property of these numbers
                  that might cause the problem?



Monday, 30 November 2009
# Each number must be stored in a finite amount of space
        # => each number can only have a finite number of digits
        # => floating point math does not work like normal math


        (1e-16 + 1) == 1
        (1e-16 + 1) * 10 == 1e-16 * 10 + 1 * 10


        1e9 + 2 - 0.1 - 1e9
        1e10 + 2 - 0.1 - 1e10
        1e11 + 2 - 0.1 - 1e11
        1e12 + 2 - 0.1 - 1e12
        1e13 + 2 - 0.1 - 1e13
        1e14 + 2 - 0.1 - 1e14



Monday, 30 November 2009
# By default R only shows 7 significant digits
        # If the trailing digits are zero, the number will be rounded
        (1 / 237)
        (1 / 237) * 237
        (1 / 237) * 237 - 1


        seq(0.1, 0.9, by = 0.1)
        seq(0.1, 0.9, by = 0.1) - 1:9 / 10


        # Tricky to get to print exactly:
        formatC((1 / 237) * 237, digits = 20)
        formatC(seq(0.1, 0.9, by = 0.1), digits = 20)




Monday, 30 November 2009
# When working with floating point numbers (numeric)
        # (but not integers, this is the one place where the
        # difference is important) never test for equality with ==


        a <- seq(0.1, 0.9, by = 0.1)
        b <- 1:9 / 10


        all(a == b)
        all.equal(a, b)
        all(abs(a - b) < 1e-6)


        # Similarly, need to be careful with < and > etc




Monday, 30 November 2009
#     Places where this matters:
     #
     #     * sums
     #     * calculating the standard deviation
     #     * inverting a matrix (condition)
     #       * linear models!
     #     * maximum likelihood estimation




Monday, 30 November 2009
Optimisation
                  If, and only if, your code is too slow
                  First use system.time() to figure out
                  exactly how long things are taking: you
                  need this so you can check your changes
                  actually speed things up
                  Then see what is taking the longest
                  amount of time with the profr package


Monday, 30 November 2009
General advice
                  • Start with the slowest part of your code
                  • Use built-in R functions, where possible
                  • Use vectorised functions, where
                    possible
                  • Think through your basic algorithm
                           • Knowledge of basic CS algorithms
                             and data structures v. helpful


Monday, 30 November 2009
Monday, 30 November 2009
Continuing education

                  Learn more about R.
                  Learn more about your other tools.
                  Professional development




Monday, 30 November 2009
Mailing list
                  Sign up to R-help: https://stat.ethz.ch/
                  mailman/listinfo/r-help
                  Make sure to set up filters
                  Skim interesting subjects and read them
                  Don’t be afraid to post
                  (use a pseudonym if necessary)


Monday, 30 November 2009
Read books
                  Phil Spector. Data Manipulation with R.
                  William N. Venables and Brian D. Ripley.
                  Modern Applied Statistics with S.
                  Frank E. Harrell. Regression Modelling
                  Strategies.
                  Jose C. Pinheiro and Douglas M. Bates.
                  Mixed-Effects Models in S and S-Plus.


Monday, 30 November 2009
Read papers

                  The R Journal: http://journal.r-project.org/
                  The Journal of Statistical Software: http://
                  www.jstatsoft.org/




Monday, 30 November 2009
Learn your tools
                  • Touch typing
                  • Text editor
                  • Command line
                  • Caffeine
                  • Email



Monday, 30 November 2009
Professional
                           development
                  The aspects of being a statistician, apart
                  from knowing statistics.
                  Principally communication: written,
                  spoken, visual and electronic.
                  Take every opportunity you can to
                  practice these skills.



Monday, 30 November 2009
Visual       Electronic

                    Written
                                       Posters      Email
                       Papers
                                       Graphics     Website
                       Vita/Resume
                                                    Blog
                       Bibliography
                       Reviews                      Code


                    Spoken
                                       Oral exam    Video
                       Teaching
                                                    Slidecast
                       Short talk
                       Long talk


Monday, 30 November 2009
Written
                  Particularly important if you want to be an
                  academic, or if you‘re PhD student, or
                  want to become one.
                  “Style: Toward Clarity and Grace”
                  Sign up for the thesis writing workshops
                  when they come around.
                  Develop a regular habit.


Monday, 30 November 2009
My habit
                  • Roll out of bed at 7am
                  • Boil water
                  • Make tea
                  • Drink tea
                  • Write for an hour



Monday, 30 November 2009
Spoken

                  Seize every opportunity to practice.
                  Make use of Tracy Volz - tmvolz@rice.edu.
                  She is a fantastic resource - if you had to
                  pay for her, you wouldn’t be able to afford
                  it.




Monday, 30 November 2009
Email



Monday, 30 November 2009
1200


         1000


         800
 value




         600                                           unread
                                                       read
         400


         200


            0

                2007       2008   2009          2010



                                         265,000 emails
                                         134,000 unread!
Monday, 30 November 2009
1200


         1000


         800
 value




         600                                    unread
                                                read
         400


         200


            0

                2007       2008   2009   2010




Monday, 30 November 2009
1.0



              0.8



              0.6
   read/all




              0.4



              0.2



              0.0

                    2007   2008          2009   2010
                                  from




Monday, 30 November 2009
350


         300


         250
 value




         200                                    direct
                                                sent
         150


         100


         50


               2007        2008   2009   2010




Monday, 30 November 2009
350


         300


         250
 value




         200                                    direct
                                                sent
         150


         100


         50


               2007        2008   2009   2010




Monday, 30 November 2009
Inbox Zero
                           http://www.43folders.com/izero

                                    Merlin Mann

         There is no way you will ever be able to respond to — let alone read in
       exquisite detail — every email you ever receive for the rest of your life. If
       you take issue with this, just wait six months, because, believe me, we’re
        all getting a lot more email (and other sundry demands on our attention)
        every day. What seems like a doddle today is going to get progressively
       more difficult — even insurmountable — unless you put a realistic system
                                       in place now.




Monday, 30 November 2009
Your time is priceless
                            (and wildly limited)

               You need an agnostic system for
             dealing with mail that isn’t based on
                nonces, exceptions, and guilt.

        [The] ultimate goal is for you to spend
         less time playing with your email and
                 more time doing stuff.
Monday, 30 November 2009
Key concepts
                  Regularly empty your inbox
                  Minimal response
                  Delete, delete, delete
                  Filters
                  Email dashes



Monday, 30 November 2009
Response does not need to be
                               proportional to request

                                   “Do you still need this?”

                                        “I don’t know”

                           “Good idea. I’ll add it to my to do list.”

                       “Here’s a link that might be what you’re
                                     looking for…”
                                           [Delete]


    http://www.43folders.com/2006/03/13/email-cheats
Monday, 30 November 2009
Delete!
                  Most minimal response is none.
                  “Just remember that every email you
                  read, re-read, and re-re-re-re-re-read as it
                  sits in that big dumb pile is actually
                  incurring mental debt on your behalf.”
                  Be brutally honest - if you’re not going to
                  do anything with the email delete it now.


Monday, 30 November 2009
Filters grey mail
                  “noisy, frequent, and non-urgent items
                  which can be dealt with all at a pass and
                  later.”
                  facebook, comments, university/
                  department memos, newsletters, mailing
                  lists
                  Good catch all: contains unsubscribe

            http://www.43folders.com/2006/03/13/filters
Monday, 30 November 2009
13
                                                     00
        bannerpcard@rice.edu, carlyn@rice.edu, (5/d /35
                                                     ay 00
        cchat@rice.edu, cmtcomment@rice.edu,           !)
        giving@rice.edu, payroll@rice.edu,
        registrar@rice.edu, sandra@rice.edu,
        sallie@rice.edu subject:(weekly message),
        alldepts@rice.edu, list:"k2i-members.rice.edu",
        list:"mailman.rice.edu"
        allfaculty@stat.rice.edu, faculty@stat.rice.edu,
        statdept@stat.rice.edu, colloquium@stat.rice.edu,
        undergrad@stat.rice.edu
        from:(statements@wageworks.com)
        from:(TIAA-CREF_eDelivery@tiaa-cref.org)
Monday, 30 November 2009
Patricia Wallace, a techno-psychologist,
      believes part of the allure of e-mail—
      for adults as well as teens—is similar to
      that of a slot machine. “You have
      intermittent, variable reinforcement,”
      she explains. You are not sure you are
      going to get a reward every time or
      how often you will, so you keep pulling
      that handle.”
Monday, 30 November 2009
Email dashes
                  Don’t have your email open all day.
                  Schedule times when you respond to
                  emails.
                  You can tackle emails a lot faster when
                  you batch them up.
                  Lack self control (like me)? Try an internet
                  blocker: http://macfreedom.com/

      http://www.43folders.com/2006/03/15/email-dash
Monday, 30 November 2009
Feedback
                  http://hadley.wufoo.com/
               forms/stat405-final-feedback/



Monday, 30 November 2009

More Related Content

PDF
05 Random Variables
PDF
13 Bivariate
PDF
PDF
07 Discrete
PDF
15 Bivariate Change Of Variables
PDF
18 Normal Cont
PDF
01 Introduction
PDF
14 Bivariate Transformations
05 Random Variables
13 Bivariate
07 Discrete
15 Bivariate Change Of Variables
18 Normal Cont
01 Introduction
14 Bivariate Transformations

Similar to 26 Development (20)

PDF
27 development
PDF
04 reports
PDF
04 Reports
PDF
PDF
OpenRepGrid – An Open Source Software for the Analysis of Repertory Grids
PDF
The Best Ways To Get Interviews
PDF
PDF
01 Intro
PPTX
Developing yourself for an alternative career
PDF
Rich Picture One Of The Tools
DOCX
Dma Competencies 2011 12
PDF
PDF
21 Polishing
PDF
08 Functions
PPTX
Degree c vs arts
PDF
Cv workbook 10 13
DOC
2010 Graduate Schemes With Early Deadlines
PDF
Dr_Zen_Harper_CV_May_2015
DOC
Cv di guglielmo-eng
PPTX
Microsoft office excel 2007 english
27 development
04 reports
04 Reports
OpenRepGrid – An Open Source Software for the Analysis of Repertory Grids
The Best Ways To Get Interviews
01 Intro
Developing yourself for an alternative career
Rich Picture One Of The Tools
Dma Competencies 2011 12
21 Polishing
08 Functions
Degree c vs arts
Cv workbook 10 13
2010 Graduate Schemes With Early Deadlines
Dr_Zen_Harper_CV_May_2015
Cv di guglielmo-eng
Microsoft office excel 2007 english
Ad

More from Hadley Wickham (20)

PDF
27 development
PDF
24 modelling
PDF
23 data-structures
PDF
Graphical inference
PDF
R packages
PDF
PDF
PDF
20 date-times
PDF
19 tables
PDF
18 cleaning
PDF
17 polishing
PDF
16 critique
PDF
15 time-space
PDF
14 case-study
PDF
13 case-study
PDF
12 adv-manip
PDF
11 adv-manip
PDF
11 adv-manip
PDF
10 simulation
PDF
10 simulation
27 development
24 modelling
23 data-structures
Graphical inference
R packages
20 date-times
19 tables
18 cleaning
17 polishing
16 critique
15 time-space
14 case-study
13 case-study
12 adv-manip
11 adv-manip
11 adv-manip
10 simulation
10 simulation
Ad

Recently uploaded (20)

PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PPTX
1. Introduction to Computer Programming.pptx
PPTX
Group 1 Presentation -Planning and Decision Making .pptx
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Machine learning based COVID-19 study performance prediction
PDF
Univ-Connecticut-ChatGPT-Presentaion.pdf
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PPTX
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
PPTX
SOPHOS-XG Firewall Administrator PPT.pptx
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PPTX
Tartificialntelligence_presentation.pptx
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
August Patch Tuesday
PDF
Mushroom cultivation and it's methods.pdf
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Reach Out and Touch Someone: Haptics and Empathic Computing
1. Introduction to Computer Programming.pptx
Group 1 Presentation -Planning and Decision Making .pptx
MIND Revenue Release Quarter 2 2025 Press Release
Machine learning based COVID-19 study performance prediction
Univ-Connecticut-ChatGPT-Presentaion.pdf
Programs and apps: productivity, graphics, security and other tools
Mobile App Security Testing_ A Comprehensive Guide.pdf
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
SOPHOS-XG Firewall Administrator PPT.pptx
Advanced methodologies resolving dimensionality complications for autism neur...
Unlocking AI with Model Context Protocol (MCP)
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
NewMind AI Weekly Chronicles - August'25-Week II
Tartificialntelligence_presentation.pptx
Diabetes mellitus diagnosis method based random forest with bat algorithm
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
August Patch Tuesday
Mushroom cultivation and it's methods.pdf

26 Development

  • 1. Stat405 Development Hadley Wickham Monday, 30 November 2009
  • 2. 1. Floating point math 2. Optimisation 3. Continuing education 4. Feedback Monday, 30 November 2009
  • 3. Your turn Perform the following calculations in R. Are the answers what you expect? seq(0.1, 0.9, by = 0.1) - 1:9 / 10 sqrt(2)^2 - 2 What is the property of these numbers that might cause the problem? Monday, 30 November 2009
  • 4. # Each number must be stored in a finite amount of space # => each number can only have a finite number of digits # => floating point math does not work like normal math (1e-16 + 1) == 1 (1e-16 + 1) * 10 == 1e-16 * 10 + 1 * 10 1e9 + 2 - 0.1 - 1e9 1e10 + 2 - 0.1 - 1e10 1e11 + 2 - 0.1 - 1e11 1e12 + 2 - 0.1 - 1e12 1e13 + 2 - 0.1 - 1e13 1e14 + 2 - 0.1 - 1e14 Monday, 30 November 2009
  • 5. # By default R only shows 7 significant digits # If the trailing digits are zero, the number will be rounded (1 / 237) (1 / 237) * 237 (1 / 237) * 237 - 1 seq(0.1, 0.9, by = 0.1) seq(0.1, 0.9, by = 0.1) - 1:9 / 10 # Tricky to get to print exactly: formatC((1 / 237) * 237, digits = 20) formatC(seq(0.1, 0.9, by = 0.1), digits = 20) Monday, 30 November 2009
  • 6. # When working with floating point numbers (numeric) # (but not integers, this is the one place where the # difference is important) never test for equality with == a <- seq(0.1, 0.9, by = 0.1) b <- 1:9 / 10 all(a == b) all.equal(a, b) all(abs(a - b) < 1e-6) # Similarly, need to be careful with < and > etc Monday, 30 November 2009
  • 7. # Places where this matters: # # * sums # * calculating the standard deviation # * inverting a matrix (condition) # * linear models! # * maximum likelihood estimation Monday, 30 November 2009
  • 8. Optimisation If, and only if, your code is too slow First use system.time() to figure out exactly how long things are taking: you need this so you can check your changes actually speed things up Then see what is taking the longest amount of time with the profr package Monday, 30 November 2009
  • 9. General advice • Start with the slowest part of your code • Use built-in R functions, where possible • Use vectorised functions, where possible • Think through your basic algorithm • Knowledge of basic CS algorithms and data structures v. helpful Monday, 30 November 2009
  • 11. Continuing education Learn more about R. Learn more about your other tools. Professional development Monday, 30 November 2009
  • 12. Mailing list Sign up to R-help: https://stat.ethz.ch/ mailman/listinfo/r-help Make sure to set up filters Skim interesting subjects and read them Don’t be afraid to post (use a pseudonym if necessary) Monday, 30 November 2009
  • 13. Read books Phil Spector. Data Manipulation with R. William N. Venables and Brian D. Ripley. Modern Applied Statistics with S. Frank E. Harrell. Regression Modelling Strategies. Jose C. Pinheiro and Douglas M. Bates. Mixed-Effects Models in S and S-Plus. Monday, 30 November 2009
  • 14. Read papers The R Journal: http://journal.r-project.org/ The Journal of Statistical Software: http:// www.jstatsoft.org/ Monday, 30 November 2009
  • 15. Learn your tools • Touch typing • Text editor • Command line • Caffeine • Email Monday, 30 November 2009
  • 16. Professional development The aspects of being a statistician, apart from knowing statistics. Principally communication: written, spoken, visual and electronic. Take every opportunity you can to practice these skills. Monday, 30 November 2009
  • 17. Visual Electronic Written Posters Email Papers Graphics Website Vita/Resume Blog Bibliography Reviews Code Spoken Oral exam Video Teaching Slidecast Short talk Long talk Monday, 30 November 2009
  • 18. Written Particularly important if you want to be an academic, or if you‘re PhD student, or want to become one. “Style: Toward Clarity and Grace” Sign up for the thesis writing workshops when they come around. Develop a regular habit. Monday, 30 November 2009
  • 19. My habit • Roll out of bed at 7am • Boil water • Make tea • Drink tea • Write for an hour Monday, 30 November 2009
  • 20. Spoken Seize every opportunity to practice. Make use of Tracy Volz - tmvolz@rice.edu. She is a fantastic resource - if you had to pay for her, you wouldn’t be able to afford it. Monday, 30 November 2009
  • 22. 1200 1000 800 value 600 unread read 400 200 0 2007 2008 2009 2010 265,000 emails 134,000 unread! Monday, 30 November 2009
  • 23. 1200 1000 800 value 600 unread read 400 200 0 2007 2008 2009 2010 Monday, 30 November 2009
  • 24. 1.0 0.8 0.6 read/all 0.4 0.2 0.0 2007 2008 2009 2010 from Monday, 30 November 2009
  • 25. 350 300 250 value 200 direct sent 150 100 50 2007 2008 2009 2010 Monday, 30 November 2009
  • 26. 350 300 250 value 200 direct sent 150 100 50 2007 2008 2009 2010 Monday, 30 November 2009
  • 27. Inbox Zero http://www.43folders.com/izero Merlin Mann There is no way you will ever be able to respond to — let alone read in exquisite detail — every email you ever receive for the rest of your life. If you take issue with this, just wait six months, because, believe me, we’re all getting a lot more email (and other sundry demands on our attention) every day. What seems like a doddle today is going to get progressively more difficult — even insurmountable — unless you put a realistic system in place now. Monday, 30 November 2009
  • 28. Your time is priceless (and wildly limited) You need an agnostic system for dealing with mail that isn’t based on nonces, exceptions, and guilt. [The] ultimate goal is for you to spend less time playing with your email and more time doing stuff. Monday, 30 November 2009
  • 29. Key concepts Regularly empty your inbox Minimal response Delete, delete, delete Filters Email dashes Monday, 30 November 2009
  • 30. Response does not need to be proportional to request “Do you still need this?” “I don’t know” “Good idea. I’ll add it to my to do list.” “Here’s a link that might be what you’re looking for…” [Delete] http://www.43folders.com/2006/03/13/email-cheats Monday, 30 November 2009
  • 31. Delete! Most minimal response is none. “Just remember that every email you read, re-read, and re-re-re-re-re-read as it sits in that big dumb pile is actually incurring mental debt on your behalf.” Be brutally honest - if you’re not going to do anything with the email delete it now. Monday, 30 November 2009
  • 32. Filters grey mail “noisy, frequent, and non-urgent items which can be dealt with all at a pass and later.” facebook, comments, university/ department memos, newsletters, mailing lists Good catch all: contains unsubscribe http://www.43folders.com/2006/03/13/filters Monday, 30 November 2009
  • 33. 13 00 bannerpcard@rice.edu, carlyn@rice.edu, (5/d /35 ay 00 cchat@rice.edu, cmtcomment@rice.edu, !) giving@rice.edu, payroll@rice.edu, registrar@rice.edu, sandra@rice.edu, sallie@rice.edu subject:(weekly message), alldepts@rice.edu, list:"k2i-members.rice.edu", list:"mailman.rice.edu" allfaculty@stat.rice.edu, faculty@stat.rice.edu, statdept@stat.rice.edu, colloquium@stat.rice.edu, undergrad@stat.rice.edu from:(statements@wageworks.com) from:(TIAA-CREF_eDelivery@tiaa-cref.org) Monday, 30 November 2009
  • 34. Patricia Wallace, a techno-psychologist, believes part of the allure of e-mail— for adults as well as teens—is similar to that of a slot machine. “You have intermittent, variable reinforcement,” she explains. You are not sure you are going to get a reward every time or how often you will, so you keep pulling that handle.” Monday, 30 November 2009
  • 35. Email dashes Don’t have your email open all day. Schedule times when you respond to emails. You can tackle emails a lot faster when you batch them up. Lack self control (like me)? Try an internet blocker: http://macfreedom.com/ http://www.43folders.com/2006/03/15/email-dash Monday, 30 November 2009
  • 36. Feedback http://hadley.wufoo.com/ forms/stat405-final-feedback/ Monday, 30 November 2009