SlideShare a Scribd company logo
1https://xkcd.com/1478/
Introducing the R software: Free
statistics at your fingertips
Kamarul Imran Musa
MD, M Community Medicine
Associate Professor (Epidemiology and Statistics)
Dept of Community Medicine, School of Medical Sciences,
Universiti Sains Malaysia, Health Campus
Email: drki.musa@gmail.com
2
Overview of presentation
• A bit on ‘Data’ and people dealing with ‘Data’
• Statistical software – choices
• Our main course ---- R -----
• Different flavors of R
• Our experiences with R at Health Campus
• Data analysis – now and future
3
Data as for now … Data in future?
• What is data?
– Facts and statistics collected together for reference and analysis
4
5
https://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century/
(Harvard Business Review)
Scientist and Data
• Are you a scientist?
• Do we work with data?
• Scientist + Data
• Are we data scientist?
• What is new for data science in health and medicine?
6
7
What does a data scientist do?
• Data scientists are inquisitive:
8
http://www-01.ibm.com/software/data/infosphere/data-
scientist/
Scientists use tools to work with
• We need tools in science
• What is the right tool for a data scientist?
• In my case, tools to deal with data
– Scientists use tool or tools to ‘manipulate’ data giving them results that
they have to make sense of the findings
• Which tools are available to help scientist best work with their
data?
9
Choices of statistical software – many. So don’t get
spoiled
• The normal questions for scientist dealing with data analysis
– What choices do you have?
– Which one are you familiar with?
– Popularity?
– Cost?
– Capability?
– After-sale support?
– Meet scientific rigor?
10
IBM SPSS – everyone knows
• Popular, easy and user-friendly
• How about the cost? When does the license expire?
Usually for USM, every July.
• What does it do when it expires? NOTHING works
• http://www.ibm.com/marketplace/cloud/spss-
statistics/us/en-us?step=Plan
11
STATA – less people know it, but it is amazing
• Do not expire but upgradeable
• Much cheaper than SPSS
• Balanced between
– Codes use
– Point-and-click use
• Powerful
• http://www.stata.com/order/new/edu/single-user-licenses/
12
Who are using what software?
• Number of scholarly articles found in the most recent complete
year (2014) for each software package.
• In order of # of articles:
1. SPSS
2. SAS
3. R
4. STATA
• http://r4stats.com/articles/popularity/
13
The number of scholarly articles found in each year by
Google Scholar.
Only the top six “classic” statistics packages are shown.
14
No so new kid on the block – R
15
What (almost) everybody knows about R?
• R is :
– ‘GNU S’, a freely available language and environment for
statistical computing and graphics which provides a wide
variety of statistical and graphical techniques: linear and
nonlinear modelling, statistical tests, time series
analysis, classification, clustering, etc.
• Questions:
1. What can R do?
2. What is special about R?
3. Does R have future?
16
R and R-gui
• https://cran.r-project.org/
17
Revolution-R
• Microsoft owns Revolution-R
http://www.revolutionanalytics.com/revolution-r-enterprise 18
Revolution R Ent and Revolution R Open
19
Rstudio IDE
• https://www.rstudio.com/
• Highly recommend to start
with Rstudio IDE
• It is an interface for R
• Requires users to download
and install R first from CRAN
20
RStudio IDE- Features
• Clean interface
• Organized
• Integrated with many brilliant in-built
tools
21
DEMO
22
How does R fit into data analysis now and in the
future?
23
Recognition
24
Reproducibility (DEMO)
25
Reproducibility
• Reproducibility in research
• The Associate Editor for reproducibility (AER) will handle
submissions of reproducible articles.
– Data: The analytic data from which the principal results were derived are
made available on the journal's Web site.
– Code: Any computer code, software, or other computer instructions that
were used to compute published results are provided.
– Reproducible: An article is designated as reproducible if the AER succeeds
in executing the code on the data provided and produces results matching
those that the authors claim are reproducible.
– http://biostatistics.oxfordjournals.org/content/10/3/405.full
26
On the fly report using R-markdown (DEMO)
• Produce report on a fly
• In HTML or PDF formats
• Benefits
– Save time
– Reduce error – no more copy paste
– Pretty
27
Integration with other software (DEMO)
• Latex
• Stata
• WinBUGS
• SPSS
• SAS
28
Our experience with R
• No experience with undergraduate
• Started teaching R for DrPH candidates this academic session
• Personally introduced R, 2 years ago
• Common resistance
– Totally command-driven
– Steep learning curve
– Limited resources esp books on R --- that was 2 years ago. Not a problem
now
– You need to know your statistics
– Not for data entry
– Very difficult to view and manipulate variables
29
How’s the feedback from users?
• No formal study or assessment on their experience
• Users seem to like R because it opens up creativity
• R pushes users to explore more and challenge themselves
• R is not boring like point-and-click (menu driven) software
• They seem to like R-markdown
– On-the-fly report
30
The BIG question--- stick to R? .. And R only?
• Yes, you may
• Hmm, maybe not
– Specialized software for data entry
– Software for data cleaning
– Software for data mining
• But yes, 1 software is enough for 95% of us
31
Embrace R and abandon others?
• I love R
– Lots of data analysis – Creating publication : HTML, PDF
– Spatial data analysis
– Bayesian
• WINBUGS
• INLA
• But I do love Stata too
– Data cleaning
– Variable manipulation
• And I use Epidata for data entry
• But yes, I have left SPSS
32
• East-coast data science user group
• My blog :
– https://designdataanalysis.wordpress.com
33

More Related Content

PDF
Class ppt intro to r
PPTX
How to get started with R programming
PPTX
Introduction to R programming
PPTX
R programming
PDF
The History and Use of R
PPTX
R Introduction
PDF
Introduction to R
PPTX
R for data analytics
Class ppt intro to r
How to get started with R programming
Introduction to R programming
R programming
The History and Use of R
R Introduction
Introduction to R
R for data analytics

What's hot (20)

PDF
2 it unit-1 start learning r
PDF
Introduction to R software, by Leire ibaibarriaga
PDF
Webinar : Introduction to R Programming and Machine Learning
PPTX
Introduction to statistical software R
PDF
R Programming Overview
PPTX
R programming
PPT
R programming
PPTX
PPTX
R language tutorial
PDF
A short tutorial on r
PDF
R crash course
PPTX
LSESU a Taste of R Language Workshop
PPTX
Big data analytics using R
PPTX
Why R? A Brief Introduction to the Open Source Statistics Platform
PPTX
R programming
PDF
1.3 introduction to R language, importing dataset in r, data exploration in r
PDF
Intro to R statistic programming
PPT
R programming slides
PDF
Introtor
PPTX
Reason To learn & use r
2 it unit-1 start learning r
Introduction to R software, by Leire ibaibarriaga
Webinar : Introduction to R Programming and Machine Learning
Introduction to statistical software R
R Programming Overview
R programming
R programming
R language tutorial
A short tutorial on r
R crash course
LSESU a Taste of R Language Workshop
Big data analytics using R
Why R? A Brief Introduction to the Open Source Statistics Platform
R programming
1.3 introduction to R language, importing dataset in r, data exploration in r
Intro to R statistic programming
R programming slides
Introtor
Reason To learn & use r
Ad

Similar to Introducing The R Software (20)

PPTX
Statistical software packages ,their layout & applications
PPTX
Data Processing DOH Workshop.pptx
PPTX
R programming for psychometrics
PDF
What is The Importance of SPSS How Will I Get SPSS help online in Australia.pdf
PPTX
SEO Asset (PPT) Comparing Python, R, and SAS Overcoming Training Data Set Cha...
PPTX
Comparing Python, R, and SAS Overcoming Training Data Set Challenges.pptx
PDF
Introduction to Computational Statistics
PPTX
UCL’s research IT management systems architecture review aligned with Open Sc...
PPTX
Open Source and Science at the National Science Foundation (NSF)
PPTX
R and Rcmdr Statistical Software
PDF
GNU R in Clinical Research and Evidence-Based Medicine
PDF
SEO Asset (PDF) Comparing Python, R, and SAS Overcoming Training Data Set Cha...
PPTX
Spss and software
PPTX
softwares in public health
PPTX
20160607 citation4software opening
PDF
How to be data savvy manager
PPT
R programming
PPTX
Data Science.pptx NEW COURICUUMN IN DATA
PPTX
Software Professionals (RSEs) at NCSA
PDF
RES814 U1 Individual Project
Statistical software packages ,their layout & applications
Data Processing DOH Workshop.pptx
R programming for psychometrics
What is The Importance of SPSS How Will I Get SPSS help online in Australia.pdf
SEO Asset (PPT) Comparing Python, R, and SAS Overcoming Training Data Set Cha...
Comparing Python, R, and SAS Overcoming Training Data Set Challenges.pptx
Introduction to Computational Statistics
UCL’s research IT management systems architecture review aligned with Open Sc...
Open Source and Science at the National Science Foundation (NSF)
R and Rcmdr Statistical Software
GNU R in Clinical Research and Evidence-Based Medicine
SEO Asset (PDF) Comparing Python, R, and SAS Overcoming Training Data Set Cha...
Spss and software
softwares in public health
20160607 citation4software opening
How to be data savvy manager
R programming
Data Science.pptx NEW COURICUUMN IN DATA
Software Professionals (RSEs) at NCSA
RES814 U1 Individual Project
Ad

Recently uploaded (20)

PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PPT
Quality review (1)_presentation of this 21
PPTX
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PPTX
Computer network topology notes for revision
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PDF
Lecture1 pattern recognition............
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PPTX
Introduction to Knowledge Engineering Part 1
PDF
annual-report-2024-2025 original latest.
PPTX
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
PDF
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
PDF
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
Quality review (1)_presentation of this 21
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
oil_refinery_comprehensive_20250804084928 (1).pptx
Data_Analytics_and_PowerBI_Presentation.pptx
IBA_Chapter_11_Slides_Final_Accessible.pptx
Computer network topology notes for revision
Miokarditis (Inflamasi pada Otot Jantung)
Business Ppt On Nestle.pptx huunnnhhgfvu
Galatica Smart Energy Infrastructure Startup Pitch Deck
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
Lecture1 pattern recognition............
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
STUDY DESIGN details- Lt Col Maksud (21).pptx
Acceptance and paychological effects of mandatory extra coach I classes.pptx
Introduction to Knowledge Engineering Part 1
annual-report-2024-2025 original latest.
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf

Introducing The R Software

  • 2. Introducing the R software: Free statistics at your fingertips Kamarul Imran Musa MD, M Community Medicine Associate Professor (Epidemiology and Statistics) Dept of Community Medicine, School of Medical Sciences, Universiti Sains Malaysia, Health Campus Email: drki.musa@gmail.com 2
  • 3. Overview of presentation • A bit on ‘Data’ and people dealing with ‘Data’ • Statistical software – choices • Our main course ---- R ----- • Different flavors of R • Our experiences with R at Health Campus • Data analysis – now and future 3
  • 4. Data as for now … Data in future? • What is data? – Facts and statistics collected together for reference and analysis 4
  • 6. Scientist and Data • Are you a scientist? • Do we work with data? • Scientist + Data • Are we data scientist? • What is new for data science in health and medicine? 6
  • 7. 7
  • 8. What does a data scientist do? • Data scientists are inquisitive: 8 http://www-01.ibm.com/software/data/infosphere/data- scientist/
  • 9. Scientists use tools to work with • We need tools in science • What is the right tool for a data scientist? • In my case, tools to deal with data – Scientists use tool or tools to ‘manipulate’ data giving them results that they have to make sense of the findings • Which tools are available to help scientist best work with their data? 9
  • 10. Choices of statistical software – many. So don’t get spoiled • The normal questions for scientist dealing with data analysis – What choices do you have? – Which one are you familiar with? – Popularity? – Cost? – Capability? – After-sale support? – Meet scientific rigor? 10
  • 11. IBM SPSS – everyone knows • Popular, easy and user-friendly • How about the cost? When does the license expire? Usually for USM, every July. • What does it do when it expires? NOTHING works • http://www.ibm.com/marketplace/cloud/spss- statistics/us/en-us?step=Plan 11
  • 12. STATA – less people know it, but it is amazing • Do not expire but upgradeable • Much cheaper than SPSS • Balanced between – Codes use – Point-and-click use • Powerful • http://www.stata.com/order/new/edu/single-user-licenses/ 12
  • 13. Who are using what software? • Number of scholarly articles found in the most recent complete year (2014) for each software package. • In order of # of articles: 1. SPSS 2. SAS 3. R 4. STATA • http://r4stats.com/articles/popularity/ 13
  • 14. The number of scholarly articles found in each year by Google Scholar. Only the top six “classic” statistics packages are shown. 14
  • 15. No so new kid on the block – R 15
  • 16. What (almost) everybody knows about R? • R is : – ‘GNU S’, a freely available language and environment for statistical computing and graphics which provides a wide variety of statistical and graphical techniques: linear and nonlinear modelling, statistical tests, time series analysis, classification, clustering, etc. • Questions: 1. What can R do? 2. What is special about R? 3. Does R have future? 16
  • 17. R and R-gui • https://cran.r-project.org/ 17
  • 18. Revolution-R • Microsoft owns Revolution-R http://www.revolutionanalytics.com/revolution-r-enterprise 18
  • 19. Revolution R Ent and Revolution R Open 19
  • 20. Rstudio IDE • https://www.rstudio.com/ • Highly recommend to start with Rstudio IDE • It is an interface for R • Requires users to download and install R first from CRAN 20
  • 21. RStudio IDE- Features • Clean interface • Organized • Integrated with many brilliant in-built tools 21
  • 23. How does R fit into data analysis now and in the future? 23
  • 26. Reproducibility • Reproducibility in research • The Associate Editor for reproducibility (AER) will handle submissions of reproducible articles. – Data: The analytic data from which the principal results were derived are made available on the journal's Web site. – Code: Any computer code, software, or other computer instructions that were used to compute published results are provided. – Reproducible: An article is designated as reproducible if the AER succeeds in executing the code on the data provided and produces results matching those that the authors claim are reproducible. – http://biostatistics.oxfordjournals.org/content/10/3/405.full 26
  • 27. On the fly report using R-markdown (DEMO) • Produce report on a fly • In HTML or PDF formats • Benefits – Save time – Reduce error – no more copy paste – Pretty 27
  • 28. Integration with other software (DEMO) • Latex • Stata • WinBUGS • SPSS • SAS 28
  • 29. Our experience with R • No experience with undergraduate • Started teaching R for DrPH candidates this academic session • Personally introduced R, 2 years ago • Common resistance – Totally command-driven – Steep learning curve – Limited resources esp books on R --- that was 2 years ago. Not a problem now – You need to know your statistics – Not for data entry – Very difficult to view and manipulate variables 29
  • 30. How’s the feedback from users? • No formal study or assessment on their experience • Users seem to like R because it opens up creativity • R pushes users to explore more and challenge themselves • R is not boring like point-and-click (menu driven) software • They seem to like R-markdown – On-the-fly report 30
  • 31. The BIG question--- stick to R? .. And R only? • Yes, you may • Hmm, maybe not – Specialized software for data entry – Software for data cleaning – Software for data mining • But yes, 1 software is enough for 95% of us 31
  • 32. Embrace R and abandon others? • I love R – Lots of data analysis – Creating publication : HTML, PDF – Spatial data analysis – Bayesian • WINBUGS • INLA • But I do love Stata too – Data cleaning – Variable manipulation • And I use Epidata for data entry • But yes, I have left SPSS 32
  • 33. • East-coast data science user group • My blog : – https://designdataanalysis.wordpress.com 33