SlideShare a Scribd company logo
R IntroWeek 1Scott Chamberlain[modified from Haldre Rogers]September 9, 2011
Don’t just listen to me! Other Intros to R:http://www.stat.duke.edu/programs/gcc/ResourcesDocuments/RTutorial.pdfhttp://www.cyclismo.org/tutorial/R/http://www.r-tutor.com/r-introductionQuick R: http://www.statmethods.net/http://www.bioconductor.org/help/course-materials/2011/CSAMA/Monday/Morning%20Talks/R_intro.pdf
R user frameworksR from command line: OSX and PCJust type “R” into the command line – and have fun!R itselfhttp://www.r-project.org/RStudio – good choicehttp://www.rstudio.org/RevolutionR [free academic version] – this is sort of the SAS-ised version of Rhttp://www.revolutionanalytics.com/downloads/free-academic.phpUses proprietary .xdf file format that speeds up computation timesMany other ways to use R, including GUIs, other IDEs, and huge variety of text editorshttps://github.com/RatRiceEEB/RIntroCode/wiki/R-ResourcesIf you are afraid of the code interface, use Rattle, or R Commander, or Deducer, or Red RYou can learn using these interfaces what code does what after pressing buttons
R user frameworks, cont.R from PythonRPy: http://rpy.sourceforge.net/C from R: rcpp package:http://cran.r-project.org/web/packages/Rcpp/index.htmlhttp://dirk.eddelbuettel.com/code/rcpp.htmlCan hugely speed up computation times by writing R functions in C language. Then the function calls C to run instead of R.E.g., http://helmingstay.blogspot.com/2011/06/efficient-loops-in-r-complexity-versus.html& http://dirk.eddelbuettel.com/code/rcpp.examples.htmlExcel from RXLConnect package: http://cran.r-project.org/web/packages/XLConnect/index.htmlAnd more….see for yourself
R TipsR can crash  Do not use R’s built in text editor or solely write code in the R console. Instead use any text editor that integrates with R. See here for links: https://github.com/RatRiceEEB/RIntroCode/wiki/R-ResourcesWhen asking for help on listserves/help websites, use BRIEF and  REPRODUCIBLE examplesNot doing this makes people not want to help you!R automatically overwrites files with the same file name!!!!Make sure you want to overwrite a file before doing so
Style
Not this kind of style…
This kind of style!!!
StyleStyle is important so YOU and OTHERS can read your code and actually use itGoogle style guide: http://google-styleguide.googlecode.com/svn/trunk/google-r-style.html#generallayoutHenrik Bengtsson style guide: http://www1.maths.lth.se/help/R/RCC/Hadley Wickham's style guide: https://github.com/hadley/devtools/wiki/Style
Preparing your data for RWhat makes clean data?Correct spellingIdentical capitalization (e.g. Premna vspremna)If myvector <- c(3, 4, 5), calling Myvector does not work!No spaces between words (spaces turned into “.”)Generally try to avoid, use underscores insteadNA or blank (if using csv) for missing valuesFind and replace to get rid of spaces after wordsI generally keep an .xls and a .csv file so you can always recreate work in R with the .csv file and still modify the .xls file
Bringing data into RCreate csv fileOne worksheet onlyNo special formatting, filters, comments etc.Copy only columns and rows with your data to the CSV, as R will read in columns without data sometimesName your variables well self-explanatory, unique, lowercase, short-ish, one-word namesIn R, set the working directorysetwd("/Users/ScottMac/Dropbox/R Group/Week1_R-Intro")What is the working directory? getwd()What is in the working directory? dir()Read in dataCSV files: iris.df <- read.csv("iris_df.csv", header=T)Clipboard: read.csv("clipboard")- reads in file like cutting and pasting itFrom web: read.csv("http://explore.data.gov/download/pwaj-zn2n/CSV")From excel files: (using the XLConnect package)iris.df <- readWorksheetFromFile("/Users/ScottMac/Dropbox/R Group/Week1_R-Intro/iris_df.xlsx", sheet=“Sheet1”)Write datawrite.csv(dataframe, “dataframename.csv”), ORsave(iris, “iris.RData”) [and load(“iris.RData”) to open in R]
R data structuresScalar:Object with a single value, either numeric or characterVector:Sequence of any values, including numeric, character, and NAList:Arbitrary collections of variables – very useful R objectCharacter:Text, e.g., “this is some text”Factor:Like character vectors, but only w/ values in predefined “levels”Matrix:Only numeric values allowedDataframe: Each column can be of a different classImmutable dataframe: special dataframe used in plyr package for faster dataframe manipulation, it references the original dataframe for faster calculationsFunctionEnvironment
Exploring dataframesstr(dataframe) gives column formats and dimensionshead(dataframe) and tail() give first and last 6 rowsnames(dataframe) gives column namesrow.names(dataframe) gives row namesattributes(dataframe) gives column and row names and object classsummary(dataframe) gives a lot of good informationMake sure variables are appropriate formCharacter/string, Numeric, Factor, Integer, logicalMake sure mins, maxs, means, etc. seem rightMake sure you don’t have typing errors so Premna and premna are two separate factorsUse: unique(iris$species) to see what all unique values of a column areOr use: levels(spider$species) to see different levels
To attach or not to attach…that is the questionSome like to use ‘attach’ to make dataframe variables accessible by name within the R session Generally, ‘attach’ is frowned upon by R junkies.  Use dataframe$y, or data=dataframe, or dataframe[,”y”], or dataframe[, 2]To detach the object, use: detach()  I recommend: do not use attach, but do what you want
R Packages3,262 packages!!!!Packages are extensions written by anyone for any purpose, usually loaded by:install.packages(”packagename”), thenrequire(packagename) or library()Use ?functionname for help on any function in base R or in R packagesIn RStudio, just press tab when in parentheses after the function name to see function options!!!Explore packages at the CRAN site:http://cran.r-project.org/web/packages/Inside-R package reference: http://www.inside-r.org/packages
Data manipulationPackages: plyr, data.table, doBY, sqldf, reshape2, and moreComparison of packagesModified from code from Recipes, scripts and Genomics blog: https://gist.github.com/878919data.table is by far the fastest!!! BUT, ease of use and flexibility may be plyr? See for yourself…Also, see examples in the tutorial code for reshape2 package for neat data manipulation tricks
VisualizationsA few different approaches:Base graphicsLattice graphicsGrid graphicsggplot2 graphicsFurther reading: http://www.slideshare.net/dataspora/a-survey-of-r-graphicsAn example:
more on ggplot2 graphicsThere are classes taught by Hadley Wickham here at Rice if you want to learn more!Data visualization (Stat645): http://had.co.nz/stat645/Statistical computing (Stat405): http://had.co.nz/stat405/Hadley’s website is really helpful: http://had.co.nz/ggplot2/The ggplot2 google groups site: https://groups.google.com/forum/#!forum/ggplot2
QUICK RSTUDIO RUN THROUGHKeyboard shortcuts!!http://www.rstudio.org/docs/using/keyboard_shortcuts
USE CASE HERE[see intro_usecase.R file]

More Related Content

PPT
Python Pandas
PPTX
Getting Started with R
PDF
Introduction to R Programming
PDF
DATA VISUALIZATION USING MATPLOTLIB (PYTHON)
PDF
Introduction to R and R Studio
PDF
Algorithms Lecture 7: Graph Algorithms
PDF
Introduction to R - from Rstudio to ggplot
PPTX
Step By Step Guide to Learn R
Python Pandas
Getting Started with R
Introduction to R Programming
DATA VISUALIZATION USING MATPLOTLIB (PYTHON)
Introduction to R and R Studio
Algorithms Lecture 7: Graph Algorithms
Introduction to R - from Rstudio to ggplot
Step By Step Guide to Learn R

What's hot (20)

PPTX
Introduction to pandas
PDF
Big Data - Analytics with R
PPTX
PDF
8. R Graphics with R
 
PDF
pandas - Python Data Analysis
PPTX
Exploring Data
PPTX
Unit 1 - R Programming (Part 2).pptx
PPTX
Machine Learning Tutorial Part - 2 | Machine Learning Tutorial For Beginners ...
PDF
Introduction to R Graphics with ggplot2
PDF
Lecture Notes-Finite State Automata for NLP.pdf
PDF
Syntactic analysis in NLP
PPT
tkinter final ppt.ppt
PPTX
Data visualization using R
PPTX
Data Analysis with Python Pandas
PDF
Similarity-based retrieval of multimedia content
PPTX
NLP_KASHK:Text Normalization
PDF
Reinforcement Learning Tutorial | Edureka
PPTX
Kruskal Algorithm
PDF
02 problem solving_search_control
PDF
What is Machine Learning | Introduction to Machine Learning | Machine Learnin...
Introduction to pandas
Big Data - Analytics with R
8. R Graphics with R
 
pandas - Python Data Analysis
Exploring Data
Unit 1 - R Programming (Part 2).pptx
Machine Learning Tutorial Part - 2 | Machine Learning Tutorial For Beginners ...
Introduction to R Graphics with ggplot2
Lecture Notes-Finite State Automata for NLP.pdf
Syntactic analysis in NLP
tkinter final ppt.ppt
Data visualization using R
Data Analysis with Python Pandas
Similarity-based retrieval of multimedia content
NLP_KASHK:Text Normalization
Reinforcement Learning Tutorial | Edureka
Kruskal Algorithm
02 problem solving_search_control
What is Machine Learning | Introduction to Machine Learning | Machine Learnin...
Ad

Viewers also liked (20)

PPTX
R language tutorial
PPTX
Why R? A Brief Introduction to the Open Source Statistics Platform
PPTX
R programming
PDF
R programming Basic & Advanced
PPTX
An Interactive Introduction To R (Programming Language For Statistics)
PDF
R learning by examples
PDF
Class ppt intro to r
PDF
R presentation
PPTX
R programming language
PPT
Rtutorial
PDF
Introduction to R
PDF
2 R Tutorial Programming
PDF
Introduction to R
PDF
1 R Tutorial Introduction
PDF
Intro to RStudio
PDF
R tutorial
KEY
Presentation R basic teaching module
PPTX
Data analysis with R
PDF
Introduction to the R Statistical Computing Environment
PDF
Introduction to R for Data Science :: Session 7 [Multiple Linear Regression i...
R language tutorial
Why R? A Brief Introduction to the Open Source Statistics Platform
R programming
R programming Basic & Advanced
An Interactive Introduction To R (Programming Language For Statistics)
R learning by examples
Class ppt intro to r
R presentation
R programming language
Rtutorial
Introduction to R
2 R Tutorial Programming
Introduction to R
1 R Tutorial Introduction
Intro to RStudio
R tutorial
Presentation R basic teaching module
Data analysis with R
Introduction to the R Statistical Computing Environment
Introduction to R for Data Science :: Session 7 [Multiple Linear Regression i...
Ad

Similar to R Introduction (20)

PDF
Language-agnostic data analysis workflows and reproducible research
PPTX
AWSM packages and code script awsm1c2.pptx
PPTX
BUSINESS ANALYTICS WITH R SOFTWARE DIAST
PPTX
RPreliminariesdsjhfsdsfhjshfjsdhjfhjfhdfjhf
PPTX
RPreliminariesdsjhfsdsfhjshfjsdhjfhjfhdfjhf
PDF
Devtools cheatsheet
PDF
Devtools cheatsheet
PDF
R Traning-Session-I 21-23 May 2025 Updated Alpha.pdf
PPTX
PDF
Data Science - Part II - Working with R & R studio
PPTX
Reproducible research (and literate programming) in R
PPT
Basics R.ppt
PPT
Basics.pptNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
PPT
PDF
Reading Data into R REVISED
PDF
Introduction to Data Mining with R and Data Import/Export in R
PPTX
Workshop presentation hands on r programming
PPT
r,rstats,r language,r packages
PDF
Introduction to R software, by Leire ibaibarriaga
PPTX
Introduction to r
Language-agnostic data analysis workflows and reproducible research
AWSM packages and code script awsm1c2.pptx
BUSINESS ANALYTICS WITH R SOFTWARE DIAST
RPreliminariesdsjhfsdsfhjshfjsdhjfhjfhdfjhf
RPreliminariesdsjhfsdsfhjshfjsdhjfhjfhdfjhf
Devtools cheatsheet
Devtools cheatsheet
R Traning-Session-I 21-23 May 2025 Updated Alpha.pdf
Data Science - Part II - Working with R & R studio
Reproducible research (and literate programming) in R
Basics R.ppt
Basics.pptNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
Reading Data into R REVISED
Introduction to Data Mining with R and Data Import/Export in R
Workshop presentation hands on r programming
r,rstats,r language,r packages
Introduction to R software, by Leire ibaibarriaga
Introduction to r

More from schamber (6)

PDF
Poster
PDF
Poster
PPTX
Chamberlain PhD Thesis
PPT
Phylogenetics in R
PPTX
Web data from R
PDF
regex-presentation_ed_goodwin
Poster
Poster
Chamberlain PhD Thesis
Phylogenetics in R
Web data from R
regex-presentation_ed_goodwin

Recently uploaded (20)

PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
Network Security Unit 5.pdf for BCA BBA.
PPTX
Big Data Technologies - Introduction.pptx
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
KodekX | Application Modernization Development
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Electronic commerce courselecture one. Pdf
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PPTX
Spectroscopy.pptx food analysis technology
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Approach and Philosophy of On baking technology
PDF
Empathic Computing: Creating Shared Understanding
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Network Security Unit 5.pdf for BCA BBA.
Big Data Technologies - Introduction.pptx
20250228 LYD VKU AI Blended-Learning.pptx
KodekX | Application Modernization Development
Reach Out and Touch Someone: Haptics and Empathic Computing
Encapsulation_ Review paper, used for researhc scholars
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Electronic commerce courselecture one. Pdf
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Digital-Transformation-Roadmap-for-Companies.pptx
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Spectroscopy.pptx food analysis technology
Programs and apps: productivity, graphics, security and other tools
Approach and Philosophy of On baking technology
Empathic Computing: Creating Shared Understanding
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Dropbox Q2 2025 Financial Results & Investor Presentation
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Advanced methodologies resolving dimensionality complications for autism neur...

R Introduction

  • 1. R IntroWeek 1Scott Chamberlain[modified from Haldre Rogers]September 9, 2011
  • 2. Don’t just listen to me! Other Intros to R:http://www.stat.duke.edu/programs/gcc/ResourcesDocuments/RTutorial.pdfhttp://www.cyclismo.org/tutorial/R/http://www.r-tutor.com/r-introductionQuick R: http://www.statmethods.net/http://www.bioconductor.org/help/course-materials/2011/CSAMA/Monday/Morning%20Talks/R_intro.pdf
  • 3. R user frameworksR from command line: OSX and PCJust type “R” into the command line – and have fun!R itselfhttp://www.r-project.org/RStudio – good choicehttp://www.rstudio.org/RevolutionR [free academic version] – this is sort of the SAS-ised version of Rhttp://www.revolutionanalytics.com/downloads/free-academic.phpUses proprietary .xdf file format that speeds up computation timesMany other ways to use R, including GUIs, other IDEs, and huge variety of text editorshttps://github.com/RatRiceEEB/RIntroCode/wiki/R-ResourcesIf you are afraid of the code interface, use Rattle, or R Commander, or Deducer, or Red RYou can learn using these interfaces what code does what after pressing buttons
  • 4. R user frameworks, cont.R from PythonRPy: http://rpy.sourceforge.net/C from R: rcpp package:http://cran.r-project.org/web/packages/Rcpp/index.htmlhttp://dirk.eddelbuettel.com/code/rcpp.htmlCan hugely speed up computation times by writing R functions in C language. Then the function calls C to run instead of R.E.g., http://helmingstay.blogspot.com/2011/06/efficient-loops-in-r-complexity-versus.html& http://dirk.eddelbuettel.com/code/rcpp.examples.htmlExcel from RXLConnect package: http://cran.r-project.org/web/packages/XLConnect/index.htmlAnd more….see for yourself
  • 5. R TipsR can crash  Do not use R’s built in text editor or solely write code in the R console. Instead use any text editor that integrates with R. See here for links: https://github.com/RatRiceEEB/RIntroCode/wiki/R-ResourcesWhen asking for help on listserves/help websites, use BRIEF and REPRODUCIBLE examplesNot doing this makes people not want to help you!R automatically overwrites files with the same file name!!!!Make sure you want to overwrite a file before doing so
  • 7. Not this kind of style…
  • 8. This kind of style!!!
  • 9. StyleStyle is important so YOU and OTHERS can read your code and actually use itGoogle style guide: http://google-styleguide.googlecode.com/svn/trunk/google-r-style.html#generallayoutHenrik Bengtsson style guide: http://www1.maths.lth.se/help/R/RCC/Hadley Wickham's style guide: https://github.com/hadley/devtools/wiki/Style
  • 10. Preparing your data for RWhat makes clean data?Correct spellingIdentical capitalization (e.g. Premna vspremna)If myvector <- c(3, 4, 5), calling Myvector does not work!No spaces between words (spaces turned into “.”)Generally try to avoid, use underscores insteadNA or blank (if using csv) for missing valuesFind and replace to get rid of spaces after wordsI generally keep an .xls and a .csv file so you can always recreate work in R with the .csv file and still modify the .xls file
  • 11. Bringing data into RCreate csv fileOne worksheet onlyNo special formatting, filters, comments etc.Copy only columns and rows with your data to the CSV, as R will read in columns without data sometimesName your variables well self-explanatory, unique, lowercase, short-ish, one-word namesIn R, set the working directorysetwd("/Users/ScottMac/Dropbox/R Group/Week1_R-Intro")What is the working directory? getwd()What is in the working directory? dir()Read in dataCSV files: iris.df <- read.csv("iris_df.csv", header=T)Clipboard: read.csv("clipboard")- reads in file like cutting and pasting itFrom web: read.csv("http://explore.data.gov/download/pwaj-zn2n/CSV")From excel files: (using the XLConnect package)iris.df <- readWorksheetFromFile("/Users/ScottMac/Dropbox/R Group/Week1_R-Intro/iris_df.xlsx", sheet=“Sheet1”)Write datawrite.csv(dataframe, “dataframename.csv”), ORsave(iris, “iris.RData”) [and load(“iris.RData”) to open in R]
  • 12. R data structuresScalar:Object with a single value, either numeric or characterVector:Sequence of any values, including numeric, character, and NAList:Arbitrary collections of variables – very useful R objectCharacter:Text, e.g., “this is some text”Factor:Like character vectors, but only w/ values in predefined “levels”Matrix:Only numeric values allowedDataframe: Each column can be of a different classImmutable dataframe: special dataframe used in plyr package for faster dataframe manipulation, it references the original dataframe for faster calculationsFunctionEnvironment
  • 13. Exploring dataframesstr(dataframe) gives column formats and dimensionshead(dataframe) and tail() give first and last 6 rowsnames(dataframe) gives column namesrow.names(dataframe) gives row namesattributes(dataframe) gives column and row names and object classsummary(dataframe) gives a lot of good informationMake sure variables are appropriate formCharacter/string, Numeric, Factor, Integer, logicalMake sure mins, maxs, means, etc. seem rightMake sure you don’t have typing errors so Premna and premna are two separate factorsUse: unique(iris$species) to see what all unique values of a column areOr use: levels(spider$species) to see different levels
  • 14. To attach or not to attach…that is the questionSome like to use ‘attach’ to make dataframe variables accessible by name within the R session Generally, ‘attach’ is frowned upon by R junkies. Use dataframe$y, or data=dataframe, or dataframe[,”y”], or dataframe[, 2]To detach the object, use: detach()  I recommend: do not use attach, but do what you want
  • 15. R Packages3,262 packages!!!!Packages are extensions written by anyone for any purpose, usually loaded by:install.packages(”packagename”), thenrequire(packagename) or library()Use ?functionname for help on any function in base R or in R packagesIn RStudio, just press tab when in parentheses after the function name to see function options!!!Explore packages at the CRAN site:http://cran.r-project.org/web/packages/Inside-R package reference: http://www.inside-r.org/packages
  • 16. Data manipulationPackages: plyr, data.table, doBY, sqldf, reshape2, and moreComparison of packagesModified from code from Recipes, scripts and Genomics blog: https://gist.github.com/878919data.table is by far the fastest!!! BUT, ease of use and flexibility may be plyr? See for yourself…Also, see examples in the tutorial code for reshape2 package for neat data manipulation tricks
  • 17. VisualizationsA few different approaches:Base graphicsLattice graphicsGrid graphicsggplot2 graphicsFurther reading: http://www.slideshare.net/dataspora/a-survey-of-r-graphicsAn example:
  • 18. more on ggplot2 graphicsThere are classes taught by Hadley Wickham here at Rice if you want to learn more!Data visualization (Stat645): http://had.co.nz/stat645/Statistical computing (Stat405): http://had.co.nz/stat405/Hadley’s website is really helpful: http://had.co.nz/ggplot2/The ggplot2 google groups site: https://groups.google.com/forum/#!forum/ggplot2
  • 19. QUICK RSTUDIO RUN THROUGHKeyboard shortcuts!!http://www.rstudio.org/docs/using/keyboard_shortcuts
  • 20. USE CASE HERE[see intro_usecase.R file]

Editor's Notes

  • #12: Header=T means first row contains variable names
  • #14: Some numbers are actually factors- think of 0/1 for dead/alive or zipcodes (average zipcode?)