SlideShare a Scribd company logo
David Chiu
R Language Tutorial
14/23/2013 Confidential | Copyright 2013 Trend Micro Inc.
Background of R
4/23/2013 2Confidential | Copyright 2012 Trend Micro Inc.
What is R?
• GNU Project Developed by John Chambers @ Bell Lab
• Free software environment for statistical computing and graphics
• Functional programming language written primarily in C, Fortran
4/23/2013 3Confidential | Copyright 2012 Trend Micro Inc.
R Language
• R is functional programming language
• R is an interpreted language
• R is object oriented-language
Why Using R
• Statistic analysis on the fly
• Mathematical function and graphic module embedded
• FREE! & Open Source!
– http://cran.r-project.org/src/base/
Kaggle
http://www.kaggle.com/
R is the most widely language used by
kaggle participants
Data Scientist of these Companies Using R
What is your programming language of
choice, R, Python or something else?
“I use R, and occasionally matlab, for data analysis. There is
a large, active and extremely knowledgeable R community at
Google.”
http://simplystatistics.org/2013/02/15/interview-with-nick-chamandy-statistician-at-google/
4/23/2013 7Confidential | Copyright 2013 Trend Micro Inc.
“Expert knowledge of SAS (With Enterprise
Guide/Miner) required and candidates with
strong knowledge of R will be preferred”
http://www.kdnuggets.com/jobs/13/03-29-apple-sr-data-
scientist.html?utm_source=twitterfeed&utm_medium=facebook&utm_campaign=t
fb&utm_content=FaceBook&utm_term=analytics#.UVXibgXOpfc.facebook
Commercial support for R
• In 2007, Revolution Analytics providea commercial support for
Revolution R
– http://www.revolutionanalytics.com/products/revolution-r.php
– http://www.revolutionanalytics.com/why-revolution-r/which-r-is-right-for-me.php
• Big Data Appliance, which integrates R, Apache Hadoop, Oracle
Enterprise Linux, and a NoSQL database with the
Exadata hardware
– http://www.oracle.com/us/products/database/big-data-
appliance/overview/index.html
Revolotion R
• Free for Community Version
– http://www.revolutionanalytics.com/downloads/
– http://www.revolutionanalytics.com/why-revolution-r/benchmarks.php
4/23/2013 9Confidential | Copyright 2013 Trend Micro Inc.
Base R 2.14.2
64
Revolution R
(1-core)
Revolution R
(4-core)
Speedup (4 core)
Matrix
Calculation
17.4 sec 2.9 sec 2.0 sec 7.9x
Matrix Functions 10.3 sec 2.0 sec 1.2 sec 7.8x
Program Control 2.7 sec 2.7 sec 2.7 sec Not Appreciable
IDE
R Studio
• http://www.rstudio.com/
4/23/2013 10Confidential | Copyright 2013 Trend Micro Inc.
RGUI
• http://www.r-project.org/
Web App Development
Shiny makes it super simple for R users like you to turn
analyses into interactive web applications that anyone
can use
http://www.rstudio.com/shiny/
4/23/2013 11Confidential | Copyright 2013 Trend Micro Inc.
Package Management
• CRAN (Comprehensive R Archive Network)
4/23/2013 12Confidential | Copyright 2013 Trend Micro Inc.
Repository URL
CRAN http://cran.r-project.org/web/packages/
Bioconductor http://www.bioconductor.org/packages/release/Software.html
R-Forge http://r-forge.r-project.org/
R Basic
4/23/2013 13Confidential | Copyright 2012 Trend Micro Inc.
Basic Command
• help()
– help(demo)
• demo()
– demo(is.things)
• q()
• ls()
• rm()
– rm(x)
4/23/2013 14Confidential | Copyright 2013 Trend Micro Inc.
Basic Object
• Vector
• List
• Factor
• Array
• Matrix
• Data Frame
4/23/2013 15Confidential | Copyright 2013 Trend Micro Inc.
Objects & Arithmetic
• Scalar
– x=3; y<-5; x+y
• Vectors
– x = c(1,2,3, 7); y= c(2,3,5,1); x+y; x*y; x – y; x/y;
– x =seq(1,10); y= 2:11; x+y
– x =seq(1,10,by=2); y =seq(1,10,length=2)
– rep(c(5,8), 3)
– x= c(1,2,3); length(x)
4/23/2013 16Confidential | Copyright 2013 Trend Micro Inc.
Summaries and Subscripting
• Summary
– X = c(1,2,3,4,5,6,7,8,9,10)
– mean(x), min(x), median(x), max(x), var(x)
– summary(x)
• Subscripting
– x = c(1,2,3,4,5,6,7,8,9,10)
– x[1:3]; x[c(1,3,5)];
– x[c(1,3,5)] * 2 + x[c(2,2,2)]
– x[-(1:6)]
4/23/2013 17Confidential | Copyright 2013 Trend Micro Inc.
Lists
• Contain a heterogeneous selection of objects
– e <- list(thing="hat", size="8.25"); e
– l <- list(a=1,b=2,c=3,d=4,e=5,f=6,g=7,h=8,i=9,j=10)
– l$j
– man = list(name="Qoo", height=183); man$name
Factor
• Ordered collection of items to present categorical value
• Different values that the factor can take are called levels
• Factors
– phone =
factor(c('iphone', 'htc', 'iphone', 'samsung', 'iphone', 'samsung'))
– levels(phone)
4/23/2013 19Confidential | Copyright 2013 Trend Micro Inc.
Matrices & Array
• Array
– An extension of a vector to more than two dimensions
– a <- array(c(1,2,3,4,5,6,7,8,9,10,11,12),dim=c(3,4))
• Matrices
– A vector to two dimensions – 2d-array
– x = c(1,2,3); y = c(4,5,6); rbind(x,y);cbind(x,y)
– x = rbind(c(1,2,3),c(4,5,6)); dim(x)
– x<-matrix(c(1,2,3,4,5,6),nr=3);
– x<-matrix(c(1,2,3,4,5,6),nrow=3, ,byrow=T)
– x<-matrix(c(1,2,3,4),nr=2);y<-matrix(c(5,6),nr=2); x%*%y
– t(matrix(c(1,2,3,4),nr=2))
– solve(matrix(c(1,2,3,4),nr=2))
Data Frame
• Useful way to represent tabular data
• essentially a matrix with named columns may also
include non-numerical variables
• Example
– df = data.frame(a=c(1,2,3,4,5),b=c(2,3,4,5,6));df
Function
• Function
– `%myop%` <- function(a, b) {2*a + 2*b}; 1 %myop% 1
– f <- function(x) {return(x^2 + 3)}
create.vector.of.ones <- function(n) {
return.vector <- NA;
for (i in 1:n) {
return.vector[i] <- 1;
} return.vector;
}
– create.vector.of.ones(3)
• Control Structures
– If …else…
– Repeat, for, while
• Catch error – trycatch
Anonymous Function
• Functional language Characteristic
– apply.to.three <- function(f) {f(3)}
– apply.to.three(function(x) {x * 7})
Objects and Classes
• All R code manipulates objects.
• Every object in R has a type
• In assignment statements, R will copy the object, not
just the reference to the object Attributes
S3 & S4 Object
• Many R functions were implemented using S3 methods
• In S version 4 (hence S4), formal classes and methods
were introduced that allowed
– Multiple arguments
– Abstract types
– inheritance.
OOP of S4
• S4 OOP Example
– setClass("Student", representation(name =
"character", score="numeric"))
– studenta = new ("Student", name="david", score=80 )
– studentb = new ("Student", name="andy", score=90 )
setMethod("show", signature("Student"),
function(object) {
cat(object@score+100)
})
– setGeneric("getscore", function(object)
standardGeneric("getscore"))
– Studenta
Packages
• A package is a related set of functions, help files, and
data files that have been bundled together.
• Basic Command
– library(rpart)
– CRAN
– Install
– (.packages())
Package used in Machine Learning for
Hackers
4/23/2013 28Confidential | Copyright 2013 Trend Micro Inc.
Apply
• Apply
– Returns a vector or array or list of values obtained by applying a
function to margins of an array or matrix.
– data <- cbind(c(1,2),c(3,4))
– data.rowsum <- apply(data,1,sum)
– data.colsum <- apply(data,2,sum)
– data
4/23/2013 29Confidential | Copyright 2013 Trend Micro Inc.
Apply
• lapply
– returns a list of the same length as X, each element of which is
the result of applying FUN to the corresponding element of X.
• sapply
– is a user-friendly version and wrapper of lapply by default
returning a vector, matrix or
• vapply
– is similar to sapply, but has a pre-specified type of return
value, so it can be safer (and sometimes faster) to use.
4/23/2013 30Confidential | Copyright 2013 Trend Micro Inc.
File IO
• Save and Load
– x = USPersonalExpenditure
– save(x, file="~/test.RData")
– rm(x)
– load("~/test.RData")
– x
Charts and Graphics
Plotting Example
– xrange = range(as.numeric(colnames(USPersonalExpenditure)));
– yrange= range(USPersonalExpenditure);
– plot(xrange, yrange, type="n", xlab="Year",ylab="Category" )
– for(i in 1:5) {
lines(as.numeric(colnames(USPersonalExpenditure)),USPersonalExpenditur
e[i,], type="b", lwd=1.5)
}
IRIS Dataset
• data()
IRIS Dataset
• The Iris flower data set or Fisher's Iris data set is
a multivariate data set introduced by Sir Ronald
Fisher (1936) as an example ofdiscriminant analysis.[1] It
is sometimes called Anderson's Iris data set
– http://en.wikipedia.org/wiki/Iris_flower_data_set
4/23/2013 35Confidential | Copyright 2013 Trend Micro Inc.
Iris setosa Iris versicolor Iris virginica
Classification of IRIS
• Classification Example
– install.packages("e1071")
– pairs(iris[1:4],main="Iris Data
(red=setosa,green=versicolor,blue=virginica)", pch=21,
bg=c("red","green3","blue")[unclass(iris$Species)])
– classifier<-naiveBayes(iris[,1:4], iris[,5])
– table(predict(classifier, iris[,-5]), iris[,5])
– classifier<-svm(iris[,1:4], iris[,5]) > table(predict(classifier, iris[,-
5]), iris[,5] + )
– prediction = predict(classifier, iris[,1:4])
• http://en.wikibooks.org/wiki/Data_Mining_Algorithms_In_R/Classification/Na%C3%A
Fve_Bayes
4/23/2013 36Confidential | Copyright 2013 Trend Micro Inc.
Performance Tips
• Use Built-in Math Functions
• Use Environments for Lookup Tables
• Use a Database to Query Large Data Sets
• Preallocate Memory
• Monitor How Much Memory You Are Using
• Cleaning Up Objects
• Functions for Big Data Sets
• Parallel Computation with R
R for Machine Learning
4/23/2013 38Confidential | Copyright 2012 Trend Micro Inc.
Helps of the Topic
• ?read.delim
– # Access a function's help file
• ??base::delim
– # Search for 'delim' in all help files for functions in 'base'
• help.search("delimited")
– # Search for 'delimited' in all help files
• RSiteSearch("parsing text")
– # Search for the term 'parsing text' on the R site.
Sample Code of Chapter 1
• https://github.com/johnmyleswhite/ML_for_Hackers.git
4/23/2013 40Confidential | Copyright 2013 Trend Micro Inc.
Reference & Resource
4/23/2013 41Confidential | Copyright 2012 Trend Micro Inc.
Study Material
• R in a nutshell
4/23/2013 42Confidential | Copyright 2013 Trend Micro Inc.
Online Reference
4/23/2013 43Confidential | Copyright 2013 Trend Micro Inc.
Community Resources for R help
4/23/2013 44Confidential | Copyright 2013 Trend Micro Inc.
Resource
• Websites
– Stackoverflow
– Cross Validated
– R-help
– R-devel
– R-sig-*
– Package-specific mailing list
• Blog
– R-bloggers
• Twitter
– https://twitter.com/#rstats
• Quora
– http://www.quora.com/R-software
4/23/2013 45Confidential | Copyright 2013 Trend Micro Inc.
Resource (Con’d)
• Conference
– useR!
– R in Finance
– R in Insurance
– Others
– Joint Statistical Meetings
– Royal Statistical Society Conference
• Local User Group
– http://blog.revolutionanalytics.com/local-r-groups.html
• Taiwan R User Group
– http://www.facebook.com/Tw.R.User
– http://www.meetup.com/Taiwan-R/
4/23/2013 46Confidential | Copyright 2013 Trend Micro Inc.
Thank You!
4/23/2013 47Confidential | Copyright 2012 Trend Micro Inc.

More Related Content

PDF
Data Visualization With R
PPTX
Data visualization with R
PPT
R programming slides
PDF
R data types
PDF
Class ppt intro to r
PDF
Data Types and Structures in R
PDF
Introduction to R Programming
PPTX
R programming presentation
Data Visualization With R
Data visualization with R
R programming slides
R data types
Class ppt intro to r
Data Types and Structures in R
Introduction to R Programming
R programming presentation

What's hot (20)

PPSX
PPT
Hive(ppt)
PDF
Data Visualization in Python
PPT
Java Streams
PDF
Lecture2 big data life cycle
PPTX
3. R- list and data frame
PDF
Zero to Hero - Introduction to Python3
PPTX
Data Analysis with Python Pandas
PPT
Php Presentation
PPTX
Introduction to matplotlib
PDF
Introduction To Python
PDF
Introduction to Hadoop
PPTX
Data Analysis in Python-NumPy
PPTX
Software product line
PPTX
PPT on Data Science Using Python
PPTX
Php.ppt
PPTX
Introduction to Hadoop and Hadoop component
PPTX
PPT on Hadoop
PPTX
Data visualization using R
PDF
R Programming: Introduction To R Packages
Hive(ppt)
Data Visualization in Python
Java Streams
Lecture2 big data life cycle
3. R- list and data frame
Zero to Hero - Introduction to Python3
Data Analysis with Python Pandas
Php Presentation
Introduction to matplotlib
Introduction To Python
Introduction to Hadoop
Data Analysis in Python-NumPy
Software product line
PPT on Data Science Using Python
Php.ppt
Introduction to Hadoop and Hadoop component
PPT on Hadoop
Data visualization using R
R Programming: Introduction To R Packages
Ad

Similar to R language tutorial (20)

PDF
R basics
PDF
R tutorial
PDF
R - the language
PDF
Machine Learning in R
PDF
R-Language-Lab-Manual-lab-1.pdf
PDF
R-Language-Lab-Manual-lab-1.pdf
PDF
R-Language-Lab-Manual-lab-1.pdf
PPT
Basics of R
PPT
R Programming for Statistical Applications
PPT
R-programming with example representation.ppt
PPT
Basics of R-Programming with example.ppt
PPT
Basocs of statistics with R-Programming.ppt
PPT
R-Programming.ppt it is based on R programming language
PPTX
DATA MINING USING R (1).pptx
PDF
Data analysis in R
PPT
R programming by ganesh kavhar
PPT
How to obtain and install R.ppt
PDF
Introduction to R programming
PPT
Introduction to R for Data Science Technology
PPTX
Getting Started with R
R basics
R tutorial
R - the language
Machine Learning in R
R-Language-Lab-Manual-lab-1.pdf
R-Language-Lab-Manual-lab-1.pdf
R-Language-Lab-Manual-lab-1.pdf
Basics of R
R Programming for Statistical Applications
R-programming with example representation.ppt
Basics of R-Programming with example.ppt
Basocs of statistics with R-Programming.ppt
R-Programming.ppt it is based on R programming language
DATA MINING USING R (1).pptx
Data analysis in R
R programming by ganesh kavhar
How to obtain and install R.ppt
Introduction to R programming
Introduction to R for Data Science Technology
Getting Started with R
Ad

More from David Chiu (10)

PDF
無中生有 - 利用外部數據打造新商業模式
PPTX
洞見未來,用python 與 r 結合深度學習技術預測趨勢
PDF
python 實戰資料科學工作坊
PDF
新聞 X 謊言 用文字探勘挖掘財經新聞沒告訴你的真相(丘祐瑋)
PDF
Data Analysis - Making Big Data Work
PDF
PyCon APAC 2014 - Social Network Analysis Using Python (David Chiu)
PPTX
Big Data Analysis With RHadoop
PDF
Social Network Analysis With R
PDF
Machine Learning With R
PPT
Hidden Markov Model & Stock Prediction
無中生有 - 利用外部數據打造新商業模式
洞見未來,用python 與 r 結合深度學習技術預測趨勢
python 實戰資料科學工作坊
新聞 X 謊言 用文字探勘挖掘財經新聞沒告訴你的真相(丘祐瑋)
Data Analysis - Making Big Data Work
PyCon APAC 2014 - Social Network Analysis Using Python (David Chiu)
Big Data Analysis With RHadoop
Social Network Analysis With R
Machine Learning With R
Hidden Markov Model & Stock Prediction

Recently uploaded (20)

PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
cuic standard and advanced reporting.pdf
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Network Security Unit 5.pdf for BCA BBA.
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Unlocking AI with Model Context Protocol (MCP)
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PPTX
MYSQL Presentation for SQL database connectivity
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Machine learning based COVID-19 study performance prediction
PPTX
Big Data Technologies - Introduction.pptx
Agricultural_Statistics_at_a_Glance_2022_0.pdf
cuic standard and advanced reporting.pdf
Dropbox Q2 2025 Financial Results & Investor Presentation
Network Security Unit 5.pdf for BCA BBA.
Digital-Transformation-Roadmap-for-Companies.pptx
Encapsulation_ Review paper, used for researhc scholars
Unlocking AI with Model Context Protocol (MCP)
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
NewMind AI Weekly Chronicles - August'25 Week I
20250228 LYD VKU AI Blended-Learning.pptx
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Understanding_Digital_Forensics_Presentation.pptx
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
MYSQL Presentation for SQL database connectivity
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
The AUB Centre for AI in Media Proposal.docx
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Machine learning based COVID-19 study performance prediction
Big Data Technologies - Introduction.pptx

R language tutorial

  • 1. David Chiu R Language Tutorial 14/23/2013 Confidential | Copyright 2013 Trend Micro Inc.
  • 2. Background of R 4/23/2013 2Confidential | Copyright 2012 Trend Micro Inc.
  • 3. What is R? • GNU Project Developed by John Chambers @ Bell Lab • Free software environment for statistical computing and graphics • Functional programming language written primarily in C, Fortran 4/23/2013 3Confidential | Copyright 2012 Trend Micro Inc.
  • 4. R Language • R is functional programming language • R is an interpreted language • R is object oriented-language
  • 5. Why Using R • Statistic analysis on the fly • Mathematical function and graphic module embedded • FREE! & Open Source! – http://cran.r-project.org/src/base/
  • 6. Kaggle http://www.kaggle.com/ R is the most widely language used by kaggle participants
  • 7. Data Scientist of these Companies Using R What is your programming language of choice, R, Python or something else? “I use R, and occasionally matlab, for data analysis. There is a large, active and extremely knowledgeable R community at Google.” http://simplystatistics.org/2013/02/15/interview-with-nick-chamandy-statistician-at-google/ 4/23/2013 7Confidential | Copyright 2013 Trend Micro Inc. “Expert knowledge of SAS (With Enterprise Guide/Miner) required and candidates with strong knowledge of R will be preferred” http://www.kdnuggets.com/jobs/13/03-29-apple-sr-data- scientist.html?utm_source=twitterfeed&utm_medium=facebook&utm_campaign=t fb&utm_content=FaceBook&utm_term=analytics#.UVXibgXOpfc.facebook
  • 8. Commercial support for R • In 2007, Revolution Analytics providea commercial support for Revolution R – http://www.revolutionanalytics.com/products/revolution-r.php – http://www.revolutionanalytics.com/why-revolution-r/which-r-is-right-for-me.php • Big Data Appliance, which integrates R, Apache Hadoop, Oracle Enterprise Linux, and a NoSQL database with the Exadata hardware – http://www.oracle.com/us/products/database/big-data- appliance/overview/index.html
  • 9. Revolotion R • Free for Community Version – http://www.revolutionanalytics.com/downloads/ – http://www.revolutionanalytics.com/why-revolution-r/benchmarks.php 4/23/2013 9Confidential | Copyright 2013 Trend Micro Inc. Base R 2.14.2 64 Revolution R (1-core) Revolution R (4-core) Speedup (4 core) Matrix Calculation 17.4 sec 2.9 sec 2.0 sec 7.9x Matrix Functions 10.3 sec 2.0 sec 1.2 sec 7.8x Program Control 2.7 sec 2.7 sec 2.7 sec Not Appreciable
  • 10. IDE R Studio • http://www.rstudio.com/ 4/23/2013 10Confidential | Copyright 2013 Trend Micro Inc. RGUI • http://www.r-project.org/
  • 11. Web App Development Shiny makes it super simple for R users like you to turn analyses into interactive web applications that anyone can use http://www.rstudio.com/shiny/ 4/23/2013 11Confidential | Copyright 2013 Trend Micro Inc.
  • 12. Package Management • CRAN (Comprehensive R Archive Network) 4/23/2013 12Confidential | Copyright 2013 Trend Micro Inc. Repository URL CRAN http://cran.r-project.org/web/packages/ Bioconductor http://www.bioconductor.org/packages/release/Software.html R-Forge http://r-forge.r-project.org/
  • 13. R Basic 4/23/2013 13Confidential | Copyright 2012 Trend Micro Inc.
  • 14. Basic Command • help() – help(demo) • demo() – demo(is.things) • q() • ls() • rm() – rm(x) 4/23/2013 14Confidential | Copyright 2013 Trend Micro Inc.
  • 15. Basic Object • Vector • List • Factor • Array • Matrix • Data Frame 4/23/2013 15Confidential | Copyright 2013 Trend Micro Inc.
  • 16. Objects & Arithmetic • Scalar – x=3; y<-5; x+y • Vectors – x = c(1,2,3, 7); y= c(2,3,5,1); x+y; x*y; x – y; x/y; – x =seq(1,10); y= 2:11; x+y – x =seq(1,10,by=2); y =seq(1,10,length=2) – rep(c(5,8), 3) – x= c(1,2,3); length(x) 4/23/2013 16Confidential | Copyright 2013 Trend Micro Inc.
  • 17. Summaries and Subscripting • Summary – X = c(1,2,3,4,5,6,7,8,9,10) – mean(x), min(x), median(x), max(x), var(x) – summary(x) • Subscripting – x = c(1,2,3,4,5,6,7,8,9,10) – x[1:3]; x[c(1,3,5)]; – x[c(1,3,5)] * 2 + x[c(2,2,2)] – x[-(1:6)] 4/23/2013 17Confidential | Copyright 2013 Trend Micro Inc.
  • 18. Lists • Contain a heterogeneous selection of objects – e <- list(thing="hat", size="8.25"); e – l <- list(a=1,b=2,c=3,d=4,e=5,f=6,g=7,h=8,i=9,j=10) – l$j – man = list(name="Qoo", height=183); man$name
  • 19. Factor • Ordered collection of items to present categorical value • Different values that the factor can take are called levels • Factors – phone = factor(c('iphone', 'htc', 'iphone', 'samsung', 'iphone', 'samsung')) – levels(phone) 4/23/2013 19Confidential | Copyright 2013 Trend Micro Inc.
  • 20. Matrices & Array • Array – An extension of a vector to more than two dimensions – a <- array(c(1,2,3,4,5,6,7,8,9,10,11,12),dim=c(3,4)) • Matrices – A vector to two dimensions – 2d-array – x = c(1,2,3); y = c(4,5,6); rbind(x,y);cbind(x,y) – x = rbind(c(1,2,3),c(4,5,6)); dim(x) – x<-matrix(c(1,2,3,4,5,6),nr=3); – x<-matrix(c(1,2,3,4,5,6),nrow=3, ,byrow=T) – x<-matrix(c(1,2,3,4),nr=2);y<-matrix(c(5,6),nr=2); x%*%y – t(matrix(c(1,2,3,4),nr=2)) – solve(matrix(c(1,2,3,4),nr=2))
  • 21. Data Frame • Useful way to represent tabular data • essentially a matrix with named columns may also include non-numerical variables • Example – df = data.frame(a=c(1,2,3,4,5),b=c(2,3,4,5,6));df
  • 22. Function • Function – `%myop%` <- function(a, b) {2*a + 2*b}; 1 %myop% 1 – f <- function(x) {return(x^2 + 3)} create.vector.of.ones <- function(n) { return.vector <- NA; for (i in 1:n) { return.vector[i] <- 1; } return.vector; } – create.vector.of.ones(3) • Control Structures – If …else… – Repeat, for, while • Catch error – trycatch
  • 23. Anonymous Function • Functional language Characteristic – apply.to.three <- function(f) {f(3)} – apply.to.three(function(x) {x * 7})
  • 24. Objects and Classes • All R code manipulates objects. • Every object in R has a type • In assignment statements, R will copy the object, not just the reference to the object Attributes
  • 25. S3 & S4 Object • Many R functions were implemented using S3 methods • In S version 4 (hence S4), formal classes and methods were introduced that allowed – Multiple arguments – Abstract types – inheritance.
  • 26. OOP of S4 • S4 OOP Example – setClass("Student", representation(name = "character", score="numeric")) – studenta = new ("Student", name="david", score=80 ) – studentb = new ("Student", name="andy", score=90 ) setMethod("show", signature("Student"), function(object) { cat(object@score+100) }) – setGeneric("getscore", function(object) standardGeneric("getscore")) – Studenta
  • 27. Packages • A package is a related set of functions, help files, and data files that have been bundled together. • Basic Command – library(rpart) – CRAN – Install – (.packages())
  • 28. Package used in Machine Learning for Hackers 4/23/2013 28Confidential | Copyright 2013 Trend Micro Inc.
  • 29. Apply • Apply – Returns a vector or array or list of values obtained by applying a function to margins of an array or matrix. – data <- cbind(c(1,2),c(3,4)) – data.rowsum <- apply(data,1,sum) – data.colsum <- apply(data,2,sum) – data 4/23/2013 29Confidential | Copyright 2013 Trend Micro Inc.
  • 30. Apply • lapply – returns a list of the same length as X, each element of which is the result of applying FUN to the corresponding element of X. • sapply – is a user-friendly version and wrapper of lapply by default returning a vector, matrix or • vapply – is similar to sapply, but has a pre-specified type of return value, so it can be safer (and sometimes faster) to use. 4/23/2013 30Confidential | Copyright 2013 Trend Micro Inc.
  • 31. File IO • Save and Load – x = USPersonalExpenditure – save(x, file="~/test.RData") – rm(x) – load("~/test.RData") – x
  • 33. Plotting Example – xrange = range(as.numeric(colnames(USPersonalExpenditure))); – yrange= range(USPersonalExpenditure); – plot(xrange, yrange, type="n", xlab="Year",ylab="Category" ) – for(i in 1:5) { lines(as.numeric(colnames(USPersonalExpenditure)),USPersonalExpenditur e[i,], type="b", lwd=1.5) }
  • 35. IRIS Dataset • The Iris flower data set or Fisher's Iris data set is a multivariate data set introduced by Sir Ronald Fisher (1936) as an example ofdiscriminant analysis.[1] It is sometimes called Anderson's Iris data set – http://en.wikipedia.org/wiki/Iris_flower_data_set 4/23/2013 35Confidential | Copyright 2013 Trend Micro Inc. Iris setosa Iris versicolor Iris virginica
  • 36. Classification of IRIS • Classification Example – install.packages("e1071") – pairs(iris[1:4],main="Iris Data (red=setosa,green=versicolor,blue=virginica)", pch=21, bg=c("red","green3","blue")[unclass(iris$Species)]) – classifier<-naiveBayes(iris[,1:4], iris[,5]) – table(predict(classifier, iris[,-5]), iris[,5]) – classifier<-svm(iris[,1:4], iris[,5]) > table(predict(classifier, iris[,- 5]), iris[,5] + ) – prediction = predict(classifier, iris[,1:4]) • http://en.wikibooks.org/wiki/Data_Mining_Algorithms_In_R/Classification/Na%C3%A Fve_Bayes 4/23/2013 36Confidential | Copyright 2013 Trend Micro Inc.
  • 37. Performance Tips • Use Built-in Math Functions • Use Environments for Lookup Tables • Use a Database to Query Large Data Sets • Preallocate Memory • Monitor How Much Memory You Are Using • Cleaning Up Objects • Functions for Big Data Sets • Parallel Computation with R
  • 38. R for Machine Learning 4/23/2013 38Confidential | Copyright 2012 Trend Micro Inc.
  • 39. Helps of the Topic • ?read.delim – # Access a function's help file • ??base::delim – # Search for 'delim' in all help files for functions in 'base' • help.search("delimited") – # Search for 'delimited' in all help files • RSiteSearch("parsing text") – # Search for the term 'parsing text' on the R site.
  • 40. Sample Code of Chapter 1 • https://github.com/johnmyleswhite/ML_for_Hackers.git 4/23/2013 40Confidential | Copyright 2013 Trend Micro Inc.
  • 41. Reference & Resource 4/23/2013 41Confidential | Copyright 2012 Trend Micro Inc.
  • 42. Study Material • R in a nutshell 4/23/2013 42Confidential | Copyright 2013 Trend Micro Inc.
  • 43. Online Reference 4/23/2013 43Confidential | Copyright 2013 Trend Micro Inc.
  • 44. Community Resources for R help 4/23/2013 44Confidential | Copyright 2013 Trend Micro Inc.
  • 45. Resource • Websites – Stackoverflow – Cross Validated – R-help – R-devel – R-sig-* – Package-specific mailing list • Blog – R-bloggers • Twitter – https://twitter.com/#rstats • Quora – http://www.quora.com/R-software 4/23/2013 45Confidential | Copyright 2013 Trend Micro Inc.
  • 46. Resource (Con’d) • Conference – useR! – R in Finance – R in Insurance – Others – Joint Statistical Meetings – Royal Statistical Society Conference • Local User Group – http://blog.revolutionanalytics.com/local-r-groups.html • Taiwan R User Group – http://www.facebook.com/Tw.R.User – http://www.meetup.com/Taiwan-R/ 4/23/2013 46Confidential | Copyright 2013 Trend Micro Inc.
  • 47. Thank You! 4/23/2013 47Confidential | Copyright 2012 Trend Micro Inc.