SlideShare a Scribd company logo
應用 Machine Learning 到你的 Data 上吧
從 R 開始
@ COSCUP 2013David Chiu
About Me
Trend Micro
Taiwan R User Group
ywchiu-tw.appspot.com
Big Data Era
Quick analysis, finding meaning beneath data.
Data Analysis
1. Preparing to run the Data (Munging)
2. Running the model (Analysis)
3. Interpreting the result
Machine Learning
Black-box, algorithmic approach to producing predictions or
classifications from data
A computer program is said to learn from
experience E with respect to some task T and some
performance measure P, if its performance on T, as
measured by P, improves with experience E
Tom Mitchell (1998)
Using to do Machine
Learning
Using R
Why Using R?
1. Statistic analysis on the fly
2. Mathematical function and graphic module embedded
3. FREE! & Open Source!
Application of Machine
Learning
1. Recommender systems
2. Pattern Recognition
3. Stock market analysis
4. Natural language processing
5. Information Retrieval
Facial Recognition
Topics of Machine Learning
Supervised Learning
Regression
Classfication
Unsupervised Learning
Dimension Reduction
Clustering
Regression
Predict one set of numbers given another set of numbers
Given number of friends x, predict how many
goods I will receive on each facebook posts
Scatter Plot
dataset <- read.csv('fbgood.txt',head=TRUE, sep='t', row.names=1)
x = dataset$friends
y = dataset$getgoods
plot(x,y)
Linear Fit
fit <- lm(y ~ x);
abline(fit, col = 'red', lwd=3)
2nd order polynomial fit
plot(x,y)
polyfit2 <- lm(y ~ poly(x, 2));
lines(sort(x), polyfit2$fit[order(x)], col = 2, lwd = 3)
3rd order polynomial fit
plot(x,y)
polyfit3 <- lm(y ~ poly(x, 3));
lines(sort(x), polyfit3$fit[order(x)], col = 2, lwd = 3)
Other Regression Packages
MASS rlm - Robust Regression
GLM - Generalized linear Models
GAM - Generalized Additive Models
Classfication
Identifying to which of a set of categories a new observation belongs,
on the basis of a training set of data
Given features of bank costumer, predict whether
the client will subscribe a term deposit
Data Description
Features:
age,job,marital,education,default,balance,housing,loan,contact
Labels:
Customers subscribe a term deposit (Yes/No)
Classify Data With LibSVM
library(e1071)
dataset <- read.csv('bank.csv',head=TRUE, sep=';')
dati = split.data(dataset, p = 0.7)
train = dati$train
test = dati$test
model <- svm(y~., data = train, probability = TRUE)
pred <- predict(model, test[,1:(dim(test)[[2]]-1)], probability = TRUE)
Verify the predictions
table(pred,test[,dim(test)[2]])
pred no yes
no 1183 99
yes 27 47
Using ROC for assessment
library(ROCR)
pred.prob <- attr(pred, "probabilities")
pred.to.roc <- pred.prob[, 2]
pred.rocr <- prediction(pred.to.roc, as.factor(test[,(dim(test)[[2]])]))
perf.rocr <- performance(pred.rocr, measure = "auc", x.measure = "cutoff")
perf.tpr.rocr <- performance(pred.rocr, "tpr","fpr")
plot(perf.tpr.rocr, colorize=T, main=paste("AUC:",(perf.rocr@y.values)))
Then, get your thesis
Support Vector Machines and
Kernel Methods
e1071 - LIBSVM
kernlab - SVM, RVM and other kernel learning algorithms
klaR - SVMlight
rdetools - Model selection and prediction
Dimension Reduction
Seeks linear combinations of the columns of X with maximalvariance
Calculate a new index to measure economy index
of each Taiwan city/county
Economic Index of Taiwan
County
縣市
營利事業銷售額
經濟發展支出佔歲出比例
得收入者平均每人可支配所得
2012年《天下雜誌》幸福城市大調查 - 第505期
Component Bar Plot
dataset <- read.csv('eco_index.csv',head=TRUE, sep=',', row.names=1)
pc.cr <- princomp(dataset, cor = TRUE)
plot(pc.cr)
Component Line Plot
screeplot(pc.cr, type="lines")
abline(h=1, lty=3)
PCA biplot
biplot(pc.cr)
PCA barplot
barplot(sort(-pc.cr$scores[,1], TRUE))
Other Dimension Reduction
Packages
kpca - Kernel PCA
cmdscale - Multi Dimension Scaling
SVD - Singular Value Decomposition
fastICA - Independent Component Analysis
Clustering
Birds of a feather flock together
Segment customers based on existing features
Customer Segmentation
Clustering by 4 features
Visit Time
Average Expense
Loyalty Days
Age
Determing Clusters
mydata <- read.csv('costumer_segment.txt',head=TRUE, sep='t')
mydata <- scale(mydata)
d <- dist(mydata, method = "euclidean")
fit <- hclust(d, method="ward")
plot(fit)
Cutting trees
k1 = 4
groups <- cutree(fit, k=k1)
rect.hclust(fit, k=k1, border="red")
Kmeans Clustering
fit <- kmeans(mydata, k1)
plot(mydata, col = fit$cluster)
Principal Component Plot
library(cluster)
clusplot(mydata, fit$cluster, color=TRUE, shade=TRUE, lines=0)
Other Clustering Packages
kernlab - Spectral Clustering
specc - Spectral Clustering
fpc - DBSCAN
Machine Learning Dignostic
1. Get more training examples
2. Try smaller sets of features
3. Try getting additional features
4. Try adding polynomial features
5. Try parameter increasing/decreasing
Overfitting
Trainging error to be low, test error to be highe. g. θJtraining θJtest
Use
For Data Analysis
THANK YOU
Please Come and Visit Taiwan R User
Group

More Related Content

PPTX
K-Means Clustering Algorithm - Cluster Analysis | Machine Learning Algorithm ...
PDF
An Introduction to Data Mining with R
PDF
Frequent Pattern Mining - Krishna Sridhar, Feb 2016
PPSX
Customer Segmentation with R - Deep Dive into flexclust
PDF
Workshop - Introduction to Machine Learning with R
PDF
Random forest using apache mahout
PDF
ITB Term Paper - 10BM60066
PPT
Chapter - 8.3 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
K-Means Clustering Algorithm - Cluster Analysis | Machine Learning Algorithm ...
An Introduction to Data Mining with R
Frequent Pattern Mining - Krishna Sridhar, Feb 2016
Customer Segmentation with R - Deep Dive into flexclust
Workshop - Introduction to Machine Learning with R
Random forest using apache mahout
ITB Term Paper - 10BM60066
Chapter - 8.3 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber

What's hot (20)

PDF
Spark ml streaming
PDF
Working with text data
PPTX
Data Science Challenge presentation given to the CinBITools Meetup Group
PDF
Visualizing the Model Selection Process
PPT
Data structures cs301 power point slides lecture 01
PPT
computer notes - Data Structures - 1
PDF
XGBoost: the algorithm that wins every competition
PDF
Gradient Boosted Regression Trees in scikit-learn
PPTX
The Other HPC: High Productivity Computing in Polystore Environments
PPTX
Baisc Deep Learning HandsOn
PPTX
Comparison Study of Decision Tree Ensembles for Regression
PDF
Feature Engineering - Getting most out of data for predictive models - TDC 2017
PPT
Mining Frequent Patterns, Association and Correlations
PPT
Logistic Regression using Mahout
PPT
Unit iv(dsc++)
PPT
Association rule mining
PPTX
Training in Analytics, R and Social Media Analytics
PDF
Hacking Predictive Modeling - RoadSec 2018
PPTX
Session 06 machine learning.pptx
PPT
Computer notes - data structures
Spark ml streaming
Working with text data
Data Science Challenge presentation given to the CinBITools Meetup Group
Visualizing the Model Selection Process
Data structures cs301 power point slides lecture 01
computer notes - Data Structures - 1
XGBoost: the algorithm that wins every competition
Gradient Boosted Regression Trees in scikit-learn
The Other HPC: High Productivity Computing in Polystore Environments
Baisc Deep Learning HandsOn
Comparison Study of Decision Tree Ensembles for Regression
Feature Engineering - Getting most out of data for predictive models - TDC 2017
Mining Frequent Patterns, Association and Correlations
Logistic Regression using Mahout
Unit iv(dsc++)
Association rule mining
Training in Analytics, R and Social Media Analytics
Hacking Predictive Modeling - RoadSec 2018
Session 06 machine learning.pptx
Computer notes - data structures
Ad

Viewers also liked (20)

PPTX
R language tutorial
PDF
Data Science - Part IX - Support Vector Machine
PDF
Introduction to Machine Learning using R - Dublin R User Group - Oct 2013
PDF
PyCon APAC 2014 - Social Network Analysis Using Python (David Chiu)
PDF
Predictive analytics
PDF
6 h blockeel - machine learning en geo-toepassingen
PPTX
Big Data Expo 2015 - Big 4 Data Bonaparte
PDF
Hans f hans adviseert
PPTX
Heliview 29sep2015 slideshare
PDF
fauvel_igarss.pdf
PDF
Principal component analysis and matrix factorizations for learning (part 2) ...
PDF
Kernel Entropy Component Analysis in Remote Sensing Data Clustering.pdf
PDF
Nonlinear component analysis as a kernel eigenvalue problem
PPTX
Different kind of distance and Statistical Distance
PDF
KPCA_Survey_Report
PPTX
Principal Component Analysis For Novelty Detection
PDF
新聞 X 謊言 用文字探勘挖掘財經新聞沒告訴你的真相(丘祐瑋)
PDF
Adaptive anomaly detection with kernel eigenspace splitting and merging
PDF
Analyzing Kernel Security and Approaches for Improving it
PDF
Modeling and forecasting age-specific mortality: Lee-Carter method vs. Functi...
R language tutorial
Data Science - Part IX - Support Vector Machine
Introduction to Machine Learning using R - Dublin R User Group - Oct 2013
PyCon APAC 2014 - Social Network Analysis Using Python (David Chiu)
Predictive analytics
6 h blockeel - machine learning en geo-toepassingen
Big Data Expo 2015 - Big 4 Data Bonaparte
Hans f hans adviseert
Heliview 29sep2015 slideshare
fauvel_igarss.pdf
Principal component analysis and matrix factorizations for learning (part 2) ...
Kernel Entropy Component Analysis in Remote Sensing Data Clustering.pdf
Nonlinear component analysis as a kernel eigenvalue problem
Different kind of distance and Statistical Distance
KPCA_Survey_Report
Principal Component Analysis For Novelty Detection
新聞 X 謊言 用文字探勘挖掘財經新聞沒告訴你的真相(丘祐瑋)
Adaptive anomaly detection with kernel eigenspace splitting and merging
Analyzing Kernel Security and Approaches for Improving it
Modeling and forecasting age-specific mortality: Lee-Carter method vs. Functi...
Ad

Similar to Machine Learning With R (20)

PDF
Machine Learning Guide maXbox Starter62
PDF
Hands-on Tutorial of Machine Learning in Python
PDF
Data Analysis - Making Big Data Work
PDF
Kaggle presentation
PDF
How to use SVM for data classification
PDF
Silicon valleycodecamp2013
PPTX
Machine Learning Workshop
PPT
從行動廣告大數據觀點談 Big data 20150916
PDF
Pycon 2012 Scikit-Learn
PDF
【FIT2016チュートリアル】ここから始める情報処理 ~機械学習編~
PPTX
svm classification
PDF
Mastering Predictive Analytics with R 2nd edition Edition Forte
PDF
MLDM CM Kaggle Tips
PDF
Mastering Predictive Analytics with R 2nd edition Edition Forte
PDF
Machine learning pour les données massives algorithmes randomis´es, en ligne ...
PPTX
Analytics Boot Camp - Slides
PDF
MLEARN 210 B Autumn 2018: Lecture 1
PPTX
An introduction to Machine Learning with scikit-learn (October 2018)
PDF
ML with python.pdf
PPTX
Introduction to machine learning and model building using linear regression
Machine Learning Guide maXbox Starter62
Hands-on Tutorial of Machine Learning in Python
Data Analysis - Making Big Data Work
Kaggle presentation
How to use SVM for data classification
Silicon valleycodecamp2013
Machine Learning Workshop
從行動廣告大數據觀點談 Big data 20150916
Pycon 2012 Scikit-Learn
【FIT2016チュートリアル】ここから始める情報処理 ~機械学習編~
svm classification
Mastering Predictive Analytics with R 2nd edition Edition Forte
MLDM CM Kaggle Tips
Mastering Predictive Analytics with R 2nd edition Edition Forte
Machine learning pour les données massives algorithmes randomis´es, en ligne ...
Analytics Boot Camp - Slides
MLEARN 210 B Autumn 2018: Lecture 1
An introduction to Machine Learning with scikit-learn (October 2018)
ML with python.pdf
Introduction to machine learning and model building using linear regression

More from David Chiu (6)

PDF
無中生有 - 利用外部數據打造新商業模式
PPTX
洞見未來,用python 與 r 結合深度學習技術預測趨勢
PDF
python 實戰資料科學工作坊
PPTX
Big Data Analysis With RHadoop
PDF
Social Network Analysis With R
PPT
Hidden Markov Model & Stock Prediction
無中生有 - 利用外部數據打造新商業模式
洞見未來,用python 與 r 結合深度學習技術預測趨勢
python 實戰資料科學工作坊
Big Data Analysis With RHadoop
Social Network Analysis With R
Hidden Markov Model & Stock Prediction

Recently uploaded (20)

PPTX
Big Data Technologies - Introduction.pptx
PPT
Teaching material agriculture food technology
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PDF
Machine learning based COVID-19 study performance prediction
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Electronic commerce courselecture one. Pdf
PDF
cuic standard and advanced reporting.pdf
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Big Data Technologies - Introduction.pptx
Teaching material agriculture food technology
Reach Out and Touch Someone: Haptics and Empathic Computing
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
Machine learning based COVID-19 study performance prediction
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
NewMind AI Weekly Chronicles - August'25 Week I
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
20250228 LYD VKU AI Blended-Learning.pptx
Electronic commerce courselecture one. Pdf
cuic standard and advanced reporting.pdf
Network Security Unit 5.pdf for BCA BBA.
Chapter 3 Spatial Domain Image Processing.pdf
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
CIFDAQ's Market Insight: SEC Turns Pro Crypto
Advanced methodologies resolving dimensionality complications for autism neur...
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx

Machine Learning With R