SlideShare a Scribd company logo
Practical data analysis with wine 
  
December 2014 
Toshifumi Kuga CEO of TOSHI STATS SDN. BHD. 
beta version 
1
Today’s menu 
1. formula for prediction of wine price 
2. data handling (vector & matrix) 
3. liner regression model with R 
2
  Formula for prediction of wine price is public 
• Dr. Orley Ashenfelter  
• He is a professor of economics at Princeton 
University and was a president American Economic 
Association in 2011 
• The formula was public in 1990 
3 
1. formula of price prediction 
http://www.liquidasset.com/winedata.html 
Data is available here
Dr. Orley Ashenfelter’s formular 
wine price=-12.145+0.00117×amount of rain in winter+ 0.06163×average 
temperature- 0.00386×amount of rain in harvest+ 0.02385×years from 1983 
• parameters:θ=[ -12.145, 0.00117, 0.06163, -0.00386, 0.02385 ] 
• input variables:X=[1, rain winter, average temp, rain harvest, years] 
• wine price:Y=θ0+θ1×X1+θ2×X2+θ3×X3+θ4×X4 
• wine price can be represented as「Y=θX」 
※ ‘wine price’ : ratio of average price of the year against the average price of 1961, and take log of the ratio 
4 
1. formula of price prediction 
simplified in the explanation above
   Step for prediction of wine price 
• wine price:Y=θX 
• Y : value to be predicted(Future wine price in this case、unknown value) 
• X : known value(temperature in the past are known now) 
• Parametersθis unknown 
    → Ifθis obtained 、future wine price Y can be obtained, too! 
• Y in the past is also known(wine price in the past is known) 
   → X and Y in the past are available as a set → θcan be obtained 
5 
1. formula of price prediction
  Data used in the analysis 
OBS VINT Y:LPRICE2 X1:WRAIN X2:DEGREES X3:HRAIN X4:TIME_SV 
1 1952 -0.99868 600 17.1167 160 31 
2 1953 -0.4544 690 16.7333 80 30 
3 1954 430 15.3833 180 29 
4 1955 -0.80796 502 17.15 130 28 
5 1956 440 15.65 140 27 
… … … … … … … 
35 1986 563 16.2833 171 -3 
36 1987 452 16.9833 115 -4 
37 1988 808 17.1 59 -5 
38 1989 443 82 -6 
Y X 
6 
1. formula of price prediction
 How to obtainθ: Least square method 
• Compared predictions with observed value(value it the 
past), parametersθcan be obtained so that square of 
deferences can be minimize 
• There are programs (algorithms) that calculations 
automatically are executed in the computers 
• In practice, we rarely calculate parameters manually(In 
practice, it can not be solved manually) 
7 
1. formula of price prediction
1. formula of price prediction 
 Parameter calculations by computer 
8 
Value in the past 
Y Parameter calculation Price prediction model 
θ 
X 
Y=θX
  θand X are not “ just a number ” 
• gathering of numbers 
• It can be represented as vectors and matrixes in math 
• Massive amount of data can be represented by vectors 
and matrices with ease! 
• Data can be handled as vectors and matrices in computers 
• Major program language, such as R, MATLAB, python can 
prepare vectors and matrices and control them effectively 
9 
1. formula of price prediction
Be familiar with vectors and matrices! 
• You can handle data as you like 
• You can program it by yourself 
• First step for practical data analysis 
10 
1. formula of price prediction
2. Data handling(vector&matrix) 
 Math in high school is important ! 
• Arithmetic is mainly explained 
• No more than +, -, ×, / 
• Exercise manually until getting familiar 
with vectors and matrices 
• Let us verify the results by using R 
11
2. Data handling(vector&matrix) 
   Vector : one line 
• either vertical or horizontal 
[1 3 7] [5 13 ] 
12 
] 
]5 1 
b=c(5,13) 
d=c(1,5) 
a=c(1,3,7) 
Blue: Verify the results by R language
2. Data handling(vector&matrix) 
   vector : addition 
[1 3 7] 
+ = 
6 7 
2 7 + = 
13 
] 
]5 1 
[24 1] 
] 
[3 7 8] 
] 
] 
] 
a=c(1,3,7) 
b=c(2,4,1) 
a+b 
a=c(1,5) 
b=c(6,2) 
a+b
2. Data handling(vector&matrix) 
   vector : subtraction 
[1 4 7] 
- = 
6 -5 
2 3 - = 
14 
] 
]5 1 
[23 1] 
] 
[-11 6] 
] 
] 
] 
a=c(1,4,7) 
b=c(2,3,1) 
a-b 
a=c(1,5) 
b=c(6,2) 
a-b
2. Data handling(vector&matrix) 
   vector : scalar multiplication 
3 [24 1] 
6 12 
2 4 2 × = 
15 
] 
[6 12 3] 
] 
× = 
] 
] 
a=c(2,4,1) 
3*a 
b=c(6,2) 
2*b
2. Data handling(vector&matrix) 
 vector : multiplication (inner product)  
× = 
16 
[24 1] 
]3 
6 32 
2 
] 
a=c(2,4,1) 
b=c(3,6,2) 
2×3+ 4×6+1×2 =32 a%*%b
2. Data handling(vector&matrix) 
   Matrix : rectangular shape 
a=matrix(c(1,3,2,4),2,2) 2×2 2×2 3×2 
• dimension:number of rows × number of columns (m×n) 
17
2. Data handling(vector&matrix) 
   Matrix : elements 
• elements (entries) 
first row first column:1 
second row first column:3 
first row and second column:2 
second row and second column:4 
18
2. Data handling(vector&matrix) 
   Matrix : addition 
+ = 
19 
+ 
=
2. Data handling(vector&matrix) 
   Matrix : subtraction 
ー= 
20 
ー 
=
2. Data handling(vector&matrix) 
  Matrix:scalar multiplication/division 
21 
= 
× 
= 
2 × 
/ 2 = 1/2
2. Data handling(vector&matrix) 
   Matrix : multiplication 
22 
× 
× 
= 
= 
a little 
complicated?
2. Data handling(vector&matrix) 
   Let us see it more details ! 
[× 52 49] 
[1 2] 3 4 
[ 
a=matrix(c(1,3,2,4),2,2) 
b=matrix(c(2,5,9,4),2,2) 
1×9+ 2×4 =17 
3×9+ 4×4 =43]= 
23 
= 
a%*%b 
[ 17] 26 43 
1×2+ 2×5 =12 12 
3×2+ 4×5 =26
2. Data handling(vector&matrix) 
 Matrix multiplication : not commutative 
24 
× 
× 
× = × 
=
2. Data handling(vector&matrix) 
   vector : multiplication 2 
[24] 
25 
]3 
]6 × = [6 12] 12 24 
a=matrix(c(3,6),2,1) 
b=c(2,4) 
a%*%b
2. Data handling(vector&matrix) 
identity matrix 
• Diagonal elements are 1 
• Any other elements are 0 
• In multiplication with identity matrix, 
nothing is changed 
× = × = 
26 
diag(2)
inverse matrix 
• If A is m×m matrix and if A has an inverse 
matrix AA=AA=I I : identity matrix 
-1 -1 
× = × = 
a=matrix(c(1,3,2,4),2,2) 
> a 
[,1] [,2] 
[1,] 1 2 
[2,] 3 4 
27 
> inv=solve(a) 
> inv 
[,1] [,2] 
[1,] -2.0 1.0 
[2,] 1.5 -0.5 
-1= 
-1 -1 
2. Data handling(vector&matrix)
transpose matrix 
• exchange elements of row and column 
28 
a=matrix(c(1,3,2,4),2,2) 
t(a) 
= 
T 
2. Data handling(vector&matrix)
   Least squares estimation 
• Vector and matrix are used in programming least squares estimation 
• J = 1/(2*m) * T(X*θ-Y)*(X*θ-Y):cost function (Squared error function) 
• m : number of sample data 
• X is a matrix, Y is a vector、θ is a parameter vector 
• T( )means transpose matrix 
• θ can be obtained so that J is minimized ( deference between predictions 
29 
and real value can be minimized) 
→ Least squares estimation 
2. Data handling(vector&matrix)
  analysis by liner regression model “lm” 
> wineprice=lm(LPRICE2~WRAIN+DEGREES+HRAIN+TIME_SV, data=wine) 
> wineprice 
input variables 
30 
3. Liner regression with R 
After lm, put a variable to be predicted 、then ”~” and input variables、data=name of data file 
> ▼▼▼=lm(◯◯◯~△△△+■■■, data=◎◎◎) 
> ▼▼▼ 
a variable to be 
predicted 
http://www.liquidasset.com/winedata.html Data is available here
3. Liner regression with R 
   Parameters can be obtained! 
• Call: 
• lm(formula = LPRICE2 ~ WRAIN + DEGREES + HRAIN + TIME_SV, 
Let us compare them with formula of 
31 
data =wine) 
prediction of wine price 
• Coefficients: 
• (Intercept) WRAIN DEGREES HRAIN TIME_SV 
• -12.145007 0.001167 0.616365 -0.003861 0.023850
32 
RStudio 
see p38 
3. Liner regression with R
33 
3. Liner regression with R 
□ prediction 
◯ real price 
predict(wineprice,data.frame(wine))
  analyze data by functions automatically 
• By function ‘lm’, parameters can be obtained with one line command 
• There are a lot of of functions in R. we can analyze data by these functions 
without wring functions by ourselves. 
• However we should understand how calculations are done in functions 
broadly. Blackbox approach is not recommendedただし、 
• More we can understand functions, better we can select the functions for 
particular cases to solve 
• Let us be familiar with ‘lm’. Then you can understand other functions with ease 
34 
3. Liner regression with R
recommender systems 
• amazon.com and Netflix are famous for 
recommendations 
• a variety of recommendations 
• Recommend the most popular product 
→same recommendation for everyone 
• Recommend the best products for the 
individual customer 
→need for personalization method ! 
35
Personalization 
• example of method for personalized recommendations 
• θ:customers’ preference(click the products or not? 
provide the rating or not?) 
• X:items features(in the case of movies:holler? romance? 
SF?・Who is the director, actor, actress?・When and where is 
it created?) 
• Obtain probabilities based on θX by logistic regression model 
• If probability is high, recommendations of the item are 
provided to the customer 
36
Quandl:data source 
37 
• Over 10M data is 
available for free 
• Data can be 
downloaded 
directly to R、 
MATLAB、python 
https://www.quandl.com
Website of R and RStudio 
• R is a language and environment for statistical computing. R 
Foundation for Statistical Computing, Vienna, Austria. ISBN 
3-90005107-0 URL http://www.R-project.org 
• I prepare short movie about how to use R. 
http://www.toshistats.net/introduction-to-r-language/ 
• RStudio is one of the best IDE for R. 
http://www.rstudio.com/products/rstudio/download/ 
38
Thanks for your attentions 
• TOSHI STATS SDN. BHD, Digital-learning center for statistical computing in Asia 
• CEO : Toshifumi Kuga, Certified financial services auditor 
• Company website : www.toshistats.net 
• Company FB page : www.facebook.com/toshistatsco 
• Company blog : http://toshistats.wordpress.com/aboutme/ 
• Company blog is updated on AM 10:00 every Thursday and reports the latest 
information about data analysis ! Please look at this blog or Company website. 
39
Disclaimer 
• TOSHI STATS SDN. BHD. and I do not accept any responsibility or 
liability for loss or damage occasioned to any person or property 
through using materials, instructions, methods, algorithm or ideas 
contained herein, or acting or refraining from acting as a result of 
such use. TOSHI STATS SDN. BHD. and I expressly disclaim all 
implied warranties, including merchantability or fitness for any 
particular purpose. There will be no duty on TOSHI STATS SDN. 
BHD. and me to correct any errors or defects in the codes and the 
software 
© 2014 TOSHI STATS SDN. BHD. All rights reserved 
40

More Related Content

PPTX
MATLAB - Arrays and Matrices
PPT
Co-factor matrix..
PPTX
MATLAB - Aplication of Arrays and Matrices in Electrical Systems
DOC
Business mathametics and statistics b.com ii semester (2)
PPTX
Random number generation
PPTX
Matrix2 english
PPTX
A mid point ellipse drawing algorithm on a hexagonal grid
PPTX
Matrix algebra
MATLAB - Arrays and Matrices
Co-factor matrix..
MATLAB - Aplication of Arrays and Matrices in Electrical Systems
Business mathametics and statistics b.com ii semester (2)
Random number generation
Matrix2 english
A mid point ellipse drawing algorithm on a hexagonal grid
Matrix algebra

What's hot (20)

PPT
M a t r i k s
PPS
PPTX
matrix algebra
PDF
Matrices
PPT
matrices and determinantes
PPTX
Digital Differential Analyzer Line Drawing Algorithm
PPTX
Ppt presentasi matrix algebra
PDF
Introduction to Logarithm
PPTX
Polynomials and Curve Fitting in MATLAB
PDF
Matlab tutorial 2
PPTX
Bba i-bm-u-2- matrix -
PDF
2. determinantes
PPT
Matrices - Mathematics
PPTX
Computer graphics presentation
PDF
PDF
Applied numerical methods lec8
PDF
5HBC: How to Graph Implicit Relations Intro Packet!
PPT
digital systems and information
PPT
Matrices and determinants-1
M a t r i k s
matrix algebra
Matrices
matrices and determinantes
Digital Differential Analyzer Line Drawing Algorithm
Ppt presentasi matrix algebra
Introduction to Logarithm
Polynomials and Curve Fitting in MATLAB
Matlab tutorial 2
Bba i-bm-u-2- matrix -
2. determinantes
Matrices - Mathematics
Computer graphics presentation
Applied numerical methods lec8
5HBC: How to Graph Implicit Relations Intro Packet!
digital systems and information
Matrices and determinants-1
Ad

Viewers also liked (6)

PPTX
Wine Analytics
PDF
Charting Your Career Path with Globibo: Opportunities for Growth and Development
PDF
Cluster analysis - Wholesale customers data set
PPTX
Predicting Wine Quality Using Different Implementations of Decision Tree Algo...
PPT
Wine quality Analysis
PPTX
IDS 570 project presentation
Wine Analytics
Charting Your Career Path with Globibo: Opportunities for Growth and Development
Cluster analysis - Wholesale customers data set
Predicting Wine Quality Using Different Implementations of Decision Tree Algo...
Wine quality Analysis
IDS 570 project presentation
Ad

Similar to Practical data analysis with wine (20)

PPTX
Predective analytcis v0.1 AS
PPTX
Data analysis with R
PPTX
R programming language
PPTX
R Programming.pptx
PPT
R programming slides
PPT
Advanced Data Analytics with R Programming.ppt
PPT
How to obtain and install R.ppt
PDF
Matrix algebra in_r
PPTX
Dr. Syed Muhammad Ali Tirmizi - Special topics in finance lec 4
PPT
Introduction to R for Data Science Technology
PPT
PPT
Slides on introduction to R by ArinBasu MD
PPT
17641.ppt
PPT
Basics of R-Progranmming with instata.ppt
PPT
introduction to R with example, Data science
PDF
ITB Term Paper - 10BM60066
PDF
Applied Statistics With R
PPTX
Programming in R
PDF
R basics
PDF
R-Language-Lab-Manual-lab-1.pdf
Predective analytcis v0.1 AS
Data analysis with R
R programming language
R Programming.pptx
R programming slides
Advanced Data Analytics with R Programming.ppt
How to obtain and install R.ppt
Matrix algebra in_r
Dr. Syed Muhammad Ali Tirmizi - Special topics in finance lec 4
Introduction to R for Data Science Technology
Slides on introduction to R by ArinBasu MD
17641.ppt
Basics of R-Progranmming with instata.ppt
introduction to R with example, Data science
ITB Term Paper - 10BM60066
Applied Statistics With R
Programming in R
R basics
R-Language-Lab-Manual-lab-1.pdf

More from TOSHI STATS Co.,Ltd. (6)

PDF
実践データ分析基礎
PDF
ビジネスマネージャとデータ分析
PDF
How to be data savvy manager
PDF
Introduction to credit risk management
PDF
Introduction to VaR
PDF
Basic of computational economics with MATLAB program
実践データ分析基礎
ビジネスマネージャとデータ分析
How to be data savvy manager
Introduction to credit risk management
Introduction to VaR
Basic of computational economics with MATLAB program

Recently uploaded (20)

PDF
Business Analytics and business intelligence.pdf
PPTX
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PPTX
Introduction to Knowledge Engineering Part 1
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PPTX
Qualitative Qantitative and Mixed Methods.pptx
PPTX
Introduction to machine learning and Linear Models
PPTX
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
PPTX
Computer network topology notes for revision
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PDF
Clinical guidelines as a resource for EBP(1).pdf
PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PPTX
Business Acumen Training GuidePresentation.pptx
PDF
.pdf is not working space design for the following data for the following dat...
PDF
Fluorescence-microscope_Botany_detailed content
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PPTX
Supervised vs unsupervised machine learning algorithms
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PPTX
IB Computer Science - Internal Assessment.pptx
Business Analytics and business intelligence.pdf
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
Introduction to Knowledge Engineering Part 1
Miokarditis (Inflamasi pada Otot Jantung)
Qualitative Qantitative and Mixed Methods.pptx
Introduction to machine learning and Linear Models
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
Computer network topology notes for revision
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
Clinical guidelines as a resource for EBP(1).pdf
Business Ppt On Nestle.pptx huunnnhhgfvu
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
Business Acumen Training GuidePresentation.pptx
.pdf is not working space design for the following data for the following dat...
Fluorescence-microscope_Botany_detailed content
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
Supervised vs unsupervised machine learning algorithms
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
IB Computer Science - Internal Assessment.pptx

Practical data analysis with wine

  • 1. Practical data analysis with wine   December 2014 Toshifumi Kuga CEO of TOSHI STATS SDN. BHD. beta version 1
  • 2. Today’s menu 1. formula for prediction of wine price 2. data handling (vector & matrix) 3. liner regression model with R 2
  • 3.   Formula for prediction of wine price is public • Dr. Orley Ashenfelter  • He is a professor of economics at Princeton University and was a president American Economic Association in 2011 • The formula was public in 1990 3 1. formula of price prediction http://www.liquidasset.com/winedata.html Data is available here
  • 4. Dr. Orley Ashenfelter’s formular wine price=-12.145+0.00117×amount of rain in winter+ 0.06163×average temperature- 0.00386×amount of rain in harvest+ 0.02385×years from 1983 • parameters:θ=[ -12.145, 0.00117, 0.06163, -0.00386, 0.02385 ] • input variables:X=[1, rain winter, average temp, rain harvest, years] • wine price:Y=θ0+θ1×X1+θ2×X2+θ3×X3+θ4×X4 • wine price can be represented as「Y=θX」 ※ ‘wine price’ : ratio of average price of the year against the average price of 1961, and take log of the ratio 4 1. formula of price prediction simplified in the explanation above
  • 5.    Step for prediction of wine price • wine price:Y=θX • Y : value to be predicted(Future wine price in this case、unknown value) • X : known value(temperature in the past are known now) • Parametersθis unknown     → Ifθis obtained 、future wine price Y can be obtained, too! • Y in the past is also known(wine price in the past is known)    → X and Y in the past are available as a set → θcan be obtained 5 1. formula of price prediction
  • 6.   Data used in the analysis OBS VINT Y:LPRICE2 X1:WRAIN X2:DEGREES X3:HRAIN X4:TIME_SV 1 1952 -0.99868 600 17.1167 160 31 2 1953 -0.4544 690 16.7333 80 30 3 1954 430 15.3833 180 29 4 1955 -0.80796 502 17.15 130 28 5 1956 440 15.65 140 27 … … … … … … … 35 1986 563 16.2833 171 -3 36 1987 452 16.9833 115 -4 37 1988 808 17.1 59 -5 38 1989 443 82 -6 Y X 6 1. formula of price prediction
  • 7.  How to obtainθ: Least square method • Compared predictions with observed value(value it the past), parametersθcan be obtained so that square of deferences can be minimize • There are programs (algorithms) that calculations automatically are executed in the computers • In practice, we rarely calculate parameters manually(In practice, it can not be solved manually) 7 1. formula of price prediction
  • 8. 1. formula of price prediction  Parameter calculations by computer 8 Value in the past Y Parameter calculation Price prediction model θ X Y=θX
  • 9.   θand X are not “ just a number ” • gathering of numbers • It can be represented as vectors and matrixes in math • Massive amount of data can be represented by vectors and matrices with ease! • Data can be handled as vectors and matrices in computers • Major program language, such as R, MATLAB, python can prepare vectors and matrices and control them effectively 9 1. formula of price prediction
  • 10. Be familiar with vectors and matrices! • You can handle data as you like • You can program it by yourself • First step for practical data analysis 10 1. formula of price prediction
  • 11. 2. Data handling(vector&matrix)  Math in high school is important ! • Arithmetic is mainly explained • No more than +, -, ×, / • Exercise manually until getting familiar with vectors and matrices • Let us verify the results by using R 11
  • 12. 2. Data handling(vector&matrix)    Vector : one line • either vertical or horizontal [1 3 7] [5 13 ] 12 ] ]5 1 b=c(5,13) d=c(1,5) a=c(1,3,7) Blue: Verify the results by R language
  • 13. 2. Data handling(vector&matrix)    vector : addition [1 3 7] + = 6 7 2 7 + = 13 ] ]5 1 [24 1] ] [3 7 8] ] ] ] a=c(1,3,7) b=c(2,4,1) a+b a=c(1,5) b=c(6,2) a+b
  • 14. 2. Data handling(vector&matrix)    vector : subtraction [1 4 7] - = 6 -5 2 3 - = 14 ] ]5 1 [23 1] ] [-11 6] ] ] ] a=c(1,4,7) b=c(2,3,1) a-b a=c(1,5) b=c(6,2) a-b
  • 15. 2. Data handling(vector&matrix)    vector : scalar multiplication 3 [24 1] 6 12 2 4 2 × = 15 ] [6 12 3] ] × = ] ] a=c(2,4,1) 3*a b=c(6,2) 2*b
  • 16. 2. Data handling(vector&matrix)  vector : multiplication (inner product)  × = 16 [24 1] ]3 6 32 2 ] a=c(2,4,1) b=c(3,6,2) 2×3+ 4×6+1×2 =32 a%*%b
  • 17. 2. Data handling(vector&matrix)    Matrix : rectangular shape a=matrix(c(1,3,2,4),2,2) 2×2 2×2 3×2 • dimension:number of rows × number of columns (m×n) 17
  • 18. 2. Data handling(vector&matrix)    Matrix : elements • elements (entries) first row first column:1 second row first column:3 first row and second column:2 second row and second column:4 18
  • 19. 2. Data handling(vector&matrix)    Matrix : addition + = 19 + =
  • 20. 2. Data handling(vector&matrix)    Matrix : subtraction ー= 20 ー =
  • 21. 2. Data handling(vector&matrix)   Matrix:scalar multiplication/division 21 = × = 2 × / 2 = 1/2
  • 22. 2. Data handling(vector&matrix)    Matrix : multiplication 22 × × = = a little complicated?
  • 23. 2. Data handling(vector&matrix)    Let us see it more details ! [× 52 49] [1 2] 3 4 [ a=matrix(c(1,3,2,4),2,2) b=matrix(c(2,5,9,4),2,2) 1×9+ 2×4 =17 3×9+ 4×4 =43]= 23 = a%*%b [ 17] 26 43 1×2+ 2×5 =12 12 3×2+ 4×5 =26
  • 24. 2. Data handling(vector&matrix)  Matrix multiplication : not commutative 24 × × × = × =
  • 25. 2. Data handling(vector&matrix)    vector : multiplication 2 [24] 25 ]3 ]6 × = [6 12] 12 24 a=matrix(c(3,6),2,1) b=c(2,4) a%*%b
  • 26. 2. Data handling(vector&matrix) identity matrix • Diagonal elements are 1 • Any other elements are 0 • In multiplication with identity matrix, nothing is changed × = × = 26 diag(2)
  • 27. inverse matrix • If A is m×m matrix and if A has an inverse matrix AA=AA=I I : identity matrix -1 -1 × = × = a=matrix(c(1,3,2,4),2,2) > a [,1] [,2] [1,] 1 2 [2,] 3 4 27 > inv=solve(a) > inv [,1] [,2] [1,] -2.0 1.0 [2,] 1.5 -0.5 -1= -1 -1 2. Data handling(vector&matrix)
  • 28. transpose matrix • exchange elements of row and column 28 a=matrix(c(1,3,2,4),2,2) t(a) = T 2. Data handling(vector&matrix)
  • 29.    Least squares estimation • Vector and matrix are used in programming least squares estimation • J = 1/(2*m) * T(X*θ-Y)*(X*θ-Y):cost function (Squared error function) • m : number of sample data • X is a matrix, Y is a vector、θ is a parameter vector • T( )means transpose matrix • θ can be obtained so that J is minimized ( deference between predictions 29 and real value can be minimized) → Least squares estimation 2. Data handling(vector&matrix)
  • 30.   analysis by liner regression model “lm” > wineprice=lm(LPRICE2~WRAIN+DEGREES+HRAIN+TIME_SV, data=wine) > wineprice input variables 30 3. Liner regression with R After lm, put a variable to be predicted 、then ”~” and input variables、data=name of data file > ▼▼▼=lm(◯◯◯~△△△+■■■, data=◎◎◎) > ▼▼▼ a variable to be predicted http://www.liquidasset.com/winedata.html Data is available here
  • 31. 3. Liner regression with R    Parameters can be obtained! • Call: • lm(formula = LPRICE2 ~ WRAIN + DEGREES + HRAIN + TIME_SV, Let us compare them with formula of 31 data =wine) prediction of wine price • Coefficients: • (Intercept) WRAIN DEGREES HRAIN TIME_SV • -12.145007 0.001167 0.616365 -0.003861 0.023850
  • 32. 32 RStudio see p38 3. Liner regression with R
  • 33. 33 3. Liner regression with R □ prediction ◯ real price predict(wineprice,data.frame(wine))
  • 34.   analyze data by functions automatically • By function ‘lm’, parameters can be obtained with one line command • There are a lot of of functions in R. we can analyze data by these functions without wring functions by ourselves. • However we should understand how calculations are done in functions broadly. Blackbox approach is not recommendedただし、 • More we can understand functions, better we can select the functions for particular cases to solve • Let us be familiar with ‘lm’. Then you can understand other functions with ease 34 3. Liner regression with R
  • 35. recommender systems • amazon.com and Netflix are famous for recommendations • a variety of recommendations • Recommend the most popular product →same recommendation for everyone • Recommend the best products for the individual customer →need for personalization method ! 35
  • 36. Personalization • example of method for personalized recommendations • θ:customers’ preference(click the products or not? provide the rating or not?) • X:items features(in the case of movies:holler? romance? SF?・Who is the director, actor, actress?・When and where is it created?) • Obtain probabilities based on θX by logistic regression model • If probability is high, recommendations of the item are provided to the customer 36
  • 37. Quandl:data source 37 • Over 10M data is available for free • Data can be downloaded directly to R、 MATLAB、python https://www.quandl.com
  • 38. Website of R and RStudio • R is a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-90005107-0 URL http://www.R-project.org • I prepare short movie about how to use R. http://www.toshistats.net/introduction-to-r-language/ • RStudio is one of the best IDE for R. http://www.rstudio.com/products/rstudio/download/ 38
  • 39. Thanks for your attentions • TOSHI STATS SDN. BHD, Digital-learning center for statistical computing in Asia • CEO : Toshifumi Kuga, Certified financial services auditor • Company website : www.toshistats.net • Company FB page : www.facebook.com/toshistatsco • Company blog : http://toshistats.wordpress.com/aboutme/ • Company blog is updated on AM 10:00 every Thursday and reports the latest information about data analysis ! Please look at this blog or Company website. 39
  • 40. Disclaimer • TOSHI STATS SDN. BHD. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithm or ideas contained herein, or acting or refraining from acting as a result of such use. TOSHI STATS SDN. BHD. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on TOSHI STATS SDN. BHD. and me to correct any errors or defects in the codes and the software © 2014 TOSHI STATS SDN. BHD. All rights reserved 40