Statistics Lab
Rodolfo Metulini
IMT Institute for Advanced Studies, Lucca, Italy
Introduction to R - 09.01.2014
Getting help with functions
To get more information on any specific named function, for
example solve, the command is
> help(solve)
An alternative is:
> ?solve
Running:
> help.start()
we will launch a Web browser that allows to enter to the help
home page.
The ?? command allows searching for help in a different way. For
example, it is usefull to get a help of non installed packages.
objects and saving data
The entities that R creates and manipulates are known as
objects.
During an R session, objects are created and stored by name.
> objects() can be used to display the name of the objects which
are currently stored within R.
> rm() can be used to remove objects.
At the end of each R session, you are given the opportunity to save
all the currently available objects. You can save the objects (the
workspace) in .RData format in the current directory. You also can
save command lines in .Rhistory format.
scalars and vectors: manipulation
To set up a vector named x, namely 1, 2, 3, 4 and 5, let use the R
command:
> x = c(1,2,3,4,5)
or, identically, the assign function could be used.
> assign(”x”, c(1,2,3,4,5))
x is a vector of length 5. To check it we can use the following
function:
> length(x)
>1/x gives the reciprocal of x.
> y = c(x,0,x) would create a vector with 11 entries consisting of
two copies of x with a 0 in the middle.
scalars and vectors: manipulation
Vectors can be used in arithmetic expressions.
Vector in the same expression need not all to be of the same
length. If not, the output value have the length of the longest
vector in the expression.
For example:
> v = 2 ∗ x + y + 1
generate a new vector of length 11 constructed by adding together,
element by element, 2*x repeated 2.2 times, y repeated just once,
and 1 repeated 11 times.
So, WARNING: R compute that kind of expression even if it is
wrongly defined.
scalars and vectors: manipulation - 2
In addition, are also available log, exp, sin, cos, tan, sqrt and, of
course, the classical arithmetic operators
min(x) and max(x) select the smallest and the largest element of
the vector.
sum(x) and prod(x) display the sum and the product, respectively,
of the numbers within the vector.
mean(x) calculates the sample (arithmetic) mean, wich is the same
of sum(x)/length(x); and var(x) gives the sample variance:
sum((x − mean(x))2)/(length(x) − 1)
sort(x) returns a vector of the same size of x, with the elements in
increasing order.
seq and rep
There are facilities to generate commonly used sequences of
numbers.
> 1:30 is the vector c(1,2, ..., 29,30)
> 2*1:15 is the vector c(2,4, ..., 28,30) of length 15.
In addition, seq() is in use. seq(2:10) is the same of the vector
2:10
by=, from=, to= are usefull command:
>seq(from= 30, to = 1)
>seq(-10, 10, by = 0.5)
rep() can be used for replicating and object.
> rep(x, times=5) > rep(x, each=5)
logical vectors
As well as numerical vectors, R allows manipulation of logical
quantities.
The elements of a logical vector can have the value TRUE, FALSE
and NA (”not available”)
Logical vectors are generated by conditions. Example:
> temp = x > 3
The logical operator are : <, <=, >=, ==, ! = for inequality. In
addition, if c1 and c2 are logical expressions, then c1c2 is the
intersection (”and”), c1|c2 is the union (”or”), and !c1 is the
negation of c1
missing Values
In some cases the components of a vector may not be completely
known: in this case we assign the value ”NA”
The function is.na(x) gives a logical vector of the same size as x
with value TRUE if the corresponding element in x is NA. > z =
c(1:3, NA); ind = is.na(z)
There is a second kind of ”missing” values that are produced by
numerical computation, the so-called Not a Number, NaN, values.
Example:
> 0/0
> Inf/Inf
index vectors: subsets of a vector
Subsets of the elements of a vector may be selected by appendix to
the name of the vector an index vector in square brackets.
1. A logical vector: Values corresponding to TRUE in the index
vector are selected: > y = x[!is.na(x)]
2. A vector of positive (negative) integer quantities: in this
case the values in the index vector must lie in the set
{1, 2, ..., length(x)}. In the second case the selected vales will
be excluded. > x[2:3]; x[-(2:3)]
3. A vector of character string: this is possible only after
applying a names to the objects.
> cars = c(1,2,3)
> names(cars)=c(”ferrari”,”lamborghini”,”bugatti”)
> pref = cars[c(”ferrari”,”bugatti”)]
Objects and attribute
To each object it is associated one (and only one) attribute (it’s
the reason why we called them ”atomic”)
The objects can be: numeric, logical, complex, character and
raw
Usefull commands: mode(), as.numeric(), is.numeric()
For example, create a numeric vector:
> z = 0:9
change it in character: > digits = as.character(z);
and coerce it in a numeric:> d = as.integer(digits)
d and z are the same!
arrays, matrices and data.frame
Vectors are the most important type of objects in R, but there are
several others. Between the others:
matrix: they are multidimensional generalizations of vectors
data.frame: matrix-like structures, but the column can be of
different types. This is used when we manage with both
numerical and categorical data.
How to transform a vector in matrix?
> v = 1:50
> dim(v) = c(10,5)
arrays, matrices and data.frame (2)
How to create by beginning a matrix?
> m = array(1:20, dim= c(4,5))
Subsetting a matrix or replacing a subset of a matrix with zeros?
Lets give a look to the examples in the codes.
matrix manipulation
The operator ÷ ∗ ÷ is used for the matrix moltiplication.
An nx1 or 1xn matrices are also valid matrices.
If for example, A and B are square matrix of the same size,
then:
> A * B
is the matrix of element by element products(it doesn’t work for
matrices with different dimension), and
> A ÷ ∗ ÷ t(B)
is the matrix product.
diag(A) return the elements in the main diagonal of A. ginv(A)
and t(A) return the inverse and the transposed matrix.
Ginv() require MASS package.
lists and data frames
An R list is an object consisting of an ordered collection of objects
known as its components.
Here is a simple example of how to make a list:
> Lst = list(name=”Rodolfo”, surname=”Metulini”, age =
”30”)
It is possible to concatenating two or more lists:
list.ABC = C(list.A, list.B, list.C)
A data.frame is a list with a specific class ”data.frame”.
We can convert a matrix object in a data.frame objects with the
command as.data.frame(matrix)
The Easiest way to create a data.frame object is by mean of
read.table () function.
reading data
Large data objects will usually be read as values from external files
rather than entered during an R session at the keyboard.
There are basically two similar commands to upload data.
1. read.table(): specific for .csv files.
2. read.delim(): specific for .txt files
Usefull commands:
sep = ” ”: to specify if data in the dataset are separated by ;, ., ,
or they are tab delimited.
header = TRUE: to specify that first row in the dataset refers to
variable names
moreover, read.dta() is used to upload data from STATA :)
distributions and co.
One convenient use of R is to provide a comprehensive set of
statistical tables. Functions are provided to evaluate the
comulative distribution P(X < x), the probability density function
and the quantile function (given q, the smallest x such that
P(X < x) > q), and to simulate from the distribution.
Here, by ”d” for the density , ”p” (pnorm, punif, pexp etc ..) for
the CDF, ”q” for the quantile function. and ”r” for
simulation.
Let empirically examine the distribution of a variable
(codes).
covar and concentration indices
The covariance and the correlation measure the degree at which
two variables change togheter
The correlation is a index [-1,1], the covariance is a pure number
(depends on the values assumed by the variables)
> Cov = cov(A,B) > Cor = corr(A,B)
We can also calculate the correlation netween A and B as
follow:
> CorAB = Cov / sqrt(Var(A)*Var(B))
Gini index: it is the most popular concentration index, we need to
install ineq package
Mode: the most frequent value within the distribution, we need to
install modeest package, mfv command
homeworks
For who of us is familiar with STATA, lets try to upload a .dta
file with read.dta() function.
Study the agreement with other distributions (exponential?
uniform? it is up to you) of eruption data.

More Related Content

PDF
Introduction to R
PDF
Matlab quickref
PPTX
R교육1
PDF
Introduction to Arrays in C
PPT
Chapter 3 ds
PDF
R reference card
PDF
6. R data structures
PDF
R short-refcard
Introduction to R
Matlab quickref
R교육1
Introduction to Arrays in C
Chapter 3 ds
R reference card
6. R data structures
R short-refcard

What's hot (18)

PDF
Matlab lec1
PDF
Commands list
PDF
Data transformation-cheatsheet
DOC
Lesson 4
PPTX
PDF
2 data structure in R
PPT
Arrays and structures
PDF
Data Wrangling with dplyr and tidyr Cheat Sheet
PDF
3 Data Structure in R
PPTX
Identifiers, keywords and types
PDF
A complete introduction on matlab and matlab's projects
PDF
PPTX
Array in c++
PDF
Programming with effects - Graham Hutton
PDF
R Reference Card for Data Mining
PPTX
PPTX
Row major and column major in 2 d
PPT
Array
Matlab lec1
Commands list
Data transformation-cheatsheet
Lesson 4
2 data structure in R
Arrays and structures
Data Wrangling with dplyr and tidyr Cheat Sheet
3 Data Structure in R
Identifiers, keywords and types
A complete introduction on matlab and matlab's projects
Array in c++
Programming with effects - Graham Hutton
R Reference Card for Data Mining
Row major and column major in 2 d
Array
Ad

Viewers also liked (7)

DOCX
PDF
Ad b 1702_metu_v2
PDF
The Worldwide Network of Virtual Water with Kriskogram
PDF
The linear regression model: Theory and Application
Ad b 1702_metu_v2
The Worldwide Network of Virtual Water with Kriskogram
The linear regression model: Theory and Application
Ad

Similar to Statistics lab 1 (20)

PDF
R Programming Reference Card
PDF
R command cheatsheet.pdf
PDF
@ R reference
PPT
WIDI ediot autis dongok part 1.ediot lu lemot lu setan lu
PPT
MatlabIntro.ppt
PPT
MatlabIntro.ppt
PPT
MatlabIntro.ppt
PPT
MatlabIntro.ppt
PPT
Matlab intro
PPT
MatlabIntro1234.ppt.....................
PDF
20170509 rand db_lesugent
PDF
Short Reference Card for R users.
PDF
Reference card for R
PPT
WIDI FREAK MANUSIA SETENGAH EDIOTDAN LEMBOT
PPT
Introduction to matlab
PPT
WIDI ediot autis dongok part 3.EDIOT LU LEMBOT LY
PPT
WIDI ediot autis dongok part 2.ediot lu lembot lu
PPTX
1. Introduction.pptx
PPTX
R language introduction
PPTX
Introduction to MATLAB Programming for Engineers
R Programming Reference Card
R command cheatsheet.pdf
@ R reference
WIDI ediot autis dongok part 1.ediot lu lemot lu setan lu
MatlabIntro.ppt
MatlabIntro.ppt
MatlabIntro.ppt
MatlabIntro.ppt
Matlab intro
MatlabIntro1234.ppt.....................
20170509 rand db_lesugent
Short Reference Card for R users.
Reference card for R
WIDI FREAK MANUSIA SETENGAH EDIOTDAN LEMBOT
Introduction to matlab
WIDI ediot autis dongok part 3.EDIOT LU LEMBOT LY
WIDI ediot autis dongok part 2.ediot lu lembot lu
1. Introduction.pptx
R language introduction
Introduction to MATLAB Programming for Engineers

More from University of Salerno (20)

PDF
Modelling traffic flows with gravity models and mobile phone large data
PDF
Regression models for panel data
PDF
Carpita metulini 111220_dssr_bari_version2
PDF
A strategy for the matching of mobile phone signals with census data
PDF
Detecting and classifying moments in basketball matches using sensor tracked ...
PDF
BASKETBALL SPATIAL PERFORMANCE INDICATORS
PDF
Human activity spatio-temporal indicators using mobile phone data
PDF
Poster venezia
PDF
Metulini280818 iasi
PDF
Players Movements and Team Performance
PDF
Big Data Analytics for Smart Cities
PDF
Meeting progetto ode_sm_rm
PDF
Metulini, R., Manisera, M., Zuccolotto, P. (2017), Sensor Analytics in Basket...
PDF
Metulini, R., Manisera, M., Zuccolotto, P. (2017), Space-Time Analysis of Mov...
PDF
Metulini1503
PDF
A Spatial Filtering Zero-Inflated approach to the estimation of the Gravity M...
PPT
The Water Suitcase of Migrants: Assessing Virtual Water Fluxes Associated to ...
PPT
The Global Virtual Water Network
PDF
Introduction to Bootstrap and elements of Markov Chains
Modelling traffic flows with gravity models and mobile phone large data
Regression models for panel data
Carpita metulini 111220_dssr_bari_version2
A strategy for the matching of mobile phone signals with census data
Detecting and classifying moments in basketball matches using sensor tracked ...
BASKETBALL SPATIAL PERFORMANCE INDICATORS
Human activity spatio-temporal indicators using mobile phone data
Poster venezia
Metulini280818 iasi
Players Movements and Team Performance
Big Data Analytics for Smart Cities
Meeting progetto ode_sm_rm
Metulini, R., Manisera, M., Zuccolotto, P. (2017), Sensor Analytics in Basket...
Metulini, R., Manisera, M., Zuccolotto, P. (2017), Space-Time Analysis of Mov...
Metulini1503
A Spatial Filtering Zero-Inflated approach to the estimation of the Gravity M...
The Water Suitcase of Migrants: Assessing Virtual Water Fluxes Associated to ...
The Global Virtual Water Network
Introduction to Bootstrap and elements of Markov Chains

Recently uploaded (20)

PDF
FOISHS ANNUAL IMPLEMENTATION PLAN 2025.pdf
PPTX
Onco Emergencies - Spinal cord compression Superior vena cava syndrome Febr...
PPTX
A powerpoint presentation on the Revised K-10 Science Shaping Paper
PPTX
History, Philosophy and sociology of education (1).pptx
PPTX
Virtual and Augmented Reality in Current Scenario
DOC
Soft-furnishing-By-Architect-A.F.M.Mohiuddin-Akhand.doc
PDF
Complications of Minimal Access-Surgery.pdf
PDF
David L Page_DCI Research Study Journey_how Methodology can inform one's prac...
PDF
ChatGPT for Dummies - Pam Baker Ccesa007.pdf
PDF
Environmental Education MCQ BD2EE - Share Source.pdf
PPTX
Share_Module_2_Power_conflict_and_negotiation.pptx
PDF
Trump Administration's workforce development strategy
PDF
AI-driven educational solutions for real-life interventions in the Philippine...
PDF
What if we spent less time fighting change, and more time building what’s rig...
PDF
1.3 FINAL REVISED K-10 PE and Health CG 2023 Grades 4-10 (1).pdf
PDF
Paper A Mock Exam 9_ Attempt review.pdf.
PDF
احياء السادس العلمي - الفصل الثالث (التكاثر) منهج متميزين/كلية بغداد/موهوبين
PDF
CISA (Certified Information Systems Auditor) Domain-Wise Summary.pdf
PDF
A GUIDE TO GENETICS FOR UNDERGRADUATE MEDICAL STUDENTS
PPTX
202450812 BayCHI UCSC-SV 20250812 v17.pptx
FOISHS ANNUAL IMPLEMENTATION PLAN 2025.pdf
Onco Emergencies - Spinal cord compression Superior vena cava syndrome Febr...
A powerpoint presentation on the Revised K-10 Science Shaping Paper
History, Philosophy and sociology of education (1).pptx
Virtual and Augmented Reality in Current Scenario
Soft-furnishing-By-Architect-A.F.M.Mohiuddin-Akhand.doc
Complications of Minimal Access-Surgery.pdf
David L Page_DCI Research Study Journey_how Methodology can inform one's prac...
ChatGPT for Dummies - Pam Baker Ccesa007.pdf
Environmental Education MCQ BD2EE - Share Source.pdf
Share_Module_2_Power_conflict_and_negotiation.pptx
Trump Administration's workforce development strategy
AI-driven educational solutions for real-life interventions in the Philippine...
What if we spent less time fighting change, and more time building what’s rig...
1.3 FINAL REVISED K-10 PE and Health CG 2023 Grades 4-10 (1).pdf
Paper A Mock Exam 9_ Attempt review.pdf.
احياء السادس العلمي - الفصل الثالث (التكاثر) منهج متميزين/كلية بغداد/موهوبين
CISA (Certified Information Systems Auditor) Domain-Wise Summary.pdf
A GUIDE TO GENETICS FOR UNDERGRADUATE MEDICAL STUDENTS
202450812 BayCHI UCSC-SV 20250812 v17.pptx

Statistics lab 1

  • 1. Statistics Lab Rodolfo Metulini IMT Institute for Advanced Studies, Lucca, Italy Introduction to R - 09.01.2014
  • 2. Getting help with functions To get more information on any specific named function, for example solve, the command is > help(solve) An alternative is: > ?solve Running: > help.start() we will launch a Web browser that allows to enter to the help home page. The ?? command allows searching for help in a different way. For example, it is usefull to get a help of non installed packages.
  • 3. objects and saving data The entities that R creates and manipulates are known as objects. During an R session, objects are created and stored by name. > objects() can be used to display the name of the objects which are currently stored within R. > rm() can be used to remove objects. At the end of each R session, you are given the opportunity to save all the currently available objects. You can save the objects (the workspace) in .RData format in the current directory. You also can save command lines in .Rhistory format.
  • 4. scalars and vectors: manipulation To set up a vector named x, namely 1, 2, 3, 4 and 5, let use the R command: > x = c(1,2,3,4,5) or, identically, the assign function could be used. > assign(”x”, c(1,2,3,4,5)) x is a vector of length 5. To check it we can use the following function: > length(x) >1/x gives the reciprocal of x. > y = c(x,0,x) would create a vector with 11 entries consisting of two copies of x with a 0 in the middle.
  • 5. scalars and vectors: manipulation Vectors can be used in arithmetic expressions. Vector in the same expression need not all to be of the same length. If not, the output value have the length of the longest vector in the expression. For example: > v = 2 ∗ x + y + 1 generate a new vector of length 11 constructed by adding together, element by element, 2*x repeated 2.2 times, y repeated just once, and 1 repeated 11 times. So, WARNING: R compute that kind of expression even if it is wrongly defined.
  • 6. scalars and vectors: manipulation - 2 In addition, are also available log, exp, sin, cos, tan, sqrt and, of course, the classical arithmetic operators min(x) and max(x) select the smallest and the largest element of the vector. sum(x) and prod(x) display the sum and the product, respectively, of the numbers within the vector. mean(x) calculates the sample (arithmetic) mean, wich is the same of sum(x)/length(x); and var(x) gives the sample variance: sum((x − mean(x))2)/(length(x) − 1) sort(x) returns a vector of the same size of x, with the elements in increasing order.
  • 7. seq and rep There are facilities to generate commonly used sequences of numbers. > 1:30 is the vector c(1,2, ..., 29,30) > 2*1:15 is the vector c(2,4, ..., 28,30) of length 15. In addition, seq() is in use. seq(2:10) is the same of the vector 2:10 by=, from=, to= are usefull command: >seq(from= 30, to = 1) >seq(-10, 10, by = 0.5) rep() can be used for replicating and object. > rep(x, times=5) > rep(x, each=5)
  • 8. logical vectors As well as numerical vectors, R allows manipulation of logical quantities. The elements of a logical vector can have the value TRUE, FALSE and NA (”not available”) Logical vectors are generated by conditions. Example: > temp = x > 3 The logical operator are : <, <=, >=, ==, ! = for inequality. In addition, if c1 and c2 are logical expressions, then c1c2 is the intersection (”and”), c1|c2 is the union (”or”), and !c1 is the negation of c1
  • 9. missing Values In some cases the components of a vector may not be completely known: in this case we assign the value ”NA” The function is.na(x) gives a logical vector of the same size as x with value TRUE if the corresponding element in x is NA. > z = c(1:3, NA); ind = is.na(z) There is a second kind of ”missing” values that are produced by numerical computation, the so-called Not a Number, NaN, values. Example: > 0/0 > Inf/Inf
  • 10. index vectors: subsets of a vector Subsets of the elements of a vector may be selected by appendix to the name of the vector an index vector in square brackets. 1. A logical vector: Values corresponding to TRUE in the index vector are selected: > y = x[!is.na(x)] 2. A vector of positive (negative) integer quantities: in this case the values in the index vector must lie in the set {1, 2, ..., length(x)}. In the second case the selected vales will be excluded. > x[2:3]; x[-(2:3)] 3. A vector of character string: this is possible only after applying a names to the objects. > cars = c(1,2,3) > names(cars)=c(”ferrari”,”lamborghini”,”bugatti”) > pref = cars[c(”ferrari”,”bugatti”)]
  • 11. Objects and attribute To each object it is associated one (and only one) attribute (it’s the reason why we called them ”atomic”) The objects can be: numeric, logical, complex, character and raw Usefull commands: mode(), as.numeric(), is.numeric() For example, create a numeric vector: > z = 0:9 change it in character: > digits = as.character(z); and coerce it in a numeric:> d = as.integer(digits) d and z are the same!
  • 12. arrays, matrices and data.frame Vectors are the most important type of objects in R, but there are several others. Between the others: matrix: they are multidimensional generalizations of vectors data.frame: matrix-like structures, but the column can be of different types. This is used when we manage with both numerical and categorical data. How to transform a vector in matrix? > v = 1:50 > dim(v) = c(10,5)
  • 13. arrays, matrices and data.frame (2) How to create by beginning a matrix? > m = array(1:20, dim= c(4,5)) Subsetting a matrix or replacing a subset of a matrix with zeros? Lets give a look to the examples in the codes.
  • 14. matrix manipulation The operator ÷ ∗ ÷ is used for the matrix moltiplication. An nx1 or 1xn matrices are also valid matrices. If for example, A and B are square matrix of the same size, then: > A * B is the matrix of element by element products(it doesn’t work for matrices with different dimension), and > A ÷ ∗ ÷ t(B) is the matrix product. diag(A) return the elements in the main diagonal of A. ginv(A) and t(A) return the inverse and the transposed matrix. Ginv() require MASS package.
  • 15. lists and data frames An R list is an object consisting of an ordered collection of objects known as its components. Here is a simple example of how to make a list: > Lst = list(name=”Rodolfo”, surname=”Metulini”, age = ”30”) It is possible to concatenating two or more lists: list.ABC = C(list.A, list.B, list.C) A data.frame is a list with a specific class ”data.frame”. We can convert a matrix object in a data.frame objects with the command as.data.frame(matrix) The Easiest way to create a data.frame object is by mean of read.table () function.
  • 16. reading data Large data objects will usually be read as values from external files rather than entered during an R session at the keyboard. There are basically two similar commands to upload data. 1. read.table(): specific for .csv files. 2. read.delim(): specific for .txt files Usefull commands: sep = ” ”: to specify if data in the dataset are separated by ;, ., , or they are tab delimited. header = TRUE: to specify that first row in the dataset refers to variable names moreover, read.dta() is used to upload data from STATA :)
  • 17. distributions and co. One convenient use of R is to provide a comprehensive set of statistical tables. Functions are provided to evaluate the comulative distribution P(X < x), the probability density function and the quantile function (given q, the smallest x such that P(X < x) > q), and to simulate from the distribution. Here, by ”d” for the density , ”p” (pnorm, punif, pexp etc ..) for the CDF, ”q” for the quantile function. and ”r” for simulation. Let empirically examine the distribution of a variable (codes).
  • 18. covar and concentration indices The covariance and the correlation measure the degree at which two variables change togheter The correlation is a index [-1,1], the covariance is a pure number (depends on the values assumed by the variables) > Cov = cov(A,B) > Cor = corr(A,B) We can also calculate the correlation netween A and B as follow: > CorAB = Cov / sqrt(Var(A)*Var(B)) Gini index: it is the most popular concentration index, we need to install ineq package Mode: the most frequent value within the distribution, we need to install modeest package, mfv command
  • 19. homeworks For who of us is familiar with STATA, lets try to upload a .dta file with read.dta() function. Study the agreement with other distributions (exponential? uniform? it is up to you) of eruption data.