SlideShare a Scribd company logo
Getting Started - R Console.

Data types and Structures.

Exploring and Visualizing Data.

Programming Structures and Data Relationships.

Introduction to Data Analysis using R
Eslam Montaser Roushdi
Facultad de Inform´tica
a
Universidad Complutense de Madrid
Grupo G-Tec UCM
www.tecnologiaUCM.es

February, 2014
Getting Started - R Console.

Data types and Structures.

Exploring and Visualizing Data.

Programming Structures and Data Relationships.

Our aim

Study and describe in depth analysis of Big Data by using the R program
and learn how to explore datasets to extract insight.
Getting Started - R Console.

Data types and Structures.

Exploring and Visualizing Data.

Outlines:

1

Getting Started - R Console.

2

Data types and Structures.

3

Exploring and Visualizing Data.

4

Programming Structures and Data Relationships.

Programming Structures and Data Relationships.
Getting Started - R Console.

Data types and Structures.

Exploring and Visualizing Data.

Programming Structures and Data Relationships.

1)Getting Started - R Console.

R program: is a free software environment for data analysis and graphics.
R program:
i) Programming language. ii) Data analysis tool.
R is used across many industries such as healthcare, retail, and financial
services.
R can be used to analyze both structured and unstructured datasets.
R can help you explore a new dataset and perform descriptive analysis.
Getting Started - R Console.

Data types and Structures.

1) Getting Started - R Console.

Exploring and Visualizing Data.

Programming Structures and Data Relationships.
Getting Started - R Console.

Data types and Structures.

Exploring and Visualizing Data.

2) Data types and Structures.
i) Data types.
numeric, logical, and character data types.

Programming Structures and Data Relationships.
Getting Started - R Console.

Data types and Structures.

Exploring and Visualizing Data.

2) Data types and Structures.
ii) Data structures.
Vector.
List.
Multi-Dimensional ( Matrix/Array - Data frame).

Programming Structures and Data Relationships.
Getting Started - R Console.

Data types and Structures.

2) Data types and Structures.

Exploring and Visualizing Data.

Programming Structures and Data Relationships.
Getting Started - R Console.

Data types and Structures.

2) Data types and Structures.

Exploring and Visualizing Data.

Programming Structures and Data Relationships.
Getting Started - R Console.

Data types and Structures.

2) Data types and Structures.

Exploring and Visualizing Data.

Programming Structures and Data Relationships.
Getting Started - R Console.

Data types and Structures.

2) Data types and Structures.

Exploring and Visualizing Data.

Programming Structures and Data Relationships.
Getting Started - R Console.

Data types and Structures.

2) Data types and Structures.

Exploring and Visualizing Data.

Programming Structures and Data Relationships.
Getting Started - R Console.

Data types and Structures.

Exploring and Visualizing Data.

Programming Structures and Data Relationships.

2) Data types and Structures.
Note that
Adding columns of data.
df1 <- cbind (df1, The new column).
Adding rows of data.
df1 <- rbind (df1, The new row).
Missing Data
Large datasets often have missing data.
Most R functions can handle.
> ages <- c (23, 45, NA)
> mean(ages)
[1] NA
> mean(ages, na.rm=TRUE)
[1] 34
Where, NA is a logical constant of length 1 which contains a missing
value indicator.
Getting Started - R Console.

Data types and Structures.

Exploring and Visualizing Data.

Programming Structures and Data Relationships.

3) Exploring and Visualizing Data.

Importing and Exporting data.
Filtering/Subsets.
Sorting.
Visulization/Analysis data.
How to import external data from files into R?
Reding Data from text files:
Multiple functions to read in data from text files.
Types of Data formats.
- Delimited.
- positional.
Getting Started - R Console.

Data types and Structures.

Exploring and Visualizing Data.

Programming Structures and Data Relationships.

3) Exploring and Visualizing Data.
Reading external data into R
Delimited files
R includes a family of functions for importing delimited text files into R, based
on the read.table function:
read.table(file, header, sep = , quote = , dec = , row.names, col.names,
as.is = , na.strings , colClasses , nrows =, skip = , check.names = ,
fill = , strip.white = , blank.lines.skip = , comment.char = ,
allowEscapes = , flush = , stringsAsFactors = , encoding = )
For example
name.last,name.first,team,position,salary
”Manning”,”Peyton”,”Colts”,”QB”,18700000
”Brady”,”Tom”,”Patriots”,”QB”,14626720
”Pepper”,”Julius”,”Panthers”,”DE”,14137500
”Palmer”,”Carson”,”Bengals”,”QB”,13980000
”Manning”,”Eli”,”Giants”,”QB”,12916666
Getting Started - R Console.

Data types and Structures.

Exploring and Visualizing Data.

Programming Structures and Data Relationships.

3) Exploring and Visualizing Data.
Note that
The first row contains the column names.
Each text field is encapsulated in quotes.
Each field is separated by commas.
How to load this file into R
the first row contained column names (header=TRUE), that the delimiter
was a comma (sep=”,”), and that quotes were used to encapsulate text
(quote=”””).
The R statement that loads in this file:
> top.5.salaries <- read.table(”top.5.salaries.csv”,
+ header=TRUE,
+ sep=”,”,
+ quote=”””)
Getting Started - R Console.

Data types and Structures.

Exploring and Visualizing Data.

Programming Structures and Data Relationships.

3) Exploring and Visualizing Data.

Fixed-width files
To read a fixed-width format text file into a data frame, you can use the
read.fwf function:
read.fwf(file, widths, header = , sep = , skip = , row.names, col.names,
n = , buffersize = ,. . .)

Note that
read.fwf can also take many arguments used by read.table, including as.is,
na.strings, colClasses, and strip.white.
Getting Started - R Console.

Data types and Structures.

Exploring and Visualizing Data.

3) Exploring and Visualizing Data.
Let’s explore a public data using R.

Programming Structures and Data Relationships.
Getting Started - R Console.

Data types and Structures.

Exploring and Visualizing Data.

3) Exploring and Visualizing Data.

Programming Structures and Data Relationships.
Getting Started - R Console.

Data types and Structures.

Exploring and Visualizing Data.

3) Exploring and Visualizing Data.

Programming Structures and Data Relationships.
Getting Started - R Console.

Data types and Structures.

Exploring and Visualizing Data.

3) Exploring and Visualizing Data.

Programming Structures and Data Relationships.
Getting Started - R Console.

Data types and Structures.

Exploring and Visualizing Data.

3) Exploring and Visualizing Data.

Programming Structures and Data Relationships.
Getting Started - R Console.

Data types and Structures.

Exploring and Visualizing Data.

Programming Structures and Data Relationships.

3) Exploring and Visualizing Data.

Now let’s visualize trends in our data using Data Visualizations or graphics
Getting Started - R Console.

Data types and Structures.

Exploring and Visualizing Data.

3) Exploring and Visualizing Data.

Programming Structures and Data Relationships.
Getting Started - R Console.

Data types and Structures.

Exploring and Visualizing Data.

3) Exploring and Visualizing Data.

Programming Structures and Data Relationships.
Getting Started - R Console.

Data types and Structures.

Exploring and Visualizing Data.

3) Exploring and Visualizing Data.

Programming Structures and Data Relationships.
Getting Started - R Console.

Data types and Structures.

Exploring and Visualizing Data.

Programming Structures and Data Relationships.

4) Programming Structures and Data Relationships.
Getting Started - R Console.

Data types and Structures.

Exploring and Visualizing Data.

Programming Structures and Data Relationships.

4) Programming Structures and Data Relationships.
Let’s examine decision making in R
Getting Started - R Console.

Data types and Structures.

Exploring and Visualizing Data.

Programming Structures and Data Relationships.

4) Programming Structures and Data Relationships.
Getting Started - R Console.

Data types and Structures.

Exploring and Visualizing Data.

Programming Structures and Data Relationships.

4) Programming Structures and Data Relationships.
Getting Started - R Console.

Data types and Structures.

Exploring and Visualizing Data.

Programming Structures and Data Relationships.

4) Programming Structures and Data Relationships.
Getting Started - R Console.

Data types and Structures.

Exploring and Visualizing Data.

Programming Structures and Data Relationships.

4) Programming Structures and Data Relationships.
Functions - Example
> f1 <- function(a,b) { return(a+b) }
> f2 <- function(a,b) { return(a-b) }
> f <- f1
> f(3,8)
[1] 11
> f <- f2
> f(5,4)
[1] 1
The apply family of functions
apply() can apply a function to elements of a matrix or an array.
lapply() applies a function to each column of a dataframe and returns a
list.
sapply() is similar but the output is simplified. It may be a vector or a
matrix depending on the function.
tapply() applies the function for each level of a factor.
Getting Started - R Console.

Data types and Structures.

Exploring and Visualizing Data.

Programming Structures and Data Relationships.

4) Programming Structures and Data Relationships.
Getting Started - R Console.

Data types and Structures.

Exploring and Visualizing Data.

Programming Structures and Data Relationships.

4) Programming Structures and Data Relationships.

Common useful built-in functions
all()

#returns TRUE if all values are TRUE.

any()
args()
cat()

# returns TRUE if any values are TRUE.
# information on the arguments to a function.
# prints multiple objects, one after the other.

cumprod()

# cumulative product.

cumsum()

# cumulative sum.

mean()

# mean of the elements of a vector.

median() # median of the elements of a vector.
order()

# prints a single R object.
Getting Started - R Console.

Data types and Structures.

Exploring and Visualizing Data.

Programming Structures and Data Relationships.

4) Programming Structures and Data Relationships.
Getting Started - R Console.

Data types and Structures.

Exploring and Visualizing Data.

Programming Structures and Data Relationships.

4) Programming Structures and Data Relationships.
Getting Started - R Console.

Data types and Structures.

Exploring and Visualizing Data.

Programming Structures and Data Relationships.

4) Programming Structures and Data Relationships.
Getting Started - R Console.

Data types and Structures.

Exploring and Visualizing Data.

Programming Structures and Data Relationships.

4) Programming Structures and Data Relationships.
Getting Started - R Console.

Data types and Structures.

Exploring and Visualizing Data.

Programming Structures and Data Relationships.

4) Programming Structures and Data Relationships.
Getting Started - R Console.

Data types and Structures.

Exploring and Visualizing Data.

Thanks!!

Programming Structures and Data Relationships.
Getting Started - R Console.

Data types and Structures.

Exploring and Visualizing Data.

Programming Structures and Data Relationships.

References

Grant Hutchison, Introduction to Data Analysis using R, October 2013.
John Maindonald, W. John Braun, Data Analysis and Graphics Using R:
An Example-Based Approach (Cambridge Series in Statistical and
Probabilistic Mathematics), Third Edition, Cambridge University Press
2003.
Nicholas J. Horton, Ken Kleinman, Using R for Data Management,
Statistical Analysis, and Graphics, CRC Press, 2010.

More Related Content

PPTX
Data analysis with R
PPTX
Step By Step Guide to Learn R
PPT
R programming slides
PDF
Cox model
PPT
المخلفات الصلبة.ppt
PPTX
Unit 1 - R Programming (Part 2).pptx
PDF
Multivariate Analysis
PPTX
Market Segmentation
Data analysis with R
Step By Step Guide to Learn R
R programming slides
Cox model
المخلفات الصلبة.ppt
Unit 1 - R Programming (Part 2).pptx
Multivariate Analysis
Market Segmentation

What's hot (20)

PDF
Exploratory data analysis data visualization
PPTX
Data analytics with R
PPTX
Data visualization using R
PPTX
Data Wrangling
PDF
Introduction on Data Science
PPTX
Exploratory data analysis with Python
PPTX
2. R-basics, Vectors, Arrays, Matrices, Factors
PDF
Statistics For Data Science | Statistics Using R Programming Language | Hypot...
PPTX
Introduction to R
PPTX
R programming presentation
PPTX
Exploring Data
PDF
Introduction to R Programming
PPTX
Data Visualization & Analytics.pptx
PPTX
Introduction to Data Science
PPTX
Data Management in R
PDF
Introduction to R Graphics with ggplot2
PPT
Data models
PDF
Statistics And Probability Tutorial | Statistics And Probability for Data Sci...
PPT
Introduction To Predictive Analytics Part I
PPTX
Python Seaborn Data Visualization
Exploratory data analysis data visualization
Data analytics with R
Data visualization using R
Data Wrangling
Introduction on Data Science
Exploratory data analysis with Python
2. R-basics, Vectors, Arrays, Matrices, Factors
Statistics For Data Science | Statistics Using R Programming Language | Hypot...
Introduction to R
R programming presentation
Exploring Data
Introduction to R Programming
Data Visualization & Analytics.pptx
Introduction to Data Science
Data Management in R
Introduction to R Graphics with ggplot2
Data models
Statistics And Probability Tutorial | Statistics And Probability for Data Sci...
Introduction To Predictive Analytics Part I
Python Seaborn Data Visualization
Ad

Viewers also liked (18)

PDF
Facebook data analysis using r
PPTX
Basic data analysis using R.
PPT
Building ITIL Training &amp; Communication Plans
PDF
Analysis of massive data using R (CAEPIA2015)
PPT
Scalable Data Analysis in R -- Lee Edlefsen
PDF
Using R for Analyzing Loans, Portfolios and Risk: From Academic Theory to Fi...
PDF
5 Ways ITSM can Support DevOps, an ITSM Academy Webinar
PPTX
Data and donuts: Data Visualization using R
PPT
R Spatial Analysis using SP
PPTX
How to get started with R programming
PPTX
PDF
Class ppt intro to r
PPTX
R programming
PDF
Self-Organising Maps for Customer Segmentation using R - Shane Lynn - Dublin R
PPTX
Data analysis powerpoint
PDF
R learning by examples
PDF
Iris data analysis example in R
PDF
Text Mining with R -- an Analysis of Twitter Data
Facebook data analysis using r
Basic data analysis using R.
Building ITIL Training &amp; Communication Plans
Analysis of massive data using R (CAEPIA2015)
Scalable Data Analysis in R -- Lee Edlefsen
Using R for Analyzing Loans, Portfolios and Risk: From Academic Theory to Fi...
5 Ways ITSM can Support DevOps, an ITSM Academy Webinar
Data and donuts: Data Visualization using R
R Spatial Analysis using SP
How to get started with R programming
Class ppt intro to r
R programming
Self-Organising Maps for Customer Segmentation using R - Shane Lynn - Dublin R
Data analysis powerpoint
R learning by examples
Iris data analysis example in R
Text Mining with R -- an Analysis of Twitter Data
Ad

Similar to Introduction to data analysis using R (20)

PPTX
Data Analytics with R and SQL Server
PPTX
Get started with R lang
PDF
Data-Structure-original-QuantumSupply.pdf
PPTX
Unit 2 - Data Manipulation with R.pptx
PPT
Introduction to r language programming.ppt
PDF
Data Science as a Career and Intro to R
PPSX
DISE - Database Concepts
PDF
Bridging data analysis and interactive visualization
PPTX
4)12th_L-1_PYTHON-PANDAS-I.pptx
DOCX
Algorithms and Data Structures~hmftj
PDF
R programming & Machine Learning
PDF
IRJET- Data Retrieval using Master Resource Description Framework
PPTX
Lecture 1.pptxffffffffffffffcfffffffffff
PPTX
Python data structures - best in class for data analysis
PDF
EFFICIENT SCHEMA BASED KEYWORD SEARCH IN RELATIONAL DATABASES
PDF
Data Structures for Robotic Learning
PDF
SE-IT DSA THEORY SYLLABUS
PPTX
Week 1
PDF
Data Structure and its Fundamentals
PPT
R-Programming.ppt it is based on R programming language
Data Analytics with R and SQL Server
Get started with R lang
Data-Structure-original-QuantumSupply.pdf
Unit 2 - Data Manipulation with R.pptx
Introduction to r language programming.ppt
Data Science as a Career and Intro to R
DISE - Database Concepts
Bridging data analysis and interactive visualization
4)12th_L-1_PYTHON-PANDAS-I.pptx
Algorithms and Data Structures~hmftj
R programming & Machine Learning
IRJET- Data Retrieval using Master Resource Description Framework
Lecture 1.pptxffffffffffffffcfffffffffff
Python data structures - best in class for data analysis
EFFICIENT SCHEMA BASED KEYWORD SEARCH IN RELATIONAL DATABASES
Data Structures for Robotic Learning
SE-IT DSA THEORY SYLLABUS
Week 1
Data Structure and its Fundamentals
R-Programming.ppt it is based on R programming language

More from Victoria López (20)

PPTX
Alan turing uva-presentationdec-2019
PDF
Seminar UvA 2018- socialbigdata
PDF
Jornada leiden short
PDF
BIG DATA EN CIENCIAS DE LA SALUD Y CIENCIAS SOCIALES
PDF
ICCES'2016 BIG DATA IN HEALTHCARE AND SOCIAL SCIENCES
PDF
Presentación Gupo G-TeC en Social Big Data
PPSX
Big data systems and analytics
PPSX
Big Data. Complejidad,algoritmos y su procesamiento
PPTX
APLICACIÓN DE TÉCNICAS DE OPTIMIZACIÓN Y BIG DATA AL PROBLEMA DE BÚSQUEDA...
PPSX
G te c sesion1a-bioinformatica y big data
PPSX
G te c sesion1b-casos de uso
PPSX
G te c sesion2a-data collection
PPSX
G tec sesion2b-host-cloud y cloudcomputing
PPSX
G te c sesion3a-bases de datos modernas
PPSX
G te c sesion3b- mapreduce
PPSX
G te c sesion4a-bigdatasystemsanalytics
PPSX
G te c sesion4b-complejidad y tpa
PDF
Open Data para Smartcity-Facultad de Estudios Estadísticos
PDF
Deep Learning + R by Gabriel Valverde
PPSX
Fortune Time Institute: Big Data - Challenges for Smartcity
Alan turing uva-presentationdec-2019
Seminar UvA 2018- socialbigdata
Jornada leiden short
BIG DATA EN CIENCIAS DE LA SALUD Y CIENCIAS SOCIALES
ICCES'2016 BIG DATA IN HEALTHCARE AND SOCIAL SCIENCES
Presentación Gupo G-TeC en Social Big Data
Big data systems and analytics
Big Data. Complejidad,algoritmos y su procesamiento
APLICACIÓN DE TÉCNICAS DE OPTIMIZACIÓN Y BIG DATA AL PROBLEMA DE BÚSQUEDA...
G te c sesion1a-bioinformatica y big data
G te c sesion1b-casos de uso
G te c sesion2a-data collection
G tec sesion2b-host-cloud y cloudcomputing
G te c sesion3a-bases de datos modernas
G te c sesion3b- mapreduce
G te c sesion4a-bigdatasystemsanalytics
G te c sesion4b-complejidad y tpa
Open Data para Smartcity-Facultad de Estudios Estadísticos
Deep Learning + R by Gabriel Valverde
Fortune Time Institute: Big Data - Challenges for Smartcity

Recently uploaded (20)

PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PPT
Teaching material agriculture food technology
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Network Security Unit 5.pdf for BCA BBA.
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PPTX
MYSQL Presentation for SQL database connectivity
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Encapsulation_ Review paper, used for researhc scholars
PPTX
Programs and apps: productivity, graphics, security and other tools
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Machine learning based COVID-19 study performance prediction
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Teaching material agriculture food technology
The AUB Centre for AI in Media Proposal.docx
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Review of recent advances in non-invasive hemoglobin estimation
Network Security Unit 5.pdf for BCA BBA.
20250228 LYD VKU AI Blended-Learning.pptx
Advanced methodologies resolving dimensionality complications for autism neur...
MYSQL Presentation for SQL database connectivity
The Rise and Fall of 3GPP – Time for a Sabbatical?
NewMind AI Weekly Chronicles - August'25 Week I
Understanding_Digital_Forensics_Presentation.pptx
Encapsulation_ Review paper, used for researhc scholars
Programs and apps: productivity, graphics, security and other tools
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Machine learning based COVID-19 study performance prediction

Introduction to data analysis using R

  • 1. Getting Started - R Console. Data types and Structures. Exploring and Visualizing Data. Programming Structures and Data Relationships. Introduction to Data Analysis using R Eslam Montaser Roushdi Facultad de Inform´tica a Universidad Complutense de Madrid Grupo G-Tec UCM www.tecnologiaUCM.es February, 2014
  • 2. Getting Started - R Console. Data types and Structures. Exploring and Visualizing Data. Programming Structures and Data Relationships. Our aim Study and describe in depth analysis of Big Data by using the R program and learn how to explore datasets to extract insight.
  • 3. Getting Started - R Console. Data types and Structures. Exploring and Visualizing Data. Outlines: 1 Getting Started - R Console. 2 Data types and Structures. 3 Exploring and Visualizing Data. 4 Programming Structures and Data Relationships. Programming Structures and Data Relationships.
  • 4. Getting Started - R Console. Data types and Structures. Exploring and Visualizing Data. Programming Structures and Data Relationships. 1)Getting Started - R Console. R program: is a free software environment for data analysis and graphics. R program: i) Programming language. ii) Data analysis tool. R is used across many industries such as healthcare, retail, and financial services. R can be used to analyze both structured and unstructured datasets. R can help you explore a new dataset and perform descriptive analysis.
  • 5. Getting Started - R Console. Data types and Structures. 1) Getting Started - R Console. Exploring and Visualizing Data. Programming Structures and Data Relationships.
  • 6. Getting Started - R Console. Data types and Structures. Exploring and Visualizing Data. 2) Data types and Structures. i) Data types. numeric, logical, and character data types. Programming Structures and Data Relationships.
  • 7. Getting Started - R Console. Data types and Structures. Exploring and Visualizing Data. 2) Data types and Structures. ii) Data structures. Vector. List. Multi-Dimensional ( Matrix/Array - Data frame). Programming Structures and Data Relationships.
  • 8. Getting Started - R Console. Data types and Structures. 2) Data types and Structures. Exploring and Visualizing Data. Programming Structures and Data Relationships.
  • 9. Getting Started - R Console. Data types and Structures. 2) Data types and Structures. Exploring and Visualizing Data. Programming Structures and Data Relationships.
  • 10. Getting Started - R Console. Data types and Structures. 2) Data types and Structures. Exploring and Visualizing Data. Programming Structures and Data Relationships.
  • 11. Getting Started - R Console. Data types and Structures. 2) Data types and Structures. Exploring and Visualizing Data. Programming Structures and Data Relationships.
  • 12. Getting Started - R Console. Data types and Structures. 2) Data types and Structures. Exploring and Visualizing Data. Programming Structures and Data Relationships.
  • 13. Getting Started - R Console. Data types and Structures. Exploring and Visualizing Data. Programming Structures and Data Relationships. 2) Data types and Structures. Note that Adding columns of data. df1 <- cbind (df1, The new column). Adding rows of data. df1 <- rbind (df1, The new row). Missing Data Large datasets often have missing data. Most R functions can handle. > ages <- c (23, 45, NA) > mean(ages) [1] NA > mean(ages, na.rm=TRUE) [1] 34 Where, NA is a logical constant of length 1 which contains a missing value indicator.
  • 14. Getting Started - R Console. Data types and Structures. Exploring and Visualizing Data. Programming Structures and Data Relationships. 3) Exploring and Visualizing Data. Importing and Exporting data. Filtering/Subsets. Sorting. Visulization/Analysis data. How to import external data from files into R? Reding Data from text files: Multiple functions to read in data from text files. Types of Data formats. - Delimited. - positional.
  • 15. Getting Started - R Console. Data types and Structures. Exploring and Visualizing Data. Programming Structures and Data Relationships. 3) Exploring and Visualizing Data. Reading external data into R Delimited files R includes a family of functions for importing delimited text files into R, based on the read.table function: read.table(file, header, sep = , quote = , dec = , row.names, col.names, as.is = , na.strings , colClasses , nrows =, skip = , check.names = , fill = , strip.white = , blank.lines.skip = , comment.char = , allowEscapes = , flush = , stringsAsFactors = , encoding = ) For example name.last,name.first,team,position,salary ”Manning”,”Peyton”,”Colts”,”QB”,18700000 ”Brady”,”Tom”,”Patriots”,”QB”,14626720 ”Pepper”,”Julius”,”Panthers”,”DE”,14137500 ”Palmer”,”Carson”,”Bengals”,”QB”,13980000 ”Manning”,”Eli”,”Giants”,”QB”,12916666
  • 16. Getting Started - R Console. Data types and Structures. Exploring and Visualizing Data. Programming Structures and Data Relationships. 3) Exploring and Visualizing Data. Note that The first row contains the column names. Each text field is encapsulated in quotes. Each field is separated by commas. How to load this file into R the first row contained column names (header=TRUE), that the delimiter was a comma (sep=”,”), and that quotes were used to encapsulate text (quote=”””). The R statement that loads in this file: > top.5.salaries <- read.table(”top.5.salaries.csv”, + header=TRUE, + sep=”,”, + quote=”””)
  • 17. Getting Started - R Console. Data types and Structures. Exploring and Visualizing Data. Programming Structures and Data Relationships. 3) Exploring and Visualizing Data. Fixed-width files To read a fixed-width format text file into a data frame, you can use the read.fwf function: read.fwf(file, widths, header = , sep = , skip = , row.names, col.names, n = , buffersize = ,. . .) Note that read.fwf can also take many arguments used by read.table, including as.is, na.strings, colClasses, and strip.white.
  • 18. Getting Started - R Console. Data types and Structures. Exploring and Visualizing Data. 3) Exploring and Visualizing Data. Let’s explore a public data using R. Programming Structures and Data Relationships.
  • 19. Getting Started - R Console. Data types and Structures. Exploring and Visualizing Data. 3) Exploring and Visualizing Data. Programming Structures and Data Relationships.
  • 20. Getting Started - R Console. Data types and Structures. Exploring and Visualizing Data. 3) Exploring and Visualizing Data. Programming Structures and Data Relationships.
  • 21. Getting Started - R Console. Data types and Structures. Exploring and Visualizing Data. 3) Exploring and Visualizing Data. Programming Structures and Data Relationships.
  • 22. Getting Started - R Console. Data types and Structures. Exploring and Visualizing Data. 3) Exploring and Visualizing Data. Programming Structures and Data Relationships.
  • 23. Getting Started - R Console. Data types and Structures. Exploring and Visualizing Data. Programming Structures and Data Relationships. 3) Exploring and Visualizing Data. Now let’s visualize trends in our data using Data Visualizations or graphics
  • 24. Getting Started - R Console. Data types and Structures. Exploring and Visualizing Data. 3) Exploring and Visualizing Data. Programming Structures and Data Relationships.
  • 25. Getting Started - R Console. Data types and Structures. Exploring and Visualizing Data. 3) Exploring and Visualizing Data. Programming Structures and Data Relationships.
  • 26. Getting Started - R Console. Data types and Structures. Exploring and Visualizing Data. 3) Exploring and Visualizing Data. Programming Structures and Data Relationships.
  • 27. Getting Started - R Console. Data types and Structures. Exploring and Visualizing Data. Programming Structures and Data Relationships. 4) Programming Structures and Data Relationships.
  • 28. Getting Started - R Console. Data types and Structures. Exploring and Visualizing Data. Programming Structures and Data Relationships. 4) Programming Structures and Data Relationships. Let’s examine decision making in R
  • 29. Getting Started - R Console. Data types and Structures. Exploring and Visualizing Data. Programming Structures and Data Relationships. 4) Programming Structures and Data Relationships.
  • 30. Getting Started - R Console. Data types and Structures. Exploring and Visualizing Data. Programming Structures and Data Relationships. 4) Programming Structures and Data Relationships.
  • 31. Getting Started - R Console. Data types and Structures. Exploring and Visualizing Data. Programming Structures and Data Relationships. 4) Programming Structures and Data Relationships.
  • 32. Getting Started - R Console. Data types and Structures. Exploring and Visualizing Data. Programming Structures and Data Relationships. 4) Programming Structures and Data Relationships. Functions - Example > f1 <- function(a,b) { return(a+b) } > f2 <- function(a,b) { return(a-b) } > f <- f1 > f(3,8) [1] 11 > f <- f2 > f(5,4) [1] 1 The apply family of functions apply() can apply a function to elements of a matrix or an array. lapply() applies a function to each column of a dataframe and returns a list. sapply() is similar but the output is simplified. It may be a vector or a matrix depending on the function. tapply() applies the function for each level of a factor.
  • 33. Getting Started - R Console. Data types and Structures. Exploring and Visualizing Data. Programming Structures and Data Relationships. 4) Programming Structures and Data Relationships.
  • 34. Getting Started - R Console. Data types and Structures. Exploring and Visualizing Data. Programming Structures and Data Relationships. 4) Programming Structures and Data Relationships. Common useful built-in functions all() #returns TRUE if all values are TRUE. any() args() cat() # returns TRUE if any values are TRUE. # information on the arguments to a function. # prints multiple objects, one after the other. cumprod() # cumulative product. cumsum() # cumulative sum. mean() # mean of the elements of a vector. median() # median of the elements of a vector. order() # prints a single R object.
  • 35. Getting Started - R Console. Data types and Structures. Exploring and Visualizing Data. Programming Structures and Data Relationships. 4) Programming Structures and Data Relationships.
  • 36. Getting Started - R Console. Data types and Structures. Exploring and Visualizing Data. Programming Structures and Data Relationships. 4) Programming Structures and Data Relationships.
  • 37. Getting Started - R Console. Data types and Structures. Exploring and Visualizing Data. Programming Structures and Data Relationships. 4) Programming Structures and Data Relationships.
  • 38. Getting Started - R Console. Data types and Structures. Exploring and Visualizing Data. Programming Structures and Data Relationships. 4) Programming Structures and Data Relationships.
  • 39. Getting Started - R Console. Data types and Structures. Exploring and Visualizing Data. Programming Structures and Data Relationships. 4) Programming Structures and Data Relationships.
  • 40. Getting Started - R Console. Data types and Structures. Exploring and Visualizing Data. Thanks!! Programming Structures and Data Relationships.
  • 41. Getting Started - R Console. Data types and Structures. Exploring and Visualizing Data. Programming Structures and Data Relationships. References Grant Hutchison, Introduction to Data Analysis using R, October 2013. John Maindonald, W. John Braun, Data Analysis and Graphics Using R: An Example-Based Approach (Cambridge Series in Statistical and Probabilistic Mathematics), Third Edition, Cambridge University Press 2003. Nicholas J. Horton, Ken Kleinman, Using R for Data Management, Statistical Analysis, and Graphics, CRC Press, 2010.