SlideShare a Scribd company logo
Elegant Graphics for Data
  Analysis with ggplot2
      Yann Abraham
     BaselR 28.04.2010
Who is Yann Abraham
• Biochemist by training
• Bioinformatician by trade
• Pharma/Biotech
  – Cellzome AG
  – Novartis Pharma AG


• http://ch.linkedin.com/in/yannabraham
How to Represent Data
“A Picture is Worth a Thousand Words”
• Visualization is a critical component of data
  analysis
• Graphics are the most efficient way to digest
  large volumes of data & identify trends
• Graphical design is a mixture of mathematical
  and perceptual science
A Straightforward Way to Create
              Visualizations
• Grammar of Graphics provides a framework to
  streamline the description and creation of graphics
• For a given dataset to be displayed:
   – Map variables to aesthetics
   – Define Layers
      • A representation (a ‘geom’) ie line, boxplot, histogram,…
      • Associated statistical transformation ie counts, model,…
   – Define Scales
      • Color, Shape, axes,…
   – Define Coordinates
   – Define Facets
Why Use ggplot2?
• Simple yet powerful syntax
• Provides a framework for creating any type of
  graphics
• Implements basic graphical design rules by
  default
An Example
• 4 cell lines where treated with a compound
  active against a class of enzymes
• Proteins where extracted and quantified using
  mass spectrometry
• Is there anything interesting?!?
Excel…
R…
R…
R…
R… (this could go on for hours)
…ggplot2!
ggplot2!
ggplot2!
ggplot2!
ggplot2!



qplot(Experiment_1,Experiment_2,data=comp,color=CELLLINE,facets=ISTARGET~CELLLINE)+
         coord_equal()+
         scale_x_log10()+
         scale_y_log10()
When Visualization Alone is Not
              Enough
• Some datasets are large multidimensional
  data structures
• Representing data from such structure
  requires data transformation
• R is good at handling large sets
• R functions for handling multidimensional sets
  are complex to use
Easy Data Transformation With plyr
• plyr provides wrappers around typical R
  operations
  – Split
  – Apply
  – Combine
• plyr functions are similar to the by() function
Why use plyr?
• Simple syntax
• Predictable output
• Tightly integrated into ggplot2

• This comes at a price – somewhat slower than
  apply
An Example
• Given a set of raw data from a High
  Throughput Screen, compute the plate-
  normalized effect
The standard R way
    plate.mean <- aggregate(hts.data$RAW,
         list(hts.data$PLATE_ID),mean)

names(plate.mean) <- c(‘PLATE_ID’,’PLATE_MEAN’)

     hts.data <- merge(hts.data,plate.mean)

               hts.data$NORM<-
      hts.data$RAW/hts.data$PLATE_MEAN
The plyr way
             hts.data <-
ddply(hts.data,.(PLATE_ID),function(df) {
   df$NORM<-df$RAW/mean(df$RAW)
             return(df) }
                   )
Benefits of Using plyr & ggplot2
• Compact, straightforward syntax
  – Good basic output, complex options only required
    for polishing
• Shifts focus from plotting to exploring
  – Presentation graphics can be created from there
    at minimal cost
  – Data transformation is intuitive
• Powerful statistics available
  – It’s R!
Some links…
• The Grammar of Graphics book by Leland
  Wilkinson
• The ggplot2 book by Hadley Wickham
  – And the corresponding website
• A presentation about plyr by JD Long
  – And his initial blog post
THANK YOU FOR YOUR
    ATTENTION!

More Related Content

PDF
Intro to ggplot2 - Sheffield R Users Group, Feb 2015
PDF
8. R Graphics with R
 
PDF
Introduction to R Graphics with ggplot2
PPTX
Data visualization using R
PDF
Introduction to ggplot2
PPTX
India software developers conference 2013 Bangalore
PDF
Feature Engineering - Getting most out of data for predictive models - TDC 2017
PPTX
Tech talk ggplot2
Intro to ggplot2 - Sheffield R Users Group, Feb 2015
8. R Graphics with R
 
Introduction to R Graphics with ggplot2
Data visualization using R
Introduction to ggplot2
India software developers conference 2013 Bangalore
Feature Engineering - Getting most out of data for predictive models - TDC 2017
Tech talk ggplot2

What's hot (20)

PPTX
R and Visualization: A match made in Heaven
PDF
모듈형 패키지를 활용한 나만의 기계학습 모형 만들기 - 회귀나무모형을 중심으로
PPT
Chapter 9 ds
PDF
Vectors data frames
 
PPTX
Machine Learning - Neural Networks - Perceptron
PDF
Tutorial of topological data analysis part 3(Mapper algorithm)
PDF
RUCK 2017 MxNet과 R을 연동한 딥러닝 소개
PPTX
Pandas data transformational data structure patterns and challenges final
PPTX
Machine Learning in R
PPT
5.4 randomized datastructures
PPTX
Python data structures - best in class for data analysis
PDF
I Don't Want to Be a Dummy! Encoding Predictors for Trees
PPTX
5. working on data using R -Cleaning, filtering ,transformation, Sampling
PDF
Broom: Converting Statistical Models to Tidy Data Frames
PPTX
Self taught clustering
PPT
chapter1
PPT
Chapter 3 ds
PPTX
3. R- list and data frame
PPT
DATA VISUALIZATION WITH R PACKAGES
PPTX
PCA and SVD in brief
R and Visualization: A match made in Heaven
모듈형 패키지를 활용한 나만의 기계학습 모형 만들기 - 회귀나무모형을 중심으로
Chapter 9 ds
Vectors data frames
 
Machine Learning - Neural Networks - Perceptron
Tutorial of topological data analysis part 3(Mapper algorithm)
RUCK 2017 MxNet과 R을 연동한 딥러닝 소개
Pandas data transformational data structure patterns and challenges final
Machine Learning in R
5.4 randomized datastructures
Python data structures - best in class for data analysis
I Don't Want to Be a Dummy! Encoding Predictors for Trees
5. working on data using R -Cleaning, filtering ,transformation, Sampling
Broom: Converting Statistical Models to Tidy Data Frames
Self taught clustering
chapter1
Chapter 3 ds
3. R- list and data frame
DATA VISUALIZATION WITH R PACKAGES
PCA and SVD in brief
Ad

Similar to Elegant Graphics for Data Analysis with ggplot2 (20)

PDF
Data Visualization in R (Graph, Trend, etc)
PDF
Introduction to R Short course Fall 2016
PDF
M4_DAR_part1. module part 4 analystics with r
PDF
Introduction to R for data science
PPTX
Data Exploration in R.pptx
PPTX
Introduction to GGVIS Visualization
PDF
Data Analysis with R (combined slides)
DOCX
Background This course is all about data visualization. However, we.docx
KEY
Presentation R basic teaching module
PPT
introduction to R with example, Data science
PPT
PPT
Slides on introduction to R by ArinBasu MD
PPT
17641.ppt
PPT
Basics of R-Progranmming with instata.ppt
PPTX
An implementation of the grammar of graphics: ggplot
PPTX
Exploratory Data Analysis
PDF
Rtips123
PDF
Essentials of R
PPTX
Introduction To R
PPTX
Exploratory Analysis Part1 Coursera DataScience Specialisation
Data Visualization in R (Graph, Trend, etc)
Introduction to R Short course Fall 2016
M4_DAR_part1. module part 4 analystics with r
Introduction to R for data science
Data Exploration in R.pptx
Introduction to GGVIS Visualization
Data Analysis with R (combined slides)
Background This course is all about data visualization. However, we.docx
Presentation R basic teaching module
introduction to R with example, Data science
Slides on introduction to R by ArinBasu MD
17641.ppt
Basics of R-Progranmming with instata.ppt
An implementation of the grammar of graphics: ggplot
Exploratory Data Analysis
Rtips123
Essentials of R
Introduction To R
Exploratory Analysis Part1 Coursera DataScience Specialisation
Ad

Recently uploaded (20)

PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Machine learning based COVID-19 study performance prediction
PPTX
A Presentation on Artificial Intelligence
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PPTX
MYSQL Presentation for SQL database connectivity
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
cuic standard and advanced reporting.pdf
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PDF
Approach and Philosophy of On baking technology
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Modernizing your data center with Dell and AMD
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Electronic commerce courselecture one. Pdf
Digital-Transformation-Roadmap-for-Companies.pptx
Machine learning based COVID-19 study performance prediction
A Presentation on Artificial Intelligence
Chapter 3 Spatial Domain Image Processing.pdf
“AI and Expert System Decision Support & Business Intelligence Systems”
MYSQL Presentation for SQL database connectivity
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
The AUB Centre for AI in Media Proposal.docx
cuic standard and advanced reporting.pdf
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
Approach and Philosophy of On baking technology
NewMind AI Weekly Chronicles - August'25 Week I
Reach Out and Touch Someone: Haptics and Empathic Computing
Modernizing your data center with Dell and AMD
Dropbox Q2 2025 Financial Results & Investor Presentation
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
Per capita expenditure prediction using model stacking based on satellite ima...
Electronic commerce courselecture one. Pdf

Elegant Graphics for Data Analysis with ggplot2

  • 1. Elegant Graphics for Data Analysis with ggplot2 Yann Abraham BaselR 28.04.2010
  • 2. Who is Yann Abraham • Biochemist by training • Bioinformatician by trade • Pharma/Biotech – Cellzome AG – Novartis Pharma AG • http://ch.linkedin.com/in/yannabraham
  • 4. “A Picture is Worth a Thousand Words” • Visualization is a critical component of data analysis • Graphics are the most efficient way to digest large volumes of data & identify trends • Graphical design is a mixture of mathematical and perceptual science
  • 5. A Straightforward Way to Create Visualizations • Grammar of Graphics provides a framework to streamline the description and creation of graphics • For a given dataset to be displayed: – Map variables to aesthetics – Define Layers • A representation (a ‘geom’) ie line, boxplot, histogram,… • Associated statistical transformation ie counts, model,… – Define Scales • Color, Shape, axes,… – Define Coordinates – Define Facets
  • 6. Why Use ggplot2? • Simple yet powerful syntax • Provides a framework for creating any type of graphics • Implements basic graphical design rules by default
  • 7. An Example • 4 cell lines where treated with a compound active against a class of enzymes • Proteins where extracted and quantified using mass spectrometry • Is there anything interesting?!?
  • 10. R…
  • 11. R…
  • 12. R… (this could go on for hours)
  • 18. When Visualization Alone is Not Enough • Some datasets are large multidimensional data structures • Representing data from such structure requires data transformation • R is good at handling large sets • R functions for handling multidimensional sets are complex to use
  • 19. Easy Data Transformation With plyr • plyr provides wrappers around typical R operations – Split – Apply – Combine • plyr functions are similar to the by() function
  • 20. Why use plyr? • Simple syntax • Predictable output • Tightly integrated into ggplot2 • This comes at a price – somewhat slower than apply
  • 21. An Example • Given a set of raw data from a High Throughput Screen, compute the plate- normalized effect
  • 22. The standard R way plate.mean <- aggregate(hts.data$RAW, list(hts.data$PLATE_ID),mean) names(plate.mean) <- c(‘PLATE_ID’,’PLATE_MEAN’) hts.data <- merge(hts.data,plate.mean) hts.data$NORM<- hts.data$RAW/hts.data$PLATE_MEAN
  • 23. The plyr way hts.data <- ddply(hts.data,.(PLATE_ID),function(df) { df$NORM<-df$RAW/mean(df$RAW) return(df) } )
  • 24. Benefits of Using plyr & ggplot2 • Compact, straightforward syntax – Good basic output, complex options only required for polishing • Shifts focus from plotting to exploring – Presentation graphics can be created from there at minimal cost – Data transformation is intuitive • Powerful statistics available – It’s R!
  • 25. Some links… • The Grammar of Graphics book by Leland Wilkinson • The ggplot2 book by Hadley Wickham – And the corresponding website • A presentation about plyr by JD Long – And his initial blog post
  • 26. THANK YOU FOR YOUR ATTENTION!