SlideShare a Scribd company logo
Chapter 3 – Data Visualization
© Galit Shmueli and Peter Bruce 2010
Shmueli, Patel & Bruce
1
Graphs for Data Exploration
Basic Plots
Line Graphs
Bar Charts
Scatterplots
Distribution Plots
Boxplots
Histograms
2 © Galit Shmueli and Peter Bruce 2010
Line Graph for Time Series
3 © Galit Shmueli and Peter Bruce 2010
Bar Chart for Categorical
Variable
95% of tracts do not
border Charles River
Excel can confuse:
y-axis is actually “% of
records that have a
value for CATMEDV”
(i.e., “% of all records”)
4 © Galit Shmueli and Peter Bruce 2010
Scatterplot
Displays relationship between
two numerical variables
5 © Galit Shmueli and Peter Bruce 2010
Distribution Plots
 Display “how many” of each value occur in a data
set
 Or, for continuous data or data with many
possible values, “how many” values are in each of
a series of ranges or “bins”
6 © Galit Shmueli and Peter Bruce 2010
Histograms
Histogram shows the
distribution of the
outcome variable
(median house value)
Boston Housing
example:
7 © Galit Shmueli and Peter Bruce 2010
Boxplots
Boston Housing
Example: Display
distribution of outcome
variable (MEDV) for
neighborhoods on
Charles river (1) and not
on Charles river (0)
Side-by-side boxplots are useful for comparing subgroups
8 © Galit Shmueli and Peter Bruce 2010
Box Plot
 Top outliers defined
as those above
Q3+1.5(Q3-Q1).
 “max” = maximum of
non-outliers
 Analogous definitions
for bottom outliers and
for “min”
 Details may differ
across software
Median
Quartile 1
“max”
“min”
outliers
mean
Quartile 3
9 © Galit Shmueli and Peter Bruce 2010
Heat Maps
Color conveys information
In data mining, used to visualize
Correlations
Missing Data
10 © Galit Shmueli and Peter Bruce 2010
Heatmap to highlight correlations
(Boston Housing)
CRIM ZN INDUS CHAS NOX RM AGE DIS RAD TAX PTRATIO B LSTAT MEDV
CRIM 1.00
ZN -0.20 1.00
INDUS 0.41 -0.53 1.00
CHAS -0.06 -0.04 0.06 1.00
NOX 0.42 -0.52 0.76 0.09 1.00
RM -0.22 0.31 -0.39 0.09 -0.30 1.00
AGE 0.35 -0.57 0.64 0.09 0.73 -0.24 1.00
DIS -0.38 0.66 -0.71 -0.10 -0.77 0.21 -0.75 1.00
RAD 0.63 -0.31 0.60 -0.01 0.61 -0.21 0.46 -0.49 1.00
TAX 0.58 -0.31 0.72 -0.04 0.67 -0.29 0.51 -0.53 0.91 1.00
PTRATIO 0.29 -0.39 0.38 -0.12 0.19 -0.36 0.26 -0.23 0.46 0.46 1.00
B -0.39 0.18 -0.36 0.05 -0.38 0.13 -0.27 0.29 -0.44 -0.44 -0.18 1.00
LSTAT 0.46 -0.41 0.60 -0.05 0.59 -0.61 0.60 -0.50 0.49 0.54 0.37 -0.37 1.00
MEDV -0.39 0.36 -0.48 0.18 -0.43 0.70 -0.38 0.25 -0.38 -0.47 -0.51 0.33 -0.74 1.00
In Excel
(using
conditional
formatting)
In Spotfire
11 © Galit Shmueli and Peter Bruce 2010
Multidimensional Visualization
12 © Galit Shmueli and Peter Bruce 2010
Scatterplot with color added
Boston Housing
NOX vs. LSTAT
Red = low median
value
Blue = high median
value
13 © Galit Shmueli and Peter Bruce 2010
Matrix Plot
0
0
1.8
1.8
3.6
3.6
5.4
5.4
7.2
7.2
9
9
0
0
0.2
0.2
0.4
0.4
0.6
0.6
0.8
0.8
1
1
0
0
0.6
0.6
1.2
1.2
1.8
1.8
2.4
2.4
3
3
CRIM
101
ZN
102
INDUS
101
Matrix Plot
Shows scatterplots
for variable pairs
Example:
scatterplots for 3
Boston Housing
variables
14 © Galit Shmueli and Peter Bruce 2010
Rescaling to log scale (on right)
“uncrowds” the data
15 © Galit Shmueli and Peter Bruce 2010
Aggregation
16 © Galit Shmueli and Peter Bruce 2010
Amtrak Ridership – Monthly Data
17 © Galit Shmueli and Peter Bruce 2010
Aggregation – Monthly Average
18 © Galit Shmueli and Peter Bruce 2010
Aggregation – Yearly Average
19 © Galit Shmueli and Peter Bruce 2010
Scatter Plot with Labels (Utilities)
20 © Galit Shmueli and Peter Bruce 2010
Scaling: Smaller markers, jittering, color
contrast (Universal Bank; red = accept loan)
21 © Galit Shmueli and Peter Bruce 2010
Jittering
 Moving markers by a small random amount
 Uncrowds the data by allowing more markers to
be seen
22 © Galit Shmueli and Peter Bruce 2010
Without jittering (for comparison)
23 © Galit Shmueli and Peter Bruce 2010
Parallel Coordinate Plot (Boston Housing)
Filter
Settings
-
CAT.
MEDV:
(1)
CATMEDV =1
CATMEDV =0
24 © Galit Shmueli and Peter Bruce 2010
Linked plots
(same record is highlighted in each plot)
25 © Galit Shmueli and Peter Bruce 2010
Network Graph – eBay Auctions
(sellers on left, buyers on right)
Circle size = # of
transactions for the
node
Line width =# of
auctions for the buyer-
seller pair
Arrows point from
buyer to seller
26 © Galit Shmueli and Peter Bruce 2010
Treemap – eBay Auctions
(Hierarchical eBay data:
Category> sub-category> Brand)
Rectangle size
= average
closing price
(=item value)
Color = %
sellers with
negative
feedback
(darker=more)
27 © Galit Shmueli and Peter Bruce 2010
Map Chart
(Comparing countries’ well-being with GDP)
Darker = higher value
28 © Galit Shmueli and Peter Bruce 2010
Summary of Major Visualizations &
Operations, According to Data Mining Goal
29 © Galit Shmueli and Peter Bruce 2010
Summary of Major Visualizations &
Operations, According to Data Mining Goal
30 © Galit Shmueli and Peter Bruce 2010
Summary of Major Visualizations &
Operations, According to Data Mining Goal
31 © Galit Shmueli and Peter Bruce 2010

More Related Content

PPT
2-1-Data Visualization data data data.ppt
PPT
Chapter 2. Know Your Data.ppt
PPT
02Dataccccccccccccccccccccccccccccccccccccccc.ppt
PPT
chap3_data_exploration with realtimeexample.ppt
PPT
chap3_data_exploration in data science.ppt
PPT
Data_exploration.ppt
PPT
Data mining techniques in data mining with examples
PPT
Data mining data characteristics
2-1-Data Visualization data data data.ppt
Chapter 2. Know Your Data.ppt
02Dataccccccccccccccccccccccccccccccccccccccc.ppt
chap3_data_exploration with realtimeexample.ppt
chap3_data_exploration in data science.ppt
Data_exploration.ppt
Data mining techniques in data mining with examples
Data mining data characteristics

Similar to Data Visualization Data Visualization Data Visualization (20)

PPT
Getting to Know Your Data Some sources from where you can access datasets for...
PPT
02Data mining 243657786756868766758(1).ppt
PPT
Data mining Concepts and Techniques
PPT
02Data.ppt
PPT
02Data.ppt
PDF
Data Mining - Exploring Data
PPT
Upstate CSCI 525 Data Mining Chapter 2
PPT
Data Mining: Concepts and Techniques — Chapter 2 —
PPT
Data mining :Concepts and Techniques Chapter 2, data
PPTX
visual representation with BOX PLOT,BAR PLOTS
PPT
DATA MINING: CONCEPTS AND TECHNIQUES OF DATA MINING
PPTX
Data Exploration.pptx
PPT
hanjia chapter_2.ppt data mining chapter 2
PPTX
Data Visualization Fundamentals power.pptx
PPT
Data Mining and Warehousing Concept and Techniques
PPTX
Lect_2_ Data visualization using Microsoft Excel[64].pptx
PDF
PPT
02Data.ppt data mining introduction topic
PPT
02Data.ppt 02Data.ppt data mining introduction topic1
PPT
17329274.ppt
Getting to Know Your Data Some sources from where you can access datasets for...
02Data mining 243657786756868766758(1).ppt
Data mining Concepts and Techniques
02Data.ppt
02Data.ppt
Data Mining - Exploring Data
Upstate CSCI 525 Data Mining Chapter 2
Data Mining: Concepts and Techniques — Chapter 2 —
Data mining :Concepts and Techniques Chapter 2, data
visual representation with BOX PLOT,BAR PLOTS
DATA MINING: CONCEPTS AND TECHNIQUES OF DATA MINING
Data Exploration.pptx
hanjia chapter_2.ppt data mining chapter 2
Data Visualization Fundamentals power.pptx
Data Mining and Warehousing Concept and Techniques
Lect_2_ Data visualization using Microsoft Excel[64].pptx
02Data.ppt data mining introduction topic
02Data.ppt 02Data.ppt data mining introduction topic1
17329274.ppt
Ad

Recently uploaded (20)

PPTX
Current and future trends in Computer Vision.pptx
PPTX
Artificial Intelligence
PPTX
OOP with Java - Java Introduction (Basics)
PDF
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
DOCX
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
PPTX
web development for engineering and engineering
PPTX
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
PPTX
CH1 Production IntroductoryConcepts.pptx
PDF
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
PPTX
Safety Seminar civil to be ensured for safe working.
PPTX
additive manufacturing of ss316l using mig welding
PDF
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
PDF
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
PPTX
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
PPTX
Lecture Notes Electrical Wiring System Components
PDF
PPT on Performance Review to get promotions
PPTX
Foundation to blockchain - A guide to Blockchain Tech
PDF
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
PPTX
UNIT-1 - COAL BASED THERMAL POWER PLANTS
PPTX
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
Current and future trends in Computer Vision.pptx
Artificial Intelligence
OOP with Java - Java Introduction (Basics)
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
web development for engineering and engineering
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
CH1 Production IntroductoryConcepts.pptx
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
Safety Seminar civil to be ensured for safe working.
additive manufacturing of ss316l using mig welding
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
Lecture Notes Electrical Wiring System Components
PPT on Performance Review to get promotions
Foundation to blockchain - A guide to Blockchain Tech
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
UNIT-1 - COAL BASED THERMAL POWER PLANTS
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
Ad

Data Visualization Data Visualization Data Visualization

  • 1. Chapter 3 – Data Visualization © Galit Shmueli and Peter Bruce 2010 Shmueli, Patel & Bruce 1
  • 2. Graphs for Data Exploration Basic Plots Line Graphs Bar Charts Scatterplots Distribution Plots Boxplots Histograms 2 © Galit Shmueli and Peter Bruce 2010
  • 3. Line Graph for Time Series 3 © Galit Shmueli and Peter Bruce 2010
  • 4. Bar Chart for Categorical Variable 95% of tracts do not border Charles River Excel can confuse: y-axis is actually “% of records that have a value for CATMEDV” (i.e., “% of all records”) 4 © Galit Shmueli and Peter Bruce 2010
  • 5. Scatterplot Displays relationship between two numerical variables 5 © Galit Shmueli and Peter Bruce 2010
  • 6. Distribution Plots  Display “how many” of each value occur in a data set  Or, for continuous data or data with many possible values, “how many” values are in each of a series of ranges or “bins” 6 © Galit Shmueli and Peter Bruce 2010
  • 7. Histograms Histogram shows the distribution of the outcome variable (median house value) Boston Housing example: 7 © Galit Shmueli and Peter Bruce 2010
  • 8. Boxplots Boston Housing Example: Display distribution of outcome variable (MEDV) for neighborhoods on Charles river (1) and not on Charles river (0) Side-by-side boxplots are useful for comparing subgroups 8 © Galit Shmueli and Peter Bruce 2010
  • 9. Box Plot  Top outliers defined as those above Q3+1.5(Q3-Q1).  “max” = maximum of non-outliers  Analogous definitions for bottom outliers and for “min”  Details may differ across software Median Quartile 1 “max” “min” outliers mean Quartile 3 9 © Galit Shmueli and Peter Bruce 2010
  • 10. Heat Maps Color conveys information In data mining, used to visualize Correlations Missing Data 10 © Galit Shmueli and Peter Bruce 2010
  • 11. Heatmap to highlight correlations (Boston Housing) CRIM ZN INDUS CHAS NOX RM AGE DIS RAD TAX PTRATIO B LSTAT MEDV CRIM 1.00 ZN -0.20 1.00 INDUS 0.41 -0.53 1.00 CHAS -0.06 -0.04 0.06 1.00 NOX 0.42 -0.52 0.76 0.09 1.00 RM -0.22 0.31 -0.39 0.09 -0.30 1.00 AGE 0.35 -0.57 0.64 0.09 0.73 -0.24 1.00 DIS -0.38 0.66 -0.71 -0.10 -0.77 0.21 -0.75 1.00 RAD 0.63 -0.31 0.60 -0.01 0.61 -0.21 0.46 -0.49 1.00 TAX 0.58 -0.31 0.72 -0.04 0.67 -0.29 0.51 -0.53 0.91 1.00 PTRATIO 0.29 -0.39 0.38 -0.12 0.19 -0.36 0.26 -0.23 0.46 0.46 1.00 B -0.39 0.18 -0.36 0.05 -0.38 0.13 -0.27 0.29 -0.44 -0.44 -0.18 1.00 LSTAT 0.46 -0.41 0.60 -0.05 0.59 -0.61 0.60 -0.50 0.49 0.54 0.37 -0.37 1.00 MEDV -0.39 0.36 -0.48 0.18 -0.43 0.70 -0.38 0.25 -0.38 -0.47 -0.51 0.33 -0.74 1.00 In Excel (using conditional formatting) In Spotfire 11 © Galit Shmueli and Peter Bruce 2010
  • 12. Multidimensional Visualization 12 © Galit Shmueli and Peter Bruce 2010
  • 13. Scatterplot with color added Boston Housing NOX vs. LSTAT Red = low median value Blue = high median value 13 © Galit Shmueli and Peter Bruce 2010
  • 14. Matrix Plot 0 0 1.8 1.8 3.6 3.6 5.4 5.4 7.2 7.2 9 9 0 0 0.2 0.2 0.4 0.4 0.6 0.6 0.8 0.8 1 1 0 0 0.6 0.6 1.2 1.2 1.8 1.8 2.4 2.4 3 3 CRIM 101 ZN 102 INDUS 101 Matrix Plot Shows scatterplots for variable pairs Example: scatterplots for 3 Boston Housing variables 14 © Galit Shmueli and Peter Bruce 2010
  • 15. Rescaling to log scale (on right) “uncrowds” the data 15 © Galit Shmueli and Peter Bruce 2010
  • 16. Aggregation 16 © Galit Shmueli and Peter Bruce 2010
  • 17. Amtrak Ridership – Monthly Data 17 © Galit Shmueli and Peter Bruce 2010
  • 18. Aggregation – Monthly Average 18 © Galit Shmueli and Peter Bruce 2010
  • 19. Aggregation – Yearly Average 19 © Galit Shmueli and Peter Bruce 2010
  • 20. Scatter Plot with Labels (Utilities) 20 © Galit Shmueli and Peter Bruce 2010
  • 21. Scaling: Smaller markers, jittering, color contrast (Universal Bank; red = accept loan) 21 © Galit Shmueli and Peter Bruce 2010
  • 22. Jittering  Moving markers by a small random amount  Uncrowds the data by allowing more markers to be seen 22 © Galit Shmueli and Peter Bruce 2010
  • 23. Without jittering (for comparison) 23 © Galit Shmueli and Peter Bruce 2010
  • 24. Parallel Coordinate Plot (Boston Housing) Filter Settings - CAT. MEDV: (1) CATMEDV =1 CATMEDV =0 24 © Galit Shmueli and Peter Bruce 2010
  • 25. Linked plots (same record is highlighted in each plot) 25 © Galit Shmueli and Peter Bruce 2010
  • 26. Network Graph – eBay Auctions (sellers on left, buyers on right) Circle size = # of transactions for the node Line width =# of auctions for the buyer- seller pair Arrows point from buyer to seller 26 © Galit Shmueli and Peter Bruce 2010
  • 27. Treemap – eBay Auctions (Hierarchical eBay data: Category> sub-category> Brand) Rectangle size = average closing price (=item value) Color = % sellers with negative feedback (darker=more) 27 © Galit Shmueli and Peter Bruce 2010
  • 28. Map Chart (Comparing countries’ well-being with GDP) Darker = higher value 28 © Galit Shmueli and Peter Bruce 2010
  • 29. Summary of Major Visualizations & Operations, According to Data Mining Goal 29 © Galit Shmueli and Peter Bruce 2010
  • 30. Summary of Major Visualizations & Operations, According to Data Mining Goal 30 © Galit Shmueli and Peter Bruce 2010
  • 31. Summary of Major Visualizations & Operations, According to Data Mining Goal 31 © Galit Shmueli and Peter Bruce 2010