SlideShare a Scribd company logo
Chapter 3 – Data Visualization
© Galit Shmueli and Peter Bruce 2010
Shmueli, Patel & Bruce
1
Graphs for Data Exploration
Basic Plots
Line Graphs
Bar Charts
Scatterplots
Distribution Plots
Boxplots
Histograms
2 © Galit Shmueli and Peter Bruce 2010
Line Graph for Time Series
3 © Galit Shmueli and Peter Bruce 2010
Bar Chart for Categorical Variable
95% of tracts do not border
Charles River
Excel can confuse:
y-axis is actually “% of
records that have a value
for CATMEDV” (i.e., “% of
all records”)
4 © Galit Shmueli and Peter Bruce 2010
Scatterplot
Displays relationship between two
numerical variables
5 © Galit Shmueli and Peter Bruce 2010
Distribution Plots
 Display “how many” of each value occur in a data
set
 Or, for continuous data or data with many possible
values, “how many” values are in each of a series of
ranges or “bins”
6 © Galit Shmueli and Peter Bruce 2010
Histograms
Histogram shows the
distribution of the
outcome variable
(median house value)
Boston Housing example:
7 © Galit Shmueli and Peter Bruce 2010
Boxplots
Boston Housing Example:
Display distribution of
outcome variable (MEDV)
for neighborhoods on
Charles river (1) and not
on Charles river (0)
Side-by-side boxplots are useful for comparing subgroups
8 © Galit Shmueli and Peter Bruce 2010
Box Plot
 Top outliers defined as
those above
Q3+1.5(Q3-Q1).
 “max” = maximum of
non-outliers
 Analogous definitions
for bottom outliers and
for “min”
 Details may differ
across software
Median
Quartile 1
“max”
“min”
outliers
mean
Quartile 3
9 © Galit Shmueli and Peter Bruce 2010
Heat Maps
Color conveys information
In data mining, used to visualize
Correlations
Missing Data
10 © Galit Shmueli and Peter Bruce 2010
Heatmap to highlight correlations
(Boston Housing)
CRIM ZN INDUS CHAS NOX RM AGE DIS RAD TAX PTRATIO B LSTAT MEDV
CRIM 1.00
ZN -0.20 1.00
INDUS 0.41 -0.53 1.00
CHAS -0.06 -0.04 0.06 1.00
NOX 0.42 -0.52 0.76 0.09 1.00
RM -0.22 0.31 -0.39 0.09 -0.30 1.00
AGE 0.35 -0.57 0.64 0.09 0.73 -0.24 1.00
DIS -0.38 0.66 -0.71 -0.10 -0.77 0.21 -0.75 1.00
RAD 0.63 -0.31 0.60 -0.01 0.61 -0.21 0.46 -0.49 1.00
TAX 0.58 -0.31 0.72 -0.04 0.67 -0.29 0.51 -0.53 0.91 1.00
PTRATIO 0.29 -0.39 0.38 -0.12 0.19 -0.36 0.26 -0.23 0.46 0.46 1.00
B -0.39 0.18 -0.36 0.05 -0.38 0.13 -0.27 0.29 -0.44 -0.44 -0.18 1.00
LSTAT 0.46 -0.41 0.60 -0.05 0.59 -0.61 0.60 -0.50 0.49 0.54 0.37 -0.37 1.00
MEDV -0.39 0.36 -0.48 0.18 -0.43 0.70 -0.38 0.25 -0.38 -0.47 -0.51 0.33 -0.74 1.00
In Excel
(using
conditional
formatting)
In Spotfire
11 © Galit Shmueli and Peter Bruce 2010
Multidimensional Visualization
12 © Galit Shmueli and Peter Bruce 2010
Scatterplot with color added
Boston Housing
NOX vs. LSTAT
Red = low median value
Blue = high median
value
13 © Galit Shmueli and Peter Bruce 2010
Matrix Plot
Shows scatterplots
for variable pairs
Example:
scatterplots for 3
Boston Housing
variables
14 © Galit Shmueli and Peter Bruce 2010
Rescaling to log scale (on right)
“uncrowds” the data
15 © Galit Shmueli and Peter Bruce 2010
Aggregation
16 © Galit Shmueli and Peter Bruce 2010
Amtrak Ridership – Monthly Data
17 © Galit Shmueli and Peter Bruce 2010
Aggregation – Monthly Average
18 © Galit Shmueli and Peter Bruce 2010
Aggregation – Yearly Average
19 © Galit Shmueli and Peter Bruce 2010
Scatter Plot with Labels (Utilities)
20 © Galit Shmueli and Peter Bruce 2010
Scaling: Smaller markers, jittering, color contrast
(Universal Bank; red = accept loan)
21 © Galit Shmueli and Peter Bruce 2010
Jittering
 Moving markers by a small random amount
 Uncrowds the data by allowing more markers to be
seen
22 © Galit Shmueli and Peter Bruce 2010
Without jittering (for comparison)
23 © Galit Shmueli and Peter Bruce 2010
Parallel Coordinate Plot (Boston Housing)
Filter
Settings
-
CAT.
MEDV:
(1)
CATMEDV =1
CATMEDV =0
24 © Galit Shmueli and Peter Bruce 2010
Linked plots
(same record is highlighted in each plot)
25 © Galit Shmueli and Peter Bruce 2010
Network Graph – eBay Auctions
(sellers on left, buyers on right)
Circle size = # of
transactions for the node
Line width =# of
auctions for the buyer-
seller pair
Arrows point from buyer
to seller
26 © Galit Shmueli and Peter Bruce 2010
Treemap – eBay Auctions
(Hierarchical eBay data:
Category> sub-category> Brand)
Rectangle size =
average closing
price (=item
value)
Color = % sellers
with negative
feedback
(darker=more)
27 © Galit Shmueli and Peter Bruce 2010
Map Chart
(Comparing countries’ well-being with GDP)
Darker = higher value
28 © Galit Shmueli and Peter Bruce 2010
Summary of Major Visualizations & Operations,
According to Data Mining Goal
29 © Galit Shmueli and Peter Bruce 2010
Summary of Major Visualizations & Operations,
According to Data Mining Goal
30 © Galit Shmueli and Peter Bruce 2010
Summary of Major Visualizations & Operations,
According to Data Mining Goal
31 © Galit Shmueli and Peter Bruce 2010

More Related Content

PPT
Data Visualization Data Visualization Data Visualization
PPT
02Dataccccccccccccccccccccccccccccccccccccccc.ppt
PPT
Data mining techniques in data mining with examples
PPT
Chapter 2. Know Your Data.ppt
PPT
Data mining data characteristics
PPT
Data mining Concepts and Techniques
PPT
02Data.ppt
PPT
02Data.ppt
Data Visualization Data Visualization Data Visualization
02Dataccccccccccccccccccccccccccccccccccccccc.ppt
Data mining techniques in data mining with examples
Chapter 2. Know Your Data.ppt
Data mining data characteristics
Data mining Concepts and Techniques
02Data.ppt
02Data.ppt

Similar to 2-1-Data Visualization data data data.ppt (20)

PPT
DATA MINING: CONCEPTS AND TECHNIQUES OF DATA MINING
PPT
02Data mining 243657786756868766758(1).ppt
PPT
Getting to Know Your Data Some sources from where you can access datasets for...
PPT
Data Mining and Warehousing Concept and Techniques
PPT
Upstate CSCI 525 Data Mining Chapter 2
PPT
chap3_data_exploration with realtimeexample.ppt
PPT
chap3_data_exploration in data science.ppt
PPT
Data_exploration.ppt
PPT
Data Mining: Concepts and Techniques — Chapter 2 —
PPT
Data mining :Concepts and Techniques Chapter 2, data
PPTX
Data Exploration.pptx
PPT
hanjia chapter_2.ppt data mining chapter 2
PPTX
Lect_2_ Data visualization using Microsoft Excel[64].pptx
PDF
PPTX
visual representation with BOX PLOT,BAR PLOTS
PPT
data mining chapter no 2 concepts and techniques
PPTX
Data Visualization Fundamentals power.pptx
PDF
Data Mining - Exploring Data
PPT
02Data.ppt data mining introduction topic
PPT
02Data.ppt 02Data.ppt data mining introduction topic1
DATA MINING: CONCEPTS AND TECHNIQUES OF DATA MINING
02Data mining 243657786756868766758(1).ppt
Getting to Know Your Data Some sources from where you can access datasets for...
Data Mining and Warehousing Concept and Techniques
Upstate CSCI 525 Data Mining Chapter 2
chap3_data_exploration with realtimeexample.ppt
chap3_data_exploration in data science.ppt
Data_exploration.ppt
Data Mining: Concepts and Techniques — Chapter 2 —
Data mining :Concepts and Techniques Chapter 2, data
Data Exploration.pptx
hanjia chapter_2.ppt data mining chapter 2
Lect_2_ Data visualization using Microsoft Excel[64].pptx
visual representation with BOX PLOT,BAR PLOTS
data mining chapter no 2 concepts and techniques
Data Visualization Fundamentals power.pptx
Data Mining - Exploring Data
02Data.ppt data mining introduction topic
02Data.ppt 02Data.ppt data mining introduction topic1
Ad

More from TRIMEGAASRI (10)

PPTX
Draft Proposal CSR (CSR-CSR-CSR-CSR).pptx
PPTX
Bahan Vokasi Semarang PM VOKASI VOKASI.pptx
PPTX
180419 - V.1 Materi Vokasi PSMK VOKASI.pptx
PPTX
materi prof ramayah METODE PENELITIAN.pptx
PPTX
PPTUEUMetodologi-Penelitian-3METODE.pptx
PPT
Creativity - lateral thinking METODE PEN.ppt
PPT
PPT-UEU-Metodologi-Desain-Interior-Pertemuan-11-2018.ppt
PPTX
konsep_dce DCE DCE UBE OBE DCE OBE UB.pptx
PPT
02 Konsep Dasar Psikologi Komunikasi pada proses komunikasi
PPTX
Brand Communication Strategy of MSME.pptx
Draft Proposal CSR (CSR-CSR-CSR-CSR).pptx
Bahan Vokasi Semarang PM VOKASI VOKASI.pptx
180419 - V.1 Materi Vokasi PSMK VOKASI.pptx
materi prof ramayah METODE PENELITIAN.pptx
PPTUEUMetodologi-Penelitian-3METODE.pptx
Creativity - lateral thinking METODE PEN.ppt
PPT-UEU-Metodologi-Desain-Interior-Pertemuan-11-2018.ppt
konsep_dce DCE DCE UBE OBE DCE OBE UB.pptx
02 Konsep Dasar Psikologi Komunikasi pada proses komunikasi
Brand Communication Strategy of MSME.pptx
Ad

Recently uploaded (20)

PDF
MSPs in 10 Words - Created by US MSP Network
PDF
Outsourced Audit & Assurance in USA Why Globus Finanza is Your Trusted Choice
PPTX
job Avenue by vinith.pptxvnbvnvnvbnvbnbmnbmbh
PPTX
New Microsoft PowerPoint Presentation - Copy.pptx
PPTX
Belch_12e_PPT_Ch18_Accessible_university.pptx
PPTX
Board-Reporting-Package-by-Umbrex-5-23-23.pptx
DOCX
unit 2 cost accounting- Tender and Quotation & Reconciliation Statement
PDF
kom-180-proposal-for-a-directive-amending-directive-2014-45-eu-and-directive-...
PDF
Unit 1 Cost Accounting - Cost sheet
PPTX
AI-assistance in Knowledge Collection and Curation supporting Safe and Sustai...
PDF
Power and position in leadershipDOC-20250808-WA0011..pdf
PPTX
Probability Distribution, binomial distribution, poisson distribution
PDF
Chapter 5_Foreign Exchange Market in .pdf
PPTX
HR Introduction Slide (1).pptx on hr intro
PPT
Data mining for business intelligence ch04 sharda
PPTX
svnfcksanfskjcsnvvjknsnvsdscnsncxasxa saccacxsax
PDF
BsN 7th Sem Course GridNNNNNNNN CCN.pdf
PDF
Types of control:Qualitative vs Quantitative
PDF
IFRS Notes in your pocket for study all the time
PPTX
Amazon (Business Studies) management studies
MSPs in 10 Words - Created by US MSP Network
Outsourced Audit & Assurance in USA Why Globus Finanza is Your Trusted Choice
job Avenue by vinith.pptxvnbvnvnvbnvbnbmnbmbh
New Microsoft PowerPoint Presentation - Copy.pptx
Belch_12e_PPT_Ch18_Accessible_university.pptx
Board-Reporting-Package-by-Umbrex-5-23-23.pptx
unit 2 cost accounting- Tender and Quotation & Reconciliation Statement
kom-180-proposal-for-a-directive-amending-directive-2014-45-eu-and-directive-...
Unit 1 Cost Accounting - Cost sheet
AI-assistance in Knowledge Collection and Curation supporting Safe and Sustai...
Power and position in leadershipDOC-20250808-WA0011..pdf
Probability Distribution, binomial distribution, poisson distribution
Chapter 5_Foreign Exchange Market in .pdf
HR Introduction Slide (1).pptx on hr intro
Data mining for business intelligence ch04 sharda
svnfcksanfskjcsnvvjknsnvsdscnsncxasxa saccacxsax
BsN 7th Sem Course GridNNNNNNNN CCN.pdf
Types of control:Qualitative vs Quantitative
IFRS Notes in your pocket for study all the time
Amazon (Business Studies) management studies

2-1-Data Visualization data data data.ppt

  • 1. Chapter 3 – Data Visualization © Galit Shmueli and Peter Bruce 2010 Shmueli, Patel & Bruce 1
  • 2. Graphs for Data Exploration Basic Plots Line Graphs Bar Charts Scatterplots Distribution Plots Boxplots Histograms 2 © Galit Shmueli and Peter Bruce 2010
  • 3. Line Graph for Time Series 3 © Galit Shmueli and Peter Bruce 2010
  • 4. Bar Chart for Categorical Variable 95% of tracts do not border Charles River Excel can confuse: y-axis is actually “% of records that have a value for CATMEDV” (i.e., “% of all records”) 4 © Galit Shmueli and Peter Bruce 2010
  • 5. Scatterplot Displays relationship between two numerical variables 5 © Galit Shmueli and Peter Bruce 2010
  • 6. Distribution Plots  Display “how many” of each value occur in a data set  Or, for continuous data or data with many possible values, “how many” values are in each of a series of ranges or “bins” 6 © Galit Shmueli and Peter Bruce 2010
  • 7. Histograms Histogram shows the distribution of the outcome variable (median house value) Boston Housing example: 7 © Galit Shmueli and Peter Bruce 2010
  • 8. Boxplots Boston Housing Example: Display distribution of outcome variable (MEDV) for neighborhoods on Charles river (1) and not on Charles river (0) Side-by-side boxplots are useful for comparing subgroups 8 © Galit Shmueli and Peter Bruce 2010
  • 9. Box Plot  Top outliers defined as those above Q3+1.5(Q3-Q1).  “max” = maximum of non-outliers  Analogous definitions for bottom outliers and for “min”  Details may differ across software Median Quartile 1 “max” “min” outliers mean Quartile 3 9 © Galit Shmueli and Peter Bruce 2010
  • 10. Heat Maps Color conveys information In data mining, used to visualize Correlations Missing Data 10 © Galit Shmueli and Peter Bruce 2010
  • 11. Heatmap to highlight correlations (Boston Housing) CRIM ZN INDUS CHAS NOX RM AGE DIS RAD TAX PTRATIO B LSTAT MEDV CRIM 1.00 ZN -0.20 1.00 INDUS 0.41 -0.53 1.00 CHAS -0.06 -0.04 0.06 1.00 NOX 0.42 -0.52 0.76 0.09 1.00 RM -0.22 0.31 -0.39 0.09 -0.30 1.00 AGE 0.35 -0.57 0.64 0.09 0.73 -0.24 1.00 DIS -0.38 0.66 -0.71 -0.10 -0.77 0.21 -0.75 1.00 RAD 0.63 -0.31 0.60 -0.01 0.61 -0.21 0.46 -0.49 1.00 TAX 0.58 -0.31 0.72 -0.04 0.67 -0.29 0.51 -0.53 0.91 1.00 PTRATIO 0.29 -0.39 0.38 -0.12 0.19 -0.36 0.26 -0.23 0.46 0.46 1.00 B -0.39 0.18 -0.36 0.05 -0.38 0.13 -0.27 0.29 -0.44 -0.44 -0.18 1.00 LSTAT 0.46 -0.41 0.60 -0.05 0.59 -0.61 0.60 -0.50 0.49 0.54 0.37 -0.37 1.00 MEDV -0.39 0.36 -0.48 0.18 -0.43 0.70 -0.38 0.25 -0.38 -0.47 -0.51 0.33 -0.74 1.00 In Excel (using conditional formatting) In Spotfire 11 © Galit Shmueli and Peter Bruce 2010
  • 12. Multidimensional Visualization 12 © Galit Shmueli and Peter Bruce 2010
  • 13. Scatterplot with color added Boston Housing NOX vs. LSTAT Red = low median value Blue = high median value 13 © Galit Shmueli and Peter Bruce 2010
  • 14. Matrix Plot Shows scatterplots for variable pairs Example: scatterplots for 3 Boston Housing variables 14 © Galit Shmueli and Peter Bruce 2010
  • 15. Rescaling to log scale (on right) “uncrowds” the data 15 © Galit Shmueli and Peter Bruce 2010
  • 16. Aggregation 16 © Galit Shmueli and Peter Bruce 2010
  • 17. Amtrak Ridership – Monthly Data 17 © Galit Shmueli and Peter Bruce 2010
  • 18. Aggregation – Monthly Average 18 © Galit Shmueli and Peter Bruce 2010
  • 19. Aggregation – Yearly Average 19 © Galit Shmueli and Peter Bruce 2010
  • 20. Scatter Plot with Labels (Utilities) 20 © Galit Shmueli and Peter Bruce 2010
  • 21. Scaling: Smaller markers, jittering, color contrast (Universal Bank; red = accept loan) 21 © Galit Shmueli and Peter Bruce 2010
  • 22. Jittering  Moving markers by a small random amount  Uncrowds the data by allowing more markers to be seen 22 © Galit Shmueli and Peter Bruce 2010
  • 23. Without jittering (for comparison) 23 © Galit Shmueli and Peter Bruce 2010
  • 24. Parallel Coordinate Plot (Boston Housing) Filter Settings - CAT. MEDV: (1) CATMEDV =1 CATMEDV =0 24 © Galit Shmueli and Peter Bruce 2010
  • 25. Linked plots (same record is highlighted in each plot) 25 © Galit Shmueli and Peter Bruce 2010
  • 26. Network Graph – eBay Auctions (sellers on left, buyers on right) Circle size = # of transactions for the node Line width =# of auctions for the buyer- seller pair Arrows point from buyer to seller 26 © Galit Shmueli and Peter Bruce 2010
  • 27. Treemap – eBay Auctions (Hierarchical eBay data: Category> sub-category> Brand) Rectangle size = average closing price (=item value) Color = % sellers with negative feedback (darker=more) 27 © Galit Shmueli and Peter Bruce 2010
  • 28. Map Chart (Comparing countries’ well-being with GDP) Darker = higher value 28 © Galit Shmueli and Peter Bruce 2010
  • 29. Summary of Major Visualizations & Operations, According to Data Mining Goal 29 © Galit Shmueli and Peter Bruce 2010
  • 30. Summary of Major Visualizations & Operations, According to Data Mining Goal 30 © Galit Shmueli and Peter Bruce 2010
  • 31. Summary of Major Visualizations & Operations, According to Data Mining Goal 31 © Galit Shmueli and Peter Bruce 2010