Applying Machine Learning to Data Visaulization: What, Why, Where, and How

Qianwen Wang

2022.02.03
ApplyingMachineLearningAdvances

toDataVisualization
WHAT
WHY
HOW
WHERE
1

ID paper venue
D
a
t
a
P
r
o
c
e
s
s
i
n
g
4
V
I
S
D
a
t
a
-
V
I
S
M
a
p
p
i
n
g
I
n
s
i
g
h
t
C
o
m
m
u
n
i
c
a
t
i
o
n
V
I
S
R
e
a
d
i
n
g
V
I
S
I
n
t
e
r
a
c
t
i
o
n
C
l
u
s
t
e
r
i
n
g
D
i
m
e
n
s
i
o
n
R
e
d
u
c
t
i
o
n
G
e
n
e
r
a
t
i
v
e
C
l
a
s
s
i
f
i
c
a
t
i
o
n
R
e
g
r
e
s
s
i
o
n
S
e
m
i
-
s
u
p
e
r
v
i
s
e
d
R
e
i
n
f
o
r
c
e
m
e
n
t
U
s
e
r
P
r
o
f
i
l
i
n
g
1 Sips et al. [1] EuroVis 2009
X
X
2 Gotz and Wen [2] IUI 2009 X X
3 Savva et al. [3] UIST 2011 X X
4 Key et al. [4] SIGMOD 2012 X X
5 Steichen et al. [5] IUI 2013 X X
6 Brown et al. [6] TVCG 2014 X X
7 Lalle et al. [7] IUI 2014 X X
8 Toker et al. [8] IUI 2014 X X
9 Sedlmair and Aupetit [9] CGF 2015 X X
10 Mutlu et al. [10] TiiS 2016 X X
11 Aupetit and Sedlmair [11] PVis 2016 X X
12 Siegel et al. [12] ECCV 2016 X X
13 Kembhavi et al. [13] ECCV 2016 X X
14 Al-Zaidy et al. [14] AAAI 2016 X X
15 Pezzotti et al. [15] TVCG 2016 X X
16 Poco et al. [16] VIS 2017 X X
17 Kwon et al. [17] VIS 2017 X X
18 Bylinskii et al. [18] UIST 2017 X X
19 Saha et al. [19] IJCAI 2017 X X
20 Kruiger et al. [20] EuroVis 2017 X X
21 Poco and Heer [21] EuroVis 2017 X X
22 Jung et al. [22] CHI 2017 X X
23 Bylinskii et al. [23] arxiv 2017 X X X
24 Al-Zaidy and Giles [24] AAAI 2017 X X
25 Siddiqui et al. [25] VLDB 2018 X X X
26 Gramazio et al. [26] VIS 2018 X X
27 Moritz et al. [27] VIS 2018 X X X
28 Berger et al. [28] VIS 2018 X X
29 Wang et al. [29] VIS 2018 X X
30 Haehn et al. [30] VIS 2018 X X
31 Luo et al. [31] SIGMOD 2018 X X X
32 Milo and Somech [32] KDD 2018 X X
33 Zhou et al. [33] IJCAI 2018 X X
34 Kahou et al. [34] ICLR 2018 X X
35 Luo et al. [35] ICDE 2018 X X
36 Fan and Hauser [36] EuroVis 2018 X X
37 Chegini et al. [37] EuroVis 2018 X X
38 Kafle et al. [38] CVPR 2018 X X X
39 Kim et al. [39] CVPR 2018 X X
40 Battle et al. [40] CHI 2018 X X
41 Dibia and Demiralp [41] CGA 2018 X X
42 Haleem et al. [42] CGA 2018 X X
43 Madan et al. [43] arxiv 2018 X X X
44 Yu and Silva [44] VIS 2019 X X
45 He et al. [45] VIS 2019 X X
46 Chen et al. [46] VIS 2019 X X
47 Han and Wang [47] VIS 2019 X X
49 Kwon and Ma [49] VIS 2019 X X
51 Han et al. [51] VIS 2019 X X X
52 Wall et al. [52] VIS 2019 X X
53 Fujiwara et al. [53] VIS 2019 X X
54 Fu et al. [54] VIS 2019 X X X
55 Porter et al. [55] VIS 2019 X X
56 Jo and Seo [56] VIS 2019 X X X
57 Ma et al. [57] VIS 2019 X X
59 Cui et al. [59] VIS 2019 X X
62 Smart et al. [62] VIS 2019 X X
63 Huang et al. [63] VIS 2019 X X
64 Hong et al. [64] PVis 2019 X X
65 Fan and Hauser [65] EuroVis 2019 X X
66 Ottley et al. [66] EuroVis 2019 X X
67 Abbas et al. [67] EuroVis 2019 X X X
68 Kassel and Rohs [68] EuroVis 2019 X X X
69 Hu et al. [69] CHI 2019 X X
70 Fan and Hauser [70] CGA 2019 X X
71 Kafle et al. [71] arxiv 2019 X X
72 Mohammed [72] VLDB 2020 X X
73 Zhang et al. [73] VIS 2020 X X X
74 Wu et al. [74] VIS 2020 X X
75 Tang et al. [75] VIS 2020 X X
76 Qian et al. [76] VIS 2020 X X
78 Oppermann et al. [78] VIS 2020 X X
79 Fosco et al. [79] UIST 2020 X X
80 Giovannangeli et al. [80] PacificVis 2020 X X
81 Liu et al. [81] PacificVis 2020 X X X
82 Luo et al. [82] ICDE 2020 X X X
83 Lekschas et al. [83] EuroVis 2020 X X X X
84 Zhao et al. [84] CHI 2020 X X
85 Lai et al. [85] CHI 2020 X X X
86 Kim et al. [86] CHI 2020 X X X
87 Lu et al. [87] CHI 2020 X X X
88 Zhou et al. [88] arxiv 2020 X X
S
t
y
l
e
I
m
i
t
a
t
i
o
n
Current Practices,

Trends,

Challenges,

Opportunities
2

https://ml4vis.github.io
https://github.com/ML4VIS/ML4VIS.github.io/
3

Outline
WHAT
What is ML4VIS WHY
Why ML4VIS
WHERE
Where do the needs for
ML exist in visualization HOW
How can ML be used for
visualization problems
Summary
Deep Learning-based Auto-
Extraction of Extensible Timeline
Chen et al. IEEE InfoVIS 2019
4

D
a
t
a
Visu
a
liz
a
tion (VIS), M
a
chine Le
a
rning (ML),

VIS4ML, ML4VIS
WHAT
5

Data
Real World
Humans
VIS
ML
8

VIS
ML
Strengths of human visual perception systems to
e
ffi
ciently make sense of data

"a picture is worth a thousand words"
Unprecedented power of automatic algorithms to
reveal hidden patterns from large amount of data
without human intervention
ML4
VIS4
9

VIS4ML
Known relationships
between medical entities
ML
Qianwen Wang et al. 2021

ICML Workshop on

Interpretable Machine Learning in Healthcare
10

VIS4ML
11

Visual Genealogy of Deep Neural Networks

IEEE TVCG

VIS4ML

Visual Genealogy of Deep Neural Networks

IEEE TVCG
12

VIS4ML
13
Data
Collection
Model
Development
Model
Evaluation
Model
Application

VIS
4
ML
Assess
Create
Design
14

Why ML4VIS

Why this ML4VIS survey
WHY
15

WhyML4VIS
It can be challenging to create e
ff
ective visualizations
http://leoyuholo.com/bad-vis-browser/
https://www.reddit.com/r/shittydataisbeautiful/
Data
Analytics
Graphic
Design
Full Stack
Development
User
Experience
Cognitive
Science
Human-
Computer-
Interaction
16

WhyanML4VISsurvey
Capabilities of ML
Needs in Visualization
17

WhyanML4VISsurvey
Capabilities of ML
Applying ML to unsuitable visualization problems
may only impose the drawbacks of ML (e.g.,
uncertainty, inexplainability) without bringing any
bene
fi
t.
18

WhyanML4VISsurvey
Capabilities of ML
Given a suitable visualization problem,
selecting a proper ML technique and
employing necessary adaptation are
crucial yet challenging.
19

WhyanML4VISsurvey
Capabilities of ML
WHERE
HOW
20

WhyanML4VISsurvey
Capabilities of ML
WHERE
HOW
21

Where do the needs exist in visu
a
liz
a
tion?
WHERE
22

Data VIS Users
Clear, process, transform data Create visualizations Interpret, interact with, extract
information from visualizations
D
a
t
a
-VIS M
a
pping
Insight
Communic
a
tion
Style Imit
a
tion
VIS Inter
a
ction
User Pro
f
iling
VIS Re
a
ding
4VIS
D
a
t
a

Processing
23

Data
Processing4VIS
Data
Data
VIS
Input
Output
Luo, Yuyu, et al. "Interactive cleaning for progressive visualization
through composite questions." 2020 IEEE 36th International Conference on
Data Engineering (ICDE). IEEE, 2020.
Data with erros/missing values
Data with no errors that will
in
fl
uence the visualization

25
raw data is transformed into a
format that better suits the
following visualization processes

Data-VIS
Mapping
VIS
Data
Input
Output
[{“sale”: “100”, “catgegory”: “car”,“year”: “1993”}

…

{“sale”: “1605”, “catgegory”: “car”,“year”: “1993”}]
Haotian Li et al. 2019

KG4Vis: A Knowledge Graph-Based Approach for Visualization Recommendation
26
data
fi
elds are mapped
into visual channels

Insight
Communication
Insight
VIS
Data
“Among all students, 49%
like football, 32% like
basketball, and 21% like
baseball.”

Input
Output
Wewei Cui et al. 2019

Text-to-VIS
27
insights are embedded in
visualizations to be
e
ff
ectively communicated

Style
Imitation
Style
Data
VIS
Input
Output
A layout style that
emphasise the node
communities
Network Data
Yong Wang et al. 2019

DeepDrawing: A Deep Learning Approach to Graph Drawing
28
styles are extracted from
the given examples and
applied to the created
visualization
A graph with similar style

VIS
Interaction
VIS
VIS
User
Action
Input
Output
3D point cloud
2D lasso selection
Chen et al.

LassoNet: Deep Lasso-Selection of 3D Point Clouds

IEEE InfoVIS 2019 & TVCG

29
users interact with a
visualization and
transformed it into a new
stage through user actions

User Profiling
User
Action
VIS
User Action
User
Characteristic
or
Input
Output
Eye-tracker records

Perceptual speed

Verbal working memory

Visual working memory

Locus of control (personality trait)
Learning curve for a certain visual analysis task
Sébastien Lallé et al. 2020

Prediction of Users’ Learning Curves for Adaptation while Using
an Information Visualization
A speci
fi
c visualization
30
user actions with
visualizations are logged
and analyzed to better
understand users

VIS Reading
VIS
Data Style
Insight
or
Input
Output
Can Liu et al. 2020

AutoCaption: An Approach to Generate Natural Language Description from Visualization Automatically

Paci
fi
cVis
31
users read visualizations
and obtain useful
information

Data
Processing4VIS
Insight
Style
Visualization
VIS Reading
Data-VIS
Mapping
Insight
Communication
Style Imitation
USER
VIS
DATA
User Action
User Profiling User
Characteristics
VIS Interaction
Data
32

Data
Processing4VIS
Insight
Style
Visualization
VIS Reading
Data-VIS
Mapping
Insight
Communication
Style Imitation
USER
VIS
DATA
User Action
User Profiling User
Characteristics
VIS Interaction
Data
33

It would be great if I can create fancy
timeline infographics (Style Imitation)
34
Chen et al.

Towards Automated Infographic
Design: Deep Learning-based Auto-


Manually?
35
Chen et al.



2014
2015
2016
The first year of my Ph.D.
Everything is wonderful!
My first submission to VIS
has been accepted…
My second submission to VIS
has been accepted Again!
Chen et al.



36
2002
2006
2010
Brazil 2-0 Germany. A beautiful
match.
Italy 1 – 1 France. OMG Zidane
head-butted Materazzi!
Spain 1-0 Netherlands. What a pity
for Netherlands.
2014
2018
Germany 1-0 Argentina. Wonderful
game.
France 4-2 Croatia. Very exciting for
so many goals.
New Data
?
Can we ask the question differently?

Can we extract the template from a bitmap timeline infographic automatically
2014
2015
2016
The 1st year of my Ph.D. Everything is
wonderful!
My first submission to VIS has been
accepted…
My second submission to VIS has
been accepted Again!
2002
2006
2010
match.
for Netherlands.
2014
2018
game.
so many goals.
Font
Font
Font
Icon Icon
Icon
Font
Font
Font
2014
2015
2016
wonderful!
accepted…
am am
Linear, Sequential, Unified, Horizontal
at
am
et
em em
em
et
et
at
at
VIS Reading (ML-based) non-ML-based
Chen et al.



37

Data
Processing4VIS
Insight
Style
Visualization
VIS Reading
Data-VIS
Mapping
Insight
Communication
Style Imitation
USER
VIS
DATA
User Action
User Profiling User
Characteristics
VIS Interaction
Data
38

Data
Processing4VIS
Insight
Style
Visualization
VIS Reading
Data-VIS
Mapping
Insight
Communication
Style Imitation
USER
VIS
DATA
User Action
User Profiling User
Characteristics
VIS Interaction
Data
39

How c
a
n ML be used to s
a
tisfy these needs?
HOW
40

ML models are quickly evolving
41

Supervised
Learning
Semi-Supervised
Learning
Unsupervised
Learning
Reinforcement
Learning
42

Supervised
Learning
How a visualization problem can be formed as a
supervised learning task

• Training Dataset (labeled input-output pairs)

• The output can be either described using a numerical
value (regression) or
fi
nite number of types
(classi
fi
cation)
Classi
fi
cation Regression
a model learns the mapping from input X to output Y
from the labeled training examples
FigureSeer Dataset (60k),

AI2D dataset (5k),

Visually29K dataset (29k),

DVQA dataset (300k)

FigureQA dataset (100k)

ColorMapping dataset (1.6k)
43

Classi
fi
cation Regression
Supervised
Learning


fi
(classi
fi
cation)
VIS Reading:

A saliency score for
each pixel
VIS Reading:

Bounding box and
data values
User Pro
fi
ling:

Learning curve
A score for a
visualisation?

A score for a data
processing?
44

Classi
fi
cation Regression
Supervised
Learning


fi
(classi
fi
cation)
Data-VIS Mapping

Is there always a
fi
nite
number of classes?
VIS Interaction

type of action

VIS Reading

type of a chart

45

Unsupervised
Learning
Generative
Clustering
Dimension
Reduction
unsupervised learning task

• Labeled dataset is unavailable

• Find similar new samples by learning the
distribution of existing samples (Generative)

a model learns the underlying structure of
the unlabelled data X
Chen Chen et al. 2019

GenerativeMap: Visualization and Exploration of Dynamic Density Maps via
Generative Learning Model

Alvitta Ottley et al 2019

Follow The Clicks: Learning and Anticipating Mouse Interactions During
Exploratory Data Analysis
G
Generate interpolation
visualizations
Generate next step user actions
?
?
t t+n
46

Semi-supervised
Learning
How a visualization problem can be formed as
a semi-supervised learning task


• The output can be either described using a
numerical value (regression) or
fi
nite number
of types (classi
fi
cation)

• Only a small amount of data is labeled

• Interactively query new labels from users
Similar to supervised learning.

But this model is trained using a small amount of
labeled data with a large amount of unlabeled data.
47

Reinforcement
Learning
an agent learns to take actions in an
environment to maximize the
cumulative rewards.
How a visualization problem can be formed as a a
reinforcement learning task

• The solution can be formed as a set of actions

• The quality of the solution can be presented by
cumulative rewards
Tan Tang et al. 2020

PlotThread: Creating Expressive Storyline Visualizations using Reinforcement Learning
Decomposing the creation of a timeline
visualization as a set of actions
Reward: Δsimilarity between the ground truth
layout and the k-th step layout
48

2002
2006
2010
match.
for Netherlands.
2014
2018
game.
so many goals.
Font
Font
Font
Icon Icon
Icon
Font
Font
Font
non-ML-based
2014
2015
2016
wonderful!
accepted…
Bitmap Image Content

Understanding
2014
2015
2016
wonderful!
accepted…
am am
Linear, Sequential, Unified, Horizontal
at
am
et
em em
em
et
et
at
at
VIS Reading (ML-based)
TaskForming
49
Supervised
Learning

M. Brehmer, B. Lee, B. Bach, N. H.
Riche, and T. Munzner. Timelines
Revisited: A Design Space and
Considerations for Expressive
Storytelling. IEEE TVCG
About the whole
timeline:

1. Representation

2. Scale

3. Layout

4. Orientation
About the elements:

1. Category

2. Location

3. Mask
Classi
fi
cation
of an Image
Classi
fi
cation of an object
Regression
Classi
fi
cation of a pixel
TaskForming
50

ResNeXt - FPN
RPN
RoiAlign layer
Feature maps
Box Head
Element
Bbox
Element
Category
Mask Head
Element
Mask
Timeline
Type
Fixed size feature map of
a RoI
Timeline
Orientation
Feature maps
with RoIs
Global
Local
51
Mask R-CNN. Kaiming He, Georgia Gkioxari, Piotr Dollar, Ross Girshick

The model is then fine-tuned with

A Synthetic dataset (4296) + a Real world dataset (393)
TimelineStoryteller:

https://timelinestoryteller.com/
The model is pre-trained with Microsoft COCO Dataset
TrainingData
52

ML4VIS:

Align Needs with Capabilities
55

Supervised ML is the most widely use ML techniques
56

Calling for more diverse ML techniques and
more close AI-human collaboration
57

What are still missing:

Multi-View Visualizations

Visualization Interactions

Visualization Animation
59


The adaption of ML techniques for
visualization data
60


The adaption of ML techniques for
visualization data
61

Deep learning for natural images, a blessing and a curse
62

Take-HomeMessage
• 7 visualization processes
that can bene
fi
t from ML
• How to form di
ff
erent visualization
problems into 4 main types of ML tasks
63

Thanks
Questions & Comments

a
re welcome!
64

Applying Machine Learning to Data Visaulization: What, Why, Where, and How

More Related Content

Similar to Applying Machine Learning to Data Visaulization: What, Why, Where, and How (20)

Recently uploaded (20)

Applying Machine Learning to Data Visaulization: What, Why, Where, and How