SlideShare a Scribd company logo
Krist Wongsuphasawat /@kristw
visualizationdata
A quick tour for data science enthusiasts
visualizationdata
What is it about?
What is it good for?
How is it related to data science?
Example projects
…
1. What is it about?
“A picture is worth more than a thousand words.”
— ใครสักคนได้กล่าวไว้
Data Visualization: A Quick Tour for Data Science Enthusiasts
Data Picture
Data Visual display
Help audience consume a lot of information rapidly
Data Visual display
2. What is it good for?
Example / History
data
Data Visualization: A Quick Tour for Data Science Enthusiasts
location (lat,lon => x,y), quantity of troops (width), direction (color)
time (x), temperature (y)
Example / Cholera epidemic
List of deceased patients
!
Mr. Smith, who lived at 11 Sunny St.
Miss White, who lived at 23 Cloudy Rd.
Mr. Jones, who lived at 30 Rainy St.
Mrs. Robinson, who lived at 34 Windy Rd.
…
data
John Snow
Data Visualization: A Quick Tour for Data Science Enthusiasts
Data Visualization: A Quick Tour for Data Science Enthusiasts
What is it good for?
Storytelling
Communicate known information
Exploratory data analysis
Explore data to reveal insights
More powerful
Visualization = Visual display + Interaction
3. How is it related
to data science?
Turn data into
valuable insights
data product
interesting stories
data
wrangling
output
insights, products, stories
exploratory
data analysis
report
results
raw data
in-depth
analysis
data
wrangling
output
insights, products, stories
exploratory
data analysis
report
results
in-depth
analysis
communication,
storytelling
raw data
4. Example projects
4.1 Ballon d’Or
Data Visualization: A Quick Tour for Data Science Enthusiasts
Data Visualization: A Quick Tour for Data Science Enthusiasts
FIFA released voting data
• 3 voters / country
• National team captain
• National team coach
• Journalist (media)
• Each voter select 3 players for 1st, 2nd and 3rd place
Rules
Data Visualization: A Quick Tour for Data Science Enthusiasts
data
wrangling
output
insights, products, stories
exploratory
data analysis
report
results
in-depth
analysis
communication,
storytelling
raw data
• Given data are tables in PDF.
• Extract to csv
• Format data to desired format.
Data Wrangling
Demo / Ballon d’Or
https://medium.com/@kristw/who-voted-for-who-diving-into-ballon-dor-voting-data-e09138ba9712
4.2 Public-facing vis
& New year 2013
interactive.twitter.com
Geo
Heatmap
Low density
High density
Geo
San Francisco
flickr.com/photos/twitteroffice/8798020541
Low density
High density
Geo
San Francisco
Rebuild the world
based on
tweet volumes
twitter.github.io/interactive/andes/
How are these phrases used in Tweets?
Is there any pattern?
data
wrangling
output
insights, products, stories
exploratory
data analysis
report
results
in-depth
analysis
communication,
storytelling
raw data
Big data wrangling
Having all Tweets
How people think I feel.
How people think I feel. How I really feel.
Having all Tweets
• Too much data, want only relevant Tweets
• contain “สวัสดีปีใหม่”
• variations: หวัดดีปีใหม่, หวัดดีปีหม่ายยย
• typos: หวัดตีปีใหม่
• Need to aggregate & reduce size
• Long processing time (hours)
Challenges
Hadoop Cluster
Data Storage
Workflow
Hadoop Cluster
Pig / Hive / Scalding (slow)
Data Storage
Tool
Workflow
Hadoop Cluster
Pig / Hive / Scalding (slow)
Data Storage
Tool
Workflow
Hadoop Cluster
Pig / Hive / Scalding (slow)
Data Storage
Tool
Smaller datasetYour laptop
Workflow
Hadoop Cluster
Pig / Hive / Scalding (slow)
Data Storage
Tool
Final dataset
Tool node.js / python / etc. (fast)
Your laptop
Workflow
Smaller dataset
Exploratory Data Analysis
Improve design
for releasing to public
Data Visualization: A Quick Tour for Data Science Enthusiasts
Data Visualization: A Quick Tour for Data Science Enthusiasts
Data Visualization: A Quick Tour for Data Science Enthusiasts
Data Visualization: A Quick Tour for Data Science Enthusiasts
Data Visualization: A Quick Tour for Data Science Enthusiasts
Demo / New Year 2013
twitter.github.io/interactive/newyear2014/
Data Visualization: A Quick Tour for Data Science Enthusiasts
Data Visualization: A Quick Tour for Data Science Enthusiasts
Another fun fact:
Developed using 2012 data
Then update data on Jan 2, 2013
4.3 Data Analysis Tool
data
wrangling
output
insights, products, stories
exploratory
data analysis
report
results
in-depth
analysis
communication,
storytelling
raw data
Logging user activities
UsersUseTwitter
UsersUse
Product Managers
Curious
Twitter
UsersUse
Curious
Engineers
Log data
in Hadoop
Write Twitter
Instrument
Product Managers
What are being logged?
tweet
activities
What are being logged?
tweet from home timeline on twitter.com
tweet from search page on iPhone
activities
What are being logged?
tweet from home timeline on twitter.com
tweet from search page on iPhone
sign up
log in
retweet
etc.
activities
Organize?
log event a.k.a. “client event”
[Lee et al. 2012]
log event a.k.a. “client event”
client : page : section : component : element : action
web : home : timeline : tweet_box : button : tweet
1) User ID
2) Timestamp
3) Event name
4) Event detail
[Lee et al. 2012]
Twitter for Banana
Count page visits
banana : home : - : - : - : impression
home page
User sessions
Session#1
A
B
start
end
Session#4
start
end
A
Session#2
B
start
end
A
Session#3
C
start
end
Aclient event
client event
Funnel
home page
profile page
Funnel analysis
banana : home : - : - : - : impression
banana : profile : - : - : - : impression
1 jobhome page
profile page
1 hour
Funnel analysis
banana : home : - : - : - : impression
banana : profile : - : - : - : impression banana : search : - : - : - : impression
home page
profile page search page
2 jobs
2 hours
Funnel analysis
banana : home : - : - : - : impression
banana : profile : - : - : - : impression banana : search : - : - : - : impression
home page
profile page search page
Specify all funnels manually!
n jobs
n hours
Goal
banana : home : - : - : - : impression
… ……
1 job => all funnels, visualized
home page
User sessions
Session#1
A
B
start
end
Session#4
start
end
A
Session#2
B
start
end
A
Session#3
C
start
end
A
Aggregate
4 sessions
A
BB C
start
end endend
A A
end
A
Aggregate
A
BB C
start
end endend
end
4 sessions
Aggregate
C
start
end endend
end
A
B
4 sessions
Aggregate
C
start
end endend
end
A
B
4 sessions
Aggregate
C
start
end endend
A
B end
4 sessions
Aggregate
C
start
endend
A
B end
4 sessions
Aggregate
C
start
endend
A
B end
4 sessions
Aggregate
start
endend
A
CB end
4 sessions
Aggregate
4,000,000 sessions
endend
A
CB end
start
Demo / Flying Sessions
Using Visualizations to Monitor Changes and Harvest Insights from a Global-Scale Logging
Infrastructure at Twitter by Krist Wongsuphasawat and Jimmy Lin. in Proc. IEEE Conference on Visual
Analytics Science and Technology (VAST), Paris, France, 13 November, 2014
visualizationdata
What is it about?
Data => Visual display + Interaction
What is it good for?
Exploratory data analysis & storytelling
How is it related to data science?
It is one of the skills often utilized in the process.
Example projects
interactive.twitter.com @kristw / kristw.yellowpigz.com
Thank you
Questions?

More Related Content

PDF
Making Sense of Millions of Thoughts: Finding Patterns in the Tweets
PDF
Increasing the Impact of Visualization Research
PDF
Data Visualization at Twitter
PPT
Data 2.0 - Harnessing New Data Visualization Tools CIL 2008
PDF
Adventure in Data: A tour of visualization projects at Twitter
PDF
How to become a Data Scientist?
PDF
Using hadoop for big data
PDF
Explore Data: Data Science + Visualization
Making Sense of Millions of Thoughts: Finding Patterns in the Tweets
Increasing the Impact of Visualization Research
Data Visualization at Twitter
Data 2.0 - Harnessing New Data Visualization Tools CIL 2008
Adventure in Data: A tour of visualization projects at Twitter
How to become a Data Scientist?
Using hadoop for big data
Explore Data: Data Science + Visualization

What's hot (20)

PDF
Life of a data scientist (pub)
PPTX
How To Become a Data Scientist in Iran Marketplace
PDF
Python for Data Science
PDF
Be a Data Scientist in 8 steps!
PDF
What to expect when you are visualizing (v.2)
PDF
Introduction to Python for Data Science
PPTX
Big Data: Architectures and Approaches
PDF
6 things to expect when you are visualizing
PDF
Clare Corthell: Learning Data Science Online
PDF
Intro to Python for Data Science
PDF
R, Data Wrangling & Kaggle Data Science Competitions
PDF
How to Become a Data Scientist – By Ryan Orban, VP of Operations and Expansio...
PDF
Claudia Gold: Learning Data Science Online
PDF
Data science presentation
PDF
Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017
PDF
Sentiment Analysis In Retail Domain
PDF
Data Science : Make Smarter Business Decisions
PDF
Analyzing social media with Python and other tools (4/4)
PDF
UBC STAT545 2014 Cm001 intro to-course
PPTX
How to become a data scientist
Life of a data scientist (pub)
How To Become a Data Scientist in Iran Marketplace
Python for Data Science
Be a Data Scientist in 8 steps!
What to expect when you are visualizing (v.2)
Introduction to Python for Data Science
Big Data: Architectures and Approaches
6 things to expect when you are visualizing
Clare Corthell: Learning Data Science Online
Intro to Python for Data Science
R, Data Wrangling & Kaggle Data Science Competitions
How to Become a Data Scientist – By Ryan Orban, VP of Operations and Expansio...
Claudia Gold: Learning Data Science Online
Data science presentation
Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017
Sentiment Analysis In Retail Domain
Data Science : Make Smarter Business Decisions
Analyzing social media with Python and other tools (4/4)
UBC STAT545 2014 Cm001 intro to-course
How to become a data scientist
Ad

Viewers also liked (9)

PDF
CfJSummit2015 Day2 データとプログラミングで絵を描こう
PDF
Code for Japan 第10回 Brigadeワークショップ
PPTX
リスク可視化の基本的方法
PDF
Data Visualization Japanの目指すもの
PDF
San Diego Japan Bio Forum: ライフサイエンス向けデータ可視化技術の現状
PDF
Visualizing biological graphs in Cytoscape.js
PDF
Html5j data visualization_and_d3
PDF
「モダンな」可視化アプリケーション開発とはどのようなものか?
PPTX
Large Scale Graph Analytics with JanusGraph
CfJSummit2015 Day2 データとプログラミングで絵を描こう
Code for Japan 第10回 Brigadeワークショップ
リスク可視化の基本的方法
Data Visualization Japanの目指すもの
San Diego Japan Bio Forum: ライフサイエンス向けデータ可視化技術の現状
Visualizing biological graphs in Cytoscape.js
Html5j data visualization_and_d3
「モダンな」可視化アプリケーション開発とはどのようなものか?
Large Scale Graph Analytics with JanusGraph
Ad

Similar to Data Visualization: A Quick Tour for Data Science Enthusiasts (20)

PDF
Big Data and Hadoop in the Cloud
PDF
What to expect when you are visualizing
PPTX
Big Data in NATO and Your Role
PDF
Using Visualizations to Monitor Changes and Harvest Insights from a Global-sc...
PDF
MPhil Lecture on Data Vis for Analysis
PDF
Logs & Visualizations at Twitter
PDF
Replication in Data Science - A Dance Between Data Science & Machine Learning...
PDF
Appboy analytics - NYC MUG 11/19/13
PDF
user Behavior Analysis with Session Windows and Apache Kafka's Streams API
PDF
Using Apache Kafka to Analyze Session Windows
PPTX
DMDS Winter Workshop 2 Slides
PDF
Gearing up! A Designer-Focused Evaluation of Ideation Tools for Connected Pro...
PDF
Into The Wonderful
PPT
PUC Masterclass Big Data
PDF
Bridging Big Data and Data Science Using Scalable Workflows
PDF
Designers @ Hackathons
PDF
How to create a personal knowledge graph IBM Meetup Big Data Madrid 2017
PPT
From Project to Program: Building Sustainable Digital Collections
PDF
Data Science Salon: In your own words: computing customer similarity from tex...
PDF
Lean innovation - Basic principles of Lean
Big Data and Hadoop in the Cloud
What to expect when you are visualizing
Big Data in NATO and Your Role
Using Visualizations to Monitor Changes and Harvest Insights from a Global-sc...
MPhil Lecture on Data Vis for Analysis
Logs & Visualizations at Twitter
Replication in Data Science - A Dance Between Data Science & Machine Learning...
Appboy analytics - NYC MUG 11/19/13
user Behavior Analysis with Session Windows and Apache Kafka's Streams API
Using Apache Kafka to Analyze Session Windows
DMDS Winter Workshop 2 Slides
Gearing up! A Designer-Focused Evaluation of Ideation Tools for Connected Pro...
Into The Wonderful
PUC Masterclass Big Data
Bridging Big Data and Data Science Using Scalable Workflows
Designers @ Hackathons
How to create a personal knowledge graph IBM Meetup Big Data Madrid 2017
From Project to Program: Building Sustainable Digital Collections
Data Science Salon: In your own words: computing customer similarity from tex...
Lean innovation - Basic principles of Lean

More from Krist Wongsuphasawat (20)

PDF
What I tell myself before visualizing
PDF
Navigating the Wide World of Data Visualization Libraries
PDF
Encodable: Configurable Grammar for Visualization Components
PDF
6 things to expect when you are visualizing (2020 Edition)
PDF
ร้อยเรื่องราวจากข้อมูล / Storytelling with Data
PDF
Reveal the talking points of every episode of Game of Thrones from fans' conv...
PDF
From Data to Visualization, what happens in between?
PDF
A Narrative Display for Sports Tournament Recap
PDF
Visualization for Event Sequences Exploration
PDF
Krist Wongsuphasawat's Dissertation Proposal Slides: Interactive Exploration ...
PDF
Usability of Google Docs
PDF
Outflow: Exploring Flow, Factors and Outcome of Temporal Event Sequences
PDF
Information Visualization for Knowledge Discovery
PDF
Krist Wongsuphasawat's Dissertation Defense: Interactive Exploration of Tempo...
PDF
Information Visualization for Health Care
PDF
LifeFlow: Understanding Millions of Event Sequences in a Million Pixels
PDF
Information Visualization for Knowledge Discovery: An Introduction
PDF
Finding Comparable Temporal Categorical Records: A Similarity Measure with an...
PDF
Outflow: Visualizing Patients Flow by Symptoms & Outcome
What I tell myself before visualizing
Navigating the Wide World of Data Visualization Libraries
Encodable: Configurable Grammar for Visualization Components
6 things to expect when you are visualizing (2020 Edition)
ร้อยเรื่องราวจากข้อมูล / Storytelling with Data
Reveal the talking points of every episode of Game of Thrones from fans' conv...
From Data to Visualization, what happens in between?
A Narrative Display for Sports Tournament Recap
Visualization for Event Sequences Exploration
Krist Wongsuphasawat's Dissertation Proposal Slides: Interactive Exploration ...
Usability of Google Docs
Outflow: Exploring Flow, Factors and Outcome of Temporal Event Sequences
Information Visualization for Knowledge Discovery
Krist Wongsuphasawat's Dissertation Defense: Interactive Exploration of Tempo...
Information Visualization for Health Care
LifeFlow: Understanding Millions of Event Sequences in a Million Pixels
Information Visualization for Knowledge Discovery: An Introduction
Finding Comparable Temporal Categorical Records: A Similarity Measure with an...
Outflow: Visualizing Patients Flow by Symptoms & Outcome

Recently uploaded (20)

PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PPTX
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PDF
Business Analytics and business intelligence.pdf
PDF
.pdf is not working space design for the following data for the following dat...
PPTX
Introduction to machine learning and Linear Models
PDF
annual-report-2024-2025 original latest.
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PPT
Quality review (1)_presentation of this 21
PPTX
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PPTX
IB Computer Science - Internal Assessment.pptx
PPT
Reliability_Chapter_ presentation 1221.5784
PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
PPTX
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PDF
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
PDF
Foundation of Data Science unit number two notes
IBA_Chapter_11_Slides_Final_Accessible.pptx
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
Introduction-to-Cloud-ComputingFinal.pptx
Business Analytics and business intelligence.pdf
.pdf is not working space design for the following data for the following dat...
Introduction to machine learning and Linear Models
annual-report-2024-2025 original latest.
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
Quality review (1)_presentation of this 21
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
IB Computer Science - Internal Assessment.pptx
Reliability_Chapter_ presentation 1221.5784
Business Ppt On Nestle.pptx huunnnhhgfvu
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
Foundation of Data Science unit number two notes

Data Visualization: A Quick Tour for Data Science Enthusiasts