SlideShare a Scribd company logo
Data Visualization for
Big Data
Rosa Romero Gómez, Ph.D
rosaromerogomez.com
@87rromero
Experiences from the Front Line
[Georgia Tech campus, Klaus Advanced Computing building, May 27th 2016]
Data Visualization
Why? What? How?
Why?
The greatest value of a picture is
when it forces us to notice what we
never expected to see
[John W. Tukey. (1981) Exploratory Data Analysis]
The greatest value of a picture is
when it forces us to notice what we
never expected to see
[John W. Tukey. (1981) Exploratory Data Analysis]
Let me put you a
simple example…
[Sample data sets recreated from Francis J. Anscombe (1973). Graphs in statistical analysis.
Source: Andy Kirk. (2012) Data visualization: A successful design process]
[Source: http://commons. wikimedia.org/wiki/File:Anscombe%27s_quartet_3.svg]
Data visualization addresses…
…Information Scalability
…Visual Scalability
…Human Scalability
Data visualization addresses…
…Human Scalability
• It enhances the recognition of patterns
• It increases our efficiency to explore large datasets
• It supports decisions
• It expands our working memory to solve problems
What?
Data visualization is the use of interactive
visual representations of data to amplify
cognition
[Stuart Card. (2008) Information visualization]
Data visualization is the use of interactive
visual representations of data to amplify
cognition
[Stuart Card. (2008) Information visualization]
Data visualization is the use of interactive
visual representations of data to amplify
cognition
[Stuart Card. (2008) Information visualization]
Data visualization is the use of interactive
visual representations of data to amplify
cognition
[Stuart Card. (2008) Information visualization]
Data visualization is the use of interactive
visual representations of data to amplify
cognition
[Stuart Card. (2008) Information visualization]
Data visualization
is not…
[The Starry Night. (1889) Vincent Van Gogh. Source: https://en.wikipedia.org/wiki/The_Starry_Night#/
media/File:Van_Gogh_-_Starry_Night_-_Google_Art_Project.jpg ]
[Source: http://elpais.com/elpais/2016/10/28/media/1477669343_348572.html]
[Source: http://viz.wtf/]
How?
Why are we doing this
visualization project?
Even more important…
Case Study:
Visualization of the IPv4 address space
for network threat investigation
Network Threat
Analyst
Computer Network Data Collection Point
Get to know the context…
User
CMD Tools
Websites
Logs
Physical &
Task
Context
Technical Context
Let me tell you a story…
Step 1: Identify relevant visualization tasks
•Find suspicious IPs blocks
•Find domain names associated with specific IPs
•Examine the presence of domain names on blacklists
•Examine the relation of domain names with malware
•Identify the geographical location of IPs
•Identify the ownership of domain names
•Find suspicious Autonomous Systems
Data Visualization for Big Data: Experience from the Front Line
The more accessible your visualization,
the greater your audience and your impact
[Scott Murray. (2013) Interactive Data Visualization for the Web]
Step 2: Choose a library
Step 2: Choose a library
•Functionality: Does it support the visualizations I
need?
•License: open source or commercial?
•Active support and development
•Browser compatibility
•Dependencies (e.g. React.js)
Step 2: Choose a library
Building a
visualization
with charting
libraries such
as Chart.js,
Tableau…
Step 2: Choose a library
Building a
visualization
with D3.js
•D3 is not really a “visualization library”; it does not
draw visualizations
•D3 = “Data-Driven Documents”; it associates data with
DOM elements and manages the results
•D3.js provides with tools such as layout, scales,
shapes that you can use to build visualizations
Step 2: Choose a library
Data Visualization for Big Data: Experience from the Front Line
Step 3: Data transformations
{"date":"20160408","qname":"*.3rdandmonster.com.","qtype":1,"rdata":
{"string":"66.96.161.142"},"ttl":null,"authority_ips":"216.239.36.109","count":1,"hours":
1048576,"source":"gt","sensor":"active-dns"}

{"date":"20160408","qname":"*.aavxxnbm.org.","qtype":1,"rdata":{"string":"213.184.126.162"},"ttl":{"int":
604800},"authority_ips":"213.184.126.162","count":10,"hours":5543209,"source":"gt","sensor":"active-dns"}

{"date":"20160408","qname":"*.aenhfat.info.","qtype":1,"rdata":{"string":"213.184.126.162"},"ttl":{"int":
604800},"authority_ips":"213.184.126.162","count":4,"hours":8397064,"source":"gt","sensor":"active-dns"}

{"date":"20160408","qname":"*.agzksjhrmf.info.","qtype":1,"rdata":{"string":"213.184.126.162"},"ttl":{"int":
604800},"authority_ips":"213.184.126.162","count":5,"hours":4329736,"source":"gt","sensor":"active-dns"}
[Fragment of Active DNS resolution queries in deserialized Avro format - JSON format,
https://www.activednsproject.org]
Pre-processed data Domain Name
IP address
Step 3: Data transformations
Guided by the Visual Information-Seeking Mantra:
“Overview first,
Zoom and Filter, and then
Details-on-Demand”
[Shneiderman. (1996) The Eyes Have It: A Task by Data Type Taxonomy for Information Visualizations]
Step 3: Data transformations
{
"date": "dateValue",
"children": [{
"name": “/8Name",
"size": “numberOfIPs/8",
"color": “numberOfBlacklistedDomainNames/8",
"children":
[
{
"name": "/16Name",
"size": "numberOfIPs/16",
"color": "numberOfBlacklistedDomainNamesper/16",
"children": [
….
]
}
….
]
}
Nested JSON
format template
(JSON file per day)
Nested IPs in the following format:
/8 >> /16 >> /24 >> /32
Visual variables
Step 3: Data transformations
{
"date": "dateValue",
"children": [{
"name": “Continent",
"size": “numberOfIPsContinent",
"color": “numberOfBlacklistedDomainNamesperContinent",
"children":
[
{
"name": "Country",
"size": "numberOfIPscOuntry",
"color": "numberOfBlacklistedDomainNamesperCountry",
"children": [
….
]
}
….
]
}
Nested JSON
format template
(JSON file per day)
Alternative nesting options:
Continent >> Country >> State >> City
Step 3: Data transformations
> JSON files of 70
Mb
Nested JSON
format template
(JSON file per day)
Triple hierarchy!!!
Step 3: Data transformations
Split into
IPhierarchy.json
GeographicalHierarchy.json
AS.json
Nested JSON
format template
(JSON file per day)
Data Visualization for Big Data: Experience from the Front Line
Step 4: Data binding
Step 4: Data binding
Data Visualization for Big Data: Experience from the Front Line
Step 5: User Experience Breadcrumbs User-adjustable
visual settings
Data Visualization for Big Data: Experience from the Front Line
Data Visualization for Big Data: Experience from the Front Line
[Astrolavos Team during S&P 2017 deadline, November 11th 2016.
Source: https://twitter.com/mAntonakakis?lang=es]
Data Visualization for
Big Data
Rosa Romero Gómez, Ph.D
rosaromerogomez.com
@87rromero
Experiences from the Front Line

More Related Content

PDF
Towards Designing Effective Visualizations for DNS-based Network Threat Analysis
PDF
Big Data Visualization
PDF
Big data Visualization and Dashboards
PDF
Cloud - Security - Big Data
PDF
From Data to Visualization, what happens in between?
PPTX
How To Drive Value with Security Data
PDF
Big Data Visualization
PDF
Intake at AnacondaCon
Towards Designing Effective Visualizations for DNS-based Network Threat Analysis
Big Data Visualization
Big data Visualization and Dashboards
Cloud - Security - Big Data
From Data to Visualization, what happens in between?
How To Drive Value with Security Data
Big Data Visualization
Intake at AnacondaCon

What's hot (20)

PDF
Deep Learning in Security - Examples, Infrastructure, Challenges, and Suggest...
PPTX
2016 FS-ISAC Annual Summit (Miami) - Developing Effective Encryption Strategies
PDF
A Picture is Worth 1,000 Rows
PDF
Software Analytics with Jupyter, Pandas, jQAssistant, and Neo4j [Neo4j Online...
PDF
Threat Hunting with Elastic at SpectorOps: Welcome to HELK
PDF
What Is GDS and Neo4j’s GDS Library
PDF
Graphs in Life Sciences
PDF
GraphTour London 2020 - Graphs for AI, Amy Hodler
PDF
Graphs for Finance - AML with Neo4j Graph Data Science
PDF
Python for Data Science - TDC 2015
PDF
Big data and computing grid
PDF
Data Analytics in Real World (May 2016)
PDF
Graph Data Science DEMO for fraud analysis
PDF
Disrupting Data Discovery
PPTX
Databases, Web Services and Tools For Systems Immunology
PDF
Reproducible Research and the Cloud
PDF
Security Chat 5.0
PDF
Doing Research in the Cloud - NIH Workshop Dennis Gannon
PDF
Data-Driven Threat Intelligence: Metrics on Indicator Dissemination and Sharing
PDF
Sharing is Caring: Understanding and Measuring Threat Intelligence Sharing Ef...
Deep Learning in Security - Examples, Infrastructure, Challenges, and Suggest...
2016 FS-ISAC Annual Summit (Miami) - Developing Effective Encryption Strategies
A Picture is Worth 1,000 Rows
Software Analytics with Jupyter, Pandas, jQAssistant, and Neo4j [Neo4j Online...
Threat Hunting with Elastic at SpectorOps: Welcome to HELK
What Is GDS and Neo4j’s GDS Library
Graphs in Life Sciences
GraphTour London 2020 - Graphs for AI, Amy Hodler
Graphs for Finance - AML with Neo4j Graph Data Science
Python for Data Science - TDC 2015
Big data and computing grid
Data Analytics in Real World (May 2016)
Graph Data Science DEMO for fraud analysis
Disrupting Data Discovery
Databases, Web Services and Tools For Systems Immunology
Reproducible Research and the Cloud
Security Chat 5.0
Doing Research in the Cloud - NIH Workshop Dennis Gannon
Data-Driven Threat Intelligence: Metrics on Indicator Dissemination and Sharing
Sharing is Caring: Understanding and Measuring Threat Intelligence Sharing Ef...
Ad

Similar to Data Visualization for Big Data: Experience from the Front Line (20)

PDF
Data Visualizations with D3.js
PDF
Charlotte Front End - D3
PPTX
Big data visualization
PPTX
chapter 6 data visualization ppt.pptx
PDF
FITC - Data Visualization in Practice
PDF
Data visualization in a Nutshell
PDF
Data visualization in a nutshell
PDF
Visualisation - techniques, interaction dynamics, big data
PDF
A Review on data visualization tools used for Big Data
PPTX
Data visualization is the representation of data through use of common graphi...
PDF
Georgina Armstrong - Data Visualisations. Making Boring Data Exciting and Emp...
PDF
Big Data visualization
PDF
Data Visualization The State Of The Art 1st Edition Dirk Bartz
PDF
The Importance of Data Visualization
PDF
Practical Data Visualization
KEY
Data Visualization Strategies & Open Source Tools
PPTX
Introduction to data visualization
PDF
Web-based Information Visualisation
PPTX
Presentation de la DATA visualisation.pptx
PPTX
Data Visualization in Big Data Analytics
Data Visualizations with D3.js
Charlotte Front End - D3
Big data visualization
chapter 6 data visualization ppt.pptx
FITC - Data Visualization in Practice
Data visualization in a Nutshell
Data visualization in a nutshell
Visualisation - techniques, interaction dynamics, big data
A Review on data visualization tools used for Big Data
Data visualization is the representation of data through use of common graphi...
Georgina Armstrong - Data Visualisations. Making Boring Data Exciting and Emp...
Big Data visualization
Data Visualization The State Of The Art 1st Edition Dirk Bartz
The Importance of Data Visualization
Practical Data Visualization
Data Visualization Strategies & Open Source Tools
Introduction to data visualization
Web-based Information Visualisation
Presentation de la DATA visualisation.pptx
Data Visualization in Big Data Analytics
Ad

Recently uploaded (20)

PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PPTX
Business Acumen Training GuidePresentation.pptx
PPT
Reliability_Chapter_ presentation 1221.5784
PPTX
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
PDF
Clinical guidelines as a resource for EBP(1).pdf
PPTX
Introduction to machine learning and Linear Models
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PDF
Lecture1 pattern recognition............
PPTX
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
PDF
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PPTX
Introduction to Knowledge Engineering Part 1
PPTX
1_Introduction to advance data techniques.pptx
PPTX
Supervised vs unsupervised machine learning algorithms
PPT
Quality review (1)_presentation of this 21
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
STUDY DESIGN details- Lt Col Maksud (21).pptx
Acceptance and paychological effects of mandatory extra coach I classes.pptx
Business Acumen Training GuidePresentation.pptx
Reliability_Chapter_ presentation 1221.5784
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
Clinical guidelines as a resource for EBP(1).pdf
Introduction to machine learning and Linear Models
Introduction-to-Cloud-ComputingFinal.pptx
Lecture1 pattern recognition............
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
Introduction to Knowledge Engineering Part 1
1_Introduction to advance data techniques.pptx
Supervised vs unsupervised machine learning algorithms
Quality review (1)_presentation of this 21
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
Business Ppt On Nestle.pptx huunnnhhgfvu

Data Visualization for Big Data: Experience from the Front Line

  • 1. Data Visualization for Big Data Rosa Romero Gómez, Ph.D rosaromerogomez.com @87rromero Experiences from the Front Line
  • 2. [Georgia Tech campus, Klaus Advanced Computing building, May 27th 2016]
  • 5. The greatest value of a picture is when it forces us to notice what we never expected to see [John W. Tukey. (1981) Exploratory Data Analysis]
  • 6. The greatest value of a picture is when it forces us to notice what we never expected to see [John W. Tukey. (1981) Exploratory Data Analysis]
  • 7. Let me put you a simple example…
  • 8. [Sample data sets recreated from Francis J. Anscombe (1973). Graphs in statistical analysis. Source: Andy Kirk. (2012) Data visualization: A successful design process]
  • 10. Data visualization addresses… …Information Scalability …Visual Scalability …Human Scalability
  • 11. Data visualization addresses… …Human Scalability • It enhances the recognition of patterns • It increases our efficiency to explore large datasets • It supports decisions • It expands our working memory to solve problems
  • 12. What?
  • 13. Data visualization is the use of interactive visual representations of data to amplify cognition [Stuart Card. (2008) Information visualization]
  • 14. Data visualization is the use of interactive visual representations of data to amplify cognition [Stuart Card. (2008) Information visualization]
  • 15. Data visualization is the use of interactive visual representations of data to amplify cognition [Stuart Card. (2008) Information visualization]
  • 16. Data visualization is the use of interactive visual representations of data to amplify cognition [Stuart Card. (2008) Information visualization]
  • 17. Data visualization is the use of interactive visual representations of data to amplify cognition [Stuart Card. (2008) Information visualization]
  • 19. [The Starry Night. (1889) Vincent Van Gogh. Source: https://en.wikipedia.org/wiki/The_Starry_Night#/ media/File:Van_Gogh_-_Starry_Night_-_Google_Art_Project.jpg ]
  • 22. How?
  • 23. Why are we doing this visualization project? Even more important…
  • 24. Case Study: Visualization of the IPv4 address space for network threat investigation
  • 25. Network Threat Analyst Computer Network Data Collection Point Get to know the context… User CMD Tools Websites Logs Physical & Task Context Technical Context
  • 26. Let me tell you a story…
  • 27. Step 1: Identify relevant visualization tasks •Find suspicious IPs blocks •Find domain names associated with specific IPs •Examine the presence of domain names on blacklists •Examine the relation of domain names with malware •Identify the geographical location of IPs •Identify the ownership of domain names •Find suspicious Autonomous Systems
  • 29. The more accessible your visualization, the greater your audience and your impact [Scott Murray. (2013) Interactive Data Visualization for the Web] Step 2: Choose a library
  • 30. Step 2: Choose a library •Functionality: Does it support the visualizations I need? •License: open source or commercial? •Active support and development •Browser compatibility •Dependencies (e.g. React.js)
  • 31. Step 2: Choose a library Building a visualization with charting libraries such as Chart.js, Tableau…
  • 32. Step 2: Choose a library Building a visualization with D3.js
  • 33. •D3 is not really a “visualization library”; it does not draw visualizations •D3 = “Data-Driven Documents”; it associates data with DOM elements and manages the results •D3.js provides with tools such as layout, scales, shapes that you can use to build visualizations Step 2: Choose a library
  • 35. Step 3: Data transformations {"date":"20160408","qname":"*.3rdandmonster.com.","qtype":1,"rdata": {"string":"66.96.161.142"},"ttl":null,"authority_ips":"216.239.36.109","count":1,"hours": 1048576,"source":"gt","sensor":"active-dns"} {"date":"20160408","qname":"*.aavxxnbm.org.","qtype":1,"rdata":{"string":"213.184.126.162"},"ttl":{"int": 604800},"authority_ips":"213.184.126.162","count":10,"hours":5543209,"source":"gt","sensor":"active-dns"} {"date":"20160408","qname":"*.aenhfat.info.","qtype":1,"rdata":{"string":"213.184.126.162"},"ttl":{"int": 604800},"authority_ips":"213.184.126.162","count":4,"hours":8397064,"source":"gt","sensor":"active-dns"} {"date":"20160408","qname":"*.agzksjhrmf.info.","qtype":1,"rdata":{"string":"213.184.126.162"},"ttl":{"int": 604800},"authority_ips":"213.184.126.162","count":5,"hours":4329736,"source":"gt","sensor":"active-dns"} [Fragment of Active DNS resolution queries in deserialized Avro format - JSON format, https://www.activednsproject.org] Pre-processed data Domain Name IP address
  • 36. Step 3: Data transformations Guided by the Visual Information-Seeking Mantra: “Overview first, Zoom and Filter, and then Details-on-Demand” [Shneiderman. (1996) The Eyes Have It: A Task by Data Type Taxonomy for Information Visualizations]
  • 37. Step 3: Data transformations { "date": "dateValue", "children": [{ "name": “/8Name", "size": “numberOfIPs/8", "color": “numberOfBlacklistedDomainNames/8", "children": [ { "name": "/16Name", "size": "numberOfIPs/16", "color": "numberOfBlacklistedDomainNamesper/16", "children": [ …. ] } …. ] } Nested JSON format template (JSON file per day) Nested IPs in the following format: /8 >> /16 >> /24 >> /32 Visual variables
  • 38. Step 3: Data transformations { "date": "dateValue", "children": [{ "name": “Continent", "size": “numberOfIPsContinent", "color": “numberOfBlacklistedDomainNamesperContinent", "children": [ { "name": "Country", "size": "numberOfIPscOuntry", "color": "numberOfBlacklistedDomainNamesperCountry", "children": [ …. ] } …. ] } Nested JSON format template (JSON file per day) Alternative nesting options: Continent >> Country >> State >> City
  • 39. Step 3: Data transformations > JSON files of 70 Mb Nested JSON format template (JSON file per day) Triple hierarchy!!!
  • 40. Step 3: Data transformations Split into IPhierarchy.json GeographicalHierarchy.json AS.json Nested JSON format template (JSON file per day)
  • 42. Step 4: Data binding
  • 43. Step 4: Data binding
  • 45. Step 5: User Experience Breadcrumbs User-adjustable visual settings
  • 48. [Astrolavos Team during S&P 2017 deadline, November 11th 2016. Source: https://twitter.com/mAntonakakis?lang=es]
  • 49. Data Visualization for Big Data Rosa Romero Gómez, Ph.D rosaromerogomez.com @87rromero Experiences from the Front Line