SlideShare a Scribd company logo
StatMine, – prototype
StatMine
an exploration of dissemination data

Edwin de Jonge
Statistics Netherlands
25 September 2012, Seoul
an exploration of dissemination data: StatMine

2
an exploration of dissemination data: StatMine

3
StatMine, from numbers to analysis

4
Why StatMine?
• Statistics Netherlands (SN) mission produce
relevant information for:
•
•
•
•
•
•
•

Policy makers
Journalists
Citizens
Enterprises
Economists
Social scientists
Etc.

an exploration of dissemination data: StatMine

5
Numbers ≠ Information
StatLine is SN’s online DB (over 1 billion figures)
We know from a user study that:
1. Many interesting patterns in StatLine are not
spotted by users
2. Many important topics in StatLine are scattered
across multiple tables

an exploration of dissemination data: StatMine

6
Example of problem 2
• Policymaker interested in patients with diabetes:
•
•
•
•
•

Visits to medical doctor
Hospital admissions
Mortality
Medication consumption (insuline)
Obesity

Are all different statistical products (from different
sources)!

an exploration of dissemination data: StatMine

7
Data analysis = Data insight
Goal research project StatMine is to provide data
insight by:
• (I) Using data visualisation
• (II) Combining data table fragments
• (III) Deriving variables

All hypotheses (will be) tested with a prototype with
internal and external users.
(I), tested and succesful
(II, III,… ) is work in progress
an exploration of dissemination data: StatMine

8
Chart types
Bar chart
Line chart
Mosaic chart
Bubble/scatter chart

Comparison
Development
Structure
Correlation

an exploration of dissemination data: StatMine

9
Chart type – bar chart

an exploration of dissemination data: StatMine

10
Chart type – line chart

an exploration of dissemination data: StatMine

11
Chart type – mosaic chart

an exploration of dissemination data: StatMine

12
Chart type – bubble chart

an exploration of dissemination data: StatMine

13
Small multiples




Split chart into different subpopulations
Goal: compare subpopulations
Very little tools offer this functionality!

an exploration of dissemination data: StatMine

14
Small multiples

an exploration of dissemination data: StatMine

15
Composing a chart
Example:
• Year x Region x Gender x Age
• Count
• Mean income
• Employment

categorical variables /
dimensions
Numeric variables / topics

an exploration of dissemination data: StatMine

16
Prototype
• Built in php, javascript (d3)
• Imported 10 StatLine example tables
• Complex tables, e.g.
• Labor participation x gender x cohorts
• Labor market flow per quarter (employed/unemployed)
• Enterprise birth, death and growth x economic activity x
quarter

• Tested on:
• Internal users
• Owners of data
an exploration of dissemination data: StatMine

17
Demo

an exploration of dissemination data: StatMine

18
Evaluation
• Part I : very succesful
• Owners of data want prototype to check their own
data
• Provides insights
• Easy detection of anomalies

an exploration of dissemination data: StatMine

19
Work in progress
• II, Combination of different fragments
• Testing with policymakers (end this year)
• Or “How to glue statistical tables?”

• III, Derive variables + analysis
• Absolute vs relative (per population unit)
• Turnover / # employees
• Etc

an exploration of dissemination data: StatMine

20
Questions?

an exploration of dissemination data: StatMine

21

More Related Content

PDF
Change Up Your 2016 Election Coverage. Create a Computational Campaign.
PPTX
Big Data and Nowcasting
PPT
Statistical Resources
PPTX
Paulo Canas Rodrigues - The role of Statistics in the Internet of Things - ...
PDF
Using Road Sensor Data for Official Statistics: towards a Big Data Methodology
PDF
Tabplotd3, interactive inspection of large data
PPTX
StatMine (New Technologies and Techniques for Statistics)
PPTX
StatMine
Change Up Your 2016 Election Coverage. Create a Computational Campaign.
Big Data and Nowcasting
Statistical Resources
Paulo Canas Rodrigues - The role of Statistics in the Internet of Things - ...
Using Road Sensor Data for Official Statistics: towards a Big Data Methodology
Tabplotd3, interactive inspection of large data
StatMine (New Technologies and Techniques for Statistics)
StatMine

Similar to StatMine, visual exploration of output data (20)

PDF
IICT-Big Data.pdf slideshow information to communication
PDF
IICT-Big Data.pdf slideshow Information to communication technology
PPTX
Measuring the promise of Open Data: Development of the Impact Monitoring Fram...
PDF
Responsible Data Science at Statistics Netherlands
PDF
Strata Big data presentation
PPTX
Big data as a source for official statistics
PPTX
big data analytics pgpmx2015
PPTX
On Information Quality: Can Your Data Do The Job? (SCECR 2015 Keynote)
PPTX
Introduction to Data4Impact
PPTX
Social Media Mining - Chapter 5 (Data Mining Essentials)
PDF
Data + Audience: Connecting to Create Impact
PDF
2014.09.09 - NAEC Seminar_Young SMEs, growth and job creation
PPTX
Data Visualization1.pptx
PPTX
Big Data and HR - Talk @SwissHR Congress
PDF
Τweetfix: Data Analytics on Match Fixing
PDF
Uncertainty visualisation
PPT
Leading with Data: Boost Your ROI with Open and Big Data
PDF
DELSA/GOV 3rd Health meeting - Barbara UBALDI
PDF
Opportunities and methodological challenges of Big Data for official statist...
PDF
IBM Case Study Agility & Analytics
IICT-Big Data.pdf slideshow information to communication
IICT-Big Data.pdf slideshow Information to communication technology
Measuring the promise of Open Data: Development of the Impact Monitoring Fram...
Responsible Data Science at Statistics Netherlands
Strata Big data presentation
Big data as a source for official statistics
big data analytics pgpmx2015
On Information Quality: Can Your Data Do The Job? (SCECR 2015 Keynote)
Introduction to Data4Impact
Social Media Mining - Chapter 5 (Data Mining Essentials)
Data + Audience: Connecting to Create Impact
2014.09.09 - NAEC Seminar_Young SMEs, growth and job creation
Data Visualization1.pptx
Big Data and HR - Talk @SwissHR Congress
Τweetfix: Data Analytics on Match Fixing
Uncertainty visualisation
Leading with Data: Boost Your ROI with Open and Big Data
DELSA/GOV 3rd Health meeting - Barbara UBALDI
Opportunities and methodological challenges of Big Data for official statist...
IBM Case Study Agility & Analytics
Ad

More from Edwin de Jonge (11)

PDF
sdcSpatial user!2019
PDF
Validatetools, resolve and simplify contradictive or data validation rules
PDF
Data error! But where?
PDF
Daff: diff, patch and merge for data.frame
PDF
Chunked, dplyr for large text files
PDF
Heatmaps best practices Strata Hadoop
PDF
Docopt, beautiful command-line options for R, user2014
PPTX
Big data experiments
PDF
Big Data Visualization
PDF
ffbase, statistical functions for large datasets
PPT
Statmine, Visuele dataexploratie
sdcSpatial user!2019
Validatetools, resolve and simplify contradictive or data validation rules
Data error! But where?
Daff: diff, patch and merge for data.frame
Chunked, dplyr for large text files
Heatmaps best practices Strata Hadoop
Docopt, beautiful command-line options for R, user2014
Big data experiments
Big Data Visualization
ffbase, statistical functions for large datasets
Statmine, Visuele dataexploratie
Ad

Recently uploaded (20)

PPTX
Machine Learning_overview_presentation.pptx
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
PDF
Accuracy of neural networks in brain wave diagnosis of schizophrenia
PPTX
Programs and apps: productivity, graphics, security and other tools
PPTX
Spectroscopy.pptx food analysis technology
PPTX
TLE Review Electricity (Electricity).pptx
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Empathic Computing: Creating Shared Understanding
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PDF
Getting Started with Data Integration: FME Form 101
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
A comparative analysis of optical character recognition models for extracting...
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Heart disease approach using modified random forest and particle swarm optimi...
PPTX
1. Introduction to Computer Programming.pptx
PDF
Mushroom cultivation and it's methods.pdf
Machine Learning_overview_presentation.pptx
Diabetes mellitus diagnosis method based random forest with bat algorithm
MIND Revenue Release Quarter 2 2025 Press Release
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
Accuracy of neural networks in brain wave diagnosis of schizophrenia
Programs and apps: productivity, graphics, security and other tools
Spectroscopy.pptx food analysis technology
TLE Review Electricity (Electricity).pptx
Reach Out and Touch Someone: Haptics and Empathic Computing
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Empathic Computing: Creating Shared Understanding
NewMind AI Weekly Chronicles - August'25-Week II
Getting Started with Data Integration: FME Form 101
Unlocking AI with Model Context Protocol (MCP)
A comparative analysis of optical character recognition models for extracting...
Spectral efficient network and resource selection model in 5G networks
Heart disease approach using modified random forest and particle swarm optimi...
1. Introduction to Computer Programming.pptx
Mushroom cultivation and it's methods.pdf

StatMine, visual exploration of output data

  • 1. StatMine, – prototype StatMine an exploration of dissemination data Edwin de Jonge Statistics Netherlands 25 September 2012, Seoul
  • 2. an exploration of dissemination data: StatMine 2
  • 3. an exploration of dissemination data: StatMine 3
  • 4. StatMine, from numbers to analysis 4
  • 5. Why StatMine? • Statistics Netherlands (SN) mission produce relevant information for: • • • • • • • Policy makers Journalists Citizens Enterprises Economists Social scientists Etc. an exploration of dissemination data: StatMine 5
  • 6. Numbers ≠ Information StatLine is SN’s online DB (over 1 billion figures) We know from a user study that: 1. Many interesting patterns in StatLine are not spotted by users 2. Many important topics in StatLine are scattered across multiple tables an exploration of dissemination data: StatMine 6
  • 7. Example of problem 2 • Policymaker interested in patients with diabetes: • • • • • Visits to medical doctor Hospital admissions Mortality Medication consumption (insuline) Obesity Are all different statistical products (from different sources)! an exploration of dissemination data: StatMine 7
  • 8. Data analysis = Data insight Goal research project StatMine is to provide data insight by: • (I) Using data visualisation • (II) Combining data table fragments • (III) Deriving variables All hypotheses (will be) tested with a prototype with internal and external users. (I), tested and succesful (II, III,… ) is work in progress an exploration of dissemination data: StatMine 8
  • 9. Chart types Bar chart Line chart Mosaic chart Bubble/scatter chart Comparison Development Structure Correlation an exploration of dissemination data: StatMine 9
  • 10. Chart type – bar chart an exploration of dissemination data: StatMine 10
  • 11. Chart type – line chart an exploration of dissemination data: StatMine 11
  • 12. Chart type – mosaic chart an exploration of dissemination data: StatMine 12
  • 13. Chart type – bubble chart an exploration of dissemination data: StatMine 13
  • 14. Small multiples    Split chart into different subpopulations Goal: compare subpopulations Very little tools offer this functionality! an exploration of dissemination data: StatMine 14
  • 15. Small multiples an exploration of dissemination data: StatMine 15
  • 16. Composing a chart Example: • Year x Region x Gender x Age • Count • Mean income • Employment categorical variables / dimensions Numeric variables / topics an exploration of dissemination data: StatMine 16
  • 17. Prototype • Built in php, javascript (d3) • Imported 10 StatLine example tables • Complex tables, e.g. • Labor participation x gender x cohorts • Labor market flow per quarter (employed/unemployed) • Enterprise birth, death and growth x economic activity x quarter • Tested on: • Internal users • Owners of data an exploration of dissemination data: StatMine 17
  • 18. Demo an exploration of dissemination data: StatMine 18
  • 19. Evaluation • Part I : very succesful • Owners of data want prototype to check their own data • Provides insights • Easy detection of anomalies an exploration of dissemination data: StatMine 19
  • 20. Work in progress • II, Combination of different fragments • Testing with policymakers (end this year) • Or “How to glue statistical tables?” • III, Derive variables + analysis • Absolute vs relative (per population unit) • Turnover / # employees • Etc an exploration of dissemination data: StatMine 20
  • 21. Questions? an exploration of dissemination data: StatMine 21