SlideShare a Scribd company logo
HIGH-VALUE DATASETS
FROM PUBLICATION TO IMPACT
Elena Simperl
@esimperl
National Open Data Conference
December 3, 2020
How do people search, make sense of, and use
open data?
HUMAN DATA INTERACTION
FRAMEWORKS, METHODS, TOOLS
HUMAN DATA INTERACTION
FRAMEWORKS, METHODS, TOOLS
Frameworks
and models
HUMAN DATA INTERACTION
FRAMEWORKS, METHODS, TOOLS
Methods and guidance
HUMAN DATA INTERACTION
FRAMEWORKS, METHODS, TOOLS
Tools
HUMAN DATA INTERACTION
FRAMEWORKS, METHODS, TOOLS
Analysis
Analysis
HIGH-VALUE DATASETS
UNDERSTANDING USE THROUGH BEHAVIOUR ANALYSIS
TO PROVIDE GUIDANCE TO PUBLISHERS
Open
government
data
portals
• Search logs
• Data requests
Data
science
platforms
• Activity logs
2018 STUDY
ANALYSIS OF LOGS AND REQUESTS
Four national open government data portals, 2.2 million
queries from 2013 to 2016, 1500 data requests.
Data search is a work-related activity.
Shorter queries, include time and location, with varying
levels of granularity.
Explorative search, using keywords and filters.
Native and external queries topically different.
Data requests describe the data through boundaries and
restrictions on location, time, data type, granularity.
Kacprzak, E., Koesten, L., Ibáñez, L.D., Blount, T., Tennison, J. and Simperl, E., 2018. Characterising dataset search—An analysis of search logs and data requests. Journal of Web Semantics.
2020 STUDY
ANALYSIS OF LOGS
844,343 user sessions
(April 18 to June 20)
Characterising Dataset Search on the European Data Portal: An Analysis of Search Logs. LD Ibáñez, E Kacprzak, L Koesten, E Simperl. European Data Portal, Analytical Report 18, 2020
TOOLS TO FIND DATA
EDP is used to find datasets, but it is not the only tool
people use. 60% of sessions arrive to the EDP from the
web.
Changes in portal design impact traffic (and user
experience).
Covid-19 datasets were in high demand in 2020.
A large majority of users visit only one section of the
portal at a time.
Content on the site needs to be better interlinked (both
data and other pages). When links exist, people use them.
SEARCH APPROACHES AND AFFORDANCES
Filters are important: 60% of native sessions use
filters-only; 15-20% use keywords and filters.
Common search strategies: single-filter; keywords
first, then one or more filters.
Popular filters are country and category. Less so:
format, license.
Keyword queries are short, less use of time,
format and data attributes, more use of location.
SUCCESS IN DATASET SEARCH
20-40% of native queries and 8-25% of
external queries are successful.
Keywords + filters seem to work better. Might
also be a proxy for seasoned users.
IMPLICATIONS FOR PUBLISHERS
SEO strategy Filter
affordances
Granular
location data
Dataset
retrieval
Dataset
preview
pages
Links between
content and
datasets
WHAT’S NEXT
Studies on users and their
information needs.
Granular activity data captured
and shared by portals for new
studies.
Portals publish lots of data. They
now need to do more to become
data communities.
PUBLICATIONS
Talking Datasets — understanding data sensemaking behaviours. L Koesten, K
Gregory, P Groth, E Simperl. Currently under review at the International Journal
of Human-Computer Studies. 2020
Everything You Always Wanted to Know about a Dataset: Studies in Data
Summarisation. L Koesten, E Simperl, E Kacprzak, T Blount, J Tennison.
International Journal of Human-Computer Studies. 2019
Collaborative Practices with Structured Data: Do Tools Support what Users Need?
L Koesten, E Kacprzak, E Simperl, J Tennison; ACM CHI Conference on Human
Factors in Computing Systems, CHI 2019.
Dataset search: a survey. A Chapman, E Simperl, L Koesten, G Konstantinidis,
LD Ibáñez, E Kacprzak, P Groth. The International Journal on Very Large Data
Bases, 2019.
Characterising dataset search — An analysis of search logs and data requests. E
Kacprzak, L Koesten, LD Ibáñez, T Blount, J Tennison, E Simperl; Journal of Web
Semantics, 2018
Characterising Dataset Search on the European Data Portal: An Analysis of
Search Logs. LD Ibáñez, E Kacprzak, L Koesten, E Simperl. European Data Portal,
Analytical Report 18, 2020
The Trials and Tribulations of Working with Structured Data - a Study on
Information Seeking Behaviour. L Koesten, E Kacprzak, J Tennison, E Simperl.
Proceedings of ACM CHI Conference on Human Factors in Computing Systems,
CHI 2017.
Dataset Reuse: Toward Translating Principles to Practice. L Koesten, P Vougiouklis,
E Simperl, P Groth - Patterns, 2020
Pie Chart or Pizza: Identifying Chart Types and Their Virality on Twitter - P
Vougiouklis, L Carr, E Simperl - Proceedings of the International AAAI Conference
on Web and Social Media, 2020

More Related Content

PDF
The story of Data Stories
PDF
One does not simply crowdsource the Semantic Web: 10 years with people, URIs,...
PDF
Pie chart or pizza: identifying chart types and their virality on Twitter
PDF
The data we want
PDF
Building better knowledge graphs through social computing
PDF
Data stories
PDF
The human face of AI: how collective and augmented intelligence can help sol...
PDF
The web of data: how are we doing so far?
The story of Data Stories
One does not simply crowdsource the Semantic Web: 10 years with people, URIs,...
Pie chart or pizza: identifying chart types and their virality on Twitter
The data we want
Building better knowledge graphs through social computing
Data stories
The human face of AI: how collective and augmented intelligence can help sol...
The web of data: how are we doing so far?

What's hot (20)

PDF
Crowdsourcing and citizen engagement for people-centric smart cities
PDF
Loops of humans and bots in Wikidata
PDF
Giorgio Alleva, Data Innovation in Official Statistics: the Leading Role of O...
PPT
Innovations in Data for Decision Making
PDF
GI Management Transformation: from geometry to databased relationships
PDF
Franck Rebillard, Professeur Université Paris 3
PPTX
DXO On Big Data, Open Data, and the Perils of “Democracy by Spreadsheet”
PPTX
PPTX
Tweets are Not Created Equal. Intersecting Devices in the 1% Sample
PPTX
Analyzing Social Media with Digital Methods. Possibilities, Requirements, and...
ODP
#P2Pvalue at Share and inspire: Infoday on CAPS in Horizon 2020
PPTX
Biosurveillance2.0 ranck digihealth feb 25
PDF
The GIS Guide to Public Domain Data
PDF
Community Data Program Submitted letter to Open Government Partneship
DOCX
Tfsc disc 2014 si proposal (30 june2014)
PPTX
No More Half Fast: Improving US Broadband Download Speed. Georgetown Universi...
PPTX
Engines of Order. Social Media and the Rise of Algorithmic Knowing.
PPTX
Platforms and Analytical Gestures
Crowdsourcing and citizen engagement for people-centric smart cities
Loops of humans and bots in Wikidata
Giorgio Alleva, Data Innovation in Official Statistics: the Leading Role of O...
Innovations in Data for Decision Making
GI Management Transformation: from geometry to databased relationships
Franck Rebillard, Professeur Université Paris 3
DXO On Big Data, Open Data, and the Perils of “Democracy by Spreadsheet”
Tweets are Not Created Equal. Intersecting Devices in the 1% Sample
Analyzing Social Media with Digital Methods. Possibilities, Requirements, and...
#P2Pvalue at Share and inspire: Infoday on CAPS in Horizon 2020
Biosurveillance2.0 ranck digihealth feb 25
The GIS Guide to Public Domain Data
Community Data Program Submitted letter to Open Government Partneship
Tfsc disc 2014 si proposal (30 june2014)
No More Half Fast: Improving US Broadband Download Speed. Georgetown Universi...
Engines of Order. Social Media and the Rise of Algorithmic Knowing.
Platforms and Analytical Gestures
Ad

Similar to High-value datasets: from publication to impact (20)

PDF
The web of data: how are we doing so far
PDF
Open government data portals: from publishing to use and impact
PDF
Exploration, visualization and querying of linked open data sources
PPTX
A Tale of Two Data Catalogs
PDF
Data management plans – EUDAT Best practices and case study | www.eudat.eu
PDF
Juliana Freire PPT
PDF
Big Data Analytics in Health Care: A Review Paper
PDF
Va sla nov 15 final
PPTX
Department of Commerce App Challenge: Big Data Dashboards
PDF
Big Data A Review
PPTX
Big data analyti data analytical life cycle
PDF
Big Data Analytics Orientation. .pdf
PDF
Big Data Ethics
PDF
Big Data Analytics : Existing Systems and Future Challenges – A Review
PDF
The Problem with Data Portals - PUBLIC (FINAL).pdf
PDF
The Problem with Data Portals: A Data Portal is just the tip of a Data Govern...
PDF
Pistoia alliance harmonizing fair data catalog approaches webinar
PPTX
AI Project Cycle Summary Class ninth please
PPTX
The role of libraries and information professionals during the Big Data Era/ ...
PPTX
Open linked governmental data for citizen engagement
The web of data: how are we doing so far
Open government data portals: from publishing to use and impact
Exploration, visualization and querying of linked open data sources
A Tale of Two Data Catalogs
Data management plans – EUDAT Best practices and case study | www.eudat.eu
Juliana Freire PPT
Big Data Analytics in Health Care: A Review Paper
Va sla nov 15 final
Department of Commerce App Challenge: Big Data Dashboards
Big Data A Review
Big data analyti data analytical life cycle
Big Data Analytics Orientation. .pdf
Big Data Ethics
Big Data Analytics : Existing Systems and Future Challenges – A Review
The Problem with Data Portals - PUBLIC (FINAL).pdf
The Problem with Data Portals: A Data Portal is just the tip of a Data Govern...
Pistoia alliance harmonizing fair data catalog approaches webinar
AI Project Cycle Summary Class ninth please
The role of libraries and information professionals during the Big Data Era/ ...
Open linked governmental data for citizen engagement
Ad

More from Elena Simperl (18)

PDF
When stars align: studies in data quality, knowledge graphs, and machine lear...
PDF
Knowledge engineering: from people to machines and back
PDF
This talk was not generated with ChatGPT: how AI is changing science
PDF
Knowledge graph use cases in natural language generation
PDF
Knowledge engineering: from people to machines and back
PDF
What Wikidata teaches us about knowledge engineering
PDF
Ten myths about knowledge graphs.pdf
PDF
What Wikidata teaches us about knowledge engineering
PDF
Data commons and their role in fighting misinformation.pdf
PDF
Are our knowledge graphs trustworthy?
PDF
Qrowd and the city: designing people-centric smart cities
PDF
Qrowd and the city
PDF
Inclusive cities: a crowdsourcing approach
PDF
Making transport smarter, leveraging the human factor
PDF
Data storytelling
PDF
Quality and collaboration in Wikidata
PDF
Beyond monetary incentives: experiments with paid microtasks
PDF
The Data Pitch call
When stars align: studies in data quality, knowledge graphs, and machine lear...
Knowledge engineering: from people to machines and back
This talk was not generated with ChatGPT: how AI is changing science
Knowledge graph use cases in natural language generation
Knowledge engineering: from people to machines and back
What Wikidata teaches us about knowledge engineering
Ten myths about knowledge graphs.pdf
What Wikidata teaches us about knowledge engineering
Data commons and their role in fighting misinformation.pdf
Are our knowledge graphs trustworthy?
Qrowd and the city: designing people-centric smart cities
Qrowd and the city
Inclusive cities: a crowdsourcing approach
Making transport smarter, leveraging the human factor
Data storytelling
Quality and collaboration in Wikidata
Beyond monetary incentives: experiments with paid microtasks
The Data Pitch call

Recently uploaded (20)

PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PDF
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
PDF
Lecture1 pattern recognition............
PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PDF
Foundation of Data Science unit number two notes
PPTX
IB Computer Science - Internal Assessment.pptx
PPTX
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PPTX
Supervised vs unsupervised machine learning algorithms
PDF
Introduction to Business Data Analytics.
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PPTX
Global journeys: estimating international migration
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PPT
Quality review (1)_presentation of this 21
PPTX
Business Acumen Training GuidePresentation.pptx
PPT
Chapter 3 METAL JOINING.pptnnnnnnnnnnnnn
IBA_Chapter_11_Slides_Final_Accessible.pptx
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
Lecture1 pattern recognition............
Business Ppt On Nestle.pptx huunnnhhgfvu
Galatica Smart Energy Infrastructure Startup Pitch Deck
Foundation of Data Science unit number two notes
IB Computer Science - Internal Assessment.pptx
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
Supervised vs unsupervised machine learning algorithms
Introduction to Business Data Analytics.
STUDY DESIGN details- Lt Col Maksud (21).pptx
Global journeys: estimating international migration
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
Data_Analytics_and_PowerBI_Presentation.pptx
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
Quality review (1)_presentation of this 21
Business Acumen Training GuidePresentation.pptx
Chapter 3 METAL JOINING.pptnnnnnnnnnnnnn

High-value datasets: from publication to impact

  • 1. HIGH-VALUE DATASETS FROM PUBLICATION TO IMPACT Elena Simperl @esimperl National Open Data Conference December 3, 2020
  • 2. How do people search, make sense of, and use open data?
  • 4. HUMAN DATA INTERACTION FRAMEWORKS, METHODS, TOOLS Frameworks and models
  • 5. HUMAN DATA INTERACTION FRAMEWORKS, METHODS, TOOLS Methods and guidance
  • 6. HUMAN DATA INTERACTION FRAMEWORKS, METHODS, TOOLS Tools
  • 7. HUMAN DATA INTERACTION FRAMEWORKS, METHODS, TOOLS Analysis Analysis
  • 8. HIGH-VALUE DATASETS UNDERSTANDING USE THROUGH BEHAVIOUR ANALYSIS TO PROVIDE GUIDANCE TO PUBLISHERS Open government data portals • Search logs • Data requests Data science platforms • Activity logs
  • 9. 2018 STUDY ANALYSIS OF LOGS AND REQUESTS Four national open government data portals, 2.2 million queries from 2013 to 2016, 1500 data requests. Data search is a work-related activity. Shorter queries, include time and location, with varying levels of granularity. Explorative search, using keywords and filters. Native and external queries topically different. Data requests describe the data through boundaries and restrictions on location, time, data type, granularity. Kacprzak, E., Koesten, L., Ibáñez, L.D., Blount, T., Tennison, J. and Simperl, E., 2018. Characterising dataset search—An analysis of search logs and data requests. Journal of Web Semantics.
  • 10. 2020 STUDY ANALYSIS OF LOGS 844,343 user sessions (April 18 to June 20) Characterising Dataset Search on the European Data Portal: An Analysis of Search Logs. LD Ibáñez, E Kacprzak, L Koesten, E Simperl. European Data Portal, Analytical Report 18, 2020
  • 11. TOOLS TO FIND DATA EDP is used to find datasets, but it is not the only tool people use. 60% of sessions arrive to the EDP from the web. Changes in portal design impact traffic (and user experience). Covid-19 datasets were in high demand in 2020. A large majority of users visit only one section of the portal at a time. Content on the site needs to be better interlinked (both data and other pages). When links exist, people use them.
  • 12. SEARCH APPROACHES AND AFFORDANCES Filters are important: 60% of native sessions use filters-only; 15-20% use keywords and filters. Common search strategies: single-filter; keywords first, then one or more filters. Popular filters are country and category. Less so: format, license. Keyword queries are short, less use of time, format and data attributes, more use of location.
  • 13. SUCCESS IN DATASET SEARCH 20-40% of native queries and 8-25% of external queries are successful. Keywords + filters seem to work better. Might also be a proxy for seasoned users.
  • 14. IMPLICATIONS FOR PUBLISHERS SEO strategy Filter affordances Granular location data Dataset retrieval Dataset preview pages Links between content and datasets
  • 15. WHAT’S NEXT Studies on users and their information needs. Granular activity data captured and shared by portals for new studies. Portals publish lots of data. They now need to do more to become data communities.
  • 16. PUBLICATIONS Talking Datasets — understanding data sensemaking behaviours. L Koesten, K Gregory, P Groth, E Simperl. Currently under review at the International Journal of Human-Computer Studies. 2020 Everything You Always Wanted to Know about a Dataset: Studies in Data Summarisation. L Koesten, E Simperl, E Kacprzak, T Blount, J Tennison. International Journal of Human-Computer Studies. 2019 Collaborative Practices with Structured Data: Do Tools Support what Users Need? L Koesten, E Kacprzak, E Simperl, J Tennison; ACM CHI Conference on Human Factors in Computing Systems, CHI 2019. Dataset search: a survey. A Chapman, E Simperl, L Koesten, G Konstantinidis, LD Ibáñez, E Kacprzak, P Groth. The International Journal on Very Large Data Bases, 2019. Characterising dataset search — An analysis of search logs and data requests. E Kacprzak, L Koesten, LD Ibáñez, T Blount, J Tennison, E Simperl; Journal of Web Semantics, 2018 Characterising Dataset Search on the European Data Portal: An Analysis of Search Logs. LD Ibáñez, E Kacprzak, L Koesten, E Simperl. European Data Portal, Analytical Report 18, 2020 The Trials and Tribulations of Working with Structured Data - a Study on Information Seeking Behaviour. L Koesten, E Kacprzak, J Tennison, E Simperl. Proceedings of ACM CHI Conference on Human Factors in Computing Systems, CHI 2017. Dataset Reuse: Toward Translating Principles to Practice. L Koesten, P Vougiouklis, E Simperl, P Groth - Patterns, 2020 Pie Chart or Pizza: Identifying Chart Types and Their Virality on Twitter - P Vougiouklis, L Carr, E Simperl - Proceedings of the International AAAI Conference on Web and Social Media, 2020