SlideShare a Scribd company logo
Open
government data
portals: from
publishing to
use and impact
ELENA SIMPERL
KING’S COLLEGE LONDON
@ESIMPERL
NFDI InfraTalk, February 2022
From
publishing…
To use and
impact
Open government data
The first
portal
13 years later
Open government data portals: from publishing to use and impact
The official portal for European data
data.europa.eu
• Former European Data Portal
• Technology, resources and
support to increase the value
of European open government
data
The official portal for European data
Highlights of our work
Supporting the entire data value chain from publishing to reuse
9
Current state Evolution Metadata quality
The official portal for European data
Highlights of our work
10
Supporting the entire data value chain from publishing to reuse
The official portal for European data
Highlights of our work
11
Supporting the entire data value chain from publishing to reuse
900+ open data use cases 400+ studies 26 webinars, courses, workshops
The official portal for European data
Making portals more user-centric
12
[Walker & Simperl, 2017]
The ten guidelines
Organise for use of the datasets - rather than simply for publication
Promote use through data storytelling and community building, borrowing from open -source communities and other
peer-production systems
Invest in discoverability best practices, borrowing from e -commerce and web search
Publish good quality metadata - to enhance reuse
Adopt standards to ensure interoperability
Co-locate tools so that a wider range of users can be engaged with
Link datasets to enhance value
Be accessible by offering options for from APIs to CSV downloads
Co-locate documentation - users should not need to be domain experts to understand the data;
Be measurable - as a way to assess how well they are meeting users’ needs.
Operationalising the guidance
Literature review to develop 5* schemes to operationalise indicators.
Application of the schemes on 10 open data portals at different maturity level.
Example: Organise for use
Each dataset is accompanied by a comprehensive descriptive
record (going beyond a collection of structured metadata)
An extract of the data can be previewed (for sense making)
The portal provides recommendations for related datasets
The portal enables users to review/rate the datasets
Keywords from datasets are linked to other published datasets
Example: Promote for use
The portal is connected with social media to create a social distribution channel
for open data.
The portal provides users with online support for feedback, to request/suggest
the publication of new datasets, and when problems arise during use (e.g.
contact form, discussion forum, FAQs, helpdesk, search tips, tutorials, demos).
The portal provides a way for users to keep informed of updates to the data (e.g.
news feed).
Datasets are accompanied by links or resources that provide user guidance and
support.
Examples of reuse (fictitious or real) are provided (e.g. information contributed
by other users, last reuse, best reuse, data stories).
Example: Co-locate documentation
Supporting documentation does not exist.
Supporting documentation exists, but as a document found separately from the data.
Supporting documentation is found at the same time as the data (e.g. the link to the document is
next to the link to the data in the search).
Supporting documentation can be immediately accessed from within the dataset but it is not context
sensitive (e.g. a link to the documentation or text contained within the dataset).
Supporting documentation can be immediately accessed from within the dataset and it is context
sensitive so that users can immediately access information about a specific item of concern (e.g. a
link to a specific point in the documentation or the text contained within the dataset).
Varying open data maturity levels
Be discoverable , co-locate documentation,
be measurable are universally challenging
Be discoverable
Can people find the data they need?
Analysis of logs and data requests
(2018)
• Four national open government data portals, 2.2 million
queries (2013 – 2016), 1500 data requests.
• Shorter queries, include temporal and location
information.
• Explorative search.
• Native and external queries topically different.
• Data requests offer more context to user intent.
Analysis of logs (2020 - 21)
844k sessions from 04/2018 to 06/2020,web search as
well as native search sessions from the European Data
Portal
Location, provenance, format, licence, time frame and
date, publishing date, location of publication and data
schema
Mostly web search, web search and native search users
have different information needs and different success
rates
Dataset preview pageis important in web search
Linking to stories and other content helps with traffic
Analysis of logs (2020 - 21)
Subset of the previous corpus (n=236k), focus on
comparing web and native search sessions
Different behavioural patterns, though shorter than in
other search verticals
Session type is weakly correlated to success
(proxy=downloads)
Session type is moderately correlated with search
affordances (facets, keywords)
User profiles of internal and external SERP are similar
Recommendations for publishers
Two types of
users
Spatial and
temporal
queries
Result
presentation
Quality
reviews
Data stories
More logs
needed!
Low uptake of linked data, limited vocabulary
reuse, proprietary, non-dereferenceable
vocabularies, reasonable metadata quality
Be measurable
A lot of guidance available already
Is there any evidence that it works?
GitHub as a data platform
~1.4 million datasets (e.g. CSV, excel) from
~65K repos
Map literature features to both dataset and
repository features
Use engagement metrics as proxies for data
reuse
Train a predictive model to see what
publishing guidance leads to higher
engagement values
Size Attributes
Age
Quality
Documentation
Reviews
Recommendations for publishers
Co-locate documentation:
◦ Informative, short text about the dataset
◦ Comprehensive README file in a structured form,
links to further information
Co-locate tools:
◦ Standard processable file sizes for dataset
distributions
◦ Openable with a standard configuration of a
common library (such as Pandas)
Alternative futures pilot
Users join spaces organised around datasets and share tools, develop services and apps, and derive
further datasets
Software openly available: https://gitlab.com/european-data-portal/collaborative-space
Co-locate
documentation
Data documentation and sensemaking
practices
Metadata vocabularies used where there is a clear
business case
More documentation needed to make data useful
for others
Open approaches and standards work best
when solving actual problems. These problems
are rarely about a set of technologies.
Conclusions
We are at a crucial moment in data availability and use, online and elsewhere
There is an increasing body of evidence about what people’s data needs and
about how data is published on the web
Dataset search needs better algorithms and user studies that go beyond log
analysis
There are many methodologies to assess use and impact. They would be
even more useful if more portals were using data to understand user
behaviour
Some data is missing, with serious consequences
(Source: Gregory et al., 2020)
Data work is teamwork
[Source: Ada Lovelace Institute]
Portals should support participatory publishing
Thank you
Talking Datasets: understanding data sensemaking behaviours. L Koesten, K Gregory, P Groth, E
Simperl. International Journal of Human-Computer Studies. 146:102562. 2021
Everything You Always Wanted to Know about a Dataset: Studies in Data Summarisation. L Koesten,
E Simperl, E Kacprzak, T Blount, J Tennison. International Journal of Human-Computer Studies.
2019
Collaborative Practices with Structured Data: Do Tools Support what Users Need? L Koesten, E
Kacprzak, E Simperl, J Tennison; ACM CHI Conference on Human Factors in Computing Systems, CHI
2019.
Dataset search: a survey. A Chapman, E Simperl, L Koesten, G Konstantinidis, LD Ibáñez, E
Kacprzak, P Groth. The International Journal on Very Large Data Bases, 2019.
Characterising dataset search — An analysis of search logs and data requests. E Kacprzak, L
Koesten, LD Ibáñez, T Blount, J Tennison, E Simperl; Journal of Web Semantics, 2018
Making sense of numerical data-semantic labelling of web tables. Kacprzak, E., Giménez-García,
J.M., Piscopo, A., Koesten, L., Ibáñez, L.D., Tennison, J. and Simperl, E. In European Knowledge
Acquisition Workshop (pp. 163-178). Springer, 2018
The Trials and Tribulations of Working with Structured Data - a Study on Information Seeking
Behaviour. L Koesten, E Kacprzak, J Tennison, E Simperl. Proceedings of ACM CHI Conference on
Human Factors in Computing Systems, CHI 2017
Dataset Reuse: Toward Translating Principles to Practice. L Koesten, P Vougiouklis, E Simperl, P
Groth - Patterns, 2020
Characterising Dataset Search on the European Data Portal . L Ibáñez, L Koesten, E Kacprzak, E
Simperl. European Data Portal Analytical Report 18, 2020
Understanding Supply and Demand on the European Data Portal. L Ibáñez, E Simperl. European
Data Portal Analytical Report 19, 2020
The Future of Open Data Portals. J Walker, E Simperl. European Data Portal Analytical Report 8,
2017
Smart Rural: The Open Data Gap. J Walker, G Thuermer, E Simperl, L Carr. Data for Policy, 2020
A comparison of dataset search behaviour of internal versus search engine referred sessions. L.,
Ibáñez and E. Simperl. Proceedings of ACM CHIIR 2022, to appear.

More Related Content

PDF
The web of data: how are we doing so far?
PDF
The web of data: how are we doing so far
PDF
High-value datasets: from publication to impact
PDF
Data stories
PDF
Exploration, visualization and querying of linked open data sources
PDF
APLIC 2012: Discovering & Dealing with Data
PPTX
fosscomm2013_ENGAGE_workshop_on_open_public_data
PDF
General Presentation European Data Portal
The web of data: how are we doing so far?
The web of data: how are we doing so far
High-value datasets: from publication to impact
Data stories
Exploration, visualization and querying of linked open data sources
APLIC 2012: Discovering & Dealing with Data
fosscomm2013_ENGAGE_workshop_on_open_public_data
General Presentation European Data Portal

Similar to Open government data portals: from publishing to use and impact (20)

PPT
Improving Access to Research Data: What does changing legislation mean for y...
PPT
H2020 data pilot openaire
PPT
The Horizon 2020 Open Data Pilot - OpenAIRE webinar (Oct. 21 2014) by Sarah J...
PDF
The data we want
PDF
US EPA OSWER Linked Data Workshop 1-Feb-2013
PDF
The linked open government data and metadata lifecycle
PDF
Open Government Data, Linked Data, and the Missing Blocks in Korea
PDF
A research passport: library requirements
PDF
Comparative analysis of national open data portals or whether your portal is ...
PDF
Raimondo Iemma - Open Government Data in Italy - may 2012
PPT
Where is the opportunity for libraries in the collaborative data infrastructure?
PDF
How to overcome obstacles to data publication: Issues, requirements, and good...
PPT
Open Data Publication - Requirements, Good practices, and Benefits
PDF
US National Archives & Open Government Data
PPTX
Open Data Journalism
PDF
Open Data how to
PDF
Sünje Dallmeier-Tiessen: Research data "publishing": models, roles and respon...
PPTX
Ontology Engineering at Scale for Open City Data Sharing
PPTX
The role of libraries and information professionals during the Big Data Era/ ...
PPTX
Open Science: What, why, how?
Improving Access to Research Data: What does changing legislation mean for y...
H2020 data pilot openaire
The Horizon 2020 Open Data Pilot - OpenAIRE webinar (Oct. 21 2014) by Sarah J...
The data we want
US EPA OSWER Linked Data Workshop 1-Feb-2013
The linked open government data and metadata lifecycle
Open Government Data, Linked Data, and the Missing Blocks in Korea
A research passport: library requirements
Comparative analysis of national open data portals or whether your portal is ...
Raimondo Iemma - Open Government Data in Italy - may 2012
Where is the opportunity for libraries in the collaborative data infrastructure?
How to overcome obstacles to data publication: Issues, requirements, and good...
Open Data Publication - Requirements, Good practices, and Benefits
US National Archives & Open Government Data
Open Data Journalism
Open Data how to
Sünje Dallmeier-Tiessen: Research data "publishing": models, roles and respon...
Ontology Engineering at Scale for Open City Data Sharing
The role of libraries and information professionals during the Big Data Era/ ...
Open Science: What, why, how?
Ad

More from Elena Simperl (20)

PDF
When stars align: studies in data quality, knowledge graphs, and machine lear...
PDF
Knowledge engineering: from people to machines and back
PDF
This talk was not generated with ChatGPT: how AI is changing science
PDF
Knowledge graph use cases in natural language generation
PDF
Knowledge engineering: from people to machines and back
PDF
What Wikidata teaches us about knowledge engineering
PDF
Ten myths about knowledge graphs.pdf
PDF
What Wikidata teaches us about knowledge engineering
PDF
Data commons and their role in fighting misinformation.pdf
PDF
Are our knowledge graphs trustworthy?
PDF
Crowdsourcing and citizen engagement for people-centric smart cities
PDF
Pie chart or pizza: identifying chart types and their virality on Twitter
PDF
The story of Data Stories
PDF
The human face of AI: how collective and augmented intelligence can help sol...
PDF
Qrowd and the city: designing people-centric smart cities
PDF
One does not simply crowdsource the Semantic Web: 10 years with people, URIs,...
PDF
Qrowd and the city
PDF
Inclusive cities: a crowdsourcing approach
PDF
Building better knowledge graphs through social computing
PDF
Loops of humans and bots in Wikidata
When stars align: studies in data quality, knowledge graphs, and machine lear...
Knowledge engineering: from people to machines and back
This talk was not generated with ChatGPT: how AI is changing science
Knowledge graph use cases in natural language generation
Knowledge engineering: from people to machines and back
What Wikidata teaches us about knowledge engineering
Ten myths about knowledge graphs.pdf
What Wikidata teaches us about knowledge engineering
Data commons and their role in fighting misinformation.pdf
Are our knowledge graphs trustworthy?
Crowdsourcing and citizen engagement for people-centric smart cities
Pie chart or pizza: identifying chart types and their virality on Twitter
The story of Data Stories
The human face of AI: how collective and augmented intelligence can help sol...
Qrowd and the city: designing people-centric smart cities
One does not simply crowdsource the Semantic Web: 10 years with people, URIs,...
Qrowd and the city
Inclusive cities: a crowdsourcing approach
Building better knowledge graphs through social computing
Loops of humans and bots in Wikidata
Ad

Recently uploaded (20)

PDF
Modernizing your data center with Dell and AMD
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Network Security Unit 5.pdf for BCA BBA.
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
cuic standard and advanced reporting.pdf
PDF
Encapsulation theory and applications.pdf
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Electronic commerce courselecture one. Pdf
PDF
Machine learning based COVID-19 study performance prediction
PDF
Approach and Philosophy of On baking technology
Modernizing your data center with Dell and AMD
Building Integrated photovoltaic BIPV_UPV.pdf
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Network Security Unit 5.pdf for BCA BBA.
20250228 LYD VKU AI Blended-Learning.pptx
The AUB Centre for AI in Media Proposal.docx
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Unlocking AI with Model Context Protocol (MCP)
Dropbox Q2 2025 Financial Results & Investor Presentation
NewMind AI Weekly Chronicles - August'25 Week I
Understanding_Digital_Forensics_Presentation.pptx
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
cuic standard and advanced reporting.pdf
Encapsulation theory and applications.pdf
“AI and Expert System Decision Support & Business Intelligence Systems”
Encapsulation_ Review paper, used for researhc scholars
Electronic commerce courselecture one. Pdf
Machine learning based COVID-19 study performance prediction
Approach and Philosophy of On baking technology

Open government data portals: from publishing to use and impact

  • 1. Open government data portals: from publishing to use and impact ELENA SIMPERL KING’S COLLEGE LONDON @ESIMPERL NFDI InfraTalk, February 2022
  • 8. The official portal for European data data.europa.eu • Former European Data Portal • Technology, resources and support to increase the value of European open government data
  • 9. The official portal for European data Highlights of our work Supporting the entire data value chain from publishing to reuse 9 Current state Evolution Metadata quality
  • 10. The official portal for European data Highlights of our work 10 Supporting the entire data value chain from publishing to reuse
  • 11. The official portal for European data Highlights of our work 11 Supporting the entire data value chain from publishing to reuse 900+ open data use cases 400+ studies 26 webinars, courses, workshops
  • 12. The official portal for European data Making portals more user-centric 12 [Walker & Simperl, 2017]
  • 13. The ten guidelines Organise for use of the datasets - rather than simply for publication Promote use through data storytelling and community building, borrowing from open -source communities and other peer-production systems Invest in discoverability best practices, borrowing from e -commerce and web search Publish good quality metadata - to enhance reuse Adopt standards to ensure interoperability Co-locate tools so that a wider range of users can be engaged with Link datasets to enhance value Be accessible by offering options for from APIs to CSV downloads Co-locate documentation - users should not need to be domain experts to understand the data; Be measurable - as a way to assess how well they are meeting users’ needs.
  • 14. Operationalising the guidance Literature review to develop 5* schemes to operationalise indicators. Application of the schemes on 10 open data portals at different maturity level.
  • 15. Example: Organise for use Each dataset is accompanied by a comprehensive descriptive record (going beyond a collection of structured metadata) An extract of the data can be previewed (for sense making) The portal provides recommendations for related datasets The portal enables users to review/rate the datasets Keywords from datasets are linked to other published datasets
  • 16. Example: Promote for use The portal is connected with social media to create a social distribution channel for open data. The portal provides users with online support for feedback, to request/suggest the publication of new datasets, and when problems arise during use (e.g. contact form, discussion forum, FAQs, helpdesk, search tips, tutorials, demos). The portal provides a way for users to keep informed of updates to the data (e.g. news feed). Datasets are accompanied by links or resources that provide user guidance and support. Examples of reuse (fictitious or real) are provided (e.g. information contributed by other users, last reuse, best reuse, data stories).
  • 17. Example: Co-locate documentation Supporting documentation does not exist. Supporting documentation exists, but as a document found separately from the data. Supporting documentation is found at the same time as the data (e.g. the link to the document is next to the link to the data in the search). Supporting documentation can be immediately accessed from within the dataset but it is not context sensitive (e.g. a link to the documentation or text contained within the dataset). Supporting documentation can be immediately accessed from within the dataset and it is context sensitive so that users can immediately access information about a specific item of concern (e.g. a link to a specific point in the documentation or the text contained within the dataset).
  • 18. Varying open data maturity levels Be discoverable , co-locate documentation, be measurable are universally challenging
  • 20. Can people find the data they need?
  • 21. Analysis of logs and data requests (2018) • Four national open government data portals, 2.2 million queries (2013 – 2016), 1500 data requests. • Shorter queries, include temporal and location information. • Explorative search. • Native and external queries topically different. • Data requests offer more context to user intent.
  • 22. Analysis of logs (2020 - 21) 844k sessions from 04/2018 to 06/2020,web search as well as native search sessions from the European Data Portal Location, provenance, format, licence, time frame and date, publishing date, location of publication and data schema Mostly web search, web search and native search users have different information needs and different success rates Dataset preview pageis important in web search Linking to stories and other content helps with traffic
  • 23. Analysis of logs (2020 - 21) Subset of the previous corpus (n=236k), focus on comparing web and native search sessions Different behavioural patterns, though shorter than in other search verticals Session type is weakly correlated to success (proxy=downloads) Session type is moderately correlated with search affordances (facets, keywords) User profiles of internal and external SERP are similar
  • 24. Recommendations for publishers Two types of users Spatial and temporal queries Result presentation Quality reviews Data stories More logs needed!
  • 25. Low uptake of linked data, limited vocabulary reuse, proprietary, non-dereferenceable vocabularies, reasonable metadata quality
  • 27. A lot of guidance available already
  • 28. Is there any evidence that it works?
  • 29. GitHub as a data platform ~1.4 million datasets (e.g. CSV, excel) from ~65K repos Map literature features to both dataset and repository features Use engagement metrics as proxies for data reuse Train a predictive model to see what publishing guidance leads to higher engagement values Size Attributes Age Quality Documentation Reviews
  • 30. Recommendations for publishers Co-locate documentation: ◦ Informative, short text about the dataset ◦ Comprehensive README file in a structured form, links to further information Co-locate tools: ◦ Standard processable file sizes for dataset distributions ◦ Openable with a standard configuration of a common library (such as Pandas)
  • 31. Alternative futures pilot Users join spaces organised around datasets and share tools, develop services and apps, and derive further datasets Software openly available: https://gitlab.com/european-data-portal/collaborative-space
  • 33. Data documentation and sensemaking practices
  • 34. Metadata vocabularies used where there is a clear business case More documentation needed to make data useful for others
  • 35. Open approaches and standards work best when solving actual problems. These problems are rarely about a set of technologies.
  • 36. Conclusions We are at a crucial moment in data availability and use, online and elsewhere There is an increasing body of evidence about what people’s data needs and about how data is published on the web Dataset search needs better algorithms and user studies that go beyond log analysis There are many methodologies to assess use and impact. They would be even more useful if more portals were using data to understand user behaviour
  • 37. Some data is missing, with serious consequences
  • 38. (Source: Gregory et al., 2020) Data work is teamwork
  • 39. [Source: Ada Lovelace Institute] Portals should support participatory publishing
  • 40. Thank you Talking Datasets: understanding data sensemaking behaviours. L Koesten, K Gregory, P Groth, E Simperl. International Journal of Human-Computer Studies. 146:102562. 2021 Everything You Always Wanted to Know about a Dataset: Studies in Data Summarisation. L Koesten, E Simperl, E Kacprzak, T Blount, J Tennison. International Journal of Human-Computer Studies. 2019 Collaborative Practices with Structured Data: Do Tools Support what Users Need? L Koesten, E Kacprzak, E Simperl, J Tennison; ACM CHI Conference on Human Factors in Computing Systems, CHI 2019. Dataset search: a survey. A Chapman, E Simperl, L Koesten, G Konstantinidis, LD Ibáñez, E Kacprzak, P Groth. The International Journal on Very Large Data Bases, 2019. Characterising dataset search — An analysis of search logs and data requests. E Kacprzak, L Koesten, LD Ibáñez, T Blount, J Tennison, E Simperl; Journal of Web Semantics, 2018 Making sense of numerical data-semantic labelling of web tables. Kacprzak, E., Giménez-García, J.M., Piscopo, A., Koesten, L., Ibáñez, L.D., Tennison, J. and Simperl, E. In European Knowledge Acquisition Workshop (pp. 163-178). Springer, 2018 The Trials and Tribulations of Working with Structured Data - a Study on Information Seeking Behaviour. L Koesten, E Kacprzak, J Tennison, E Simperl. Proceedings of ACM CHI Conference on Human Factors in Computing Systems, CHI 2017 Dataset Reuse: Toward Translating Principles to Practice. L Koesten, P Vougiouklis, E Simperl, P Groth - Patterns, 2020 Characterising Dataset Search on the European Data Portal . L Ibáñez, L Koesten, E Kacprzak, E Simperl. European Data Portal Analytical Report 18, 2020 Understanding Supply and Demand on the European Data Portal. L Ibáñez, E Simperl. European Data Portal Analytical Report 19, 2020 The Future of Open Data Portals. J Walker, E Simperl. European Data Portal Analytical Report 8, 2017 Smart Rural: The Open Data Gap. J Walker, G Thuermer, E Simperl, L Carr. Data for Policy, 2020 A comparison of dataset search behaviour of internal versus search engine referred sessions. L., Ibáñez and E. Simperl. Proceedings of ACM CHIIR 2022, to appear.