SlideShare a Scribd company logo
THE DATA WE WANT:
FRAMEWORKS AND
TOOLS TO ENGAGE
WITH DATA
E L E N A S I M P E R L
U N I V E R S I T Y O F S O U T H A M P TO N
WEB AND INTERNET SCIENCE
@SOTON
Large interdisciplinary lab, 120+ members, 25
academic staff
Founding member of the Web ScienceTrust
Founding partner of the Open Data Institute
Developer of ePrints
W3C office for UK and Ireland
“Data is infrastructure. It underpins transparency,
accountability, public services, business innovation
and civil society.”
The data we want
The data we want
How do we help people tell their data stories?
What data stories do people share and why?
How do we make data more engaging?
A story may start by looking for the
relevant data
DATA SEARCH
HOW DOES THE GOOGLE FOR DATA LOOK LIKE?
• Who searches for data and why?
• What sort of queries do people write?
• Do they need query writing support?
• How should results be displayed?
• How do people pick the best results?
• Do they need one or more search sessions to
find what the user is looking for?
• Is search exploratory?
FRAMEWORK FOR INTERACTING WITH DATA
HELPS SYSTEM DESIGNERS IDENTIFY USER TASKS
AND DEFINE RELEVANT FEATURES
Goal or
process
oriented
Web
Data
portals
People
FoI
Relevance
Usability
Quality
Visual scan
Obvious
errors
Basic stats
Headers
Metadata
Koesten, L.M., Kacprzak, E.,Tennison, J.F. and Simperl, E., 2017, May.The Trials and Tribulations of Working with Structured Data:-a Study on
Information Seeking Behaviour. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems (pp. 1277-1289).ACM.
SEARCH LOG ANALYSIS
INFORMS THE DESIGN OF DATA SEARCH ENGINES
● Four national open governmental data portals, 2.2 million queries from 2013
to 2016 (Kacprzak et al., 2017)
● Shorter queries, include temporal and location information
● Explorative search
● Native and external queries topically different
● Data requests offer more context to user intent
Kacprzak, E., Koesten, L.M., Ibáñez, L.D., Simperl, E. andTennison, J.,A Query Log Analysis of Dataset Search. In International
Conference onWeb Engineering (pp. 429-436).Springer, 2017.
DATA NEEDS CONTEXT
• Metadata
• Summary
• Provenance
• Details on collection and use
• Quality and reviews
• Tools
DATA SUMMARIES
HELP PEOPLE MAKE SENSE OF DATA EFFECTIVELY
Dataset in one
sentence
Format, no. of
rows and
columns, machine
readability
Headers,
groupings, key
columns
Value types and
ranges for key
headers
Provenance
Location
information
Temporal
information
Quality, known
issues
Usage, trends
Stories are meant to be shared
VIRAL DATA
HELPS (ALTERNATIVE) FACTS SPREAD FASTERS
•How does data travel?
–E.g. on social media
•What makes data go viral?
–Visualisations?
–Subject matter/topic?
–“Transmission vectors”: journalists, celebrities, grassroots,
botnets?
DATA SHARING
HELPS (ALTERNATIVE) FACTS SPREAD FASTER
• What evidence can we see of data sharing activities?
– What form is data being shared in?
– How are the various stages of the data science pipeline represented?
• How common is data sharing?
– Who is it done by?
– How do they do it?
• What kind of data is (not) being shared?
– Does anyone share raw data?
– Do narratives explicitly reference the data they are built on?
• What makes data go viral?
• Who makes use of the data for what purposes?
OFFICIAL DATA
• 6 weekTwitter study of ons.gov.uk
• 1186 original tweets made by 898 people, with
4906 subsequent retweets
• 15 most active tweeters, half work for the ONS
or are official accounts of the ONS
• Most retweeted tweet (503 times) is by a BBC
journalist mentioning an ONS data visualisation
• One of the 64 separate tweets about this ONS
data release
OPEN DATA
• Six weekTwitter study of data.gov.uk
• 113 original tweets made by 87 different
accounts, with 258 subsequent retweets
• No bias towards organisational affiliation is
present in the set of active retweeters
• The single most retweeted tweet (121 times) is
by a Joint Nature Conservation Committee earth
observation specialist. Mentions a crop map
visualisation from environment.data.gov.uk
SPREADSHEETS
• No XLSX, but Google sheets
• 1475 original tweets from 1067 unique accounts with 6923 retweets
• No bias towards organisational affiliation is present in the set of active re-
tweeters
• Most retweeted spreadsheet (1188 times) is a schedule for the timings of
INKIGAYO broadcasts (famous Korean livestreamed pop music program with
live voting)
SPREADSHEET CATEGORIES AND USE
• Visual inspection of 100 highly retweeted sheets
• sports statistics (including gambling analysis)
• computer games statistics
• catalogues of resources/assets (including
artist’s videos or a series of TV episodes)
• selling goods/artwork/services for a trader or
fan group
• coordinating donations/volunteers, political info
• coordinating political activity
• music voting
• buying on behalf of an artist
• monitoring cryptocurrency offerings
Simple list 10%
Rich data 40%
Data analysis 10%
Promoting action 15%
Coordinating crowd action 20%
Other 5%
USE OF CHARTS
• 5% (29) of sheets contained charts
• 4 charts intended to promote subsequent
use and discussion
• Survey of fanfic community from NYC festival
attendees
• A maths teacher who takes part in MathsTeaching
discussion groups tweeted a Google form to record
preferences for banana ripeness
• A study on the citation of Registered Reports in
Cognitive Neuroscience
• Historic weather data collected by a local citizen
offered to a “sports weather” journalist
Games (trading, playing, curation) 7
Politics (monitoring, organising,
arguing)
6
Surveys (attitudes, phenomena) 4
Financial investment analysis 3
Personal list of assets/achievements 2
TV/radio (voting/ratings) 2
Trading (orders) 1
Miscellaneous data collection
- Historic weather data
- Boeing 787 production data
(hobbyist)
- Google Analytics audit of Udemy
- Academic citation analysis
4
USE OF CHARTS (2)
• 2 charts support an argument or
discussion
• UN data on firearms. Discussion thread between pro- &
anti- NRA positions. Sent by author, a senior technologist
in Microsoft.
• Use of the Physics GRE in N American University Physics
admission processes. Sent by a delegate at the
Conference for Undergraduate Underrepresented
Minorities in Physics, not the spreadsheet author.
DATA
SHARING
EVOLVES
DECENTRALISATION
EMPOWERS
• Distributed ledgers give data owners new ways to
manage and share their data
• Terms of use can be defined and stored reliably
without the need of a central authority
• Data owners define rules about what they share and
under which conditions and links to their data
• Applications (e.g. surveys) use smart contracts to
define the data needed, the aggregations performed,
and the number of participants
• Data transactions recorded on the blockchain
How to make data interactions more
engaging?
NEW WAYS TO
ENGAGE WITH DATA
• Interfaces
• Tasks
• Tools
• Interactions
COLLECTING DATA WITH
VIRTUAL ASSISTANTS
• Same interaction possibilities for all the participants;
everyone can be interviewed by the same “interviewer”
(Johnston et al. 2013)
• Marginal additional cost per survey after the
implementation of the system (Stent et al. 2007; Johnston
et al. 2013)
• A virtual assistant can conduct a single interview in
several sessions
• Virtual assistants have prior knowledge of their users
schedule; they know when they are available to complete
the survey
DESIGNING A SURVEY
• All established guidelines are hand-coded in
the app
• The order of the questions is pre-defined
• Some questions might lead to specific
follow-ups based on the reply
• Different types of questions are linked to
different expected answers
– Binary, categorical etc.
Please
answer with
only a Yes or
a No
Select your
age range:
16-18, 19-40,
> 40
C H A L L E N G E S –
E N G A G E M E N T
C H A L L E N G E S –
E X P E C T E D
A N S W E R S
C H A L L E N G E S –
U N D E R S T A N D I N G
A N S W E R S
e.simperl@soton.ac.uk
@esimperl

More Related Content

PDF
Data stories
PDF
High-value datasets: from publication to impact
PDF
The web of data: how are we doing so far?
PDF
The story of Data Stories
PDF
One does not simply crowdsource the Semantic Web: 10 years with people, URIs,...
PDF
Pie chart or pizza: identifying chart types and their virality on Twitter
PDF
Building better knowledge graphs through social computing
PDF
The human face of AI: how collective and augmented intelligence can help sol...
Data stories
High-value datasets: from publication to impact
The web of data: how are we doing so far?
The story of Data Stories
One does not simply crowdsource the Semantic Web: 10 years with people, URIs,...
Pie chart or pizza: identifying chart types and their virality on Twitter
Building better knowledge graphs through social computing
The human face of AI: how collective and augmented intelligence can help sol...

What's hot (20)

PDF
Loops of humans and bots in Wikidata
PDF
Are our knowledge graphs trustworthy?
PDF
Introduction to data science
PDF
Isolating values from big data with the help of four v’s
PDF
From Data Platforms to Dataspaces: Enabling Data Ecosystems for Intelligent S...
PPT
Innovations in Data for Decision Making
PDF
NOVA Data Science Meetup 1/19/2017 - Presentation 1
PDF
BigDataCSEKeyNote_2012
PDF
The Evidence Hub: Harnessing the Collective Intelligence of Communities to Bu...
PDF
Big Data Analytics : A Social Network Approach
PPTX
HICSS - 50
PDF
data mining
PPTX
Analyzing Social Media with Digital Methods. Possibilities, Requirements, and...
PPTX
Intro to Data Science Concepts
PDF
Data Science and its impact on society
PDF
Designing a second generation of open data platforms
ODP
#P2Pvalue at Share and inspire: Infoday on CAPS in Horizon 2020
PDF
GI Management Transformation: from geometry to databased relationships
Loops of humans and bots in Wikidata
Are our knowledge graphs trustworthy?
Introduction to data science
Isolating values from big data with the help of four v’s
From Data Platforms to Dataspaces: Enabling Data Ecosystems for Intelligent S...
Innovations in Data for Decision Making
NOVA Data Science Meetup 1/19/2017 - Presentation 1
BigDataCSEKeyNote_2012
The Evidence Hub: Harnessing the Collective Intelligence of Communities to Bu...
Big Data Analytics : A Social Network Approach
HICSS - 50
data mining
Analyzing Social Media with Digital Methods. Possibilities, Requirements, and...
Intro to Data Science Concepts
Data Science and its impact on society
Designing a second generation of open data platforms
#P2Pvalue at Share and inspire: Infoday on CAPS in Horizon 2020
GI Management Transformation: from geometry to databased relationships
Ad

Similar to The data we want (20)

PPT
The evolution of research on social media
PPTX
The big story of small data.
PDF
The new flow of information
DOCX
Information is knowledge
PDF
The web of data: how are we doing so far
PDF
Data Tools cosystem_for_non_programmers
PDF
Data tools ecosystem for non-programmers
PPTX
Big and Small Web Data
PPTX
Data analytics introduction
PDF
Social media with big data analytics
PPTX
Big Data and the Social Sciences
PPTX
AI Project Cycle Summary Class ninth please
PDF
Open government data portals: from publishing to use and impact
PDF
The Datafied Society Studying Culture Through Data Mirko Tobias Schfer Editor...
PPTX
Data Science Innovations : Democratisation of Data and Data Science
PDF
The Impact of the Data Revolution on Official Statistics: Opportunities, Chal...
PDF
EDF2013: Invited Talk Julie Marguerite: Big data: a new world of opportunitie...
PPTX
Ralph schroeder and eric meyer
PDF
Be3 experimentingbigdatainabox-part1:comprehendingthescenario
PPTX
Spark Social Media
The evolution of research on social media
The big story of small data.
The new flow of information
Information is knowledge
The web of data: how are we doing so far
Data Tools cosystem_for_non_programmers
Data tools ecosystem for non-programmers
Big and Small Web Data
Data analytics introduction
Social media with big data analytics
Big Data and the Social Sciences
AI Project Cycle Summary Class ninth please
Open government data portals: from publishing to use and impact
The Datafied Society Studying Culture Through Data Mirko Tobias Schfer Editor...
Data Science Innovations : Democratisation of Data and Data Science
The Impact of the Data Revolution on Official Statistics: Opportunities, Chal...
EDF2013: Invited Talk Julie Marguerite: Big data: a new world of opportunitie...
Ralph schroeder and eric meyer
Be3 experimentingbigdatainabox-part1:comprehendingthescenario
Spark Social Media
Ad

More from Elena Simperl (18)

PDF
When stars align: studies in data quality, knowledge graphs, and machine lear...
PDF
Knowledge engineering: from people to machines and back
PDF
This talk was not generated with ChatGPT: how AI is changing science
PDF
Knowledge graph use cases in natural language generation
PDF
Knowledge engineering: from people to machines and back
PDF
What Wikidata teaches us about knowledge engineering
PDF
Ten myths about knowledge graphs.pdf
PDF
What Wikidata teaches us about knowledge engineering
PDF
Data commons and their role in fighting misinformation.pdf
PDF
Crowdsourcing and citizen engagement for people-centric smart cities
PDF
Qrowd and the city: designing people-centric smart cities
PDF
Qrowd and the city
PDF
Inclusive cities: a crowdsourcing approach
PDF
Making transport smarter, leveraging the human factor
PDF
Data storytelling
PDF
Quality and collaboration in Wikidata
PDF
Beyond monetary incentives: experiments with paid microtasks
PDF
The Data Pitch call
When stars align: studies in data quality, knowledge graphs, and machine lear...
Knowledge engineering: from people to machines and back
This talk was not generated with ChatGPT: how AI is changing science
Knowledge graph use cases in natural language generation
Knowledge engineering: from people to machines and back
What Wikidata teaches us about knowledge engineering
Ten myths about knowledge graphs.pdf
What Wikidata teaches us about knowledge engineering
Data commons and their role in fighting misinformation.pdf
Crowdsourcing and citizen engagement for people-centric smart cities
Qrowd and the city: designing people-centric smart cities
Qrowd and the city
Inclusive cities: a crowdsourcing approach
Making transport smarter, leveraging the human factor
Data storytelling
Quality and collaboration in Wikidata
Beyond monetary incentives: experiments with paid microtasks
The Data Pitch call

Recently uploaded (20)

PDF
.pdf is not working space design for the following data for the following dat...
PDF
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
PPTX
Business Acumen Training GuidePresentation.pptx
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PPTX
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
PPT
Quality review (1)_presentation of this 21
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PPTX
Introduction to Knowledge Engineering Part 1
PPTX
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PDF
Foundation of Data Science unit number two notes
PPTX
05. PRACTICAL GUIDE TO MICROSOFT EXCEL.pptx
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PPTX
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
.pdf is not working space design for the following data for the following dat...
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
Business Acumen Training GuidePresentation.pptx
IBA_Chapter_11_Slides_Final_Accessible.pptx
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
Data_Analytics_and_PowerBI_Presentation.pptx
Business Ppt On Nestle.pptx huunnnhhgfvu
Quality review (1)_presentation of this 21
Introduction-to-Cloud-ComputingFinal.pptx
Introduction to Knowledge Engineering Part 1
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
STUDY DESIGN details- Lt Col Maksud (21).pptx
Foundation of Data Science unit number two notes
05. PRACTICAL GUIDE TO MICROSOFT EXCEL.pptx
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
Galatica Smart Energy Infrastructure Startup Pitch Deck
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...

The data we want

  • 1. THE DATA WE WANT: FRAMEWORKS AND TOOLS TO ENGAGE WITH DATA E L E N A S I M P E R L U N I V E R S I T Y O F S O U T H A M P TO N
  • 2. WEB AND INTERNET SCIENCE @SOTON Large interdisciplinary lab, 120+ members, 25 academic staff Founding member of the Web ScienceTrust Founding partner of the Open Data Institute Developer of ePrints W3C office for UK and Ireland
  • 3. “Data is infrastructure. It underpins transparency, accountability, public services, business innovation and civil society.”
  • 6. How do we help people tell their data stories? What data stories do people share and why? How do we make data more engaging?
  • 7. A story may start by looking for the relevant data
  • 8. DATA SEARCH HOW DOES THE GOOGLE FOR DATA LOOK LIKE? • Who searches for data and why? • What sort of queries do people write? • Do they need query writing support? • How should results be displayed? • How do people pick the best results? • Do they need one or more search sessions to find what the user is looking for? • Is search exploratory?
  • 9. FRAMEWORK FOR INTERACTING WITH DATA HELPS SYSTEM DESIGNERS IDENTIFY USER TASKS AND DEFINE RELEVANT FEATURES Goal or process oriented Web Data portals People FoI Relevance Usability Quality Visual scan Obvious errors Basic stats Headers Metadata Koesten, L.M., Kacprzak, E.,Tennison, J.F. and Simperl, E., 2017, May.The Trials and Tribulations of Working with Structured Data:-a Study on Information Seeking Behaviour. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems (pp. 1277-1289).ACM.
  • 10. SEARCH LOG ANALYSIS INFORMS THE DESIGN OF DATA SEARCH ENGINES ● Four national open governmental data portals, 2.2 million queries from 2013 to 2016 (Kacprzak et al., 2017) ● Shorter queries, include temporal and location information ● Explorative search ● Native and external queries topically different ● Data requests offer more context to user intent Kacprzak, E., Koesten, L.M., Ibáñez, L.D., Simperl, E. andTennison, J.,A Query Log Analysis of Dataset Search. In International Conference onWeb Engineering (pp. 429-436).Springer, 2017.
  • 11. DATA NEEDS CONTEXT • Metadata • Summary • Provenance • Details on collection and use • Quality and reviews • Tools
  • 12. DATA SUMMARIES HELP PEOPLE MAKE SENSE OF DATA EFFECTIVELY Dataset in one sentence Format, no. of rows and columns, machine readability Headers, groupings, key columns Value types and ranges for key headers Provenance Location information Temporal information Quality, known issues Usage, trends
  • 13. Stories are meant to be shared
  • 14. VIRAL DATA HELPS (ALTERNATIVE) FACTS SPREAD FASTERS •How does data travel? –E.g. on social media •What makes data go viral? –Visualisations? –Subject matter/topic? –“Transmission vectors”: journalists, celebrities, grassroots, botnets?
  • 15. DATA SHARING HELPS (ALTERNATIVE) FACTS SPREAD FASTER • What evidence can we see of data sharing activities? – What form is data being shared in? – How are the various stages of the data science pipeline represented? • How common is data sharing? – Who is it done by? – How do they do it? • What kind of data is (not) being shared? – Does anyone share raw data? – Do narratives explicitly reference the data they are built on? • What makes data go viral? • Who makes use of the data for what purposes?
  • 16. OFFICIAL DATA • 6 weekTwitter study of ons.gov.uk • 1186 original tweets made by 898 people, with 4906 subsequent retweets • 15 most active tweeters, half work for the ONS or are official accounts of the ONS • Most retweeted tweet (503 times) is by a BBC journalist mentioning an ONS data visualisation • One of the 64 separate tweets about this ONS data release
  • 17. OPEN DATA • Six weekTwitter study of data.gov.uk • 113 original tweets made by 87 different accounts, with 258 subsequent retweets • No bias towards organisational affiliation is present in the set of active retweeters • The single most retweeted tweet (121 times) is by a Joint Nature Conservation Committee earth observation specialist. Mentions a crop map visualisation from environment.data.gov.uk
  • 18. SPREADSHEETS • No XLSX, but Google sheets • 1475 original tweets from 1067 unique accounts with 6923 retweets • No bias towards organisational affiliation is present in the set of active re- tweeters • Most retweeted spreadsheet (1188 times) is a schedule for the timings of INKIGAYO broadcasts (famous Korean livestreamed pop music program with live voting)
  • 19. SPREADSHEET CATEGORIES AND USE • Visual inspection of 100 highly retweeted sheets • sports statistics (including gambling analysis) • computer games statistics • catalogues of resources/assets (including artist’s videos or a series of TV episodes) • selling goods/artwork/services for a trader or fan group • coordinating donations/volunteers, political info • coordinating political activity • music voting • buying on behalf of an artist • monitoring cryptocurrency offerings Simple list 10% Rich data 40% Data analysis 10% Promoting action 15% Coordinating crowd action 20% Other 5%
  • 20. USE OF CHARTS • 5% (29) of sheets contained charts • 4 charts intended to promote subsequent use and discussion • Survey of fanfic community from NYC festival attendees • A maths teacher who takes part in MathsTeaching discussion groups tweeted a Google form to record preferences for banana ripeness • A study on the citation of Registered Reports in Cognitive Neuroscience • Historic weather data collected by a local citizen offered to a “sports weather” journalist Games (trading, playing, curation) 7 Politics (monitoring, organising, arguing) 6 Surveys (attitudes, phenomena) 4 Financial investment analysis 3 Personal list of assets/achievements 2 TV/radio (voting/ratings) 2 Trading (orders) 1 Miscellaneous data collection - Historic weather data - Boeing 787 production data (hobbyist) - Google Analytics audit of Udemy - Academic citation analysis 4
  • 21. USE OF CHARTS (2) • 2 charts support an argument or discussion • UN data on firearms. Discussion thread between pro- & anti- NRA positions. Sent by author, a senior technologist in Microsoft. • Use of the Physics GRE in N American University Physics admission processes. Sent by a delegate at the Conference for Undergraduate Underrepresented Minorities in Physics, not the spreadsheet author.
  • 23. DECENTRALISATION EMPOWERS • Distributed ledgers give data owners new ways to manage and share their data • Terms of use can be defined and stored reliably without the need of a central authority • Data owners define rules about what they share and under which conditions and links to their data • Applications (e.g. surveys) use smart contracts to define the data needed, the aggregations performed, and the number of participants • Data transactions recorded on the blockchain
  • 24. How to make data interactions more engaging?
  • 25. NEW WAYS TO ENGAGE WITH DATA • Interfaces • Tasks • Tools • Interactions
  • 26. COLLECTING DATA WITH VIRTUAL ASSISTANTS • Same interaction possibilities for all the participants; everyone can be interviewed by the same “interviewer” (Johnston et al. 2013) • Marginal additional cost per survey after the implementation of the system (Stent et al. 2007; Johnston et al. 2013) • A virtual assistant can conduct a single interview in several sessions • Virtual assistants have prior knowledge of their users schedule; they know when they are available to complete the survey
  • 27. DESIGNING A SURVEY • All established guidelines are hand-coded in the app • The order of the questions is pre-defined • Some questions might lead to specific follow-ups based on the reply • Different types of questions are linked to different expected answers – Binary, categorical etc. Please answer with only a Yes or a No Select your age range: 16-18, 19-40, > 40
  • 28. C H A L L E N G E S – E N G A G E M E N T
  • 29. C H A L L E N G E S – E X P E C T E D A N S W E R S
  • 30. C H A L L E N G E S – U N D E R S T A N D I N G A N S W E R S