SlideShare a Scribd company logo
How the Web
can change
social science research
(including yours)
Frank van Harmelen
Computer Science Department
VU University Amsterdam
Creative Commons License:
allowed to share & remix,
but must attribute & non-commercial
Using the web (of data)
for e-science
in Social Sciences
Frank van Harmelen
Computer Science Department
VU University Amsterdam
Creative Commons License:
allowed to share & remix,
but must attribute & non-commercial
Health Warning:
Computer
Scientist!
This talk is about
using the web
as an observational instrument
using the web of data
as an even better observational instrument
using the web of data
as a data-sharing platform
This talk is not about
it's NOT social science about e-science
(e.g Oxford research center)
it's NOT about high-performance computing
(that's just boring infrastructure,
let the computer scientists will deal with that)
I don’t discuss online social experiments
(crowd sourcing, social games, mech. turk, etc)
Who are you?
 who is using large computerised data-sets ?
 who is using data extracted from the web ?
 who is using semantic web data ?
This talk is about
using the web & the web of data
as an observational instrument &
as a sharing platform
Through:
A whole bunch of realistic examples
A sketch of the technology
Message = yes, you can do this too!
Philosophical confession
I take a strongly positivistic stance
Revolution ahead?
Effects of
observation instruments
Effects of
observation instruments
Effects of
observation instruments
Effects of
observation instruments
Effects of
observation instruments
Example:
Political science
Question: Is the content of party-political
programmes and election speeches predictive
of government coalition attempts?
Data
• All party manifesto’s,
• half a year of all Dutch newspapers
Example:
Communication science
Question: Can we predict the social network
at Tn from the content at Tn-1?
Data
• Discussions from online forum nl.politiek
• 21.000 participants talking about 19 Dutch
political parties during 259 weeks
Example:
Science dynamics
Question: Is thematic co-occurence at Yn
predictive of co-authoring at Yn+1?
Data:
5 year conference series,
1000 papers/year, 3000 authors/year
AmCAT3: Keyword search
This works…. sort of….
Methods:
web scraping
nat. lang. analysis
(parsing, stemming, synonyms, homonyms)
identity resolution
Required
Physical Interoperability
Syntactic Interoperability
Semantic Interoperability
Web of Data
to the rescue
General idea of Web of Data
(a.k.a. “Semantic Web”)
1. Make data available on the Web
in machine-understandable form
(formalised)
2. Structure the data
and meta-data
in ontologies
Warning:
technical content
coming up
Bluffer’s Guide to RDF
• Express relations between things:
• Results in labelled network (“graph”)
• All labels are actually web-addresses (URIs)
• You can “ping” any label and find out more
• Bits of the graph can live at physically different
locations & have different owners
Frank y
x
AuthorOf
MIT
publishedBy
Subject Object
Predicate
Bluffer’s Guide to RDF Schema
• types for subjects & objects & predicates
• Types organised in a hierarchy
• Inheritance of properties
Frank y
x
AuthorOf
MIT
publishedBy
author book publisher
person artifact
man
Ontologies (= hierarchical
conceptual vocabularies)
Identify the key concepts in a domain
Identify a vocabulary for these concepts
Identify relations between these concepts
Make these precise enough
so that they can be shared between
• humans and humans
• humans and machines
• machines and machines
Biomedical ontologies (a few..)
 Mesh
• Medical Subject Headings, National Library of Medicine
• 22.000 descriptions
 EMTREE
• Commercial Elsevier, Drugs and diseases
• 45.000 terms, 190.000 synonyms
 UMLS
• Integrates 100 different vocabularies
 SNOMED
• 200.000 concepts, College of American Pathologists
 Gene Ontology
• 15.000 terms in molecular biology
 NCBI Cancer Ontology:
• 17,000 classes (about 1M definitions),
On the Web of Data, anyone
can link anything to anything
x T
[<x> IsOfType <T>]
different
owners & locations
<institute>
SPARQL: Bluffer’s Guide
SELECT ?country_name ?population
WHERE {
?country a type:LandlockedCountries ;
?country rdfs:label ?country_name ;
?country prop:populationEstimate ?population .
FILTER (?population > 15000000) .
SELECT ?name ?img ?hp ?loc
WHERE {
?a a mo:MusicArtist ;
?a foaf:name ?name .
OPTIONAL { ?a foaf:homepage ?hp } .
}
Example:
science dynamics
Faculteit der Exacte Wetenschappen
MEET JULIE
PhD Student
“institutional influences on
collaboration patterns in
interdisciplinary research”
Faculteit der Exacte Wetenschappen
Julie needs data
33
Faculteit der Exacte Wetenschappen
34
Faculteit der Exacte Wetenschappen
DBLP: RDF & RDF Schema
Faculteit der Exacte Wetenschappen
36
SELECT ?author ?affiliation ?uriAffiliation WHERE
{
GRAPH <$graph> {
{<$article> swrc:author ?author.
OPTIONAL{?author swrc:affiliation ?uriAffiliation.}
OPTIONAL{?author swc:affiliation ?affiliation.} }
}
}
DBLP Query: 2 weeks  15 mins.
UNION {
<$article> foaf:maker ?author.
OPTIONAL{?author swrc:affiliation ?uriAffiliation.}
OPTIONAL{?author swc:affiliation ?affiliation.}
}
UNION {
<$article> dc:creator ?author.
OPTIONAL{?author swrc:affiliation ?uriAffiliation.}
OPTIONAL{?author swc:affiliation ?affiliation.}
}
Example:
Dutch census data
(1795 – 1971)
 40.745.554.078 triples
 Semantically rich
Who’s doing it?
The World Bank is also doing it!
http://data.worldbank.org/
7,000 indicators from World Bank data sets.
The US gov is also doing it!
http://data.gov/ : 390.000 data sets
Compare foreign aid budgets
Does tax influence smokers?
Compare campaign money
already many billions of facts & rules
Everybody’s doing it!
May ‘09 estimate > 4.2 billion triples +
140 million interlinks
It gets bigger every month
It gets bigger every month
And many more
• Reuters
• New York Times
• EU (EUROSTAT, others)
• BBC
• Facebook
• ….
So how good is this
observational instrument ?
Studies on validity (e.g. in science dynamics)
methods for provenance & trust
methods for attribution & citation
For real ?
“ use the power of information to
explore social and economic life on
Earth ”
1bn€ over 10 years
Pfew….
Take home message
use the web & the web-of-data
to obtain your data
use the web-of-data to share your data
yes, you can do this too!
Collaborate with computer scientists
reflect on deeper consquences
for the social sciences
(methodological, theoretical, etc)
Acknowledgements
I’ve freely used material from the work of
Shenghui Wang
Paul Groth
Julie Birkholz
Wouter van Atteveldt
Laurens van Rietveld
Rinke Hoekstra
and many in the Semantic Web community

More Related Content

PPTX
Data-mining the Semantic Web
PPTX
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...
PPTX
Exposing Humanities Data for Reuse and Linking - RED, linked data and the sem...
PPT
The SFX Framework for Context-Sensitive Reference Linking
PPTX
Research Data Sharing: A Basic Framework
PPTX
Extracting Relevant Questions to an RDF Dataset Using Formal Concept Analysis
PPTX
The Web of Data: do we actually understand what we built?
PPT
2011linked science4mccuskermcguinnessfinal
Data-mining the Semantic Web
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...
Exposing Humanities Data for Reuse and Linking - RED, linked data and the sem...
The SFX Framework for Context-Sensitive Reference Linking
Research Data Sharing: A Basic Framework
Extracting Relevant Questions to an RDF Dataset Using Formal Concept Analysis
The Web of Data: do we actually understand what we built?
2011linked science4mccuskermcguinnessfinal

What's hot (20)

PPTX
Crowdsourcing the Quality of Knowledge Graphs: A DBpedia Study
PPTX
Doing Clever Things with the Semantic Web
PDF
Analysing & Improving Learning Resources Markup on the Web
PDF
Web Data Management in the RDF Age
PPTX
Experience from 10 months of University Linked Data
PPTX
Working with data.open.ac.uk, the Linked Data Platform of the Open University
PPT
Linked Open Data for Libraries
PPTX
Semantic web Santhosh N Basavarajappa
PPT
Maass mass-omaha
PDF
NetIKX Semantic Search Presentation
PDF
Big data and statisticians
PDF
Exploration, visualization and querying of linked open data sources
PPTX
Keystone Summer School 2015: Mauro Dragoni, Ontologies For Information Retrieval
PDF
JHU Data Science MOOCs - Behind the Scenes
PDF
Linked Data
PPTX
LUCERO - Building the Open University Web of Linked Data
PDF
Kno we scape2014-thess-bouchoumarkhoff
PDF
Web Science Synergies: Exploring Web Knowledge through the Semantic Web
PDF
Beyond Linked Data - Exploiting Entity-Centric Knowledge on the Web
PDF
Introduction of Knowledge Graphs
Crowdsourcing the Quality of Knowledge Graphs: A DBpedia Study
Doing Clever Things with the Semantic Web
Analysing & Improving Learning Resources Markup on the Web
Web Data Management in the RDF Age
Experience from 10 months of University Linked Data
Working with data.open.ac.uk, the Linked Data Platform of the Open University
Linked Open Data for Libraries
Semantic web Santhosh N Basavarajappa
Maass mass-omaha
NetIKX Semantic Search Presentation
Big data and statisticians
Exploration, visualization and querying of linked open data sources
Keystone Summer School 2015: Mauro Dragoni, Ontologies For Information Retrieval
JHU Data Science MOOCs - Behind the Scenes
Linked Data
LUCERO - Building the Open University Web of Linked Data
Kno we scape2014-thess-bouchoumarkhoff
Web Science Synergies: Exploring Web Knowledge through the Semantic Web
Beyond Linked Data - Exploiting Entity-Centric Knowledge on the Web
Introduction of Knowledge Graphs
Ad

Viewers also liked (8)

PPT
Social Research Ethics
PPT
Constitutional Convention Powerpoint
PPTX
Reconciling Humanities and Social Science Research With Data Protection
PPT
Research ethics overview for social science researchers
PPT
concept of research
PPT
Indian Legal System An Introduction
PPT
Hierarchy of courts
PPTX
Types of Research
Social Research Ethics
Constitutional Convention Powerpoint
Reconciling Humanities and Social Science Research With Data Protection
Research ethics overview for social science researchers
concept of research
Indian Legal System An Introduction
Hierarchy of courts
Types of Research
Ad

Similar to How the Web can change social science research (including yours) (20)

PDF
Connections that work: Linked Open Data demystified
PPT
Semantic Web in Action
PPT
EgoSystem: Presentation to LITA, American Library Association, Nov 8 2014
PPTX
How To Make Linked Data More than Data
PPTX
How To Make Linked Data More than Data
PDF
What Academia Can Learn from Open Source
PDF
bridging formal semantics and social semantics on the web
PDF
Linked Open Data Visualization
PPT
In search of lost knowledge: joining the dots with Linked Data
PPTX
Knowledge Graph Construction and the Role of DBPedia
PPT
Information Extraction and Linked Data Cloud
PPT
Neno/Fhat: Semantic Network Programming Language and Virtual Machine Specific...
PPT
eScience: A Transformed Scientific Method
PPTX
Eureka Research Workbench: A Semantic Approach to an Open Source Electroni...
PPT
Blogs Logs Pods: Smart Labs
ODT
Riding The Semantic Wave
PPT
Semantic Technolgy
PPTX
Big Data e tecnologie semantiche - Utilizzare i Linked data come driver d'int...
PPT
A Practical Ontology for the Large-Scale Modeling of Scholarly Artifacts and ...
PPTX
OU Rise library analytics viz
Connections that work: Linked Open Data demystified
Semantic Web in Action
EgoSystem: Presentation to LITA, American Library Association, Nov 8 2014
How To Make Linked Data More than Data
How To Make Linked Data More than Data
What Academia Can Learn from Open Source
bridging formal semantics and social semantics on the web
Linked Open Data Visualization
In search of lost knowledge: joining the dots with Linked Data
Knowledge Graph Construction and the Role of DBPedia
Information Extraction and Linked Data Cloud
Neno/Fhat: Semantic Network Programming Language and Virtual Machine Specific...
eScience: A Transformed Scientific Method
Eureka Research Workbench: A Semantic Approach to an Open Source Electroni...
Blogs Logs Pods: Smart Labs
Riding The Semantic Wave
Semantic Technolgy
Big Data e tecnologie semantiche - Utilizzare i Linked data come driver d'int...
A Practical Ontology for the Large-Scale Modeling of Scholarly Artifacts and ...
OU Rise library analytics viz

More from Frank van Harmelen (20)

PPTX
Neuro-symbolic is not enough, we need neuro-*semantic*
PPTX
The K in "neuro-symbolic" stands for "knowledge"
PPTX
Adoption of Knowledge Graphs, mid 2022 (incomplete)
PPTX
Modular design patterns for systems that learn and reason: a boxology
PPTX
Adoption of Knowledge Graphs, late 2019
PPTX
Adoption of Knowledge Graphs, mid 2019
PPTX
Empirical Semantics
PPTX
The Empirical Turn in Knowledge Representation
PPTX
The end of the scientific paper as we know it (or not...)
PPTX
On the nature of AI, and the relation between symbolic and statistical approa...
PPTX
The end of the scientific paper as we know it (in 4 easy steps)
PPTX
Linked Open Data for Medical Guidelines Interactions
PPTX
Semantic Web questions we couldn't ask 10 years ago
PPT
Knowledge Engineering rediscovered, Towards Reasoning Patterns for the Semant...
PPTX
Informatics is a natural science
PPTX
4 Popular Fallacies about the Semantic Web
PPT
PPT
Het slimme Web 3.0
PPT
OWL briefing
PPT
RDF briefing
Neuro-symbolic is not enough, we need neuro-*semantic*
The K in "neuro-symbolic" stands for "knowledge"
Adoption of Knowledge Graphs, mid 2022 (incomplete)
Modular design patterns for systems that learn and reason: a boxology
Adoption of Knowledge Graphs, late 2019
Adoption of Knowledge Graphs, mid 2019
Empirical Semantics
The Empirical Turn in Knowledge Representation
The end of the scientific paper as we know it (or not...)
On the nature of AI, and the relation between symbolic and statistical approa...
The end of the scientific paper as we know it (in 4 easy steps)
Linked Open Data for Medical Guidelines Interactions
Semantic Web questions we couldn't ask 10 years ago
Knowledge Engineering rediscovered, Towards Reasoning Patterns for the Semant...
Informatics is a natural science
4 Popular Fallacies about the Semantic Web
Het slimme Web 3.0
OWL briefing
RDF briefing

Recently uploaded (20)

PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Encapsulation theory and applications.pdf
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Empathic Computing: Creating Shared Understanding
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PPTX
Big Data Technologies - Introduction.pptx
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
A comparative analysis of optical character recognition models for extracting...
PDF
Approach and Philosophy of On baking technology
PPTX
SOPHOS-XG Firewall Administrator PPT.pptx
PPTX
Spectroscopy.pptx food analysis technology
PPTX
Tartificialntelligence_presentation.pptx
PPTX
Group 1 Presentation -Planning and Decision Making .pptx
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Getting Started with Data Integration: FME Form 101
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
Mobile App Security Testing_ A Comprehensive Guide.pdf
Encapsulation theory and applications.pdf
The Rise and Fall of 3GPP – Time for a Sabbatical?
Empathic Computing: Creating Shared Understanding
gpt5_lecture_notes_comprehensive_20250812015547.pdf
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Big Data Technologies - Introduction.pptx
Building Integrated photovoltaic BIPV_UPV.pdf
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
“AI and Expert System Decision Support & Business Intelligence Systems”
A comparative analysis of optical character recognition models for extracting...
Approach and Philosophy of On baking technology
SOPHOS-XG Firewall Administrator PPT.pptx
Spectroscopy.pptx food analysis technology
Tartificialntelligence_presentation.pptx
Group 1 Presentation -Planning and Decision Making .pptx
Diabetes mellitus diagnosis method based random forest with bat algorithm
Getting Started with Data Integration: FME Form 101
Digital-Transformation-Roadmap-for-Companies.pptx

How the Web can change social science research (including yours)

  • 1. How the Web can change social science research (including yours) Frank van Harmelen Computer Science Department VU University Amsterdam Creative Commons License: allowed to share & remix, but must attribute & non-commercial
  • 2. Using the web (of data) for e-science in Social Sciences Frank van Harmelen Computer Science Department VU University Amsterdam Creative Commons License: allowed to share & remix, but must attribute & non-commercial Health Warning: Computer Scientist!
  • 3. This talk is about using the web as an observational instrument using the web of data as an even better observational instrument using the web of data as a data-sharing platform
  • 4. This talk is not about it's NOT social science about e-science (e.g Oxford research center) it's NOT about high-performance computing (that's just boring infrastructure, let the computer scientists will deal with that) I don’t discuss online social experiments (crowd sourcing, social games, mech. turk, etc)
  • 5. Who are you?  who is using large computerised data-sets ?  who is using data extracted from the web ?  who is using semantic web data ?
  • 6. This talk is about using the web & the web of data as an observational instrument & as a sharing platform Through: A whole bunch of realistic examples A sketch of the technology Message = yes, you can do this too!
  • 7. Philosophical confession I take a strongly positivistic stance
  • 15. Question: Is the content of party-political programmes and election speeches predictive of government coalition attempts? Data • All party manifesto’s, • half a year of all Dutch newspapers
  • 17. Question: Can we predict the social network at Tn from the content at Tn-1? Data • Discussions from online forum nl.politiek • 21.000 participants talking about 19 Dutch political parties during 259 weeks
  • 19. Question: Is thematic co-occurence at Yn predictive of co-authoring at Yn+1? Data: 5 year conference series, 1000 papers/year, 3000 authors/year
  • 21. This works…. sort of…. Methods: web scraping nat. lang. analysis (parsing, stemming, synonyms, homonyms) identity resolution Required Physical Interoperability Syntactic Interoperability Semantic Interoperability
  • 22. Web of Data to the rescue
  • 23. General idea of Web of Data (a.k.a. “Semantic Web”) 1. Make data available on the Web in machine-understandable form (formalised) 2. Structure the data and meta-data in ontologies
  • 25. Bluffer’s Guide to RDF • Express relations between things: • Results in labelled network (“graph”) • All labels are actually web-addresses (URIs) • You can “ping” any label and find out more • Bits of the graph can live at physically different locations & have different owners Frank y x AuthorOf MIT publishedBy Subject Object Predicate
  • 26. Bluffer’s Guide to RDF Schema • types for subjects & objects & predicates • Types organised in a hierarchy • Inheritance of properties Frank y x AuthorOf MIT publishedBy author book publisher person artifact man
  • 27. Ontologies (= hierarchical conceptual vocabularies) Identify the key concepts in a domain Identify a vocabulary for these concepts Identify relations between these concepts Make these precise enough so that they can be shared between • humans and humans • humans and machines • machines and machines
  • 28. Biomedical ontologies (a few..)  Mesh • Medical Subject Headings, National Library of Medicine • 22.000 descriptions  EMTREE • Commercial Elsevier, Drugs and diseases • 45.000 terms, 190.000 synonyms  UMLS • Integrates 100 different vocabularies  SNOMED • 200.000 concepts, College of American Pathologists  Gene Ontology • 15.000 terms in molecular biology  NCBI Cancer Ontology: • 17,000 classes (about 1M definitions),
  • 29. On the Web of Data, anyone can link anything to anything x T [<x> IsOfType <T>] different owners & locations <institute>
  • 30. SPARQL: Bluffer’s Guide SELECT ?country_name ?population WHERE { ?country a type:LandlockedCountries ; ?country rdfs:label ?country_name ; ?country prop:populationEstimate ?population . FILTER (?population > 15000000) . SELECT ?name ?img ?hp ?loc WHERE { ?a a mo:MusicArtist ; ?a foaf:name ?name . OPTIONAL { ?a foaf:homepage ?hp } . }
  • 32. Faculteit der Exacte Wetenschappen MEET JULIE PhD Student “institutional influences on collaboration patterns in interdisciplinary research”
  • 33. Faculteit der Exacte Wetenschappen Julie needs data 33
  • 34. Faculteit der Exacte Wetenschappen 34
  • 35. Faculteit der Exacte Wetenschappen DBLP: RDF & RDF Schema
  • 36. Faculteit der Exacte Wetenschappen 36 SELECT ?author ?affiliation ?uriAffiliation WHERE { GRAPH <$graph> { {<$article> swrc:author ?author. OPTIONAL{?author swrc:affiliation ?uriAffiliation.} OPTIONAL{?author swc:affiliation ?affiliation.} } } } DBLP Query: 2 weeks  15 mins. UNION { <$article> foaf:maker ?author. OPTIONAL{?author swrc:affiliation ?uriAffiliation.} OPTIONAL{?author swc:affiliation ?affiliation.} } UNION { <$article> dc:creator ?author. OPTIONAL{?author swrc:affiliation ?uriAffiliation.} OPTIONAL{?author swc:affiliation ?affiliation.} }
  • 38.  40.745.554.078 triples  Semantically rich
  • 40. The World Bank is also doing it! http://data.worldbank.org/ 7,000 indicators from World Bank data sets.
  • 41. The US gov is also doing it! http://data.gov/ : 390.000 data sets Compare foreign aid budgets Does tax influence smokers? Compare campaign money
  • 42. already many billions of facts & rules Everybody’s doing it! May ‘09 estimate > 4.2 billion triples + 140 million interlinks It gets bigger every month
  • 43. It gets bigger every month
  • 44. And many more • Reuters • New York Times • EU (EUROSTAT, others) • BBC • Facebook • ….
  • 45. So how good is this observational instrument ? Studies on validity (e.g. in science dynamics) methods for provenance & trust methods for attribution & citation
  • 46. For real ? “ use the power of information to explore social and economic life on Earth ” 1bn€ over 10 years
  • 48. Take home message use the web & the web-of-data to obtain your data use the web-of-data to share your data yes, you can do this too! Collaborate with computer scientists reflect on deeper consquences for the social sciences (methodological, theoretical, etc)
  • 49. Acknowledgements I’ve freely used material from the work of Shenghui Wang Paul Groth Julie Birkholz Wouter van Atteveldt Laurens van Rietveld Rinke Hoekstra and many in the Semantic Web community

Editor's Notes

  • #4: Add pictures
  • #5: Add pictures
  • #7: Add pictures
  • #34: Talk about citation data, difficult to get2 weeks to gather a couple of hundred citation scores
  • #35: Open data to the rescue…. (
  • #37: FasterEasier to experimentAccess to more data