SlideShare a Scribd company logo
Up and running with Wikidata 
Emw 
New York City Wikidata workshop 
2014-12-14
Wikidata is a free linked database that can be 
read and edited by 
humans and machines.
Wikidata's goals 
● Centralize interwiki links 
● Centralize infoboxes 
● Provide an interface for rich queries 
● Structure the sum of all human knowledge
What you'll learn from this talk 
● How to edit Wikidata 
● How to classify 
● Ideas for small projects 
● Wikidata vocabulary 
● Where to find things 
● Awesome tools
Elements of a Wikidata statement
Example: New York City (Q60)
Items and properties 
● Each item and property has its own page 
● Items 
– Represent subjects: Douglas Adams, Challenger disaster 
– Have identifiers like Q42, Q921090 
– 12,906,291 items 
● Properties 
– Represent attribute names: occupation, has cause 
– Have identifiers like P106, P828 
– 1,329 properties
Statements and claims 
● Claims 
– Claims are “triplets” 
● Formally: subject, predicate, object 
● In Wikidata: item, property, value 
● Example: Douglas Adams, occupation, author 
● Statements 
– A claim is only part of a statement 
– Statements also include: 
● References 
● Ranks
Qualifiers, ranks, references 
●Qualifiers 
– Qualifiers are properties used on claims rather than items 
– “Yonkers population 12,733 at time (P585) 1860” 
●Ranks 
– Preferred, normal, deprecated 
– Useful to mark outdated claims 
●References 
– Source of claim; provenance 
– “... stated in (P248) 1860 United States Census”
More on Wikidata vocabulary 
https://www.wikidata.org/wiki/Wikidata:Glossary
Finding Wikidata items 
Wikipedia articles have a Wikidata item link in the 
left navigation panel.
Up and running with Wikidata
Finding Wikidata items 
Wikidata search is quick and effective. 
Instant search suggests items that have labels or 
aliases matching your keyword.
Search by label
Search by alias: “flu” -> influenza
Finding properties 
● Is there a property for “number of windows”? 
● What was the ID of that property, again? 
● Search 
– In main site search box, prefix search term with “P:” 
– “P:number of”, “P:occupation” 
– Instant search doesn't work for properties, only items 
● Browse 
– https://www.wikidata.org/wiki/Wikidata:List_of_properties 
^ bookmark this!
Let's edit Wikidata.
Walking through edits for: 
Yonkers, New York 
https://www.wikidata.org/wiki/Q128114
Yonkers TODO 
https://www.wikidata.org/wiki/Q128114 
● population (P1082) claims for historical table in 
https://en.wikipedia.org/wiki/Yonkers,_New_York#Demographics 
● Include references! “1860 United States Census”, etc. 
● To add to item from inbofox: 
– head of government (P6), office held by head of government (P1313) 
– date of foundation or creation (P571) 
– ZIP code (P281)
Area? Population density? 
● Properties with units (km^2, people/km^2, $) 
are not yet possible 
● “Units” datatype in development 
● https://phabricator.wikimedia.org/T65722
Tools 
– Querying: Autolist, by Magnus Manske 
● http://tools.wmflabs.org/autolist/autolist1.html 
– Batch editing: Widar, by Magnus Manske 
● https://tools.wmflabs.org/autolist/ 
– Software framework: Wikidata Toolkit, by Markus Kroetzsch et al. 
● https://www.mediawiki.org/wiki/Wikidata_Toolkit 
● https://github.com/Wikidata/Wikidata-Toolkit
Querying in Wikidata 
List of politicians who died of a heart attack 
Pseudo-query: 
occupation: politician AND cause of death: heart attack 
occupation: P106 
politician: Q82955 
cause of death: P509 
heart attack: Q12152 
Wikidata query in Autolist: 
claim[106:82955] AND claim[509:12152]
http://tools.wmflabs.org/autolist/autolist1.html?q=claim[106:82955]%20AND%20claim[509:12152]
Classification on Wikidata 
● Taxonomy of knowledge 
● Enables powerful inference, novel applications 
● Interesting philosophical, design, and engineering issues
Up and running with Wikidata
Tree of Porphyry 
User:VoiceOfTheCommons, CC-BY-SA 3.0
Classes and instances 
● Plato is a human is a animal 
● Plato instance of human subclass of animal 
● Instance: concrete object, individual 
● Class: abstract object
Classification on Wikidata 
● instance of (P31) 
– rdf:type in RDF and OWL 
– 11,930,243 usages 
– Most popular Wikidata property 
● subclass of (P279) 
– “all instances of A are also instances of B” 
– rdfs:subClassOf in RDF and OWL 
– 170,571 usages
Examples 
● USS Nimitz instance of Nimitz-class aircraft carrier 
Nimitz-class aircraft carrier subclass of aircraft carrier 
● 2012 Cannes Film Festival instance of Cannes Film Festival 
Cannes Film Festival subclass of film festival 
● an individual charm quark instance of charm quark 
charm quark subclass of quark 
^ Many “leaf nodes” in Wikidata's taxonomic hierarchy are not instances. 
(There are no items about individual quarks on Wikidata!) 
https://www.wikidata.org/wiki/Help:Basic_membership_properties
Bad smells 
Item has many instance of or subclass of claims 
Items typically satisfy a huge number of instance of claims: 
● Fido instance of dog 
● Fido instance of English Pointer 
● Fido instance of faithful animal 
● … 
Solution: use one class for instance of, put other class 
knowledge into normal properties 
● Fido instance of dog 
● Fido breed: English Pointer 
● Fido known for: faithfulness 
● ...
Bad smells 
subclass of claim that is nonsensical when interpreted as “All 
instances of A are also instances of B” 
Example: 
dog subclass of pet 
But not all dogs are pets! 
feral dog subclass of dog true 
feral dog subclass of pet false 
:. dog subclass of pet false 
Solution: put “pet” knowledge about dogs into claim that does not 
apply to all instances of dog. E.g. “dog has role pet”. (Has role 
would not be transitive. Also needed: some/all quantifier.)
Classification on Wikidata 
● Last but not least: part of (P361) 
– Third basic membership property 
– Top-level “part-whole” relation 
● Instance of, subclass of and part of are all transitive 
● Transitive relation: 
A subclass of B 
B subclass of C 
:. A subclass of C 
https://www.wikidata.org/wiki/Help:Basic_membership_properties
Ideas for small projects 
● Add data about towns and cities 
– population (P1082) 
– head of government (P6) 
● Add medical knowledge about historical figures 
– medical condition (P1050) 
– cause of death (P509) 
– manner of death (P1196) 
● Add cultural knowledge about works of art 
– instance of (P31) 
– creator (P170) 
– material used (P186) 
– collection (P195)

More Related Content

PPTX
Introduction to RDF Data Model
PDF
20110728 datalift-rpi-troy
PDF
Datalift a-catalyser-for-the-web-of-data-fosdem-05-02-2011
PDF
Event-based archival descriptions
PPTX
Federated Query Formulation and Processing Through BioFed
PPTX
Federated SPARQL query processing over the Web of Data
PPT
PDF
RDF, SPARQL and Semantic Repositories
Introduction to RDF Data Model
20110728 datalift-rpi-troy
Datalift a-catalyser-for-the-web-of-data-fosdem-05-02-2011
Event-based archival descriptions
Federated Query Formulation and Processing Through BioFed
Federated SPARQL query processing over the Web of Data
RDF, SPARQL and Semantic Repositories

What's hot (18)

PPTX
FedX - Optimization Techniques for Federated Query Processing on Linked Data
ODP
Semantic Web And Coldfusion
PDF
Wikipedia infobox type_prediction_slides_dl4_k_gs
PDF
Web Data Management with RDF
PDF
An Introduction to RDF and the Web of Data
PPTX
Rich Data? Poor Data? Depends on...
PDF
Keynote session - LOD2014 W3C event
PDF
18 ° Nexa Lunch Seminar - Lo stato dell'arte dei Linked Open Data italiani
PDF
PyConUK 2016 - Writing English Right
PPTX
ALEC (A List of Everything Cool)
PDF
On the Way to a Holding Ontology
PDF
PDF
Connections that work: Linked Open Data demystified
PPSX
December 2, 2015: NISO/NFAIS Virtual Conference: Semantic Web: What's New and...
PPTX
DBpedia Archive using Memento, Triple Pattern Fragments, and HDT
PDF
2011 4IZ440 Semantic Web – RDF, SPARQL, and software APIs
PPT
MLB Teams Logic Puzzle Solution
FedX - Optimization Techniques for Federated Query Processing on Linked Data
Semantic Web And Coldfusion
Wikipedia infobox type_prediction_slides_dl4_k_gs
Web Data Management with RDF
An Introduction to RDF and the Web of Data
Rich Data? Poor Data? Depends on...
Keynote session - LOD2014 W3C event
18 ° Nexa Lunch Seminar - Lo stato dell'arte dei Linked Open Data italiani
PyConUK 2016 - Writing English Right
ALEC (A List of Everything Cool)
On the Way to a Holding Ontology
Connections that work: Linked Open Data demystified
December 2, 2015: NISO/NFAIS Virtual Conference: Semantic Web: What's New and...
DBpedia Archive using Memento, Triple Pattern Fragments, and HDT
2011 4IZ440 Semantic Web – RDF, SPARQL, and software APIs
MLB Teams Logic Puzzle Solution
Ad

Viewers also liked (6)

PDF
An Ambitious Wikidata Tutorial
PDF
Wikidata for libraries and archives
PPTX
Wikidata presentation at SemTechBiz Berlin 2012
PPTX
Lecture slides6; Construction contract financial planning
PPT
Argument pp 1
PPTX
Wikidata and the Semantic Web of Food
An Ambitious Wikidata Tutorial
Wikidata for libraries and archives
Wikidata presentation at SemTechBiz Berlin 2012
Lecture slides6; Construction contract financial planning
Argument pp 1
Wikidata and the Semantic Web of Food
Ad

Similar to Up and running with Wikidata (20)

PDF
20141114 wikidata glam_workshop2
PDF
Entity Linking, Link Prediction, and Knowledge Graph Completion
PPTX
A Little SPARQL in your Analytics
PDF
Understanding the Standards Gap
PDF
Evaluation Initiatives for Entity-oriented Search
PDF
Häskell und Grepl: Data Hacking Wikimedia Projects Exampled With Open Access ...
PDF
Csvconf data hacking-with_wikimedia_projects
PDF
Quipu: Quechua Knowledge Graph [Pilot: Building virtual assistants based on Q...
PDF
Curation and Digital Storytelling
ODP
2014 10-11 Wikidata talk London WMF UK
PDF
Web Driven Revolution For Library Data
PDF
VALA Tech Camp 2017: Intro to Wikidata & SPARQL
ODP
2014-02-27 Wikidata talk Cambridge
PDF
Digital Narratives for Transylvania DH
PDF
DCMI ARK Tutorial 2024.10.20, slides and notes, 120 mins.pdf
PDF
Universal Topic Classification - Named Entity Disambiguation (IKS Workshop Pa...
PDF
Arjun@WISE-2020 : Encoding Knowledge Graph Context in an Attentive Neural Net...
PDF
Wikipedia and Civic Engagement
PPTX
Creating Narrative with Digital Objects
PDF
Entity Retrieval (SIGIR 2013 tutorial)
20141114 wikidata glam_workshop2
Entity Linking, Link Prediction, and Knowledge Graph Completion
A Little SPARQL in your Analytics
Understanding the Standards Gap
Evaluation Initiatives for Entity-oriented Search
Häskell und Grepl: Data Hacking Wikimedia Projects Exampled With Open Access ...
Csvconf data hacking-with_wikimedia_projects
Quipu: Quechua Knowledge Graph [Pilot: Building virtual assistants based on Q...
Curation and Digital Storytelling
2014 10-11 Wikidata talk London WMF UK
Web Driven Revolution For Library Data
VALA Tech Camp 2017: Intro to Wikidata & SPARQL
2014-02-27 Wikidata talk Cambridge
Digital Narratives for Transylvania DH
DCMI ARK Tutorial 2024.10.20, slides and notes, 120 mins.pdf
Universal Topic Classification - Named Entity Disambiguation (IKS Workshop Pa...
Arjun@WISE-2020 : Encoding Knowledge Graph Context in an Attentive Neural Net...
Wikipedia and Civic Engagement
Creating Narrative with Digital Objects
Entity Retrieval (SIGIR 2013 tutorial)

Recently uploaded (20)

PPTX
Supervised vs unsupervised machine learning algorithms
PPTX
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PPTX
Business Acumen Training GuidePresentation.pptx
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
PPTX
Introduction to Knowledge Engineering Part 1
PPT
Chapter 2 METAL FORMINGhhhhhhhjjjjmmmmmmmmm
PPTX
climate analysis of Dhaka ,Banglades.pptx
PDF
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PDF
.pdf is not working space design for the following data for the following dat...
PDF
Mega Projects Data Mega Projects Data
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PPTX
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
Supervised vs unsupervised machine learning algorithms
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
oil_refinery_comprehensive_20250804084928 (1).pptx
STUDY DESIGN details- Lt Col Maksud (21).pptx
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
Business Acumen Training GuidePresentation.pptx
Galatica Smart Energy Infrastructure Startup Pitch Deck
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
Business Ppt On Nestle.pptx huunnnhhgfvu
Introduction to Knowledge Engineering Part 1
Chapter 2 METAL FORMINGhhhhhhhjjjjmmmmmmmmm
climate analysis of Dhaka ,Banglades.pptx
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
Acceptance and paychological effects of mandatory extra coach I classes.pptx
.pdf is not working space design for the following data for the following dat...
Mega Projects Data Mega Projects Data
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb

Up and running with Wikidata

  • 1. Up and running with Wikidata Emw New York City Wikidata workshop 2014-12-14
  • 2. Wikidata is a free linked database that can be read and edited by humans and machines.
  • 3. Wikidata's goals ● Centralize interwiki links ● Centralize infoboxes ● Provide an interface for rich queries ● Structure the sum of all human knowledge
  • 4. What you'll learn from this talk ● How to edit Wikidata ● How to classify ● Ideas for small projects ● Wikidata vocabulary ● Where to find things ● Awesome tools
  • 5. Elements of a Wikidata statement
  • 6. Example: New York City (Q60)
  • 7. Items and properties ● Each item and property has its own page ● Items – Represent subjects: Douglas Adams, Challenger disaster – Have identifiers like Q42, Q921090 – 12,906,291 items ● Properties – Represent attribute names: occupation, has cause – Have identifiers like P106, P828 – 1,329 properties
  • 8. Statements and claims ● Claims – Claims are “triplets” ● Formally: subject, predicate, object ● In Wikidata: item, property, value ● Example: Douglas Adams, occupation, author ● Statements – A claim is only part of a statement – Statements also include: ● References ● Ranks
  • 9. Qualifiers, ranks, references ●Qualifiers – Qualifiers are properties used on claims rather than items – “Yonkers population 12,733 at time (P585) 1860” ●Ranks – Preferred, normal, deprecated – Useful to mark outdated claims ●References – Source of claim; provenance – “... stated in (P248) 1860 United States Census”
  • 10. More on Wikidata vocabulary https://www.wikidata.org/wiki/Wikidata:Glossary
  • 11. Finding Wikidata items Wikipedia articles have a Wikidata item link in the left navigation panel.
  • 13. Finding Wikidata items Wikidata search is quick and effective. Instant search suggests items that have labels or aliases matching your keyword.
  • 15. Search by alias: “flu” -> influenza
  • 16. Finding properties ● Is there a property for “number of windows”? ● What was the ID of that property, again? ● Search – In main site search box, prefix search term with “P:” – “P:number of”, “P:occupation” – Instant search doesn't work for properties, only items ● Browse – https://www.wikidata.org/wiki/Wikidata:List_of_properties ^ bookmark this!
  • 18. Walking through edits for: Yonkers, New York https://www.wikidata.org/wiki/Q128114
  • 19. Yonkers TODO https://www.wikidata.org/wiki/Q128114 ● population (P1082) claims for historical table in https://en.wikipedia.org/wiki/Yonkers,_New_York#Demographics ● Include references! “1860 United States Census”, etc. ● To add to item from inbofox: – head of government (P6), office held by head of government (P1313) – date of foundation or creation (P571) – ZIP code (P281)
  • 20. Area? Population density? ● Properties with units (km^2, people/km^2, $) are not yet possible ● “Units” datatype in development ● https://phabricator.wikimedia.org/T65722
  • 21. Tools – Querying: Autolist, by Magnus Manske ● http://tools.wmflabs.org/autolist/autolist1.html – Batch editing: Widar, by Magnus Manske ● https://tools.wmflabs.org/autolist/ – Software framework: Wikidata Toolkit, by Markus Kroetzsch et al. ● https://www.mediawiki.org/wiki/Wikidata_Toolkit ● https://github.com/Wikidata/Wikidata-Toolkit
  • 22. Querying in Wikidata List of politicians who died of a heart attack Pseudo-query: occupation: politician AND cause of death: heart attack occupation: P106 politician: Q82955 cause of death: P509 heart attack: Q12152 Wikidata query in Autolist: claim[106:82955] AND claim[509:12152]
  • 24. Classification on Wikidata ● Taxonomy of knowledge ● Enables powerful inference, novel applications ● Interesting philosophical, design, and engineering issues
  • 26. Tree of Porphyry User:VoiceOfTheCommons, CC-BY-SA 3.0
  • 27. Classes and instances ● Plato is a human is a animal ● Plato instance of human subclass of animal ● Instance: concrete object, individual ● Class: abstract object
  • 28. Classification on Wikidata ● instance of (P31) – rdf:type in RDF and OWL – 11,930,243 usages – Most popular Wikidata property ● subclass of (P279) – “all instances of A are also instances of B” – rdfs:subClassOf in RDF and OWL – 170,571 usages
  • 29. Examples ● USS Nimitz instance of Nimitz-class aircraft carrier Nimitz-class aircraft carrier subclass of aircraft carrier ● 2012 Cannes Film Festival instance of Cannes Film Festival Cannes Film Festival subclass of film festival ● an individual charm quark instance of charm quark charm quark subclass of quark ^ Many “leaf nodes” in Wikidata's taxonomic hierarchy are not instances. (There are no items about individual quarks on Wikidata!) https://www.wikidata.org/wiki/Help:Basic_membership_properties
  • 30. Bad smells Item has many instance of or subclass of claims Items typically satisfy a huge number of instance of claims: ● Fido instance of dog ● Fido instance of English Pointer ● Fido instance of faithful animal ● … Solution: use one class for instance of, put other class knowledge into normal properties ● Fido instance of dog ● Fido breed: English Pointer ● Fido known for: faithfulness ● ...
  • 31. Bad smells subclass of claim that is nonsensical when interpreted as “All instances of A are also instances of B” Example: dog subclass of pet But not all dogs are pets! feral dog subclass of dog true feral dog subclass of pet false :. dog subclass of pet false Solution: put “pet” knowledge about dogs into claim that does not apply to all instances of dog. E.g. “dog has role pet”. (Has role would not be transitive. Also needed: some/all quantifier.)
  • 32. Classification on Wikidata ● Last but not least: part of (P361) – Third basic membership property – Top-level “part-whole” relation ● Instance of, subclass of and part of are all transitive ● Transitive relation: A subclass of B B subclass of C :. A subclass of C https://www.wikidata.org/wiki/Help:Basic_membership_properties
  • 33. Ideas for small projects ● Add data about towns and cities – population (P1082) – head of government (P6) ● Add medical knowledge about historical figures – medical condition (P1050) – cause of death (P509) – manner of death (P1196) ● Add cultural knowledge about works of art – instance of (P31) – creator (P170) – material used (P186) – collection (P195)