SlideShare a Scribd company logo
Converting WHO’s Global
Health Observatory Data to
          RDF

   Amrapali Zaveri, PhD student

         August	
  27,	
  2012


                                  1
Outline
•   Background
•   What is the RDF Data Cube Vocabulary?
•   Semi-automated approach
•   OntoWiki's CSVImport plug-in
•   RDFized GHO data
•   Limitations and Future Work




                                            2
Background
• Biomedical statistical data
     • Published as Excel sheets
• Advantage
     • Readable by humans
• Disadvantages
     • Cannot be queried efficiently
     • Difficult to integrate with other data (in different formats)
• Our approach
     • Converting data into a single data model - RDF
     • Using the RDF Data Cube Vocabulary*
        • designed particularly to represent multidimensional
          statistical data using RDF.

 *http://www.w3.org/TR/vocab-data-cube/


                                                                  3
What is the RDF Data Cube
Vocabulary?




                            4
What is the RDF Data Cube
Vocabulary?

 • Dimensions
 • Attributes
 • Measures
 • Observations




                            5
Semi-automated approach
• Transforming CSV to RDF in a fully automated way is not
  feasible.
        • Dimensions may often be encoded in heading or
           label of a sheet
• Our semi-automatic approach:
        • As a plug-in in OntoWiki#
            • a semantic collaboration platform developed by
              the AKSW research group.
        • A CSV file is converted into RDF using the RDF
           Data Cube Vocabulary: http://aksw.org/Projects/
           Stats2RDF



 # Sören Auer, Sebastian Tramp (geb. Dietzold), Jens Lehmann, and Thomas Riechert:
 OntoWiki: A Tool for Social Semantic Collaboration In: Proceedings of the Workshop on
 Social and Collaborative Construction of Structured Knowledge CKC 2007 at the 16th
 International World Wide Web Conference WWW2007 Banff, Canada, May 8, 2007              6
1. Create Knowledge Base




                           7
2. Import a CSV file




                       8
3. Define dimensions




                       9
4. Define data range




                       10
5. Save template, extract triples




                                    11
6. Re-use template for similar files




                                       12
7. View resources




                    13
RDFized GHO data
gho:Country rdfs:subClassOf qb:DimensionProperty;
          rdf:type rdfs:Class;
          rdfs:label "Country" .

gho:Disease rdfs:subClassOf qb:DimensionProperty;
          rdf:type rdfs:Class;
          rdfs:label "Disease" .

gho: Afghanistan rdf:type ex:Country;
             rdfs:label "Afghanistan" .

gho:Tuberculosis rdf:type ex:Disease;
             rdfs:label "Tuberculosis" .


gho:c1-r6 rdf:type qb:Observation;
         rdf:value "127"^^xsd:integer;
         qb:dimension gho:Afghanistan;
         qb:dimension gho:Tuberculosis .
                                                    14
RDFized GHO data

• Available at http://gho.aksw.org
• 50 datasets
• ~ 8 million triples
• Paper published at SWJ Call for Dataset descriptions:
http://www.semantic-web-journal.net/content/publishing-and-
interlinking-global-health-observatory-dataset




                                                              15
Limitations and Future Work
 • Conversion

 • Coherence

 • Temporal Comparability

 • Exploring GHO




                              16
Thank You!

    Questions?

http://aksw.org/AmrapaliZaveri

zaveri@informatik.uni-leipzig.de



                                   17

More Related Content

PDF
RDF Seminar Presentation
PPTX
Multilingualism ifla 2014 08
PDF
Managing RDF data with graph databases
PPTX
Signposting for Repositories
PDF
PPT
Semantic Pipes and Semantic Mashups
PPTX
Cogapp Open Studios 2012 - Adventures with Linked Data
PPTX
Semantic web for ontology chapter4 bynk
RDF Seminar Presentation
Multilingualism ifla 2014 08
Managing RDF data with graph databases
Signposting for Repositories
Semantic Pipes and Semantic Mashups
Cogapp Open Studios 2012 - Adventures with Linked Data
Semantic web for ontology chapter4 bynk

What's hot (20)

PPTX
Discovering Scholarly Orphans Using ORCID
PPTX
Deriving an Emergent Relational Schema from RDF Data
PPTX
ALEC (A List of Everything Cool)
PPTX
Organising principles
PDF
4-Managing CrossRef DOIs
PDF
RDF4U: RDF Graph Visualization by Interpreting Linked Data as Knowledge
PPTX
Tutorial: Describing Datasets with the Health Care and Life Sciences Communit...
PPTX
Supporting Dataset Descriptions in the Life Sciences
PDF
Freedom for bibliographic references: OpenCitations arise
PPT
Lightning Talk, Ransom: Making the Case for Interactive Data Transformation T...
PDF
Semantic Web Technology
PPTX
1 bioline & t space or2013 final
PPTX
Getting the best of Linked Data and Property Graphs: rdf2neo and the KnetMine...
PPTX
The HCLS Community Profile: Describing Datasets, Versions, and Distributions
PPTX
Towards a Unified PageRank for DBpedia and Wikidata
PDF
Rapid Digitization of Latin American Ephemera with Hydra
PDF
Linked Data, Ontologies and Inference
PPTX
An Identifier Scheme for the Digitising Scotland Project
PDF
2010 06 rdf_next
PPTX
Linked data 101: Getting Caught in the Semantic Web
Discovering Scholarly Orphans Using ORCID
Deriving an Emergent Relational Schema from RDF Data
ALEC (A List of Everything Cool)
Organising principles
4-Managing CrossRef DOIs
RDF4U: RDF Graph Visualization by Interpreting Linked Data as Knowledge
Tutorial: Describing Datasets with the Health Care and Life Sciences Communit...
Supporting Dataset Descriptions in the Life Sciences
Freedom for bibliographic references: OpenCitations arise
Lightning Talk, Ransom: Making the Case for Interactive Data Transformation T...
Semantic Web Technology
1 bioline & t space or2013 final
Getting the best of Linked Data and Property Graphs: rdf2neo and the KnetMine...
The HCLS Community Profile: Describing Datasets, Versions, and Distributions
Towards a Unified PageRank for DBpedia and Wikidata
Rapid Digitization of Latin American Ephemera with Hydra
Linked Data, Ontologies and Inference
An Identifier Scheme for the Digitising Scotland Project
2010 06 rdf_next
Linked data 101: Getting Caught in the Semantic Web
Ad

Similar to Converting GHO to RDF (20)

PDF
The web of interlinked data and knowledge stripped
PDF
WEBINAR: The Yosemite Project: An RDF Roadmap for Healthcare Information Inte...
PPTX
Semantic Web use cases in outcomes research
ODP
Data Integration And Visualization
PPTX
Linked Data efforts for data standards in biopharma and healthcare
PDF
What is New in W3C land?
PDF
Linked Data
PPTX
NISO/DCMI September 25 Webinar: Implementing Linked Data in Developing Countr...
PDF
SmartData Webinar Slides: The Yosemite Project for Healthcare Information int...
PPTX
‘Facilitating User Engagement by Enriching Library Data using Semantic Techno...
PDF
W4 4 marc-alexandre-nolin-v2
PDF
LOD技術解説
PPTX
Usage of Linked Data: Introduction and Application Scenarios
PDF
ISWC GoodRelations Tutorial Part 2
PDF
GoodRelations Tutorial Part 2
PDF
Producing, publishing and consuming linked data - CSHALS 2013
PPTX
First Steps in Semantic Data Modelling and Search & Analytics in the Cloud
PPTX
Timbuctoo 2 EASY
PPTX
RDF-Gen: Generating RDF from streaming and archival data
PPTX
IASSIST 2012 - DDI-RDF - Trouble with Triples
The web of interlinked data and knowledge stripped
WEBINAR: The Yosemite Project: An RDF Roadmap for Healthcare Information Inte...
Semantic Web use cases in outcomes research
Data Integration And Visualization
Linked Data efforts for data standards in biopharma and healthcare
What is New in W3C land?
Linked Data
NISO/DCMI September 25 Webinar: Implementing Linked Data in Developing Countr...
SmartData Webinar Slides: The Yosemite Project for Healthcare Information int...
‘Facilitating User Engagement by Enriching Library Data using Semantic Techno...
W4 4 marc-alexandre-nolin-v2
LOD技術解説
Usage of Linked Data: Introduction and Application Scenarios
ISWC GoodRelations Tutorial Part 2
GoodRelations Tutorial Part 2
Producing, publishing and consuming linked data - CSHALS 2013
First Steps in Semantic Data Modelling and Search & Analytics in the Cloud
Timbuctoo 2 EASY
RDF-Gen: Generating RDF from streaming and archival data
IASSIST 2012 - DDI-RDF - Trouble with Triples
Ad

More from Amrapali Zaveri, PhD (16)

PDF
Data Quality and the FAIR principles
PDF
Workshop on Data Quality Management in Wikidata
PDF
ESOF Panel 2018
PDF
CrowdED: Guideline for optimal Crowdsourcing Experimental Design
PDF
MetaCrowd: Crowdsourcing Gene Expression Metadata Quality Assessment
PDF
smartAPI: Towards a more intelligent network of Web APIs
PDF
Introduction to Bio SPARQL
PDF
Crowdsourcing Linked Data Quality Assessment
PDF
Linked Data Quality Assessment: A Survey
PDF
Amrapali Zaveri Defense
PDF
LDQ 2014 DQ Methodology
PDF
TripleCheckMate
PDF
Towards Biomedical Data Integration for Analyzing the Evolution of Cognition
PDF
User-driven Quality Evaluation of DBpedia
KEY
ReDD-Observatory
Data Quality and the FAIR principles
Workshop on Data Quality Management in Wikidata
ESOF Panel 2018
CrowdED: Guideline for optimal Crowdsourcing Experimental Design
MetaCrowd: Crowdsourcing Gene Expression Metadata Quality Assessment
smartAPI: Towards a more intelligent network of Web APIs
Introduction to Bio SPARQL
Crowdsourcing Linked Data Quality Assessment
Linked Data Quality Assessment: A Survey
Amrapali Zaveri Defense
LDQ 2014 DQ Methodology
TripleCheckMate
Towards Biomedical Data Integration for Analyzing the Evolution of Cognition
User-driven Quality Evaluation of DBpedia
ReDD-Observatory

Recently uploaded (20)

PDF
Empathic Computing: Creating Shared Understanding
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Encapsulation theory and applications.pdf
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Unlocking AI with Model Context Protocol (MCP)
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PPTX
Spectroscopy.pptx food analysis technology
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Review of recent advances in non-invasive hemoglobin estimation
PPTX
Big Data Technologies - Introduction.pptx
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Electronic commerce courselecture one. Pdf
PDF
Machine learning based COVID-19 study performance prediction
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Empathic Computing: Creating Shared Understanding
Per capita expenditure prediction using model stacking based on satellite ima...
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Encapsulation theory and applications.pdf
MYSQL Presentation for SQL database connectivity
Unlocking AI with Model Context Protocol (MCP)
Digital-Transformation-Roadmap-for-Companies.pptx
Spectroscopy.pptx food analysis technology
Diabetes mellitus diagnosis method based random forest with bat algorithm
Dropbox Q2 2025 Financial Results & Investor Presentation
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Review of recent advances in non-invasive hemoglobin estimation
Big Data Technologies - Introduction.pptx
20250228 LYD VKU AI Blended-Learning.pptx
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Electronic commerce courselecture one. Pdf
Machine learning based COVID-19 study performance prediction
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows

Converting GHO to RDF

  • 1. Converting WHO’s Global Health Observatory Data to RDF Amrapali Zaveri, PhD student August  27,  2012 1
  • 2. Outline • Background • What is the RDF Data Cube Vocabulary? • Semi-automated approach • OntoWiki's CSVImport plug-in • RDFized GHO data • Limitations and Future Work 2
  • 3. Background • Biomedical statistical data • Published as Excel sheets • Advantage • Readable by humans • Disadvantages • Cannot be queried efficiently • Difficult to integrate with other data (in different formats) • Our approach • Converting data into a single data model - RDF • Using the RDF Data Cube Vocabulary* • designed particularly to represent multidimensional statistical data using RDF. *http://www.w3.org/TR/vocab-data-cube/ 3
  • 4. What is the RDF Data Cube Vocabulary? 4
  • 5. What is the RDF Data Cube Vocabulary? • Dimensions • Attributes • Measures • Observations 5
  • 6. Semi-automated approach • Transforming CSV to RDF in a fully automated way is not feasible. • Dimensions may often be encoded in heading or label of a sheet • Our semi-automatic approach: • As a plug-in in OntoWiki# • a semantic collaboration platform developed by the AKSW research group. • A CSV file is converted into RDF using the RDF Data Cube Vocabulary: http://aksw.org/Projects/ Stats2RDF # Sören Auer, Sebastian Tramp (geb. Dietzold), Jens Lehmann, and Thomas Riechert: OntoWiki: A Tool for Social Semantic Collaboration In: Proceedings of the Workshop on Social and Collaborative Construction of Structured Knowledge CKC 2007 at the 16th International World Wide Web Conference WWW2007 Banff, Canada, May 8, 2007 6
  • 8. 2. Import a CSV file 8
  • 10. 4. Define data range 10
  • 11. 5. Save template, extract triples 11
  • 12. 6. Re-use template for similar files 12
  • 14. RDFized GHO data gho:Country rdfs:subClassOf qb:DimensionProperty; rdf:type rdfs:Class; rdfs:label "Country" . gho:Disease rdfs:subClassOf qb:DimensionProperty; rdf:type rdfs:Class; rdfs:label "Disease" . gho: Afghanistan rdf:type ex:Country; rdfs:label "Afghanistan" . gho:Tuberculosis rdf:type ex:Disease; rdfs:label "Tuberculosis" . gho:c1-r6 rdf:type qb:Observation; rdf:value "127"^^xsd:integer; qb:dimension gho:Afghanistan; qb:dimension gho:Tuberculosis . 14
  • 15. RDFized GHO data • Available at http://gho.aksw.org • 50 datasets • ~ 8 million triples • Paper published at SWJ Call for Dataset descriptions: http://www.semantic-web-journal.net/content/publishing-and- interlinking-global-health-observatory-dataset 15
  • 16. Limitations and Future Work • Conversion • Coherence • Temporal Comparability • Exploring GHO 16
  • 17. Thank You! Questions? http://aksw.org/AmrapaliZaveri zaveri@informatik.uni-leipzig.de 17