SlideShare a Scribd company logo
Methodological Guidelines for
   Publishing Linked Data
            g
                Boris Villazón-Terrazas, Oscar Corcho
      Facultad de Informática, Universidad Politécnica de Madrid
                              ,
    Campus de Montegancedo sn, 28660 Boadilla del Monte, Madrid
                       http://www.oeg-upm.net
                    {bvillazon,ocorcho}@fi.upm.es
             Phone: 34 91 3366605 Fax: 34 91 3524819
                     34.91.3366605,       34.91.3524819
      Slides available at: http://www.slideshare.net/boricles/


Acknowledgements: Asunción Gómez-Pérez, Luis M. Vilches,
Victor Saquicela, Al
Vi t S     i l Alexander d L ó and many others th t we
                     d de León,   d        th   that
may have omitted.
WorkdistributedunderthelicenseCreativeCommonsAttribution-
Noncommercial-Share Alike 3.0
Main References

Wood, David (Ed) Linking Government Data - 2011

Methodological Guidelines for Publishing Government Linked Data

Boris Villazón-Terrazas, Luis M. Vilches, Oscar Corcho, Asunción Gómez-Pérez




Best Practices for Publishing Linked Data

W3C Editor’s Draft – Government Linked Data Working Group

Michael Hausenblas, Bernadette Hyland, Boris Villazón-Terrazas

https://dvcs.w3.org/hg/gld/raw-file/bcb72f87b5cc/bp/index.html



Cookbook for Open Government Linked Data

W3C Editor’s Draft – Government Linked Data Working Group

Bernadette Hyland, Boris Villazón-Terrazas, Sarven Capadisli

http://www.w3.org/2011/gld/wiki/Linked_Data_Cookbook
http://www w3 org/2011/gld/wiki/Linked Data Cookbook
Guidelines for Publishing Linked Data

• The process of publishing Linked Data has an
  iterative incremental life cycle model.



• Based on our experience in the production of Linked
  Data in several Governmental Contexts, have been
  applied in real case scenarios.




                           3
4
5
Specification
• Identification and analysis of the data
  sources

• URI design

• Definition of the license




                        6
Specification
            Identification and analysis of the data sources

We have to distinguish

• O
  Open and publish d t th t government agencies h
           d bli h data that         t      i have
  not yet opened up and published
   • Task that may require contacting to specific government data
     owners to get access to their legacy data




• Reuse and leverage on data already opened up and
  p
  published by g
             y government agencies
                           g
   • Task to look for these data in public government catalogs
      • Open Government Data
      • datacatalogs org
        datacatalogs.org
      • Open Government Catalog
                                7
Specification
           Identification and analysis of the data sources

After we have identified and selected the government data
   sources

• Search and compile all the available data and
  documentation about those resources

• Identify the schema of those resources including
  conceptual components and th i relationships
          t l           t    d their l ti   hi

• Identify the items in the domain i e things whose
                            domain, i.e.,
  properties and relations are described in the data
  sources


                           8
Specification
                    GeoLinkedData – Identification of the data sources

                                                      Agreement with the IGN
                IGN
National Geographic Institute of Spain

        Oracle & MySQL




                                                       Data
                                                       D t sources available
                                                                        il bl
                                                      in a public data catalog
         INE
National Statistic Institute of Spain




                                         9
Specification
           GeoLinkedData – Analysis of the data sources




                   Year




Province                             Industry Production Index




                          10
Specification
                                                    URI Design

• Use meaningful URIs, instead of opaque URIs, when
  possible

• Separate TBox (ontology model) from ABox
  (instances) URIs
              URIs.
   • Base URI
     http://data.gov.bo/
     http://health.data.gov.bo/
   • TBox URIs
     http://data.gov.bo/ontology/{class|property}
        p        g            gy {     |p p y}
   • ABox URIs
     http://data.gov.bo/resource/
     http://data.gov.bo/resource/province/Tiraque
     http://data gov bo/resource/province/Tiraque


                                11
Specification
                                   GeoLinkedData - URI design

• Base URI
  http://linkeddata.es/
  http://geo.linkeddata.es/

• TBox URIs
  http://geo.linkeddata.es/ontology/{concept|property}
  http://geo.linkeddata.es/ontology/Provincia
  http://geo linkeddata es/ontology/Provincia

• ABox URIs
  http://geo.linkeddata.es/resource/{r. type}/{r. name}
  http://geo.linkeddata.es/resource/Provincia/Madrid


                              12
Specification
                                      Definition of the license

• Several possibilities

   • The UK Open Government License

   • Open Database License

   • Public Domain Dedication and License

   • Open Data Commons Attribution License

   • The C
         Creative C
                  Commons Licenses


It is also possible to reuse and apply an existing license
           p                      pp y           g
    of the government data sources.
                              13
Specification
                                    GeoLinkedData - Definition of the license

• Reusing the original license of the government data
  sources. IGN and INE data sources have their own
  license, similar t Att ib ti Sh
  li        i il to Attribution-Share Alik 2 5 G
                                       Alike 2.5 Generic
                                                      i
  License




  http://creativecommons.org/licenses/by-sa/2.5/


                                                   14
15
Modelling
                                                                            Ontology




•   An ontology is an engineering artifact, which provides:
     •   A set of terms
     •   A set of explicit assumptions regarding the intended meaning of the terms.
           • Almost always including concepts and their classification
           • Almost always including properties between concepts




•   Shared understanding of a domain of interest

•   Ontologies expressed in OWL or RDF(S), both based on RDF




                                          16
Modelling
                               Reuse available vocabularies



Search f suitable
S    h for it bl
  vocabularies



                                               Linked Open Vocabularies




    are there       Yes                  Build the vocabulary by
     suitable                               reusing available
                                                   g
  vocabularies?                               vocabularies


           No



       …

                          17
Modelling
                 Reuse available non-ontological resources

                                               Highly reliable Web Sites



   Search f suitable
   S     h for it bl                           Domain related
                                               Domain-related sites
non-ontological resources

                                               Government Catalogs




        are there           Yes        Build the vocabulary by
         suitable                      transforming available
                                       t     f    i      il bl
       resources?                              resources


               No




Build the vocabulary from
         scratch



                                  18
Modelling
                                                                                                              GeoLinkedData
                                                                      WGS84 Geo
                                                                   Positioning: an RDF
                                                                       vocabulary                                  scv:Dimension
                                                                                                                      scv:Item
                                                                                                                    scv:Dataset

               hydrographical
             phenomena (rivers,
                 lakes, etc.)




                                                                                                                     Vocabulary for
                                                                                                                     instants, intervals,
                                                                                                                     durations, etc.




                                                                                         Names and
                                                                                         international code
                                  Ontology for OGC                                       systems for
                                  Geography Markup                                       territories and
                                  Language                                               groups




Classes                     33          33
Object Properties           44          44
Data Properties            318         318
                                                     http://neon-toolkit.org/


                                                                   19
Modelling
     GeoLinkedData




20
21
Generation
• Transformation

• Data cleansing

• Linking




                   22
Generation
                                                Transformation

• Take the data sources selected in the specification
  activity and transform them to RDF according to the
  vocabulary created i th modelling activity
       b l         t d in the   d lli   ti it

• Some tools
   • CSV and spreadsheets
      • RDF extension of Google Refine, XLWrap, RDF123, NOR2O
   • RDB
      • D2R Server, ODEMapster, W3C RDB2RDF WG – R2RML
   • XML
      • GRDDL, ReDeFer




                               23
Generation
                                  GeoLinkedData - Transformation



                             NOR2O

       INE




                          ODEMapster


      IGN




             Geospatial       Geometry2RDF
              column


IGN




                                       24
Generation
                                   GeoLinkedData - Transformation
Industry Production Index   Year




Province




                                    NOR2O




                                     25
Generation
                                                      GeoLinkedData - Transformation
•   R2O is an e te s b e, fully dec a at e language to desc be
          s a extensible, u y declarative a guage describe
    mappings between relational database schemas and ontologies.
•   The ODEMapster processor generates RDF instances from
    relational instances based on the mapping description
                                          pp g       p
    expressed in the R2O document




    www.oeg-upm.net/index.php/en/downloads/9-r2o-odempaster
                                                              26
Generation
                       GeoLinkedData - Transformation
• Creation of the R2O Mappings




                         27
Generation
GeoLinkedData - Transformation


            Excerpt of the R2O document




  28
Generation
                                                      GeoLinkedData - Transformation

• Tool for generating RDF from geometrical information

• The geometry could be available in GML or WKT

• The RDF generated follows our Geometry Model




  http://www.oeg-upm.net/index.php/en/downloads/151-geometry2rdf

                                                           29
Generation
 GeoLinkedData - Transformation


                   Oracle STO UTIL package




SELECT TO_CHAR(SDO_UTIL.TO_GML311GEOMETRY(geometry))
          AS Gml311Geometry
FROM "BCN200"."BCN200_0301L_RIO" c
WHERE c.Etiqueta='Arroyo'




     30
Generation
GeoLinkedData - Transformation
Generation
                                                      Data Cleansing

• To find possible errors, identified by Hogan et al.
   • http-level issues, such as accessibility and derefencability,
     e.g.,
     e g HTTP URIs ret rn 40 /50 errors
                        return 40x/50x
   • reasoning issues such as namespace without vocabulary,
     e.g., rss:item term invented
   • malformed/incompatible datatypes, e.g., “true” as xsd:int


• To fix the identified errors




                                 32
Generation
                            GeoLinkedData – Data Cleansing

• Errors
   • Some resources, with the same name, were mixed. For
     example,
     e ample Granada municipality belongs to Granada
                       m nicipalit
     province, and La Granada municipality belongs to Barcelona
     Province.

   • Autonomous communities that only have one province, e.g.,
     Murcia Region, missed some municipalities, but their
     corresponding provinces, e g Murcia Province have the
                   provinces e.g.,       Province,
     correct number of municipalities.

   • S
     Some hydrographical resources missed some parts of their
                                                      f
     geometrical information.




                               33
Generation
                                                                                                          Linking


                     Identify suitable data sets                                       http://ckan.net
                         as linking targets




                       Discover relationships
                        between data items
LIMES                                              Silk Framework
http://aksw.org/Projects/limes                     http://www4.wiwiss.fu-berlin.de/bizer/silk/




                     Validate the relationships
                            discovered              sameAs Validator
                                                    http://oegdev.dia.fi.upm.es:8080/sameAs/




                                                                34
Generation
                                                            GeoLinkedData - Linking


                   GeoLinked
                     Data




                               DBPedia                     GeoNames




        ….                                  ….                               ….

http://dbpedia.org/re              http://geo.linkeddata              http://sws.geoname
   source/Madrid                       .es/.../Madrid                    s.org/6355233/


        ….                                 ….                                 ….

                                                35
Generation
                                                GeoLinkedData - Linking




http://oegdev.dia.fi.upm.es:8080/sameAs/
http://oegdev dia fi upm es:8080/sameAs/




                                           36
37
Publication
• Dataset publication

• Metadata publication

• Dataset discovery




                        38
Publication
                                           Dataset Publication

• Tools for storing RDF
   • Virtuoso Universal Server, Jena, Sesame, 4Store, YARS,
     OWLIM


• SPARQL endpoint and Linked Data frontend
   • Pubby, Talis Platform, Fuseki




                               39
Publication
                                  Metadata Publication

• VoID allows to express metadata about RDF
  datasets




• Open Provenance Model




                          40
Publication
                                                                                   Dataset discovery

• Register the dataset into CKAN Registry

• Generate sitemap files for your dataset, by using
  sitemap4rdf

• Submit the sitemap location to Google and Sindice




  http://www.w3.org/wiki/TaskForces/CommunityProjects/LinkingOpenData/DataSets/CKANmetainformation


                                                       41
Publication
                                               GeoLinkedData – Dataset publication




                               HTML                  Linked Data        SPARQL




     Including Provenance                   Pubby
            Support

http://www4.wiwiss.fu-berlin.de/pubby/   Pubby 0.3




                                                                   Virtuoso 6.1.0
                                                                            610



                                                            42
Publication
GeoLinkedData – Dataset discovery




    43
44
Exploitation




Streaming resources
     45
Exploitation
                                                                      GeoLinkedData

                      http://oegdev.dia.fi.upm.es/projects/map4rdf/


map4rdf:
   • Google maps viewer of RDF resources
       • Resources with spatial information
   • Extensible with google plugins
   • Used in other applications like Aemet Goodrelations
                                     Aemet,




                               map4rdf                 SPARQL




                                                     Triplestore
                                                46
DEMO
http://geo.linkeddata.es/browser




              47
Provinces




48
Capital of Province




49
Provinces – Industry Production Index




 50
Beaches




51
Methodological Guidelines for Publishing Linked Data
Methodological Guidelines for
   Publishing Linked Data
            g
                Boris Villazón-Terrazas, Oscar Corcho
      Facultad de Informática, Universidad Politécnica de Madrid
                              ,
    Campus de Montegancedo sn, 28660 Boadilla del Monte, Madrid
                       http://www.oeg-upm.net
                    {bvillazon,ocorcho}@fi.upm.es
             Phone: 34 91 3366605 Fax: 34 91 3524819
                     34.91.3366605,       34.91.3524819
      Slides available at: http://www.slideshare.net/boricles/


Acknowledgements: Asunción Gómez-Pérez, Luis M. Vilches,
Victor Saquicela, Al
Vi t S     i l Alexander d L ó and many others th t we
                     d de León,   d        th   that
may have omitted.
WorkdistributedunderthelicenseCreativeCommonsAttribution-
Noncommercial-Share Alike 3.0

More Related Content

PDF
Linked Data at the BNE. Elena Escolano Rodríguez, Daniel Vila Suero
PPTX
RDA: Resource Description and Access
DOC
Metadata Quality Evaluation: UTEP Library's Casasola Photograph Collection
PPS
Biblioteca Nacional de España and Linked Open Data. A view from the library s...
PDF
Statistical Linked Data
PPTX
RDA Intro - AACR2 / MARC> RDA / FRBR / Semantic Web
PPTX
Linked lists
PDF
Metadata makes the world go round 2
Linked Data at the BNE. Elena Escolano Rodríguez, Daniel Vila Suero
RDA: Resource Description and Access
Metadata Quality Evaluation: UTEP Library's Casasola Photograph Collection
Biblioteca Nacional de España and Linked Open Data. A view from the library s...
Statistical Linked Data
RDA Intro - AACR2 / MARC> RDA / FRBR / Semantic Web
Linked lists
Metadata makes the world go round 2

What's hot (9)

PPTX
Secrets of the catalog remix the remix
ODP
A brief history of MARC
PDF
Datalift lod2-paris-24032011
PPTX
All About Access Points in RDA
PPT
Resource Description & Access (RDA)
PDF
Revealing Entities From Texts With a Hybrid Approach
PPT
The tools of our trade: AACR2/RDA and MARC
PPTX
GDG Meets U event - Big data & Wikidata - no lies codelab
PDF
Marc formats : Facilitating sharing of Catalogue Records
Secrets of the catalog remix the remix
A brief history of MARC
Datalift lod2-paris-24032011
All About Access Points in RDA
Resource Description & Access (RDA)
Revealing Entities From Texts With a Hybrid Approach
The tools of our trade: AACR2/RDA and MARC
GDG Meets U event - Big data & Wikidata - no lies codelab
Marc formats : Facilitating sharing of Catalogue Records
Ad

Viewers also liked (11)

PDF
SEEMP - Semantic Aspects and Interoperability
PDF
Linguistic resources enhanced with geospatial Information
PDF
Methodological Guidelines for Publishing Linked Data
PDF
Geolinkeddata 07042011 1
PDF
Yet another SPARQL 1.1 brief introduction
PDF
Towards a Commons RDF Java library
PDF
A Method for Reusing and Re-engineering Non-ontological Resources for Buildin...
PDF
Sitemap4rdf(v2 boris)
PDF
Ecuadorian Geospatial Linked Data
PDF
iSOCO - Research Lab Brief Introduction
PDF
Data Shapes and Data Transformations
SEEMP - Semantic Aspects and Interoperability
Linguistic resources enhanced with geospatial Information
Methodological Guidelines for Publishing Linked Data
Geolinkeddata 07042011 1
Yet another SPARQL 1.1 brief introduction
Towards a Commons RDF Java library
A Method for Reusing and Re-engineering Non-ontological Resources for Buildin...
Sitemap4rdf(v2 boris)
Ecuadorian Geospatial Linked Data
iSOCO - Research Lab Brief Introduction
Data Shapes and Data Transformations
Ad

Similar to Methodological Guidelines for Publishing Linked Data (20)

PPTX
PhD Proposal Defense - Prateek Jain
PDF
Linked Data
PPTX
Linked Open Data Alignment and Enrichment Using Bootstrapping Based Techniques
PDF
What is New in W3C land?
PPTX
reegle - a new key portal for open energy data
PPTX
Prateek Jain dissertation defense, Kno.e.sis, Wright State University
PPTX
Prateek Jain's Dissertation Defense - Linked Open Data Alignment and Querying
PDF
Getting Started with Knowledge Graphs
PDF
Open Bibliographic Data and E-LIS
PDF
Tsakonas-Robbio·Open Bibliographic Data E-Lis
PDF
Leslie Johnston: Library Big Data Repository Services, Open Repositories 2012
PDF
First they have to find it: Getting Open Government Data Discovered and Used
ZIP
Linked Open Data in Libraries, Archives & Museums
PPTX
IASSIST 2012 - DDI-RDF - Trouble with Triples
PDF
Knowledge Organization System (KOS) for biodiversity information resources, G...
PDF
Linked Data
PDF
20120718 linkedopendataandnextgenerationsciencemcguinnessesip final
PDF
IASSIT Kansa Presentation
PDF
DBpedia Mappings Wiki, SMWCon Fall 2013, Berlin
PDF
A Clean Slate?
PhD Proposal Defense - Prateek Jain
Linked Data
Linked Open Data Alignment and Enrichment Using Bootstrapping Based Techniques
What is New in W3C land?
reegle - a new key portal for open energy data
Prateek Jain dissertation defense, Kno.e.sis, Wright State University
Prateek Jain's Dissertation Defense - Linked Open Data Alignment and Querying
Getting Started with Knowledge Graphs
Open Bibliographic Data and E-LIS
Tsakonas-Robbio·Open Bibliographic Data E-Lis
Leslie Johnston: Library Big Data Repository Services, Open Repositories 2012
First they have to find it: Getting Open Government Data Discovered and Used
Linked Open Data in Libraries, Archives & Museums
IASSIST 2012 - DDI-RDF - Trouble with Triples
Knowledge Organization System (KOS) for biodiversity information resources, G...
Linked Data
20120718 linkedopendataandnextgenerationsciencemcguinnessesip final
IASSIT Kansa Presentation
DBpedia Mappings Wiki, SMWCon Fall 2013, Berlin
A Clean Slate?

More from Boris Villazón-Terrazas (12)

PDF
RDB2RDF, an overview of R2RML and Direct Mapping
PDF
Map4rdf - Faceted Browser for Geospatial Datasets
PDF
Publishing Linked Data from RDB
PDF
Linked Data Projects at OEG - Current Status
PDF
A Provenance-Aware Linked Data Application for Trip Management and Organization
PDF
Methodological Guidelines for Publishing Linked Data
PDF
Linked Data Research Projects at Ontology Engineering Group
PDF
Lightweight Semantic Annotation of Geospatial RESTful Services
PPTX
Geometry2rdf(v2 boris)
PDF
An Approach to Publish Spatial Data on the Web: The GeoLinked Data Use Case
PDF
Geo linked data lstd10(v2-boris)
PPTX
RDB2RDF, an overview of R2RML and Direct Mapping
Map4rdf - Faceted Browser for Geospatial Datasets
Publishing Linked Data from RDB
Linked Data Projects at OEG - Current Status
A Provenance-Aware Linked Data Application for Trip Management and Organization
Methodological Guidelines for Publishing Linked Data
Linked Data Research Projects at Ontology Engineering Group
Lightweight Semantic Annotation of Geospatial RESTful Services
Geometry2rdf(v2 boris)
An Approach to Publish Spatial Data on the Web: The GeoLinked Data Use Case
Geo linked data lstd10(v2-boris)

Recently uploaded (20)

PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Unlocking AI with Model Context Protocol (MCP)
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PPT
Teaching material agriculture food technology
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PPTX
Big Data Technologies - Introduction.pptx
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Encapsulation theory and applications.pdf
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Machine learning based COVID-19 study performance prediction
PDF
Approach and Philosophy of On baking technology
PPTX
sap open course for s4hana steps from ECC to s4
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
Dropbox Q2 2025 Financial Results & Investor Presentation
The Rise and Fall of 3GPP – Time for a Sabbatical?
Unlocking AI with Model Context Protocol (MCP)
Understanding_Digital_Forensics_Presentation.pptx
Mobile App Security Testing_ A Comprehensive Guide.pdf
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
Reach Out and Touch Someone: Haptics and Empathic Computing
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Teaching material agriculture food technology
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Big Data Technologies - Introduction.pptx
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Review of recent advances in non-invasive hemoglobin estimation
Encapsulation theory and applications.pdf
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Machine learning based COVID-19 study performance prediction
Approach and Philosophy of On baking technology
sap open course for s4hana steps from ECC to s4
Per capita expenditure prediction using model stacking based on satellite ima...

Methodological Guidelines for Publishing Linked Data

  • 1. Methodological Guidelines for Publishing Linked Data g Boris Villazón-Terrazas, Oscar Corcho Facultad de Informática, Universidad Politécnica de Madrid , Campus de Montegancedo sn, 28660 Boadilla del Monte, Madrid http://www.oeg-upm.net {bvillazon,ocorcho}@fi.upm.es Phone: 34 91 3366605 Fax: 34 91 3524819 34.91.3366605, 34.91.3524819 Slides available at: http://www.slideshare.net/boricles/ Acknowledgements: Asunción Gómez-Pérez, Luis M. Vilches, Victor Saquicela, Al Vi t S i l Alexander d L ó and many others th t we d de León, d th that may have omitted. WorkdistributedunderthelicenseCreativeCommonsAttribution- Noncommercial-Share Alike 3.0
  • 2. Main References Wood, David (Ed) Linking Government Data - 2011 Methodological Guidelines for Publishing Government Linked Data Boris Villazón-Terrazas, Luis M. Vilches, Oscar Corcho, Asunción Gómez-Pérez Best Practices for Publishing Linked Data W3C Editor’s Draft – Government Linked Data Working Group Michael Hausenblas, Bernadette Hyland, Boris Villazón-Terrazas https://dvcs.w3.org/hg/gld/raw-file/bcb72f87b5cc/bp/index.html Cookbook for Open Government Linked Data W3C Editor’s Draft – Government Linked Data Working Group Bernadette Hyland, Boris Villazón-Terrazas, Sarven Capadisli http://www.w3.org/2011/gld/wiki/Linked_Data_Cookbook http://www w3 org/2011/gld/wiki/Linked Data Cookbook
  • 3. Guidelines for Publishing Linked Data • The process of publishing Linked Data has an iterative incremental life cycle model. • Based on our experience in the production of Linked Data in several Governmental Contexts, have been applied in real case scenarios. 3
  • 4. 4
  • 5. 5
  • 6. Specification • Identification and analysis of the data sources • URI design • Definition of the license 6
  • 7. Specification Identification and analysis of the data sources We have to distinguish • O Open and publish d t th t government agencies h d bli h data that t i have not yet opened up and published • Task that may require contacting to specific government data owners to get access to their legacy data • Reuse and leverage on data already opened up and p published by g y government agencies g • Task to look for these data in public government catalogs • Open Government Data • datacatalogs org datacatalogs.org • Open Government Catalog 7
  • 8. Specification Identification and analysis of the data sources After we have identified and selected the government data sources • Search and compile all the available data and documentation about those resources • Identify the schema of those resources including conceptual components and th i relationships t l t d their l ti hi • Identify the items in the domain i e things whose domain, i.e., properties and relations are described in the data sources 8
  • 9. Specification GeoLinkedData – Identification of the data sources Agreement with the IGN IGN National Geographic Institute of Spain Oracle & MySQL Data D t sources available il bl in a public data catalog INE National Statistic Institute of Spain 9
  • 10. Specification GeoLinkedData – Analysis of the data sources Year Province Industry Production Index 10
  • 11. Specification URI Design • Use meaningful URIs, instead of opaque URIs, when possible • Separate TBox (ontology model) from ABox (instances) URIs URIs. • Base URI http://data.gov.bo/ http://health.data.gov.bo/ • TBox URIs http://data.gov.bo/ontology/{class|property} p g gy { |p p y} • ABox URIs http://data.gov.bo/resource/ http://data.gov.bo/resource/province/Tiraque http://data gov bo/resource/province/Tiraque 11
  • 12. Specification GeoLinkedData - URI design • Base URI http://linkeddata.es/ http://geo.linkeddata.es/ • TBox URIs http://geo.linkeddata.es/ontology/{concept|property} http://geo.linkeddata.es/ontology/Provincia http://geo linkeddata es/ontology/Provincia • ABox URIs http://geo.linkeddata.es/resource/{r. type}/{r. name} http://geo.linkeddata.es/resource/Provincia/Madrid 12
  • 13. Specification Definition of the license • Several possibilities • The UK Open Government License • Open Database License • Public Domain Dedication and License • Open Data Commons Attribution License • The C Creative C Commons Licenses It is also possible to reuse and apply an existing license p pp y g of the government data sources. 13
  • 14. Specification GeoLinkedData - Definition of the license • Reusing the original license of the government data sources. IGN and INE data sources have their own license, similar t Att ib ti Sh li i il to Attribution-Share Alik 2 5 G Alike 2.5 Generic i License http://creativecommons.org/licenses/by-sa/2.5/ 14
  • 15. 15
  • 16. Modelling Ontology • An ontology is an engineering artifact, which provides: • A set of terms • A set of explicit assumptions regarding the intended meaning of the terms. • Almost always including concepts and their classification • Almost always including properties between concepts • Shared understanding of a domain of interest • Ontologies expressed in OWL or RDF(S), both based on RDF 16
  • 17. Modelling Reuse available vocabularies Search f suitable S h for it bl vocabularies Linked Open Vocabularies are there Yes Build the vocabulary by suitable reusing available g vocabularies? vocabularies No … 17
  • 18. Modelling Reuse available non-ontological resources Highly reliable Web Sites Search f suitable S h for it bl Domain related Domain-related sites non-ontological resources Government Catalogs are there Yes Build the vocabulary by suitable transforming available t f i il bl resources? resources No Build the vocabulary from scratch 18
  • 19. Modelling GeoLinkedData WGS84 Geo Positioning: an RDF vocabulary scv:Dimension scv:Item scv:Dataset hydrographical phenomena (rivers, lakes, etc.) Vocabulary for instants, intervals, durations, etc. Names and international code Ontology for OGC systems for Geography Markup territories and Language groups Classes 33 33 Object Properties 44 44 Data Properties 318 318 http://neon-toolkit.org/ 19
  • 20. Modelling GeoLinkedData 20
  • 21. 21
  • 22. Generation • Transformation • Data cleansing • Linking 22
  • 23. Generation Transformation • Take the data sources selected in the specification activity and transform them to RDF according to the vocabulary created i th modelling activity b l t d in the d lli ti it • Some tools • CSV and spreadsheets • RDF extension of Google Refine, XLWrap, RDF123, NOR2O • RDB • D2R Server, ODEMapster, W3C RDB2RDF WG – R2RML • XML • GRDDL, ReDeFer 23
  • 24. Generation GeoLinkedData - Transformation NOR2O INE ODEMapster IGN Geospatial Geometry2RDF column IGN 24
  • 25. Generation GeoLinkedData - Transformation Industry Production Index Year Province NOR2O 25
  • 26. Generation GeoLinkedData - Transformation • R2O is an e te s b e, fully dec a at e language to desc be s a extensible, u y declarative a guage describe mappings between relational database schemas and ontologies. • The ODEMapster processor generates RDF instances from relational instances based on the mapping description pp g p expressed in the R2O document www.oeg-upm.net/index.php/en/downloads/9-r2o-odempaster 26
  • 27. Generation GeoLinkedData - Transformation • Creation of the R2O Mappings 27
  • 28. Generation GeoLinkedData - Transformation Excerpt of the R2O document 28
  • 29. Generation GeoLinkedData - Transformation • Tool for generating RDF from geometrical information • The geometry could be available in GML or WKT • The RDF generated follows our Geometry Model http://www.oeg-upm.net/index.php/en/downloads/151-geometry2rdf 29
  • 30. Generation GeoLinkedData - Transformation Oracle STO UTIL package SELECT TO_CHAR(SDO_UTIL.TO_GML311GEOMETRY(geometry)) AS Gml311Geometry FROM "BCN200"."BCN200_0301L_RIO" c WHERE c.Etiqueta='Arroyo' 30
  • 32. Generation Data Cleansing • To find possible errors, identified by Hogan et al. • http-level issues, such as accessibility and derefencability, e.g., e g HTTP URIs ret rn 40 /50 errors return 40x/50x • reasoning issues such as namespace without vocabulary, e.g., rss:item term invented • malformed/incompatible datatypes, e.g., “true” as xsd:int • To fix the identified errors 32
  • 33. Generation GeoLinkedData – Data Cleansing • Errors • Some resources, with the same name, were mixed. For example, e ample Granada municipality belongs to Granada m nicipalit province, and La Granada municipality belongs to Barcelona Province. • Autonomous communities that only have one province, e.g., Murcia Region, missed some municipalities, but their corresponding provinces, e g Murcia Province have the provinces e.g., Province, correct number of municipalities. • S Some hydrographical resources missed some parts of their f geometrical information. 33
  • 34. Generation Linking Identify suitable data sets http://ckan.net as linking targets Discover relationships between data items LIMES Silk Framework http://aksw.org/Projects/limes http://www4.wiwiss.fu-berlin.de/bizer/silk/ Validate the relationships discovered sameAs Validator http://oegdev.dia.fi.upm.es:8080/sameAs/ 34
  • 35. Generation GeoLinkedData - Linking GeoLinked Data DBPedia GeoNames …. …. …. http://dbpedia.org/re http://geo.linkeddata http://sws.geoname source/Madrid .es/.../Madrid s.org/6355233/ …. …. …. 35
  • 36. Generation GeoLinkedData - Linking http://oegdev.dia.fi.upm.es:8080/sameAs/ http://oegdev dia fi upm es:8080/sameAs/ 36
  • 37. 37
  • 38. Publication • Dataset publication • Metadata publication • Dataset discovery 38
  • 39. Publication Dataset Publication • Tools for storing RDF • Virtuoso Universal Server, Jena, Sesame, 4Store, YARS, OWLIM • SPARQL endpoint and Linked Data frontend • Pubby, Talis Platform, Fuseki 39
  • 40. Publication Metadata Publication • VoID allows to express metadata about RDF datasets • Open Provenance Model 40
  • 41. Publication Dataset discovery • Register the dataset into CKAN Registry • Generate sitemap files for your dataset, by using sitemap4rdf • Submit the sitemap location to Google and Sindice http://www.w3.org/wiki/TaskForces/CommunityProjects/LinkingOpenData/DataSets/CKANmetainformation 41
  • 42. Publication GeoLinkedData – Dataset publication HTML Linked Data SPARQL Including Provenance Pubby Support http://www4.wiwiss.fu-berlin.de/pubby/ Pubby 0.3 Virtuoso 6.1.0 610 42
  • 44. 44
  • 46. Exploitation GeoLinkedData http://oegdev.dia.fi.upm.es/projects/map4rdf/ map4rdf: • Google maps viewer of RDF resources • Resources with spatial information • Extensible with google plugins • Used in other applications like Aemet Goodrelations Aemet, map4rdf SPARQL Triplestore 46
  • 50. Provinces – Industry Production Index 50
  • 53. Methodological Guidelines for Publishing Linked Data g Boris Villazón-Terrazas, Oscar Corcho Facultad de Informática, Universidad Politécnica de Madrid , Campus de Montegancedo sn, 28660 Boadilla del Monte, Madrid http://www.oeg-upm.net {bvillazon,ocorcho}@fi.upm.es Phone: 34 91 3366605 Fax: 34 91 3524819 34.91.3366605, 34.91.3524819 Slides available at: http://www.slideshare.net/boricles/ Acknowledgements: Asunción Gómez-Pérez, Luis M. Vilches, Victor Saquicela, Al Vi t S i l Alexander d L ó and many others th t we d de León, d th that may have omitted. WorkdistributedunderthelicenseCreativeCommonsAttribution- Noncommercial-Share Alike 3.0