SlideShare a Scribd company logo
In collaboration with


NANYANG TECHNOLOGICAL UNIVERSITY




                              Wee Kim Wee
                  School of Communication & Information



K6299 – Critical Inquiry in Knowledge Management

Proposal for Designing a Linked Data Migrational Framework for Singapore
Government Data Sets


                         Under the guidance of

                  Dr. Khoo Soo Guan, Christopher (Assoc Prof)
                  Mr. Soy Boom Lim (Manager, iDA Singapore)




                         Submitted by

                  SESAGIRI RAAMKUMAR ARAVIND              (G1101761F)

                  THANGAVELU MUTHU KUMAAR                (G1101765E)

                  KALEESWARAN SUDARSAN                    (G1001065F)


                               Page 1 of 9
Introduction
“The Internet is becoming the town square for the global village of tomorrow” – This quote of Bill Gates,
Chairman of Microsoft rightly pictures the world’s present business scene using internet as the dominant
medium for connecting with its resources across geographies enabling voluminous transactions at ease.
The challenge now vests upon enabling machines to read and understand data on the internet for a chain
of intelligent transactions that has been manual earlier due to the human understandable format in the
traditional form of WWW. This idea was well formulated with the concept of Semantic Web that has
content defined with semantics (Berners-Lee, Hendler & Lassila, 2001). Based on the concept, principles
describing Linked Data were released to guide individuals, enterprises and public bodies to release their
data in a common standard, RDF (Resource Description Framework) to form a web of data (Berners-Lee,
2006). Standardised data representation provides more scope for interlinking data sets across domains,
creating avenues for multi-point usage and knowledge discovery with intelligent software applications
built over it.


The most interesting large scale application of Linked Data taken for exploration is the eGovernment
(eGov) initiatives of US, UK and many other nations to publish their Open Governmental Data (OGD)
pertaining to governance and public affairs for transparency and value co-creation to empower people
with appropriate knowledge. The recent Open Government Partnership1 mandates nations to publish their
OGD in linked data format. Many nations have started to publish their data in the form of linked data, the
latest being Brazil data portal data.gov.br2. The start of the Linked data movement spurred the release of
new data sets highlighted by the LOD cloud3 maintained by CKAN4 registry.US and UK governments
have realized the benefits by releasing selective data sets in the linked data format in the portals data.gov 5
and data.gov.uk6 respectively. Well-defined relationships between these datasets and ready-made
applications guide public’s daily activities related to transport, business and other needs. Some of the
existing applications are Numberhood7, FixMyTransport8, BIS Research Funding Explorer9, SemaPlorer10
and “Linking Wildland Fire and Government Budget” mashup11.


1
  Open Government Partnership http://www.state.gov/g/ogp/
2
  Brazil Data Portal data.gov.br
3
  LOD cloud diagram shows datasets that have been published in Linked Data format, by contributors to the Linking
Open Data community project and other individuals and organisations http://richard.cyganiak.de/2007/10/lod/
4
  Comprehensive Knowledge Archive Network http://ckan.net/
5
  data.gov
6
  data.gov.uk
7
  http://www.Numberhood.net
8
  http://www.fixmytransport.com/
9
  http://consulting.talis.com/case-study/bis-research-funding-explorer/

                                                   Page 2 of 9
The current OGD scenario in Singapore doesn’t make use of Linked Data standards. This proposal aims
at suggesting a migrational framework from the existing system of data publishing. A study is being done
on the current OGD ecosystem in Singapore as a starting point. iDA12 maintains the portal data.gov.sg13
that handles data collated from different government agencies (Chee Hean, 2011). The data portal aims to
meet Singapore public’s data needs and also to establish a co-creative environment. The data is provided
in different structured and unstructured formats such as txt, excel, pdf, xml, webpages, maps and also in
the form of agency specific Application Programming Interfaces (APIs) and web services. There are
multiple endpoints for data consumption. Prominent examples include data.gov.sg, OneMap API14,
Singapore Statistics15,mytransport.sg16 and Integrated Land Information Services17. There is some level of
redundancy in data spanning across the different sources in the current OGD ecosystem with limited
interlinking and re-use capabilities. The vocabularies used by the agencies are specific to their own with
limited standardisation of commonly used terms. The process of building a mash-up application
leveraging data across agencies is complex. This study has indicated the scope for the application of
linked data as it requires standardised data representation at source level and common interface at
publication level with the data sets linked by interconnected vocabularies.




        Fig1: Linked Data implementation over current DGS (DATA.GOV.SG) Ecosystem

10
   http://www.uni-koblenz-landau.de/koblenz/fb4/institute/IFI/AGStaab/Research/systeme/semap
11
   http://logd.tw.rpi.edu/demo/linking_wildland_fire_and_government_budget
12
   Infocomm Development Authority of Singapore (iDA) http://www.ida.gov.sg/home/index.aspx
13
   data.gov.sg
14
   http://www.onemap.sg
15
   http://www.singstat.gov.sg/
16
   http://mytransport.sg
17
   http://www.inlis.gov.sg/layout/homepage.aspx#


                                                 Page 3 of 9
Objectives of the Proposal
The current study aims to build a linked data migrational framework that could be used by iDA and
Singapore Government agencies to publish their data sets in the form of linked data to the public. A
multi-step methodology would be devised with clearly defined activities and deliverables at each step
based on the current ecosystem of data.gov.sg and other OGD publishing portals in Singapore.
Geographical and Statistical data have been selected for describing each step in the framework.


The framework build process is based on the metadata and specifications provided by iDA and
government agencies. The current study focuses on linking the internal data sets. Additionally, it aims to
provide recommendations on a few use-cases that leverage the utility of external linked data. The holistic
nature of the framework will be validated with Geographical and Statistics data provided by SLA and
DOS.


Other objectives of the study are as follows:-
    1.) Explore case studies pertaining to implementation of Linked Open Government data
    2.) Prepare an inventory by assessing different linked data tools, technical frameworks and processes
    3.) Provide recommendations for linked data implementation as per nature of the government
        agency.
    4.) Build an Ontology Network model (Haase, Rudolph, Wang et al, 2006) meant to unify
        vocabularies from different agency domains.
    5.) Build a POC application based on the devised methodology to validate its applicability. This
        objective is subject to availability of sufficient time and infrastructure.


The migrational framework will be useful for iDA in formulating their Linked Data implementation
strategy in the near future, as the government body intends to make the portal data.gov.sg as a cornerstone
portal for OGD publication. The common output interface suggested by the framework will showcase the
potential of unifying the different end points provided by the agencies thereby simplifying access and
facilitating the creation of applications that integrate data from disparate sources. The ontology network
suggested by the framework will help the agencies in standardising vocabulary across domains for better
understanding their data and its relation to data from other agencies.
The framework can also be used by enterprises and individuals to understand the steps, tools and
processes involved in releasing their data to the WWW in the form of linked data.




                                                  Page 4 of 9
Literature Review
The Semantic Web facilitates a web of data18 that works on top of URI19 RDF20, Ontology21 and
SPARQL22 concepts. Resources and values are identified and described in a common standard, RDF
based on the modelled Ontology specifying the relationships (Berners-Lee, Hendler & Lassila, 2001). The
LOD223 initiative aims to build a LOD stack of products, frameworks and processes that aim to accelerate
the implementation of linked data across the globe.W3C has setup two committees24 to provide best
practices and recommendations for governments to publish their OGD in standardised linked data format.
(Bizer, Heath, Idehen & Berners-Lee, 2008), (Villazón, Vilches, Corcho & Gómez-Pérez, 2011) and
(Hyland & Wood, 2011) provide cookbooks and guidelines for OGD conversion to Linked Data format.
They are helpful in understanding the general steps and tools required in converting and publishing OGD
in Linked Data format. Governments that are new entrants in adopting Linked Data publication strategy
need a tailored migrational framework specific to the local OGD ecosystem. The customized framework
could be used by the government steering committee to expedite the migration to LOGD format.


Methodology
The project team has been discussing with iDA staff, SLA staff and NIIT staff (the IT vendor supporting
DGS25 platform) prior to the proposal to get a basic understanding of the current architecture and to
identify the DGS components that could accommodate changes as a part of this study. Primary data
would be provided by iDA and SLA. The data sets selected for the study are indicated in the below table
1.1. These seemingly disparate datasets can be connected to give a context specific knowledge on
each site for the prospective tenderers to gain insights on the consumer and locality trends based
on the demographics.


18
   Linked Data and Web of Data http://www.youtube.com/watch?v=GKfJ5onP5SQ
19
    Uniform Resource Identifiers (URIs) are short strings that identify resources in the web: documents, images,
downloadable files, services, electronic mailboxes, and other resources. They make resources available under a
variety of naming schemes and access methods such as HTTP, FTP, and Internet mail addressable in the same
simple way http://www.w3.org/Addressing/
20
   RDF is a standard model for data interchange on the Web. RDF has features that facilitate data merging even if
the underlying schemas differ, and it specifically supports the evolution of schemas over time without requiring all
the data consumers to be changed http://www.w3.org/RDF/
21
   Ontologies or vocabularies define the concepts and relationships (also referred to as “terms”) used to describe and
represent an area of concern. http://www.w3.org/standards/semanticweb/ontology
22
   SPARQL is an RDF query language; its name is an acronym that stands for SPARQL Protocol and RDF Query
Language. http://www.w3.org/TR/rdf-sparql-query/
23
   LOD2 Project http://lod2.eu/BlogPost/9-press-release-lod2-project-launch.html
24
   http://www.w3.org/2011/gld/charter and http://www.w3.org/egov/
25
   DGS – Data.gov.sg data store

                                                     Page 5 of 9
Data set                                 Agency                    Category               Data type
Resident Population by DGP Zone/         Department of             Population and         Textual
Subzone and Age Group, Type of           Statistics                Household
Dwelling, Ethnic Group                                             Characteristics
Sites Sold by URA - Details              Urban Redevelopment       Housing and Urban      Textual
                                         Authority (URA)           Planning


                                Table 1.1: Primary datasets used for the study


The entire data sets would not be used for the study instead the latest year’s data would be used for the
study.   The secondary data for the research study would be extracted from LOGD statistical and
geospatial data sets from the portal thedatahub.org for building the framework. The migrational
framework will be customized based on the current architecture of DGS because the steps will be devised
based on the understanding of the different layers in DGS and still the framework will be generic enough
to be applicable for other cases. The project team would be conducting interviews with iDA support staff
for collecting specification documents and insights relevant to the current architecture of DGS.


The framework formulation would be based on the context-specific integration of different approaches
put forth by LOGD activists, researchers and practitioners. Each step in the framework will be sequential,
comprising of sub steps covering intrinsic activities. For example, object modelling of the different data
objects in the selected data sets is a step that precedes the RDF modelling and Ontology/Vocabulary
building steps. The steps will be substantiated with sample implementations using the primary data.
Suggestions from W3C LOGD steering groups10 will be taken into account for framework formulation.
The tools that will be identified as part of the inventory will be used for the activities such as RDF
creation, RDF storage and Ontology re-use/modelling in the framework.


Difficulties and Issues
Agencies do not provide raw data to iDA. Aggregated report data is split into X dimensions representing
columns, Y dimensions representing rows and data points representing cells. These fields are provided in
an XML file and sent to iDA on a periodic basis. There is no separate master data file. The hierarchy in
master data dimensions is not explicitly set or provided. Therefore, a mechanism to identify the master
data and the relationship between different levels in the master data dimensions needs to be devised. This
mechanism may not serve as a generic transformation applicable for all agencies due to the implicit nature
of data representation in the files.




                                                 Page 6 of 9
The data conversion to RDF formats will not be done at the agency level instead it will be done on top of
the data model in iDA data store. This leads to data duplication as the data is converted to RDF format for
Linked data implementation.
There is no master data management system in place right now that standardises the dimension values
across agencies. Standardisation is required to link common data in the data sets used in the study. This
might be a complex task due to the different versions of master data values in a single data set and also
across data sets.
The current OGD ecosystem of Singapore provides multiple end points to the users such as API, web
services and files. A common endpoint in the form of Linked data API would mean building different
wrappers over the end points. The below diagram from (Bizer , Heath, Idehen, & Berners-Lee, 2008)
illustrates the different approaches of linked data implementation over existing systems.




                    Fig2: Different Linked Data Implementation Approaches




                                                 Page 7 of 9
Schedule
The schedule for the study is covered in the embedded Gantt chart.


   Gantt Chart-iDA
Linked Data Project.xlsx

Proposed Report Outline
The proposed final report will be structured in the following format.
     1. Abstract
     2. Introduction
               a. Introduction to Linked Data and its relevance to Open Government Data and eGov
               b. Overview of SG OGD Ecosystem
     3. Literature Review
               a. Government Linked Data Implementation Cookbooks, Guidelines and Recommendations
                       i.URI formulation
                      ii.RDF creation
                     iii.Ontology Formulation
                     iv.Publication and Exploitation
     4. Migrational Framework
               a. Multi-step methodology
                       i.Formulation and Description
                      ii.Examples
     5. Implementation Results and Observations
               a. POC details
               b. Description of issues faced in implementation
     6. Limitations
     7. Conclusion and Recommendations


Few new sections and sub-sections might be added in the final report.


Dissemination of Results
The migrational framework will be published in the form of a report subject to review by NTU Supervisor
followed by submission to iDA. The researchers plan to publish the report in the form of a conference
paper in the later part of the year.



                                                  Page 8 of 9
References
Berners-Lee, T., Hendler, J., & Lassila, O. (2001). THE SEMANTIC WEB. Scientific American, 284(5),
  34
Berners-Lee, T. (2006). Linked Data. Available: http://www.w3.org/DesignIssues/LinkedData.html. Last
  accessed 11th Jan 2012
Chee Hean, T. (2011). Keynote Address by Mr Teo Chee Hean, Deputy Prime Minister, Coordinating
  Minister for National Security and Minister for Home Affairs at the e-Gov Global Exchange 2011.
  Available:   http://www.ida.gov.sg/News%20and%20Events/20110620114104.aspx?getPagetype=21.
  Last accessed 11th Jan 2012
Bizer , C., Heath, T., Idehen, K., & Berners-Lee, T. (2008). Linked Data: Evolving the Web into a Global
  Data Space. (J. Hendler & F. Van Harmelen, Eds.)Proceeding of the 17th international conference on
  World Wide Web WWW 08 (Vol. 1, p. 1265). ACM Press.
Villazón-Terrazas, B., Vilches-Blázquez, L., Corcho, O., and Gómez-Pérez, A. (2011). Methodological
  guidelines for publishing government linked data linking government data. In Wood, D., editor,
  Linking Government Data, chapter 2, pages 27-49. Springer New York, New York, NY.
Hyland, B. and Wood, D. (2011). The joy of data - a cookbook for publishing linked government data on
  the web linking government data. In Wood, D., editor, Linking Government Data, chapter 1, pages 3-
  26. Springer New York, New York, NY.
Haase, P., Rudolph, S., Wang, Y., Brockmans, S., Palma, R., Euzenat, J., & d’ Aquin, M. (2006,
  November). Networked Ontology Model. Technical Report, NeOn project deliverable D1.1.1




                                              Page 9 of 9

More Related Content

DOCX
Proposal for Designing a Linked Data Migrational Framework for Singapore Gove...
DOCX
NIC Linked Data: the OHIO project
PDF
Open Linked Data as Part of a Government Enterprise Architecture
DOC
Notes for talk on 12th June 2013 to Open Innovation meeting, Glasgow
PDF
TREND-BASED NETWORKING DRIVEN BY BIG DATA TELEMETRY FOR SDN AND TRADITIONAL N...
PPTX
Web samia mehlem open data and wb main presentation
PPTX
IN2N: Cross-institutional Authority Collaboration
PDF
Introducción a Linked Open Data (espacios enlazados y enlazables)
Proposal for Designing a Linked Data Migrational Framework for Singapore Gove...
NIC Linked Data: the OHIO project
Open Linked Data as Part of a Government Enterprise Architecture
Notes for talk on 12th June 2013 to Open Innovation meeting, Glasgow
TREND-BASED NETWORKING DRIVEN BY BIG DATA TELEMETRY FOR SDN AND TRADITIONAL N...
Web samia mehlem open data and wb main presentation
IN2N: Cross-institutional Authority Collaboration
Introducción a Linked Open Data (espacios enlazados y enlazables)

What's hot (20)

PDF
Revised presentation
PDF
Open Government Data - updates from around the world
PPTX
The Future of LOD
PDF
towards an expanded and integrated ogd agenda for india // icegov 2013 // seoul
PDF
Invited talk "Open Data as a driver of Society 5.0: how you and your scientif...
PDF
Delivering on Standards for Publishing Government Linked Data
PDF
2015.12.22 teri open research
PDF
exploring internet governance implications of an expanded open data agenda: c...
PDF
A Survey of (Potential) Open Data Ecosystem in India // ICEGOV // October 2014
PPTX
2nd Stakeholder workshop: Bertin, Embrapa's appraoch to open Agricultural Sci...
PPTX
Dealing with Open Data in Istat
PPTX
Rdaeu russia_fg_1_july2014_final
PDF
CHALLENGES FOR PUBLIC SECTOR ORGANISATIONS IN CLOUD ADOPTION: A CASE STUDY OF...
PDF
Open Data is not Enough
PDF
Opening Government Data in India // Slides from ODDC Network Meeting // Berli...
PPTX
Paul Davidson – Opening up public data to improve transparancy and efficiency
PPTX
#opendata Back to the future
PDF
Uptake and Utilization of Open Data
PPTX
Open Data: Barriers, Risks, and Opportunities
PDF
Ist africa paper_ref_115_doc_3988
Revised presentation
Open Government Data - updates from around the world
The Future of LOD
towards an expanded and integrated ogd agenda for india // icegov 2013 // seoul
Invited talk "Open Data as a driver of Society 5.0: how you and your scientif...
Delivering on Standards for Publishing Government Linked Data
2015.12.22 teri open research
exploring internet governance implications of an expanded open data agenda: c...
A Survey of (Potential) Open Data Ecosystem in India // ICEGOV // October 2014
2nd Stakeholder workshop: Bertin, Embrapa's appraoch to open Agricultural Sci...
Dealing with Open Data in Istat
Rdaeu russia_fg_1_july2014_final
CHALLENGES FOR PUBLIC SECTOR ORGANISATIONS IN CLOUD ADOPTION: A CASE STUDY OF...
Open Data is not Enough
Opening Government Data in India // Slides from ODDC Network Meeting // Berli...
Paul Davidson – Opening up public data to improve transparancy and efficiency
#opendata Back to the future
Uptake and Utilization of Open Data
Open Data: Barriers, Risks, and Opportunities
Ist africa paper_ref_115_doc_3988
Ad

Viewers also liked (9)

DOC
Knowledge process productivity indexing schema
DOCX
Habits that Knowledge workers need to cultivate
PDF
Caravan insurance data mining prediction models
DOC
Load balancing implementation in wireless networks
DOCX
Information to Intelligence (BI Context)
PDF
Innovation management in fashion industry
DOCX
Semantic web design for www.data.gov.sg - Technical Report
PDF
Knowledge Management and Risk Management Connection explained with Unilever
DOCX
Bp business and information strategy alignment
Knowledge process productivity indexing schema
Habits that Knowledge workers need to cultivate
Caravan insurance data mining prediction models
Load balancing implementation in wireless networks
Information to Intelligence (BI Context)
Innovation management in fashion industry
Semantic web design for www.data.gov.sg - Technical Report
Knowledge Management and Risk Management Connection explained with Unilever
Bp business and information strategy alignment
Ad

Similar to Linked data migrational framework (20)

PPTX
Semantic web design for www.data.gov.sg - Presentation
PDF
Open Government Data on the Web - A Semantic Approach
PDF
US EPA OSWER Linked Data Workshop 1-Feb-2013
PDF
Governmental Linked Open Data: A Data Management Perspective
PDF
Linked Open Government Data: What’s Next?
PDF
Big Data on the Web – What We Will Do
PDF
Open Government Data, Linked Data, and the Missing Blocks in Korea
PDF
Semantic Search: We're Living in a Golden Age for Information
PPTX
Omitola birmingham cityuniv
PPTX
How Linked Data is transforming eGovernment
PPT
EPA OEI Linked Data Process
PDF
Martin Kaltenböck - OGD Linked Open Government Data
PPTX
Linked Data In Action
PPTX
Governmental Linked Data
PDF
W3C TPAC 2012 Breakout Session on Government Linked Data
PPT
Linked Open Govt Data - Sem Tech East
PPTX
The State of Linked Government Data
PDF
US National Archives & Open Government Data
PDF
Designing a second generation of open data platforms
PPTX
Basic concept of Linked & Linked open Government data
Semantic web design for www.data.gov.sg - Presentation
Open Government Data on the Web - A Semantic Approach
US EPA OSWER Linked Data Workshop 1-Feb-2013
Governmental Linked Open Data: A Data Management Perspective
Linked Open Government Data: What’s Next?
Big Data on the Web – What We Will Do
Open Government Data, Linked Data, and the Missing Blocks in Korea
Semantic Search: We're Living in a Golden Age for Information
Omitola birmingham cityuniv
How Linked Data is transforming eGovernment
EPA OEI Linked Data Process
Martin Kaltenböck - OGD Linked Open Government Data
Linked Data In Action
Governmental Linked Data
W3C TPAC 2012 Breakout Session on Government Linked Data
Linked Open Govt Data - Sem Tech East
The State of Linked Government Data
US National Archives & Open Government Data
Designing a second generation of open data platforms
Basic concept of Linked & Linked open Government data

More from Muthu Kumaar Thangavelu (7)

DOCX
Unilever's Lipton Risk Management with Business Intelligence
PPTX
Ul lipton-presentation v4
PPTX
Human Capital Management
PPTX
Buckmann labs KM case study
PPTX
Boeing rocketdyne radical innovation case study
PDF
Caravan insurance data mining statistical analysis
PDF
Caravan insurance data mining prediction models
Unilever's Lipton Risk Management with Business Intelligence
Ul lipton-presentation v4
Human Capital Management
Buckmann labs KM case study
Boeing rocketdyne radical innovation case study
Caravan insurance data mining statistical analysis
Caravan insurance data mining prediction models

Recently uploaded (20)

PPTX
Introduction to Child Health Nursing – Unit I | Child Health Nursing I | B.Sc...
PDF
01-Introduction-to-Information-Management.pdf
PDF
VCE English Exam - Section C Student Revision Booklet
PPTX
Pharma ospi slides which help in ospi learning
PPTX
BOWEL ELIMINATION FACTORS AFFECTING AND TYPES
PPTX
Cell Types and Its function , kingdom of life
PDF
FourierSeries-QuestionsWithAnswers(Part-A).pdf
PPTX
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
PDF
Microbial disease of the cardiovascular and lymphatic systems
PDF
Complications of Minimal Access Surgery at WLH
PPTX
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
PDF
Module 4: Burden of Disease Tutorial Slides S2 2025
PDF
Pre independence Education in Inndia.pdf
PDF
102 student loan defaulters named and shamed – Is someone you know on the list?
PDF
Mark Klimek Lecture Notes_240423 revision books _173037.pdf
PPTX
Pharmacology of Heart Failure /Pharmacotherapy of CHF
PPTX
PPH.pptx obstetrics and gynecology in nursing
PPTX
Cell Structure & Organelles in detailed.
PDF
Insiders guide to clinical Medicine.pdf
PDF
Basic Mud Logging Guide for educational purpose
Introduction to Child Health Nursing – Unit I | Child Health Nursing I | B.Sc...
01-Introduction-to-Information-Management.pdf
VCE English Exam - Section C Student Revision Booklet
Pharma ospi slides which help in ospi learning
BOWEL ELIMINATION FACTORS AFFECTING AND TYPES
Cell Types and Its function , kingdom of life
FourierSeries-QuestionsWithAnswers(Part-A).pdf
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
Microbial disease of the cardiovascular and lymphatic systems
Complications of Minimal Access Surgery at WLH
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
Module 4: Burden of Disease Tutorial Slides S2 2025
Pre independence Education in Inndia.pdf
102 student loan defaulters named and shamed – Is someone you know on the list?
Mark Klimek Lecture Notes_240423 revision books _173037.pdf
Pharmacology of Heart Failure /Pharmacotherapy of CHF
PPH.pptx obstetrics and gynecology in nursing
Cell Structure & Organelles in detailed.
Insiders guide to clinical Medicine.pdf
Basic Mud Logging Guide for educational purpose

Linked data migrational framework

  • 1. In collaboration with NANYANG TECHNOLOGICAL UNIVERSITY Wee Kim Wee School of Communication & Information K6299 – Critical Inquiry in Knowledge Management Proposal for Designing a Linked Data Migrational Framework for Singapore Government Data Sets Under the guidance of Dr. Khoo Soo Guan, Christopher (Assoc Prof) Mr. Soy Boom Lim (Manager, iDA Singapore) Submitted by SESAGIRI RAAMKUMAR ARAVIND (G1101761F) THANGAVELU MUTHU KUMAAR (G1101765E) KALEESWARAN SUDARSAN (G1001065F) Page 1 of 9
  • 2. Introduction “The Internet is becoming the town square for the global village of tomorrow” – This quote of Bill Gates, Chairman of Microsoft rightly pictures the world’s present business scene using internet as the dominant medium for connecting with its resources across geographies enabling voluminous transactions at ease. The challenge now vests upon enabling machines to read and understand data on the internet for a chain of intelligent transactions that has been manual earlier due to the human understandable format in the traditional form of WWW. This idea was well formulated with the concept of Semantic Web that has content defined with semantics (Berners-Lee, Hendler & Lassila, 2001). Based on the concept, principles describing Linked Data were released to guide individuals, enterprises and public bodies to release their data in a common standard, RDF (Resource Description Framework) to form a web of data (Berners-Lee, 2006). Standardised data representation provides more scope for interlinking data sets across domains, creating avenues for multi-point usage and knowledge discovery with intelligent software applications built over it. The most interesting large scale application of Linked Data taken for exploration is the eGovernment (eGov) initiatives of US, UK and many other nations to publish their Open Governmental Data (OGD) pertaining to governance and public affairs for transparency and value co-creation to empower people with appropriate knowledge. The recent Open Government Partnership1 mandates nations to publish their OGD in linked data format. Many nations have started to publish their data in the form of linked data, the latest being Brazil data portal data.gov.br2. The start of the Linked data movement spurred the release of new data sets highlighted by the LOD cloud3 maintained by CKAN4 registry.US and UK governments have realized the benefits by releasing selective data sets in the linked data format in the portals data.gov 5 and data.gov.uk6 respectively. Well-defined relationships between these datasets and ready-made applications guide public’s daily activities related to transport, business and other needs. Some of the existing applications are Numberhood7, FixMyTransport8, BIS Research Funding Explorer9, SemaPlorer10 and “Linking Wildland Fire and Government Budget” mashup11. 1 Open Government Partnership http://www.state.gov/g/ogp/ 2 Brazil Data Portal data.gov.br 3 LOD cloud diagram shows datasets that have been published in Linked Data format, by contributors to the Linking Open Data community project and other individuals and organisations http://richard.cyganiak.de/2007/10/lod/ 4 Comprehensive Knowledge Archive Network http://ckan.net/ 5 data.gov 6 data.gov.uk 7 http://www.Numberhood.net 8 http://www.fixmytransport.com/ 9 http://consulting.talis.com/case-study/bis-research-funding-explorer/ Page 2 of 9
  • 3. The current OGD scenario in Singapore doesn’t make use of Linked Data standards. This proposal aims at suggesting a migrational framework from the existing system of data publishing. A study is being done on the current OGD ecosystem in Singapore as a starting point. iDA12 maintains the portal data.gov.sg13 that handles data collated from different government agencies (Chee Hean, 2011). The data portal aims to meet Singapore public’s data needs and also to establish a co-creative environment. The data is provided in different structured and unstructured formats such as txt, excel, pdf, xml, webpages, maps and also in the form of agency specific Application Programming Interfaces (APIs) and web services. There are multiple endpoints for data consumption. Prominent examples include data.gov.sg, OneMap API14, Singapore Statistics15,mytransport.sg16 and Integrated Land Information Services17. There is some level of redundancy in data spanning across the different sources in the current OGD ecosystem with limited interlinking and re-use capabilities. The vocabularies used by the agencies are specific to their own with limited standardisation of commonly used terms. The process of building a mash-up application leveraging data across agencies is complex. This study has indicated the scope for the application of linked data as it requires standardised data representation at source level and common interface at publication level with the data sets linked by interconnected vocabularies. Fig1: Linked Data implementation over current DGS (DATA.GOV.SG) Ecosystem 10 http://www.uni-koblenz-landau.de/koblenz/fb4/institute/IFI/AGStaab/Research/systeme/semap 11 http://logd.tw.rpi.edu/demo/linking_wildland_fire_and_government_budget 12 Infocomm Development Authority of Singapore (iDA) http://www.ida.gov.sg/home/index.aspx 13 data.gov.sg 14 http://www.onemap.sg 15 http://www.singstat.gov.sg/ 16 http://mytransport.sg 17 http://www.inlis.gov.sg/layout/homepage.aspx# Page 3 of 9
  • 4. Objectives of the Proposal The current study aims to build a linked data migrational framework that could be used by iDA and Singapore Government agencies to publish their data sets in the form of linked data to the public. A multi-step methodology would be devised with clearly defined activities and deliverables at each step based on the current ecosystem of data.gov.sg and other OGD publishing portals in Singapore. Geographical and Statistical data have been selected for describing each step in the framework. The framework build process is based on the metadata and specifications provided by iDA and government agencies. The current study focuses on linking the internal data sets. Additionally, it aims to provide recommendations on a few use-cases that leverage the utility of external linked data. The holistic nature of the framework will be validated with Geographical and Statistics data provided by SLA and DOS. Other objectives of the study are as follows:- 1.) Explore case studies pertaining to implementation of Linked Open Government data 2.) Prepare an inventory by assessing different linked data tools, technical frameworks and processes 3.) Provide recommendations for linked data implementation as per nature of the government agency. 4.) Build an Ontology Network model (Haase, Rudolph, Wang et al, 2006) meant to unify vocabularies from different agency domains. 5.) Build a POC application based on the devised methodology to validate its applicability. This objective is subject to availability of sufficient time and infrastructure. The migrational framework will be useful for iDA in formulating their Linked Data implementation strategy in the near future, as the government body intends to make the portal data.gov.sg as a cornerstone portal for OGD publication. The common output interface suggested by the framework will showcase the potential of unifying the different end points provided by the agencies thereby simplifying access and facilitating the creation of applications that integrate data from disparate sources. The ontology network suggested by the framework will help the agencies in standardising vocabulary across domains for better understanding their data and its relation to data from other agencies. The framework can also be used by enterprises and individuals to understand the steps, tools and processes involved in releasing their data to the WWW in the form of linked data. Page 4 of 9
  • 5. Literature Review The Semantic Web facilitates a web of data18 that works on top of URI19 RDF20, Ontology21 and SPARQL22 concepts. Resources and values are identified and described in a common standard, RDF based on the modelled Ontology specifying the relationships (Berners-Lee, Hendler & Lassila, 2001). The LOD223 initiative aims to build a LOD stack of products, frameworks and processes that aim to accelerate the implementation of linked data across the globe.W3C has setup two committees24 to provide best practices and recommendations for governments to publish their OGD in standardised linked data format. (Bizer, Heath, Idehen & Berners-Lee, 2008), (Villazón, Vilches, Corcho & Gómez-Pérez, 2011) and (Hyland & Wood, 2011) provide cookbooks and guidelines for OGD conversion to Linked Data format. They are helpful in understanding the general steps and tools required in converting and publishing OGD in Linked Data format. Governments that are new entrants in adopting Linked Data publication strategy need a tailored migrational framework specific to the local OGD ecosystem. The customized framework could be used by the government steering committee to expedite the migration to LOGD format. Methodology The project team has been discussing with iDA staff, SLA staff and NIIT staff (the IT vendor supporting DGS25 platform) prior to the proposal to get a basic understanding of the current architecture and to identify the DGS components that could accommodate changes as a part of this study. Primary data would be provided by iDA and SLA. The data sets selected for the study are indicated in the below table 1.1. These seemingly disparate datasets can be connected to give a context specific knowledge on each site for the prospective tenderers to gain insights on the consumer and locality trends based on the demographics. 18 Linked Data and Web of Data http://www.youtube.com/watch?v=GKfJ5onP5SQ 19 Uniform Resource Identifiers (URIs) are short strings that identify resources in the web: documents, images, downloadable files, services, electronic mailboxes, and other resources. They make resources available under a variety of naming schemes and access methods such as HTTP, FTP, and Internet mail addressable in the same simple way http://www.w3.org/Addressing/ 20 RDF is a standard model for data interchange on the Web. RDF has features that facilitate data merging even if the underlying schemas differ, and it specifically supports the evolution of schemas over time without requiring all the data consumers to be changed http://www.w3.org/RDF/ 21 Ontologies or vocabularies define the concepts and relationships (also referred to as “terms”) used to describe and represent an area of concern. http://www.w3.org/standards/semanticweb/ontology 22 SPARQL is an RDF query language; its name is an acronym that stands for SPARQL Protocol and RDF Query Language. http://www.w3.org/TR/rdf-sparql-query/ 23 LOD2 Project http://lod2.eu/BlogPost/9-press-release-lod2-project-launch.html 24 http://www.w3.org/2011/gld/charter and http://www.w3.org/egov/ 25 DGS – Data.gov.sg data store Page 5 of 9
  • 6. Data set Agency Category Data type Resident Population by DGP Zone/ Department of Population and Textual Subzone and Age Group, Type of Statistics Household Dwelling, Ethnic Group Characteristics Sites Sold by URA - Details Urban Redevelopment Housing and Urban Textual Authority (URA) Planning Table 1.1: Primary datasets used for the study The entire data sets would not be used for the study instead the latest year’s data would be used for the study. The secondary data for the research study would be extracted from LOGD statistical and geospatial data sets from the portal thedatahub.org for building the framework. The migrational framework will be customized based on the current architecture of DGS because the steps will be devised based on the understanding of the different layers in DGS and still the framework will be generic enough to be applicable for other cases. The project team would be conducting interviews with iDA support staff for collecting specification documents and insights relevant to the current architecture of DGS. The framework formulation would be based on the context-specific integration of different approaches put forth by LOGD activists, researchers and practitioners. Each step in the framework will be sequential, comprising of sub steps covering intrinsic activities. For example, object modelling of the different data objects in the selected data sets is a step that precedes the RDF modelling and Ontology/Vocabulary building steps. The steps will be substantiated with sample implementations using the primary data. Suggestions from W3C LOGD steering groups10 will be taken into account for framework formulation. The tools that will be identified as part of the inventory will be used for the activities such as RDF creation, RDF storage and Ontology re-use/modelling in the framework. Difficulties and Issues Agencies do not provide raw data to iDA. Aggregated report data is split into X dimensions representing columns, Y dimensions representing rows and data points representing cells. These fields are provided in an XML file and sent to iDA on a periodic basis. There is no separate master data file. The hierarchy in master data dimensions is not explicitly set or provided. Therefore, a mechanism to identify the master data and the relationship between different levels in the master data dimensions needs to be devised. This mechanism may not serve as a generic transformation applicable for all agencies due to the implicit nature of data representation in the files. Page 6 of 9
  • 7. The data conversion to RDF formats will not be done at the agency level instead it will be done on top of the data model in iDA data store. This leads to data duplication as the data is converted to RDF format for Linked data implementation. There is no master data management system in place right now that standardises the dimension values across agencies. Standardisation is required to link common data in the data sets used in the study. This might be a complex task due to the different versions of master data values in a single data set and also across data sets. The current OGD ecosystem of Singapore provides multiple end points to the users such as API, web services and files. A common endpoint in the form of Linked data API would mean building different wrappers over the end points. The below diagram from (Bizer , Heath, Idehen, & Berners-Lee, 2008) illustrates the different approaches of linked data implementation over existing systems. Fig2: Different Linked Data Implementation Approaches Page 7 of 9
  • 8. Schedule The schedule for the study is covered in the embedded Gantt chart. Gantt Chart-iDA Linked Data Project.xlsx Proposed Report Outline The proposed final report will be structured in the following format. 1. Abstract 2. Introduction a. Introduction to Linked Data and its relevance to Open Government Data and eGov b. Overview of SG OGD Ecosystem 3. Literature Review a. Government Linked Data Implementation Cookbooks, Guidelines and Recommendations i.URI formulation ii.RDF creation iii.Ontology Formulation iv.Publication and Exploitation 4. Migrational Framework a. Multi-step methodology i.Formulation and Description ii.Examples 5. Implementation Results and Observations a. POC details b. Description of issues faced in implementation 6. Limitations 7. Conclusion and Recommendations Few new sections and sub-sections might be added in the final report. Dissemination of Results The migrational framework will be published in the form of a report subject to review by NTU Supervisor followed by submission to iDA. The researchers plan to publish the report in the form of a conference paper in the later part of the year. Page 8 of 9
  • 9. References Berners-Lee, T., Hendler, J., & Lassila, O. (2001). THE SEMANTIC WEB. Scientific American, 284(5), 34 Berners-Lee, T. (2006). Linked Data. Available: http://www.w3.org/DesignIssues/LinkedData.html. Last accessed 11th Jan 2012 Chee Hean, T. (2011). Keynote Address by Mr Teo Chee Hean, Deputy Prime Minister, Coordinating Minister for National Security and Minister for Home Affairs at the e-Gov Global Exchange 2011. Available: http://www.ida.gov.sg/News%20and%20Events/20110620114104.aspx?getPagetype=21. Last accessed 11th Jan 2012 Bizer , C., Heath, T., Idehen, K., & Berners-Lee, T. (2008). Linked Data: Evolving the Web into a Global Data Space. (J. Hendler & F. Van Harmelen, Eds.)Proceeding of the 17th international conference on World Wide Web WWW 08 (Vol. 1, p. 1265). ACM Press. Villazón-Terrazas, B., Vilches-Blázquez, L., Corcho, O., and Gómez-Pérez, A. (2011). Methodological guidelines for publishing government linked data linking government data. In Wood, D., editor, Linking Government Data, chapter 2, pages 27-49. Springer New York, New York, NY. Hyland, B. and Wood, D. (2011). The joy of data - a cookbook for publishing linked government data on the web linking government data. In Wood, D., editor, Linking Government Data, chapter 1, pages 3- 26. Springer New York, New York, NY. Haase, P., Rudolph, S., Wang, Y., Brockmans, S., Palma, R., Euzenat, J., & d’ Aquin, M. (2006, November). Networked Ontology Model. Technical Report, NeOn project deliverable D1.1.1 Page 9 of 9