SlideShare a Scribd company logo
Enabling Data Analytics from Knowledge Graphs
Henrique Santos
Universidade de Fortaleza, Fortaleza, CE, Brazil
The 16th
International Semantic Web Conference (ISWC 2017) – Doctoral Consortium
Vienna, Austria – 22 October 2017
22 October 2017Henrique Santos - Enabling Data Analytics from Knowledge Graphs2
The 16th International Semantic Web Conference (ISWC 2017) – Vienna, Austria – Doctoral Consortium
Problem statement
●
Datasets are the most common source of scientifc data for data analysis
●
Lack of metadata, not clean, can’t be directly combined or compared
●
Knowledge Graphs for scientifc data are on the rise
●
Many approaches, multiple uses: but data scientists are still using datasets
●
Consequence: data preparation takes around 80% of the time of the
whole analytical process (PATIL, 2012)
●
How to maintain enough metadata related to scientifc data?
●
How to exploit that knowledge to foster data analytics activities?
●
How integrate data from scientifc KGs with regular data tools like R,
Python or BI softwares?
22 October 2017Henrique Santos - Enabling Data Analytics from Knowledge Graphs3
The 16th International Semantic Web Conference (ISWC 2017) – Vienna, Austria – Doctoral Consortium
Related work
●
W3C’s CSV on the Web
●
Scientifc ontologies
●
SSN – Semantic Sensor Network
●
VSTO – Virtual Solar-Terrestrial Observatory
●
HAScO – Human-Aware Science Ontology
●
Indicators
●
GCI Ontology
●
Scientifc Knowledge graphs
●
Gene Ontology, Bio2RDF, The Graph of Things
22 October 2017Henrique Santos - Enabling Data Analytics from Knowledge Graphs4
The 16th International Semantic Web Conference (ISWC 2017) – Vienna, Austria – Doctoral Consortium
Research questions & Hypothesis
Q1 Can ontologies be used to successfully bridge the knowledge gap between acquired scientifc data and
data users? If so, how?
Q2 Will data users and applications beneft from the use of knowledge behind each scientifc data point?
Q3 How to provide data access for scientifc KGs in a way that can be consumed by routine data tools
while making use of the attached data knowledge to facilitate analytics?
H1 The reuse of scientifc data ontologies with proper extensions and their alignments to domain
ontologies can mitigate the current loss of knowledge during data acquisition
H2 Providing data points together with their knowledge (e.g. provenance, contextual knowledge) to data
users and applications can facilitate data analytics compared to current dataset usage.
H3 A hybrid RDF serialization format that suits the needs of existing data tools but also is able to convey
knowledge can be used to serialize data from KGs together with its associated metadata.
H4 A query API for scientifc KGs can be used to output data together with its associated metadata in a
better way than current tools for querying RDF data for data tools.
22 October 2017Henrique Santos - Enabling Data Analytics from Knowledge Graphs5
The 16th International Semantic Web Conference (ISWC 2017) – Vienna, Austria – Doctoral Consortium
Approach
Data
annotation
KG
building
KG
browsing
KG
serialization
Intelligent
applications
C
HAScO
VSTO-I
HACitO
prov:Activityprov:Activity
hasco: Studyhasco: Study hasco:
DataAcquisition
hasco:
DataAcquisition
vstoi:
Deployment
vstoi:
Deployment
xsd:dateTime
xsd:dateTime
isData
AcquisitionOf hasDeployment
prov: startedAtTime
prov: endedAtTime
vstoi:
Instrument
vstoi:
Instrument
vstoi:
Platform
vstoi:
Platform
vstoi:
Detector
vstoi:
Detector
hasDetectorhasInstrument hasPlatform
C
●
Automatic data
visualization
●
Data cleansing
●
Infer semantic
diference between
data points
●
...
22 October 2017Henrique Santos - Enabling Data Analytics from Knowledge Graphs6
The 16th International Semantic Web Conference (ISWC 2017) – Vienna, Austria – Doctoral Consortium
Preliminary results
SANTOS, H. et al. Contextual Data Collection for Smart Cities.
In: Proceedings of the Sixth Workshop on Semantics for
Smarter Cities. Bethlehem, PA, USA. 2015.
SANTOS, H. et al. From Data to City Indicators: A Knowledge Graph for
Supporting Automatic Generation of Dashboards. In: The Semantic Web -
Proceedings of the 14th Extended Semantic Web Conference (ESWC 2017).
Portorož, Slovenia. 2017.
Data
annotation
KG
building
KG
browsing
KG
serialization
Intelligent
applications
22 October 2017Henrique Santos - Enabling Data Analytics from Knowledge Graphs7
The 16th International Semantic Web Conference (ISWC 2017) – Vienna, Austria – Doctoral Consortium
Evaluation plan
KG evaluation (H1): state-of-the-art KG evaluation approaches discussed
in (PAULHEIM, 2017).
Metadata evaluation (H2): gathering data analytics use cases and
assessing how the associated metadata facilitates the use of the data.
KG querying & serialization (H3, H4): tests with data scientists and
feld specialists acting as users of our proposed KG and processes. Using
their data (preferably from diferent studies and sources), we intend to
build a scientifc KG adding the relevant metadata and then provide them
tools for querying the data and preparing datasets for their routine data
analytics. Then, questionnaires will be applied to measure how much our
approach has eased their tasks in contrast with their regular processes.
22 October 2017Henrique Santos - Enabling Data Analytics from Knowledge Graphs8
The 16th International Semantic Web Conference (ISWC 2017) – Vienna, Austria – Doctoral Consortium
Relevancy
●
We expect this research to bring straight benefts to data
scientists and feld specialists, by providing specifcations and
tools that we claim will ease their data preparation tasks
●
KG serialization technique will promote interoperability
between scientifc data in KGs and existing non-semantic data
tools which we believe will broaden the use of KGs to even
more knowledge areas
22 October 2017Henrique Santos - Enabling Data Analytics from Knowledge Graphs9
The 16th International Semantic Web Conference (ISWC 2017) – Vienna, Austria – Doctoral Consortium
Refections
●
Promoting data analytics from scientifc data in KGs is still in its early
stages
●
Difcult to query the needed data
●
Lack of methods and tools to easily cope data tools with data from KGs
●
Knowledge exploitation to foster data analysis is minimal
●
Our contributions
●
KG specifcation aligned with data analytics requirements
●
Data fle format able to convey both data and metadata
●
Method for data access and retrieval in scientifc KGs based on user queries
●
Our resources
●
Indicator and domain ontologies for developed use-cases
●
Implementations of the proposed method for data access
22 October 2017Henrique Santos - Enabling Data Analytics from Knowledge Graphs10
The 16th International Semantic Web Conference (ISWC 2017) – Vienna, Austria – Doctoral Consortium
hos@edu.unifor.br
@hansidm
http://henriquesantos.org
Enabling Data Analytics from Knowledge Graphs
Henrique Santos
Thank you for your attention
Advisor: Prof. João José Vasco Peixoto Furtado, Docteur

More Related Content

PPTX
From Data to City Indicators: A Knowledge Graph for Supporting Automatic Gene...
PPTX
Studying Migrations Routes: New data and Tools
PPT
Transformation of Estonian statistical system to the ESS – main aspects and e...
DOCX
Liwen Tian Resume
PDF
Advanced Analytics and Machine Learning with Data Virtualization
PDF
OpenGovIntelligence Workshop at NTTS2017
PPTX
Publishing Data to the Socrata Open Data Platform with FME
From Data to City Indicators: A Knowledge Graph for Supporting Automatic Gene...
Studying Migrations Routes: New data and Tools
Transformation of Estonian statistical system to the ESS – main aspects and e...
Liwen Tian Resume
Advanced Analytics and Machine Learning with Data Virtualization
OpenGovIntelligence Workshop at NTTS2017
Publishing Data to the Socrata Open Data Platform with FME

What's hot (17)

PDF
Resume 2017
PPTX
OpenCube Workshop at eGov2015 & ePart2015 dual conference
PPTX
Creating and Utilizing Linked Open Statistical Data for the Development of Ad...
PDF
Using Road Sensor Data for Official Statistics: towards a Big Data Methodology
PDF
"Dude, where's my graph?" RDF Data Cubes for Clinical Trials Data
PPTX
Verso le trusted smart statistics - prospettive di sviluppo e risultati del e...
PDF
Prague Hacks 2015
PPTX
Publishing Linked Statistical Data: Aragón, a case study
PDF
Opportunities and methodological challenges of Big Data for official statist...
PDF
Adam Bartusiak and Jörg Lässig | Semantic Processing for the Conversion of Un...
PPTX
Integrating Sensor and Social Data for Understanding City Events
PDF
Nick_Farrel_Resume
PDF
Stream processing: The Matrix Revolutions
PDF
Location analytics by Marc Planaguma at Big Data Spain 2014
PDF
Geo-statistical Exploration of Milano Datasets
PPTX
Geostatistics Portal - the multitool for statistics on maps
PDF
Startup Bootcamp - Online Payments Trends in the GCC
Resume 2017
OpenCube Workshop at eGov2015 & ePart2015 dual conference
Creating and Utilizing Linked Open Statistical Data for the Development of Ad...
Using Road Sensor Data for Official Statistics: towards a Big Data Methodology
"Dude, where's my graph?" RDF Data Cubes for Clinical Trials Data
Verso le trusted smart statistics - prospettive di sviluppo e risultati del e...
Prague Hacks 2015
Publishing Linked Statistical Data: Aragón, a case study
Opportunities and methodological challenges of Big Data for official statist...
Adam Bartusiak and Jörg Lässig | Semantic Processing for the Conversion of Un...
Integrating Sensor and Social Data for Understanding City Events
Nick_Farrel_Resume
Stream processing: The Matrix Revolutions
Location analytics by Marc Planaguma at Big Data Spain 2014
Geo-statistical Exploration of Milano Datasets
Geostatistics Portal - the multitool for statistics on maps
Startup Bootcamp - Online Payments Trends in the GCC
Ad

Similar to Enabling Data Analytics from Knowledge Graphs @ ISWC 2017 Doctoral Consortium (20)

PDF
Tracking research data footprints - slides
PDF
The web of data: how are we doing so far
PDF
Introduction to Data Science
PDF
The web of data: how are we doing so far?
PPTX
STATVIEW: a web platform for visualisation and dissemination of statistical d...
PDF
Intact danish workshop_20171001
PDF
Tag.bio: Self Service Data Mesh Platform
PDF
Official resume titash_mandal_
PPTX
Toward FAIR Semantic Resources
DOCX
Data mining projects
PDF
Open government data portals: from publishing to use and impact
PDF
Memory Management in BigData: A Perpective View
PDF
4th International Conference on Data Mining and NLP (DNLP 2023)
PPTX
Identifying semantics characteristics of user’s interactions datasets through...
PPTX
Observlets
PDF
Team 05 linked data generation
PPTX
Inspire hack 2017-linked-data
PDF
ESSnet Big Data WP8 Methodology (+ Quality, +IT)
PPTX
Building a semantic-based decision support system to optimize the energy use ...
PPTX
Setting Up a Qualitative or Mixed Methods Research Project in NVivo 10 to Cod...
Tracking research data footprints - slides
The web of data: how are we doing so far
Introduction to Data Science
The web of data: how are we doing so far?
STATVIEW: a web platform for visualisation and dissemination of statistical d...
Intact danish workshop_20171001
Tag.bio: Self Service Data Mesh Platform
Official resume titash_mandal_
Toward FAIR Semantic Resources
Data mining projects
Open government data portals: from publishing to use and impact
Memory Management in BigData: A Perpective View
4th International Conference on Data Mining and NLP (DNLP 2023)
Identifying semantics characteristics of user’s interactions datasets through...
Observlets
Team 05 linked data generation
Inspire hack 2017-linked-data
ESSnet Big Data WP8 Methodology (+ Quality, +IT)
Building a semantic-based decision support system to optimize the energy use ...
Setting Up a Qualitative or Mixed Methods Research Project in NVivo 10 to Cod...
Ad

Recently uploaded (20)

PPTX
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
PDF
Mega Projects Data Mega Projects Data
PPTX
1_Introduction to advance data techniques.pptx
PPTX
Qualitative Qantitative and Mixed Methods.pptx
PDF
[EN] Industrial Machine Downtime Prediction
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PDF
.pdf is not working space design for the following data for the following dat...
PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PPTX
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PPTX
Introduction to Knowledge Engineering Part 1
PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
PPT
ISS -ESG Data flows What is ESG and HowHow
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PDF
Clinical guidelines as a resource for EBP(1).pdf
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
Mega Projects Data Mega Projects Data
1_Introduction to advance data techniques.pptx
Qualitative Qantitative and Mixed Methods.pptx
[EN] Industrial Machine Downtime Prediction
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
.pdf is not working space design for the following data for the following dat...
oil_refinery_comprehensive_20250804084928 (1).pptx
IBA_Chapter_11_Slides_Final_Accessible.pptx
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
Galatica Smart Energy Infrastructure Startup Pitch Deck
STUDY DESIGN details- Lt Col Maksud (21).pptx
Introduction to Knowledge Engineering Part 1
Business Ppt On Nestle.pptx huunnnhhgfvu
ISS -ESG Data flows What is ESG and HowHow
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
Introduction-to-Cloud-ComputingFinal.pptx
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
Clinical guidelines as a resource for EBP(1).pdf

Enabling Data Analytics from Knowledge Graphs @ ISWC 2017 Doctoral Consortium

  • 1. Enabling Data Analytics from Knowledge Graphs Henrique Santos Universidade de Fortaleza, Fortaleza, CE, Brazil The 16th International Semantic Web Conference (ISWC 2017) – Doctoral Consortium Vienna, Austria – 22 October 2017
  • 2. 22 October 2017Henrique Santos - Enabling Data Analytics from Knowledge Graphs2 The 16th International Semantic Web Conference (ISWC 2017) – Vienna, Austria – Doctoral Consortium Problem statement ● Datasets are the most common source of scientifc data for data analysis ● Lack of metadata, not clean, can’t be directly combined or compared ● Knowledge Graphs for scientifc data are on the rise ● Many approaches, multiple uses: but data scientists are still using datasets ● Consequence: data preparation takes around 80% of the time of the whole analytical process (PATIL, 2012) ● How to maintain enough metadata related to scientifc data? ● How to exploit that knowledge to foster data analytics activities? ● How integrate data from scientifc KGs with regular data tools like R, Python or BI softwares?
  • 3. 22 October 2017Henrique Santos - Enabling Data Analytics from Knowledge Graphs3 The 16th International Semantic Web Conference (ISWC 2017) – Vienna, Austria – Doctoral Consortium Related work ● W3C’s CSV on the Web ● Scientifc ontologies ● SSN – Semantic Sensor Network ● VSTO – Virtual Solar-Terrestrial Observatory ● HAScO – Human-Aware Science Ontology ● Indicators ● GCI Ontology ● Scientifc Knowledge graphs ● Gene Ontology, Bio2RDF, The Graph of Things
  • 4. 22 October 2017Henrique Santos - Enabling Data Analytics from Knowledge Graphs4 The 16th International Semantic Web Conference (ISWC 2017) – Vienna, Austria – Doctoral Consortium Research questions & Hypothesis Q1 Can ontologies be used to successfully bridge the knowledge gap between acquired scientifc data and data users? If so, how? Q2 Will data users and applications beneft from the use of knowledge behind each scientifc data point? Q3 How to provide data access for scientifc KGs in a way that can be consumed by routine data tools while making use of the attached data knowledge to facilitate analytics? H1 The reuse of scientifc data ontologies with proper extensions and their alignments to domain ontologies can mitigate the current loss of knowledge during data acquisition H2 Providing data points together with their knowledge (e.g. provenance, contextual knowledge) to data users and applications can facilitate data analytics compared to current dataset usage. H3 A hybrid RDF serialization format that suits the needs of existing data tools but also is able to convey knowledge can be used to serialize data from KGs together with its associated metadata. H4 A query API for scientifc KGs can be used to output data together with its associated metadata in a better way than current tools for querying RDF data for data tools.
  • 5. 22 October 2017Henrique Santos - Enabling Data Analytics from Knowledge Graphs5 The 16th International Semantic Web Conference (ISWC 2017) – Vienna, Austria – Doctoral Consortium Approach Data annotation KG building KG browsing KG serialization Intelligent applications C HAScO VSTO-I HACitO prov:Activityprov:Activity hasco: Studyhasco: Study hasco: DataAcquisition hasco: DataAcquisition vstoi: Deployment vstoi: Deployment xsd:dateTime xsd:dateTime isData AcquisitionOf hasDeployment prov: startedAtTime prov: endedAtTime vstoi: Instrument vstoi: Instrument vstoi: Platform vstoi: Platform vstoi: Detector vstoi: Detector hasDetectorhasInstrument hasPlatform C ● Automatic data visualization ● Data cleansing ● Infer semantic diference between data points ● ...
  • 6. 22 October 2017Henrique Santos - Enabling Data Analytics from Knowledge Graphs6 The 16th International Semantic Web Conference (ISWC 2017) – Vienna, Austria – Doctoral Consortium Preliminary results SANTOS, H. et al. Contextual Data Collection for Smart Cities. In: Proceedings of the Sixth Workshop on Semantics for Smarter Cities. Bethlehem, PA, USA. 2015. SANTOS, H. et al. From Data to City Indicators: A Knowledge Graph for Supporting Automatic Generation of Dashboards. In: The Semantic Web - Proceedings of the 14th Extended Semantic Web Conference (ESWC 2017). Portorož, Slovenia. 2017. Data annotation KG building KG browsing KG serialization Intelligent applications
  • 7. 22 October 2017Henrique Santos - Enabling Data Analytics from Knowledge Graphs7 The 16th International Semantic Web Conference (ISWC 2017) – Vienna, Austria – Doctoral Consortium Evaluation plan KG evaluation (H1): state-of-the-art KG evaluation approaches discussed in (PAULHEIM, 2017). Metadata evaluation (H2): gathering data analytics use cases and assessing how the associated metadata facilitates the use of the data. KG querying & serialization (H3, H4): tests with data scientists and feld specialists acting as users of our proposed KG and processes. Using their data (preferably from diferent studies and sources), we intend to build a scientifc KG adding the relevant metadata and then provide them tools for querying the data and preparing datasets for their routine data analytics. Then, questionnaires will be applied to measure how much our approach has eased their tasks in contrast with their regular processes.
  • 8. 22 October 2017Henrique Santos - Enabling Data Analytics from Knowledge Graphs8 The 16th International Semantic Web Conference (ISWC 2017) – Vienna, Austria – Doctoral Consortium Relevancy ● We expect this research to bring straight benefts to data scientists and feld specialists, by providing specifcations and tools that we claim will ease their data preparation tasks ● KG serialization technique will promote interoperability between scientifc data in KGs and existing non-semantic data tools which we believe will broaden the use of KGs to even more knowledge areas
  • 9. 22 October 2017Henrique Santos - Enabling Data Analytics from Knowledge Graphs9 The 16th International Semantic Web Conference (ISWC 2017) – Vienna, Austria – Doctoral Consortium Refections ● Promoting data analytics from scientifc data in KGs is still in its early stages ● Difcult to query the needed data ● Lack of methods and tools to easily cope data tools with data from KGs ● Knowledge exploitation to foster data analysis is minimal ● Our contributions ● KG specifcation aligned with data analytics requirements ● Data fle format able to convey both data and metadata ● Method for data access and retrieval in scientifc KGs based on user queries ● Our resources ● Indicator and domain ontologies for developed use-cases ● Implementations of the proposed method for data access
  • 10. 22 October 2017Henrique Santos - Enabling Data Analytics from Knowledge Graphs10 The 16th International Semantic Web Conference (ISWC 2017) – Vienna, Austria – Doctoral Consortium hos@edu.unifor.br @hansidm http://henriquesantos.org Enabling Data Analytics from Knowledge Graphs Henrique Santos Thank you for your attention Advisor: Prof. João José Vasco Peixoto Furtado, Docteur