SlideShare a Scribd company logo
Integrated Ontology for Sports
(Domains: Cricket, Football and Tennis.)
Database Interoperability Project
Abhishek Agrawal, George Sam, Hari Haran Venugopal, Noopur Joshi
• Problem Statement and Motivation
• Scope of the Project
• Our Approach
• Data sources – Scraper
• Data Cleaning – Google refining, Karma
• Ontology Creation – Using existing ontology to create Federated
• Data Modeling – Karma Tool
• Data Publishing – RDF and Triple Store Creation.
• Data Extraction – Using OpenRDF for SPARQL Query
• Future Work and Challenges
• Conclusion
Outline:
2
Problem Statement and Motivation
3
Why do we need Ontologies?
- Need for constant, intelligent access to up-to-date, integrated and detailed information from the Web
- Helps to aggregate data from various sources
Why Federated Sports Ontology?
- Helps to represent different sports and presents a common view
- Is easily extendible
- Intelligent information gathering
- Scores: Who's winning, and how did the score change?
- Schedules: Who's playing who, when, and where?
- Standings: Who's in first place? Who's closest to qualifying ?
- Data Analysis
- Statistics: How do the players and/or teams measure up against one another in various
categories?
- News: How do we combine editorial coverage of sports with all data feeds??
Tennis
- Players
- Tournaments
Cricket
- Players
- Matches
- Rankings
Football
- Players
- Leagues
Scope of the Project
4
Data Extraction
Data Cleaning
Ontology Creation
Date Modeling
Querying using
SPARQL
Our Approach
5
Web Scraping: (web harvesting or web data extraction) is a computer software technique of
extracting information from websites.
Data Source: Scraper
 Scraping tools:
• Beautiful Soap – Simple methods, Unicode
support and consists of parsers like lxml and html5lib.
• Jsoup – Java HTML Parser, WHATWG
HTML5 specification, and parses HTML to the same
DOM as modern browsers do.
• Chrome Web scrapper – Using this extension
you can create a plan (sitemap) how a web site
should be traversed and what should be extracted.
Using these sitemaps the Web Scraper will navigate
the site accordingly and extract all data.
6
Data Cleaning
Data cleansing, data cleaning or data scrubbing: is the process of detecting and
correcting (or removing) corrupt or inaccurate records from a record set, table, or database.
 Data Cleaning tools:
• Karma Tool – Karma offers a programming-
by-example interface to enable users to define data
transformation scripts that transform data expressed
in multiple data formats into a common format.
• Google Refine – a power tool for working
with messy data, cleaning it up, transforming it from
one format into another.
7
Ontology : Class Hierarchy
8
FederatedOntology
9
Data Modeling
Tool Used: KARMA (USC ISI)
• Browser based Data Integration/ Data Modeling tool
• Advantage – Data Integration and Publishing is easy
• Steps:
1. Load Ontologies and data sets
2. Primitive Data Filtering
3. Setting semantic types for attributes
4. Building semantics for sports individually
• Karma intelligently creates semantic mappings for higher concepts.
• Create URL for entities.
10
Screenshot
11
Data Publishing
• Available frameworks : OpenRDF, Protégé, ApacheJena.
• OpenRDF :
Browser based framework
Integrated with KARMA
Publish each Data set
1. JSON
2. R2RML Model
3. RDF
Create Triple Store for RDF
Load RDF into OpenRDF Triplestore
12
13
Data Extraction
SPARQL
• Language used to extract information from RDF
• Query Based
SELECT *
WHERE {
?Subject ?Predicate ?Object
}
14
Future Work
1. Inclusion of other sports
2. Creating a web/ mobile based interface to query data
3. Creating an application for university level players and teams
4. Providing more specific information like :
• Details about a particular team from the year 1990 – 2014
• Images of the players/teams
• Details of all the matches played between two players/ teams
15
References
• http://www.isi.edu/integration/karma/
• http://phd.jabenitez.com/wp-content/uploads/2014/03/A-
Practical-Guide-To-Building-OWL-Ontologies-Using-Protege-4.pdf
• http://ict.siit.tu.ac.th/~sun/SW/Protege%20Tutorial.pdf
• http://www.crummy.com/software/BeautifulSoup/
• https://chrome.google.com/webstore/detail/web-
scraper/jnhgnonknehpejjnehehllkliplmbmhn?hl=en
• https://code.google.com/p/google-refine/
• http://www.datacleansing.net.au/Data_Cleansing_Services
16

More Related Content

PDF
Federated Ontology for Sports- Paper
PPTX
COVID - 19 DATA ANALYSIS USING PYTHON and Introduction to Data Science
PPTX
Digital data
PPT
PDF
Modern Database Systems - Lecture 00
PPTX
Mis chapter5
PPTX
Database Project
PDF
Week10 Presentation
Federated Ontology for Sports- Paper
COVID - 19 DATA ANALYSIS USING PYTHON and Introduction to Data Science
Digital data
Modern Database Systems - Lecture 00
Mis chapter5
Database Project
Week10 Presentation

What's hot (16)

PPTX
How to build a data dictionary
PPT
Ch 8 introduction to data structures
PDF
Mis chapter5
DOCX
Key aspects of big data storage and its architecture
PPTX
Love Your Data Locally
PPSX
PPT
Electronic Databases
PPT
Database administration
PPTX
Data Dictionary
PPTX
Introduction to data cleaning with spreadsheets
PDF
Data mining
PPTX
data science chapter-4,5,6
PPT
Database
PDF
Advanced Database System
PDF
Data mining and data warehouse lab manual updated
How to build a data dictionary
Ch 8 introduction to data structures
Mis chapter5
Key aspects of big data storage and its architecture
Love Your Data Locally
Electronic Databases
Database administration
Data Dictionary
Introduction to data cleaning with spreadsheets
Data mining
data science chapter-4,5,6
Database
Advanced Database System
Data mining and data warehouse lab manual updated
Ad

Similar to Federated Ontology Based Query System (20)

PPTX
Advanced Use Cases for Analytics Breakout Session
PPTX
Profiler for Smartphone Users Interests Using Modified Hierarchical Agglomera...
PPTX
unit 1 big data.pptx
PDF
Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...
PPTX
Data council sf amundsen presentation
PPTX
Introduction to data science
PPTX
Strata sf - Amundsen presentation
PDF
data-science-roadmap Mục tiêu hướng tới Data Science
PDF
This is ChatGPT Book Data Science Roadmap.pdf
PPT
Large scale computing
PDF
SDSC18 and DSATL Meetup March 2018
PDF
From Foundation to Mastery – Building a Mature Analytics Roadmap - Manav Misra
PDF
Practical Large Scale Experiences with Spark 2.0 Machine Learning: Spark Summ...
PDF
Disrupting Data Discovery
PDF
Meetup SF - Amundsen
PDF
Entity-Centric Data Management
PPTX
Structured data and metadata evaluation methodology for organizations looking...
PDF
A Space X Industry Day Briefing 7 Jul08 Jgm R4
PPT
Introduction to Data Mining
PPTX
Semtech bizsemanticsearchtutorial
Advanced Use Cases for Analytics Breakout Session
Profiler for Smartphone Users Interests Using Modified Hierarchical Agglomera...
unit 1 big data.pptx
Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...
Data council sf amundsen presentation
Introduction to data science
Strata sf - Amundsen presentation
data-science-roadmap Mục tiêu hướng tới Data Science
This is ChatGPT Book Data Science Roadmap.pdf
Large scale computing
SDSC18 and DSATL Meetup March 2018
From Foundation to Mastery – Building a Mature Analytics Roadmap - Manav Misra
Practical Large Scale Experiences with Spark 2.0 Machine Learning: Spark Summ...
Disrupting Data Discovery
Meetup SF - Amundsen
Entity-Centric Data Management
Structured data and metadata evaluation methodology for organizations looking...
A Space X Industry Day Briefing 7 Jul08 Jgm R4
Introduction to Data Mining
Semtech bizsemanticsearchtutorial
Ad

Recently uploaded (20)

PDF
PREDICTION OF DIABETES FROM ELECTRONIC HEALTH RECORDS
PDF
Operating System & Kernel Study Guide-1 - converted.pdf
DOCX
573137875-Attendance-Management-System-original
PDF
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
PDF
Model Code of Practice - Construction Work - 21102022 .pdf
PDF
BIO-INSPIRED HORMONAL MODULATION AND ADAPTIVE ORCHESTRATION IN S-AI-GPT
PDF
Well-logging-methods_new................
PPTX
Sustainable Sites - Green Building Construction
PPTX
Geodesy 1.pptx...............................................
PPTX
Internet of Things (IOT) - A guide to understanding
PDF
R24 SURVEYING LAB MANUAL for civil enggi
PPTX
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
PPTX
Fundamentals of safety and accident prevention -final (1).pptx
PDF
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
PDF
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
PPTX
Current and future trends in Computer Vision.pptx
PPTX
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
PPTX
UNIT-1 - COAL BASED THERMAL POWER PLANTS
PDF
737-MAX_SRG.pdf student reference guides
PDF
Human-AI Collaboration: Balancing Agentic AI and Autonomy in Hybrid Systems
PREDICTION OF DIABETES FROM ELECTRONIC HEALTH RECORDS
Operating System & Kernel Study Guide-1 - converted.pdf
573137875-Attendance-Management-System-original
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
Model Code of Practice - Construction Work - 21102022 .pdf
BIO-INSPIRED HORMONAL MODULATION AND ADAPTIVE ORCHESTRATION IN S-AI-GPT
Well-logging-methods_new................
Sustainable Sites - Green Building Construction
Geodesy 1.pptx...............................................
Internet of Things (IOT) - A guide to understanding
R24 SURVEYING LAB MANUAL for civil enggi
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
Fundamentals of safety and accident prevention -final (1).pptx
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
Current and future trends in Computer Vision.pptx
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
UNIT-1 - COAL BASED THERMAL POWER PLANTS
737-MAX_SRG.pdf student reference guides
Human-AI Collaboration: Balancing Agentic AI and Autonomy in Hybrid Systems

Federated Ontology Based Query System

  • 1. Integrated Ontology for Sports (Domains: Cricket, Football and Tennis.) Database Interoperability Project Abhishek Agrawal, George Sam, Hari Haran Venugopal, Noopur Joshi
  • 2. • Problem Statement and Motivation • Scope of the Project • Our Approach • Data sources – Scraper • Data Cleaning – Google refining, Karma • Ontology Creation – Using existing ontology to create Federated • Data Modeling – Karma Tool • Data Publishing – RDF and Triple Store Creation. • Data Extraction – Using OpenRDF for SPARQL Query • Future Work and Challenges • Conclusion Outline: 2
  • 3. Problem Statement and Motivation 3 Why do we need Ontologies? - Need for constant, intelligent access to up-to-date, integrated and detailed information from the Web - Helps to aggregate data from various sources Why Federated Sports Ontology? - Helps to represent different sports and presents a common view - Is easily extendible - Intelligent information gathering - Scores: Who's winning, and how did the score change? - Schedules: Who's playing who, when, and where? - Standings: Who's in first place? Who's closest to qualifying ? - Data Analysis - Statistics: How do the players and/or teams measure up against one another in various categories? - News: How do we combine editorial coverage of sports with all data feeds??
  • 4. Tennis - Players - Tournaments Cricket - Players - Matches - Rankings Football - Players - Leagues Scope of the Project 4
  • 5. Data Extraction Data Cleaning Ontology Creation Date Modeling Querying using SPARQL Our Approach 5
  • 6. Web Scraping: (web harvesting or web data extraction) is a computer software technique of extracting information from websites. Data Source: Scraper  Scraping tools: • Beautiful Soap – Simple methods, Unicode support and consists of parsers like lxml and html5lib. • Jsoup – Java HTML Parser, WHATWG HTML5 specification, and parses HTML to the same DOM as modern browsers do. • Chrome Web scrapper – Using this extension you can create a plan (sitemap) how a web site should be traversed and what should be extracted. Using these sitemaps the Web Scraper will navigate the site accordingly and extract all data. 6
  • 7. Data Cleaning Data cleansing, data cleaning or data scrubbing: is the process of detecting and correcting (or removing) corrupt or inaccurate records from a record set, table, or database.  Data Cleaning tools: • Karma Tool – Karma offers a programming- by-example interface to enable users to define data transformation scripts that transform data expressed in multiple data formats into a common format. • Google Refine – a power tool for working with messy data, cleaning it up, transforming it from one format into another. 7
  • 8. Ontology : Class Hierarchy 8
  • 10. Data Modeling Tool Used: KARMA (USC ISI) • Browser based Data Integration/ Data Modeling tool • Advantage – Data Integration and Publishing is easy • Steps: 1. Load Ontologies and data sets 2. Primitive Data Filtering 3. Setting semantic types for attributes 4. Building semantics for sports individually • Karma intelligently creates semantic mappings for higher concepts. • Create URL for entities. 10
  • 12. Data Publishing • Available frameworks : OpenRDF, Protégé, ApacheJena. • OpenRDF : Browser based framework Integrated with KARMA Publish each Data set 1. JSON 2. R2RML Model 3. RDF Create Triple Store for RDF Load RDF into OpenRDF Triplestore 12
  • 13. 13
  • 14. Data Extraction SPARQL • Language used to extract information from RDF • Query Based SELECT * WHERE { ?Subject ?Predicate ?Object } 14
  • 15. Future Work 1. Inclusion of other sports 2. Creating a web/ mobile based interface to query data 3. Creating an application for university level players and teams 4. Providing more specific information like : • Details about a particular team from the year 1990 – 2014 • Images of the players/teams • Details of all the matches played between two players/ teams 15
  • 16. References • http://www.isi.edu/integration/karma/ • http://phd.jabenitez.com/wp-content/uploads/2014/03/A- Practical-Guide-To-Building-OWL-Ontologies-Using-Protege-4.pdf • http://ict.siit.tu.ac.th/~sun/SW/Protege%20Tutorial.pdf • http://www.crummy.com/software/BeautifulSoup/ • https://chrome.google.com/webstore/detail/web- scraper/jnhgnonknehpejjnehehllkliplmbmhn?hl=en • https://code.google.com/p/google-refine/ • http://www.datacleansing.net.au/Data_Cleansing_Services 16