SlideShare a Scribd company logo
Mustafa Jarrar
Lecture Notes, Web Data Management (MCOM7348)
University of Birzeit, Palestine
1st Semester, 2013

Introduction to Data Integration

Dr. Mustafa Jarrar
University of Birzeit
mjarrar@birzeit.edu
www.jarrar.info
Jarrar © 2013

1
Watch this lecture and download the slides from
http://jarrar-courses.blogspot.com/2013/11/web-data-management.html

Jarrar © 2013

2
Example from the government Domain
Consider all interactions with government agencies in order
to register a new business in Palestine.
Example: Establishing a new Radio Station.

Ministry of
Telecom

Ministry of
Information

Ministry of
National Economy

Jarrar © 2013

Ministry of
Finance

Chamber of
Commerce

3
Example from the government Domain
Consider when the business evolves or changes.
Example: Changing the address of the radio station.
–  Address must be changed in 5 different databases.

Ministry of
Telecom

Ministry of
Information

Ministry of
National Economy

Jarrar © 2013

Ministry of
Finance

Chamber of
Commerce

4
Example from the government Domain
Consider the data registered about the same radio station in
the databases of different ministries and governmental
agencies:

ID

Agency 3

R2563I

Radio Al-Amal

Radio Station Ramallah

Business Name

Activity Type

Province

LM1847

Al-Amal
Broadcast

Radio
Broadcasting

Ramallah
and Bireh

ID

Agency 2

Type

B_ID

Agency 1

Name

Company Name

Company Type

Location

182NS3

Broadcast AlAmal

Broadcasting
Station

Al-Balu’

...

Jarrar © 2013

City

5
Example from the government Domain
From our simple example one can point out to some
challenges in Data Integration:
–  No agreed upon naming (name, business name, company name)
–  No agreed upon meaning (Does ’Activity Type’ mean exactly the
same as ‘Company Type’?)
–  Different Registered Data: Radio Al-Amal, Al-Amal Broadcast, ….
ID

Agency 3

R2563I

Radio Al-Amal

Radio Station Ramallah

Business Name

Activity Type

Province

LM1847

Al-Amal
Broadcast

Radio
Broadcasting

Ramallah
and Bireh

ID

Agency 2

Type

B_ID

Agency 1

Name

Company Name

Company Type

Location

182NS3

Broadcast AlAmal

Broadcasting
Station

Al-Balu’

...

Jarrar © 2013

City

6
Problem is in all domains

Jarrar © 2013

7
Problem is in all domains
Problem is now even more challenging with the Web.
The Data Web envisions the web as a global world-wide
database.
This means that one can query distributed multiple databases
on the web as if he/she is querying a local database.

Jarrar © 2013

8
Challenges of Data Integration:
Heterogeneities in Database Schemas
One can distinguish between several heterogeneities
between different schemas:
–  Name Heterogeneities (difference in used vocabulary).
–  Meaning Heterogeneities (different meaning for the same attribute
in two schemas).
–  Heterogeneities in the structure and type.
–  Heterogeneities in the rules and constraints.
–  Data Model Heterogeneities.

Jarrar © 2013

9
Name and Meaning Heterogeneities
Synonyms – Different names for the same concepts
–  employee, clerk
–  exam, course
–  code, num

Homonyms – Same name for different concepts (different
meanings)
- City as City of birth in one schema,
- City as City of Residence in another schema
Saraly: Net Salary

Section

Salary: Gross Salary

Division

Homonyms

A specialized
division of a
large
organization

Synonyms
Jarrar © 2013

10
Heterogeneities in Structure and Type
Source: Carlo Batini

The same concepts are represented with
different conceptual structures in two schemas:
–  Attribute in one schema and derived value in another schema.
–  Attribute in one schema and entity in another schema.
–  Entity in one schema and relationship in another schema.
–  Different abstraction levels for the same concept in two schemas:
e.g. two entities with homonym names related by an IS-A hierarchy
in two schemas.

Jarrar © 2013

11
Heterogeneities in Structure
Source: Carlo Batini

EXAMPLES:
EMPLOYEE

Person
MAN

Person

GENDER

EMPLOYEE

DEPARTMENT

PROJECT

WOMAN
PROJECT

BOOK

BOOK

PUBLISHER

PUBLISHER

Jarrar © 2013

12
Heterogeneities in Type
Examples:
§  In a single attribute (e.g., Numberic, Alphanumeric).
E.g., the attribute “gender”:
–  Male/Female
–  M/F
–  0/1
§  Year has a four digit domain in one schema and two digit domain
in another schema

§  Different currencies (Euros, US Dollars, etc.)
§  Different measure systems (kilos vs. pounds,
centigrade vs. Fahrenheit.)
§  Different granularities (grams, kilos, etc.)
Jarrar © 2013

13
Heterogeneities in the rules and constraints
Source: Carlo Batini

EXAMPLES:
–  Different cardinalities in the same relationships
–  Key conflicts

Jarrar © 2013

14
Model Heterogeneities
Model Heterogeneities occurs when different databases adheres to
different data models:
–  Relational Data Model, XML, RDF, Object-Oriented, OWL, ...

Solution: Reduce Model Heterogeneity by using one data model.
Example: Convert the Relational Model to RDF graph model.

Jarrar © 2013

15
References and Acknowledgement
•  Carlo Batini: Course on Data Integration. BZU IT Summer School
2011.
•  Stefano Spaccapietra: Information Integration. Presentation at the IFIP
Academy. Porto Alegre. 2005.
•  Chris Bizer: The Emerging Web of Linked Data. Presentation at SRI
International, Artificial Intelligence Center. Menlo Park, USA. 2009.

Thanks to Anton Deik for helping me preparing this lecture

Jarrar © 2013

16

More Related Content

PPTX
XML - Data Modeling
DOCX
The three level of data modeling
PPT
OODM-object oriented data model
PDF
Xml databases
PDF
RDBMS with MySQL
PDF
Catalog-based Conversion from Relational Database into XML Schema (XSD)
PPTX
parth presentation
PPTX
physical and logical data independence
XML - Data Modeling
The three level of data modeling
OODM-object oriented data model
Xml databases
RDBMS with MySQL
Catalog-based Conversion from Relational Database into XML Schema (XSD)
parth presentation
physical and logical data independence

Viewers also liked (20)

PPT
Chapter12 designing databases
PDF
Jarrar: Zinnar
PDF
Jarrar: Data Integration and Fusion using RDF
PPTX
Jarrar: Knowledge Engineering- Course Outline
PDF
Jarrar: Linked Data
PDF
Jarrar: Web 2 Data Mashups
PDF
Jarrar: Architectural Solutions in Data Integration
PPTX
Jarrar: Subtype Relations and Constraints
PPTX
Jarrar: SPARQL - RDF Query Language
PPTX
Jarrar: Sparql Project
PDF
Jarrar: RDF Stores: Challenges and Solutions
PPTX
Jarrar: Data Fusion using RDF
PPTX
Jarrar: RDFs -RDF Schema
PDF
Jarrar: The Next Generation of the Web 3.0: The Semantic Web Vesion
PDF
Jarrar: RDFa
PPTX
Jarrar: OWL -Web Ontology Language
PPTX
Jarrar: RDF Stores -Challenges and Solutions
PDF
Jarrar: OWL (Web Ontology Language)
PPTX
Jarrar: The Next Generation of the Web 3.0: The Semantic Web
PPTX
Jarrar: Conceptual Schema Design Steps
Chapter12 designing databases
Jarrar: Zinnar
Jarrar: Data Integration and Fusion using RDF
Jarrar: Knowledge Engineering- Course Outline
Jarrar: Linked Data
Jarrar: Web 2 Data Mashups
Jarrar: Architectural Solutions in Data Integration
Jarrar: Subtype Relations and Constraints
Jarrar: SPARQL - RDF Query Language
Jarrar: Sparql Project
Jarrar: RDF Stores: Challenges and Solutions
Jarrar: Data Fusion using RDF
Jarrar: RDFs -RDF Schema
Jarrar: The Next Generation of the Web 3.0: The Semantic Web Vesion
Jarrar: RDFa
Jarrar: OWL -Web Ontology Language
Jarrar: RDF Stores -Challenges and Solutions
Jarrar: OWL (Web Ontology Language)
Jarrar: The Next Generation of the Web 3.0: The Semantic Web
Jarrar: Conceptual Schema Design Steps
Ad

Similar to Jarrar: Introduction to Data Integration (7)

PPTX
Jarrar: Introduction to data Integration
PDF
Jarrar: Data Schema Integration
PPTX
Jarrar: Data Schema Integration
PDF
The International Journal of Engineering and Science (The IJES)
PPTX
Jarrar: Logical Foundation of Ontology Engineering
PPT
Legal Technology - State Bar of CA - Solo/Small Firm Summit
PPTX
Jarrar: Introduction to Ontology
Jarrar: Introduction to data Integration
Jarrar: Data Schema Integration
Jarrar: Data Schema Integration
The International Journal of Engineering and Science (The IJES)
Jarrar: Logical Foundation of Ontology Engineering
Legal Technology - State Bar of CA - Solo/Small Firm Summit
Jarrar: Introduction to Ontology
Ad

Recently uploaded (20)

PDF
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
PPTX
Cell Types and Its function , kingdom of life
PDF
A GUIDE TO GENETICS FOR UNDERGRADUATE MEDICAL STUDENTS
PDF
Classroom Observation Tools for Teachers
PDF
Microbial disease of the cardiovascular and lymphatic systems
PDF
VCE English Exam - Section C Student Revision Booklet
PPTX
master seminar digital applications in india
PDF
2.FourierTransform-ShortQuestionswithAnswers.pdf
PDF
Computing-Curriculum for Schools in Ghana
PPTX
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PPTX
Pharma ospi slides which help in ospi learning
PDF
STATICS OF THE RIGID BODIES Hibbelers.pdf
PDF
Trump Administration's workforce development strategy
PPTX
202450812 BayCHI UCSC-SV 20250812 v17.pptx
PDF
RTP_AR_KS1_Tutor's Guide_English [FOR REPRODUCTION].pdf
PDF
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
PPTX
Orientation - ARALprogram of Deped to the Parents.pptx
PPTX
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
PPTX
Lesson notes of climatology university.
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
Cell Types and Its function , kingdom of life
A GUIDE TO GENETICS FOR UNDERGRADUATE MEDICAL STUDENTS
Classroom Observation Tools for Teachers
Microbial disease of the cardiovascular and lymphatic systems
VCE English Exam - Section C Student Revision Booklet
master seminar digital applications in india
2.FourierTransform-ShortQuestionswithAnswers.pdf
Computing-Curriculum for Schools in Ghana
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
Final Presentation General Medicine 03-08-2024.pptx
Pharma ospi slides which help in ospi learning
STATICS OF THE RIGID BODIES Hibbelers.pdf
Trump Administration's workforce development strategy
202450812 BayCHI UCSC-SV 20250812 v17.pptx
RTP_AR_KS1_Tutor's Guide_English [FOR REPRODUCTION].pdf
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
Orientation - ARALprogram of Deped to the Parents.pptx
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
Lesson notes of climatology university.

Jarrar: Introduction to Data Integration

  • 1. Mustafa Jarrar Lecture Notes, Web Data Management (MCOM7348) University of Birzeit, Palestine 1st Semester, 2013 Introduction to Data Integration Dr. Mustafa Jarrar University of Birzeit mjarrar@birzeit.edu www.jarrar.info Jarrar © 2013 1
  • 2. Watch this lecture and download the slides from http://jarrar-courses.blogspot.com/2013/11/web-data-management.html Jarrar © 2013 2
  • 3. Example from the government Domain Consider all interactions with government agencies in order to register a new business in Palestine. Example: Establishing a new Radio Station. Ministry of Telecom Ministry of Information Ministry of National Economy Jarrar © 2013 Ministry of Finance Chamber of Commerce 3
  • 4. Example from the government Domain Consider when the business evolves or changes. Example: Changing the address of the radio station. –  Address must be changed in 5 different databases. Ministry of Telecom Ministry of Information Ministry of National Economy Jarrar © 2013 Ministry of Finance Chamber of Commerce 4
  • 5. Example from the government Domain Consider the data registered about the same radio station in the databases of different ministries and governmental agencies: ID Agency 3 R2563I Radio Al-Amal Radio Station Ramallah Business Name Activity Type Province LM1847 Al-Amal Broadcast Radio Broadcasting Ramallah and Bireh ID Agency 2 Type B_ID Agency 1 Name Company Name Company Type Location 182NS3 Broadcast AlAmal Broadcasting Station Al-Balu’ ... Jarrar © 2013 City 5
  • 6. Example from the government Domain From our simple example one can point out to some challenges in Data Integration: –  No agreed upon naming (name, business name, company name) –  No agreed upon meaning (Does ’Activity Type’ mean exactly the same as ‘Company Type’?) –  Different Registered Data: Radio Al-Amal, Al-Amal Broadcast, …. ID Agency 3 R2563I Radio Al-Amal Radio Station Ramallah Business Name Activity Type Province LM1847 Al-Amal Broadcast Radio Broadcasting Ramallah and Bireh ID Agency 2 Type B_ID Agency 1 Name Company Name Company Type Location 182NS3 Broadcast AlAmal Broadcasting Station Al-Balu’ ... Jarrar © 2013 City 6
  • 7. Problem is in all domains Jarrar © 2013 7
  • 8. Problem is in all domains Problem is now even more challenging with the Web. The Data Web envisions the web as a global world-wide database. This means that one can query distributed multiple databases on the web as if he/she is querying a local database. Jarrar © 2013 8
  • 9. Challenges of Data Integration: Heterogeneities in Database Schemas One can distinguish between several heterogeneities between different schemas: –  Name Heterogeneities (difference in used vocabulary). –  Meaning Heterogeneities (different meaning for the same attribute in two schemas). –  Heterogeneities in the structure and type. –  Heterogeneities in the rules and constraints. –  Data Model Heterogeneities. Jarrar © 2013 9
  • 10. Name and Meaning Heterogeneities Synonyms – Different names for the same concepts –  employee, clerk –  exam, course –  code, num Homonyms – Same name for different concepts (different meanings) - City as City of birth in one schema, - City as City of Residence in another schema Saraly: Net Salary Section Salary: Gross Salary Division Homonyms A specialized division of a large organization Synonyms Jarrar © 2013 10
  • 11. Heterogeneities in Structure and Type Source: Carlo Batini The same concepts are represented with different conceptual structures in two schemas: –  Attribute in one schema and derived value in another schema. –  Attribute in one schema and entity in another schema. –  Entity in one schema and relationship in another schema. –  Different abstraction levels for the same concept in two schemas: e.g. two entities with homonym names related by an IS-A hierarchy in two schemas. Jarrar © 2013 11
  • 12. Heterogeneities in Structure Source: Carlo Batini EXAMPLES: EMPLOYEE Person MAN Person GENDER EMPLOYEE DEPARTMENT PROJECT WOMAN PROJECT BOOK BOOK PUBLISHER PUBLISHER Jarrar © 2013 12
  • 13. Heterogeneities in Type Examples: §  In a single attribute (e.g., Numberic, Alphanumeric). E.g., the attribute “gender”: –  Male/Female –  M/F –  0/1 §  Year has a four digit domain in one schema and two digit domain in another schema §  Different currencies (Euros, US Dollars, etc.) §  Different measure systems (kilos vs. pounds, centigrade vs. Fahrenheit.) §  Different granularities (grams, kilos, etc.) Jarrar © 2013 13
  • 14. Heterogeneities in the rules and constraints Source: Carlo Batini EXAMPLES: –  Different cardinalities in the same relationships –  Key conflicts Jarrar © 2013 14
  • 15. Model Heterogeneities Model Heterogeneities occurs when different databases adheres to different data models: –  Relational Data Model, XML, RDF, Object-Oriented, OWL, ... Solution: Reduce Model Heterogeneity by using one data model. Example: Convert the Relational Model to RDF graph model. Jarrar © 2013 15
  • 16. References and Acknowledgement •  Carlo Batini: Course on Data Integration. BZU IT Summer School 2011. •  Stefano Spaccapietra: Information Integration. Presentation at the IFIP Academy. Porto Alegre. 2005. •  Chris Bizer: The Emerging Web of Linked Data. Presentation at SRI International, Artificial Intelligence Center. Menlo Park, USA. 2009. Thanks to Anton Deik for helping me preparing this lecture Jarrar © 2013 16