SlideShare a Scribd company logo
Jarrar © 2013 1
Dr. Mustafa Jarrar
University of Birzeit
mjarrar@birzeit.edu
www.jarrar.info
Lecture Notes on Data Schema Integration
Birzeit University, Palestine
2013
Data Schema Integration
Jarrar © 2013 2
Watch this lecture and download the slides from
http://jarrar-courses.blogspot.com/2013/11/web-data-management.html
3Jarrar © 2013
Employee Region
Organization
City
bornIn/ locatedIn/
/WorksIn
locatedIn/
Employee
Organization
/WorksIn
Worker RegionCity
bornIn/ locatedIn/
Organization
Municipality
locatedIn
/
Schema 1
Schema 2
Schema 3
Data Schema Integration: A simple example
In ORM:
Integrated
schema
4Jarrar © 2013
Data Schema Integration: A simple example
Source: Carlo Batini
Employee
Organization
Empoloyee City Region Municipality
Organization in
works
Schema 1
Schema 2
Schema 3
Employee City Region
Organization
works
in
in
Integrated
schema
In ER:
born
born in
5Jarrar © 2013
Challenges of Data Schema Integration
Schema Integration has two major challenges:
1. Identification of all portions of schemas that pertain to the
same concept, in such a way to unify such different
representations in the global schema.
2. Identification, analysis and resolution of the different
types of conflicts (heterogeneities) in different schemas.
Source: Carlo Batini
6Jarrar © 2013
Framework for Schema Integration
Schemas
Transformation
Schemas
Matching
Schemas
Integration
Local
Schemas
Integrated Schema
and mappings
Source: Advances in Object-Oriented Data Modeling, M. P. Papazoglou, S. Spaccapietra, Z. Tari (Eds.), The MIT Press, 2000
Integration
Rules
Transformation
Rules
Matching
Rules
7Jarrar © 2013
0. Define the integration strategy
If the number of local schemas to be integrated is large, the order of
schema integration becomes important. Several strategies can be
adopted.
Input: n source schemas
Output: n source schemas + integration strategies
Method used: heuristics
Framework for Schema Integration
One shot strategy
S1 S2 S3
IS
Pair at a time strategy
S1 S2 S3
IS1
IS2
S4
IS
Balanced Strategy
S1 S2 S3
IS1 IS2
S4
IS
- Efficient integration
process
- Many correspondences
between concepts have to
be considered together.
- Priority to most relevant
and stable schemas.
- The integration process is
more efficient
-e.g.: Production, Marketing,
Sales.
-To be preferred when the
cohesion among schemas is high.
…
8Jarrar © 2013
Framework for Schema Integration
1. Schema transformation (or Pre-integration)
Input: n source schemas
Output: n source schemas homogeneized
Methods used: Model and Design Homogeneization
Reduce model heterogeneities as much as possible to make
the sources more suitable for integration.
Goal: use a single, common data model and format.
Source DBs Homogeneized DBs DW
Transformation Integration
Source: Stefano Spaccapietra
9Jarrar © 2013
Schema Transformation
Schema Transformation involves:
• Data model homogeneization
– Where all data sources are described using the same data model.
• Design homogeneization
– Enforce standard design rules to reduce the number of structural
conflicts (e.g., Normalization: one fact in one place)
• Reverse Engineering
– Reverse engineer the schema from existing data (such as COBOL
files, spreadsheets, legacy relational databases, legacy object-
oriented databases).
10Jarrar © 2013
Example of Design homogeneization (Normalization)
ONE TABLE:
R1 (#Student, Name, LastName, #Course, CourseName,
Grade, Date)
Dependencies:
– #Student  Name, LastName
– #Course  CourseName
– #Student #Course  Grade, Date)
Normalized Into 3 Tables: One Fact In One Place:
R11 (#Student, Name, LastName)
R12 (#Course, CourseName)
R13 (#Student, #Course, Grade, Date)
11Jarrar © 2013
Example of Reverse Engineering
Source: Stefano Spaccapietra
12Jarrar © 2013
Schema Matching
2. Schema matching (Correspondences investigation)
Input: n source schemas
Output: n source schemas + correspondences
Method used: techniques to discover correspondences
Correspondences relate (schema) elements which describe
the same phenomena of the real world.
– This step aims at finding and describing all semantic links between
elements of the input schemas and the corresponding data.
– By doing so, one matches between the schemas to be integrated.
– This step fixes the conflicts found in the schema.
13Jarrar © 2013
Semantics of Correspondences
Correspondences relate (schema) elements which describe
the same phenomena of the real world.
Source: Stefano Spaccapietra
14Jarrar © 2013
Asserting Correspondences
Finding matching correspondences is done through the use of a
rich language for expressing correspondences (matchings).
Example:
S1.Person  S2.Person,
With Corresponding Identifiers: Pin,
With Corresponding Property: name
Source: Stefano Spaccapietra
15Jarrar © 2013
Automated Matching
• Fully automated matching is impossible, as a computer process can
hardly make ultimate decisions about the semantics of data.
• But even partial assistance in discovering of correspondences (to be
confirmed or guided by humans) is beneficial, due to the complexity of
the task.
• All proposed methods rely on some similarity measures that try to
evaluate the semantic distance between two descriptions.
Some state of the art matching systems
Cupid (Microsoft Research, USA)
FOAM/QOM (University of Karlsruhe, Germany)
OLA (INRIA Rhône-Alpes, France / University of Montreal, Canada)
S-Match (University of Trento, Italy)
… many others
16Jarrar © 2013
Examples of Correspondences
Source: Stefano Spaccapietra
17Jarrar © 2013
Examples of Correspondences
Employee
Organization
/WorksIn
Worker RegionCity
bornIn/ locatedIn/
Organization
Municipality
locatedIn/
Schema 1
Schema 2
Schema 3
Previous example
18Jarrar © 2013
Examples of Correspondences
Source: Stefano Spaccapietra
19Jarrar © 2013
Schema Integration & Mapping Generation
3. Schemas integration and mapping generation
Input: n source schemas + correspondences
Output: integrated schema + mapping rules btw the integrated
schema and input source schemas
Method used: New classification of conflicts + Conflict resolution
transformations
GOAL: Creating an Integrated Schema ( IS ) and the mappings to the
local databases.
Source: Carlo Batini
20Jarrar © 2013
GAV and LAV Integration
Research has identified two methods to set up mappings between the
integrated schema and the input schemas:
(1) GAV (Global As View): proposes to define the integrated schema
as a view over input schemas.
GAV is usually considered simpler and more efficient for processing
queries on the integrated database, but is weaker in supporting
evolution of the global system through addition of new sources.
(2) LAV (Local As View): proposes to define the local schemas as
views over the integrated schema.
LAV generates issues of incomplete information, which adds
complexity in handling global queries, but it better supports dynamic
addition and removal of source.
21Jarrar © 2013
Integration Process
After we identified the correspondences (in the previous
step), we now solve the conflicts:
One can distinguish between four types of conflicts:
– Structural conflicts
– Classification conflicts
– Descriptive conflicts
– Fragmentation conflicts
Examples of conflicts among related object types
– different classifications (sets of instances)
– different sets of properties
– different structures
– different coding schemes
– …
22Jarrar © 2013
Integration Rules
Rules defining the strategy to solve conflicts
Example rules:
– If an class corresponds to an attribute, keep the class
– If the population of a class is included in the population of another
class, build an is-a hierarchy
Integration rules depend on how you want the integrated
schema to look like
23Jarrar © 2013
Structural Conflicts
Different schema element types, e.g.: class, attribute, relationship
Library example:
– S1 : Book is a class
– S2 : books is an attribute of Author
Conflict resolution :
Choose the less constraining structure
– Integrated Schema: Book is a class
S1
S2
Source: Stefano Spaccapietra
24Jarrar © 2013
Classification Conflicts
• Corresponding elements describe different sets of real world objects
– S1.Faculty CONTAINS S2.PhD-advisor
• Conflict Resolution:
– Generalization / Specialization hierarchy
– Merging
Phd-advisor
Faculty
Phd-advisor
FacultyS1
S2
Faculty
25Jarrar © 2013
Descriptive Conflicts
Corresponding types have different properties, or corresponding
properties are described in different ways
Object / Entity / Relationship type:
– Naming conflicts :
• synonyms Node , Extremity
• homonyms Highway (EU) , Highway (USA)
– Composition conflicts : different attributes and methods
• Employee ( E# , name , address )
• Employee ( E# , position , salary , department )
26Jarrar © 2013
Integration Methods: Manual
Easy to implement , Flexible
BUT
time consuming for the DBA
a language
schemas integrated
schema
mapping
rules
DBA
First method: manual integration
“ do it yourself ”
Source: Stefano Spaccapietra
27Jarrar © 2013
Integration Methods: Semi-Automatic
schemas integrated
schema
mapping rules
TOOL
DBA
correspondences
Opens to visual CASE tools, integration servers
BUT knowledge acquisition can be painful
Second method : semi-automatic integration
“ tell me about the problem, I will try to fix it “
Jarrar © 2013 28
References and Acknowledgement
• Carlo Batini: Course on Data Integration. BZU IT Summer School
2011.
• Stefano Spaccapietra: Information Integration. Presentation at the IFIP
Academy. Porto Alegre. 2005.
• Chris Bizer: The Emerging Web of Linked Data. Presentation at SRI
International, Artificial Intelligence Center. Menlo Park, USA. 2009.
Thanks to Anton Deik for helping me preparing this lecture

More Related Content

PPTX
Jarrar: Architectural solutions in Data Integration
PPTX
Jarrar: Introduction to data Integration
PPTX
Jarrar: Introduction to Linked Data
PDF
Dbms Notes Lecture 4 : Data Models in DBMS
PPTX
Data models
PPT
Week 3 Classification of Database Management Systems & Data Modeling
PPTX
All data models in dbms
PPT
George thomas gtra2010
Jarrar: Architectural solutions in Data Integration
Jarrar: Introduction to data Integration
Jarrar: Introduction to Linked Data
Dbms Notes Lecture 4 : Data Models in DBMS
Data models
Week 3 Classification of Database Management Systems & Data Modeling
All data models in dbms
George thomas gtra2010

What's hot (20)

PPTX
DBMS OF DATA MODEL Deepika 2
PPT
Database Management & Models
PPT
data modeling and models
PPT
Gt ea2009
PPTX
Design approach
PPS
Data models
PDF
AUTOMATIC CONVERSION OF RELATIONAL DATABASES INTO ONTOLOGIES: A COMPARATIVE A...
PPTX
Open Health Knowledge Graphs
PPTX
Dbms database models
PPT
Database models unit 1 part 2
PPTX
Data Modeling Basics
PPTX
Data models
PDF
Improve information retrieval and e learning using
PPT
Tg03
PDF
Semantic Conflicts and Solutions in Integration of Fuzzy Relational Databases
PPT
Introduction to Data Modeling
PDF
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
DOCX
Database Concepts
PDF
Xml based data exchange in the
DBMS OF DATA MODEL Deepika 2
Database Management & Models
data modeling and models
Gt ea2009
Design approach
Data models
AUTOMATIC CONVERSION OF RELATIONAL DATABASES INTO ONTOLOGIES: A COMPARATIVE A...
Open Health Knowledge Graphs
Dbms database models
Database models unit 1 part 2
Data Modeling Basics
Data models
Improve information retrieval and e learning using
Tg03
Semantic Conflicts and Solutions in Integration of Fuzzy Relational Databases
Introduction to Data Modeling
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
Database Concepts
Xml based data exchange in the
Ad

Viewers also liked (20)

PPT
A Data Fusion System for Spatial Data Mining, Analysis and Improvement Silvij...
PDF
Pal gov.tutorial2.session15 1.linkeddata
PPT
Data Integration (ETL)
PDF
Data integration
PDF
Data integration ppt-bhawani nandan prasad - iim calcutta
PDF
AFCEA 2010 - High Level Fusion and Predictive Situational Awareness with Prob...
PDF
CodeCritics Applied to Database Schema: Challenges and First Results
PDF
Jarrar: Data Schema Integration
PDF
Pal gov.tutorial2.session13 2.gav and lav integration
PPT
[ABDO] Data Integration
PPTX
Local Search Hawaii Michael Dorausch PubCon SEO
PPT
[DSBW Spring 2010] Unit 10: XML and Web And beyond
PPTX
Ontology integration - Heterogeneity, Techniques and more
PDF
Enterprise and Data Mining Ontology Integration to Extract Actionable Knowled...
PDF
8 ontology integration and interoperability (onto i op)
PPTX
Jarrar: Data Fusion using RDF
PPTX
Lecture 07: Localization and Mapping I
PDF
DSBW Final Exam (Spring Sementer 2010)
PPT
Distributed databases and dbm ss
PPTX
Lecture 09: Localization and Mapping III
A Data Fusion System for Spatial Data Mining, Analysis and Improvement Silvij...
Pal gov.tutorial2.session15 1.linkeddata
Data Integration (ETL)
Data integration
Data integration ppt-bhawani nandan prasad - iim calcutta
AFCEA 2010 - High Level Fusion and Predictive Situational Awareness with Prob...
CodeCritics Applied to Database Schema: Challenges and First Results
Jarrar: Data Schema Integration
Pal gov.tutorial2.session13 2.gav and lav integration
[ABDO] Data Integration
Local Search Hawaii Michael Dorausch PubCon SEO
[DSBW Spring 2010] Unit 10: XML and Web And beyond
Ontology integration - Heterogeneity, Techniques and more
Enterprise and Data Mining Ontology Integration to Extract Actionable Knowled...
8 ontology integration and interoperability (onto i op)
Jarrar: Data Fusion using RDF
Lecture 07: Localization and Mapping I
DSBW Final Exam (Spring Sementer 2010)
Distributed databases and dbm ss
Lecture 09: Localization and Mapping III
Ad

Similar to Jarrar: Data Schema Integration (20)

PDF
Object oriented analysis and design unit- iv
PDF
OOM Unit I - III.pdf
DOCX
Data Modeling.docx
PPTX
Unified modeling language
PDF
Dotnet datamining ieee projects 2012 @ Seabirds ( Chennai, Pondicherry, Vello...
PDF
International Journal of Computational Engineering Research(IJCER)
PPTX
Object_Oriented_Design_Class and Object Diagrams.pptx
PDF
A Semantic Resource Based Approach for Star Schemas Matching
PDF
A SEMANTIC RESOURCE BASED APPROACH FOR STAR SCHEMAS MATCHING
PDF
Improved Presentation and Facade Layer Operations for Software Engineering Pr...
PDF
Comparison of Relational Database and Object Oriented Database
PDF
Model versioning in context of living
PDF
Mapping objects to_relational_databases
DOCX
2014 IEEE JAVA DATA MINING PROJECT Mining statistically significant co locati...
DOCX
IEEE 2014 JAVA DATA MINING PROJECTS Mining statistically significant co locat...
PPT
Oomd unit1
PPT
Ooad
PPTX
Software Engineering and Project Management - Introduction, Modeling Concepts...
PPTX
Object Oriented Approach for Software Development
PPTX
Uml diagram assignment help
Object oriented analysis and design unit- iv
OOM Unit I - III.pdf
Data Modeling.docx
Unified modeling language
Dotnet datamining ieee projects 2012 @ Seabirds ( Chennai, Pondicherry, Vello...
International Journal of Computational Engineering Research(IJCER)
Object_Oriented_Design_Class and Object Diagrams.pptx
A Semantic Resource Based Approach for Star Schemas Matching
A SEMANTIC RESOURCE BASED APPROACH FOR STAR SCHEMAS MATCHING
Improved Presentation and Facade Layer Operations for Software Engineering Pr...
Comparison of Relational Database and Object Oriented Database
Model versioning in context of living
Mapping objects to_relational_databases
2014 IEEE JAVA DATA MINING PROJECT Mining statistically significant co locati...
IEEE 2014 JAVA DATA MINING PROJECTS Mining statistically significant co locat...
Oomd unit1
Ooad
Software Engineering and Project Management - Introduction, Modeling Concepts...
Object Oriented Approach for Software Development
Uml diagram assignment help

More from Mustafa Jarrar (20)

PPTX
Clustering Arabic Tweets for Sentiment Analysis
PPTX
Classifying Processes and Basic Formal Ontology
PPTX
Discrete Mathematics Course Outline
PPTX
Business Process Implementation
PPTX
Business Process Design and Re-engineering
PPTX
BPMN 2.0 Analytical Constructs
PPTX
BPMN 2.0 Descriptive Constructs
PPTX
Introduction to Business Process Management
PDF
Customer Complaint Ontology
PPTX
Subset, Equality, and Exclusion Rules
PPTX
Schema Modularization in ORM
PPTX
On Computer Science Trends and Priorities in Palestine
PPTX
Lessons from Class Recording & Publishing of Eight Online Courses
PPTX
Presentation curras paper-emnlp2014-final
PPTX
Jarrar: Future Internet in Horizon 2020 Calls
PPT
Habash: Arabic Natural Language Processing
PDF
Adnan: Introduction to Natural Language Processing
PPTX
Riestra: How to Design and engineer Competitive Horizon 2020 Proposals
PPTX
Bouquet: SIERA Workshop on The Pillars of Horizon2020
PPTX
Jarrar: Sparql Project
Clustering Arabic Tweets for Sentiment Analysis
Classifying Processes and Basic Formal Ontology
Discrete Mathematics Course Outline
Business Process Implementation
Business Process Design and Re-engineering
BPMN 2.0 Analytical Constructs
BPMN 2.0 Descriptive Constructs
Introduction to Business Process Management
Customer Complaint Ontology
Subset, Equality, and Exclusion Rules
Schema Modularization in ORM
On Computer Science Trends and Priorities in Palestine
Lessons from Class Recording & Publishing of Eight Online Courses
Presentation curras paper-emnlp2014-final
Jarrar: Future Internet in Horizon 2020 Calls
Habash: Arabic Natural Language Processing
Adnan: Introduction to Natural Language Processing
Riestra: How to Design and engineer Competitive Horizon 2020 Proposals
Bouquet: SIERA Workshop on The Pillars of Horizon2020
Jarrar: Sparql Project

Recently uploaded (20)

PPTX
A Presentation on Artificial Intelligence
PPTX
A Presentation on Touch Screen Technology
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PPTX
TLE Review Electricity (Electricity).pptx
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PPTX
Tartificialntelligence_presentation.pptx
PDF
Univ-Connecticut-ChatGPT-Presentaion.pdf
PDF
Getting Started with Data Integration: FME Form 101
PPTX
Group 1 Presentation -Planning and Decision Making .pptx
PDF
A comparative study of natural language inference in Swahili using monolingua...
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Hybrid model detection and classification of lung cancer
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Hindi spoken digit analysis for native and non-native speakers
PDF
project resource management chapter-09.pdf
PDF
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf
PDF
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
A Presentation on Artificial Intelligence
A Presentation on Touch Screen Technology
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
TLE Review Electricity (Electricity).pptx
NewMind AI Weekly Chronicles - August'25-Week II
Tartificialntelligence_presentation.pptx
Univ-Connecticut-ChatGPT-Presentaion.pdf
Getting Started with Data Integration: FME Form 101
Group 1 Presentation -Planning and Decision Making .pptx
A comparative study of natural language inference in Swahili using monolingua...
Programs and apps: productivity, graphics, security and other tools
Encapsulation_ Review paper, used for researhc scholars
Hybrid model detection and classification of lung cancer
Building Integrated photovoltaic BIPV_UPV.pdf
Hindi spoken digit analysis for native and non-native speakers
project resource management chapter-09.pdf
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
Unlocking AI with Model Context Protocol (MCP)
Agricultural_Statistics_at_a_Glance_2022_0.pdf

Jarrar: Data Schema Integration

  • 1. Jarrar © 2013 1 Dr. Mustafa Jarrar University of Birzeit mjarrar@birzeit.edu www.jarrar.info Lecture Notes on Data Schema Integration Birzeit University, Palestine 2013 Data Schema Integration
  • 2. Jarrar © 2013 2 Watch this lecture and download the slides from http://jarrar-courses.blogspot.com/2013/11/web-data-management.html
  • 3. 3Jarrar © 2013 Employee Region Organization City bornIn/ locatedIn/ /WorksIn locatedIn/ Employee Organization /WorksIn Worker RegionCity bornIn/ locatedIn/ Organization Municipality locatedIn / Schema 1 Schema 2 Schema 3 Data Schema Integration: A simple example In ORM: Integrated schema
  • 4. 4Jarrar © 2013 Data Schema Integration: A simple example Source: Carlo Batini Employee Organization Empoloyee City Region Municipality Organization in works Schema 1 Schema 2 Schema 3 Employee City Region Organization works in in Integrated schema In ER: born born in
  • 5. 5Jarrar © 2013 Challenges of Data Schema Integration Schema Integration has two major challenges: 1. Identification of all portions of schemas that pertain to the same concept, in such a way to unify such different representations in the global schema. 2. Identification, analysis and resolution of the different types of conflicts (heterogeneities) in different schemas. Source: Carlo Batini
  • 6. 6Jarrar © 2013 Framework for Schema Integration Schemas Transformation Schemas Matching Schemas Integration Local Schemas Integrated Schema and mappings Source: Advances in Object-Oriented Data Modeling, M. P. Papazoglou, S. Spaccapietra, Z. Tari (Eds.), The MIT Press, 2000 Integration Rules Transformation Rules Matching Rules
  • 7. 7Jarrar © 2013 0. Define the integration strategy If the number of local schemas to be integrated is large, the order of schema integration becomes important. Several strategies can be adopted. Input: n source schemas Output: n source schemas + integration strategies Method used: heuristics Framework for Schema Integration One shot strategy S1 S2 S3 IS Pair at a time strategy S1 S2 S3 IS1 IS2 S4 IS Balanced Strategy S1 S2 S3 IS1 IS2 S4 IS - Efficient integration process - Many correspondences between concepts have to be considered together. - Priority to most relevant and stable schemas. - The integration process is more efficient -e.g.: Production, Marketing, Sales. -To be preferred when the cohesion among schemas is high. …
  • 8. 8Jarrar © 2013 Framework for Schema Integration 1. Schema transformation (or Pre-integration) Input: n source schemas Output: n source schemas homogeneized Methods used: Model and Design Homogeneization Reduce model heterogeneities as much as possible to make the sources more suitable for integration. Goal: use a single, common data model and format. Source DBs Homogeneized DBs DW Transformation Integration Source: Stefano Spaccapietra
  • 9. 9Jarrar © 2013 Schema Transformation Schema Transformation involves: • Data model homogeneization – Where all data sources are described using the same data model. • Design homogeneization – Enforce standard design rules to reduce the number of structural conflicts (e.g., Normalization: one fact in one place) • Reverse Engineering – Reverse engineer the schema from existing data (such as COBOL files, spreadsheets, legacy relational databases, legacy object- oriented databases).
  • 10. 10Jarrar © 2013 Example of Design homogeneization (Normalization) ONE TABLE: R1 (#Student, Name, LastName, #Course, CourseName, Grade, Date) Dependencies: – #Student  Name, LastName – #Course  CourseName – #Student #Course  Grade, Date) Normalized Into 3 Tables: One Fact In One Place: R11 (#Student, Name, LastName) R12 (#Course, CourseName) R13 (#Student, #Course, Grade, Date)
  • 11. 11Jarrar © 2013 Example of Reverse Engineering Source: Stefano Spaccapietra
  • 12. 12Jarrar © 2013 Schema Matching 2. Schema matching (Correspondences investigation) Input: n source schemas Output: n source schemas + correspondences Method used: techniques to discover correspondences Correspondences relate (schema) elements which describe the same phenomena of the real world. – This step aims at finding and describing all semantic links between elements of the input schemas and the corresponding data. – By doing so, one matches between the schemas to be integrated. – This step fixes the conflicts found in the schema.
  • 13. 13Jarrar © 2013 Semantics of Correspondences Correspondences relate (schema) elements which describe the same phenomena of the real world. Source: Stefano Spaccapietra
  • 14. 14Jarrar © 2013 Asserting Correspondences Finding matching correspondences is done through the use of a rich language for expressing correspondences (matchings). Example: S1.Person  S2.Person, With Corresponding Identifiers: Pin, With Corresponding Property: name Source: Stefano Spaccapietra
  • 15. 15Jarrar © 2013 Automated Matching • Fully automated matching is impossible, as a computer process can hardly make ultimate decisions about the semantics of data. • But even partial assistance in discovering of correspondences (to be confirmed or guided by humans) is beneficial, due to the complexity of the task. • All proposed methods rely on some similarity measures that try to evaluate the semantic distance between two descriptions. Some state of the art matching systems Cupid (Microsoft Research, USA) FOAM/QOM (University of Karlsruhe, Germany) OLA (INRIA Rhône-Alpes, France / University of Montreal, Canada) S-Match (University of Trento, Italy) … many others
  • 16. 16Jarrar © 2013 Examples of Correspondences Source: Stefano Spaccapietra
  • 17. 17Jarrar © 2013 Examples of Correspondences Employee Organization /WorksIn Worker RegionCity bornIn/ locatedIn/ Organization Municipality locatedIn/ Schema 1 Schema 2 Schema 3 Previous example
  • 18. 18Jarrar © 2013 Examples of Correspondences Source: Stefano Spaccapietra
  • 19. 19Jarrar © 2013 Schema Integration & Mapping Generation 3. Schemas integration and mapping generation Input: n source schemas + correspondences Output: integrated schema + mapping rules btw the integrated schema and input source schemas Method used: New classification of conflicts + Conflict resolution transformations GOAL: Creating an Integrated Schema ( IS ) and the mappings to the local databases. Source: Carlo Batini
  • 20. 20Jarrar © 2013 GAV and LAV Integration Research has identified two methods to set up mappings between the integrated schema and the input schemas: (1) GAV (Global As View): proposes to define the integrated schema as a view over input schemas. GAV is usually considered simpler and more efficient for processing queries on the integrated database, but is weaker in supporting evolution of the global system through addition of new sources. (2) LAV (Local As View): proposes to define the local schemas as views over the integrated schema. LAV generates issues of incomplete information, which adds complexity in handling global queries, but it better supports dynamic addition and removal of source.
  • 21. 21Jarrar © 2013 Integration Process After we identified the correspondences (in the previous step), we now solve the conflicts: One can distinguish between four types of conflicts: – Structural conflicts – Classification conflicts – Descriptive conflicts – Fragmentation conflicts Examples of conflicts among related object types – different classifications (sets of instances) – different sets of properties – different structures – different coding schemes – …
  • 22. 22Jarrar © 2013 Integration Rules Rules defining the strategy to solve conflicts Example rules: – If an class corresponds to an attribute, keep the class – If the population of a class is included in the population of another class, build an is-a hierarchy Integration rules depend on how you want the integrated schema to look like
  • 23. 23Jarrar © 2013 Structural Conflicts Different schema element types, e.g.: class, attribute, relationship Library example: – S1 : Book is a class – S2 : books is an attribute of Author Conflict resolution : Choose the less constraining structure – Integrated Schema: Book is a class S1 S2 Source: Stefano Spaccapietra
  • 24. 24Jarrar © 2013 Classification Conflicts • Corresponding elements describe different sets of real world objects – S1.Faculty CONTAINS S2.PhD-advisor • Conflict Resolution: – Generalization / Specialization hierarchy – Merging Phd-advisor Faculty Phd-advisor FacultyS1 S2 Faculty
  • 25. 25Jarrar © 2013 Descriptive Conflicts Corresponding types have different properties, or corresponding properties are described in different ways Object / Entity / Relationship type: – Naming conflicts : • synonyms Node , Extremity • homonyms Highway (EU) , Highway (USA) – Composition conflicts : different attributes and methods • Employee ( E# , name , address ) • Employee ( E# , position , salary , department )
  • 26. 26Jarrar © 2013 Integration Methods: Manual Easy to implement , Flexible BUT time consuming for the DBA a language schemas integrated schema mapping rules DBA First method: manual integration “ do it yourself ” Source: Stefano Spaccapietra
  • 27. 27Jarrar © 2013 Integration Methods: Semi-Automatic schemas integrated schema mapping rules TOOL DBA correspondences Opens to visual CASE tools, integration servers BUT knowledge acquisition can be painful Second method : semi-automatic integration “ tell me about the problem, I will try to fix it “
  • 28. Jarrar © 2013 28 References and Acknowledgement • Carlo Batini: Course on Data Integration. BZU IT Summer School 2011. • Stefano Spaccapietra: Information Integration. Presentation at the IFIP Academy. Porto Alegre. 2005. • Chris Bizer: The Emerging Web of Linked Data. Presentation at SRI International, Artificial Intelligence Center. Menlo Park, USA. 2009. Thanks to Anton Deik for helping me preparing this lecture