SlideShare a Scribd company logo
Moving beyond ETL
Definition

In general, integration of multiple information

systems aims at combining selected systems so that

they form a unified new whole and give users the

illusion of interacting with one single information

system
Reasons for Integration




First, given a set of existing information
systems, an integrated view can be created to
facilitate information access and reuse through
a single information access point
Reasons for Integration




Second, given a certain information need, data
from   different   complementing   information
systems is to be combined to gain a more
comprehensive basis to satisfy the need
Applications
In the area of Business Intelligence (BI) integrated information used
for querying and reporting for

 •    Statistical Analysis

 •    OLAP

 •    Data Mining

In order to enable

 •    Forecasting

 •    Decision Making

 •    Enterprise-wide Planning
Integration Problem
• Users will be provided with homogeneous logical view of
  data physically distributed over heterogeneous data
  sources
• All   data    has   to    be
  represented using the same
  abstraction         principle
  (unified global data model
  and unified semantic)
Kinds of Heterogeneity
•   Hardware and Operating Systems

•   Data Management Software

•   Data Models, Schemas and Semantic

•   Middle-ware

•   User Interfaces

•   Business Rules

    and Integrity Constraints
Abstraction Levels
1. Manual Integration
• Users   directly   interact   with   all   relevant
  information systems and manually integrate
  selected data
• Users have to deal with different user interfaces
  and query languages
• Users need to have detailed knowledge on
  location, logical data representation, and data
  semantics.
2. Common User Interface
• The user is supplied with a common user
  interface (e.g. a web browser) that provides a
  uniform look and feel
• Data from relevant information systems is still
  separately presented
• Homogenization and integration of data yet has
  to be done by the users
• For instance, as in Search Engines
3. Integration by Applications

• Uses Integration applications that access various
  data sources and return integrated results to the
  user

• Practical for a small number of component systems

• Applications become increasingly fat as the
  number of system interfaces and data formats to
  homogenize and integrate grows
4. Integration by Middle-ware

• Middleware provides functionality used to
  solve aspects of the integration problem

• Integration    efforts   are   still   needed   in
  applications

• Different middleware tools usually have to be
  combined to build integrated systems.
5. Uniform Data Access
• A logical integration of data is accomplished at
  the data access level
• Global applications are provided with a unified
  global view of physically distributed data
• Global provision of physically integrated data can
  be time-consuming
• Data access, homogenization, and integration
  have to be done at runtime
6. Common Data Storage
• Physical data integration is performed by
  transferring data to a new data storage
• Local sources can either be retired or remain
  operational
• In general, provides fast data access
• If local data sources are retired, applications
  have to be migrated to the new data storage
• In case local data sources remain operational,
  periodical refreshing of the common data
  storage needs to be considered
Important Examples
•   Mediated Query Systems
•   Portals
•   Data Warehouses
•   Operational Data Stores
•   Federated Database Systems (FDBMS)
•   Workflow Management Systems (WFMS)
•   Integration by Web Services
•   Peer-to-Peer (P2P) Integration
Mediated Query Systems

• Represent a uniform data access solution by
 providing a single point for read-only querying
 access to various data sources
• Uses a mediator that contains a global query
 processor to send sub-queries to local data
 sources; returned local query results are then
 combined
Portals

• Another form of uniform data access are
  personalized doorways to the internet or
  intranet
• Each user is provided with information tailored
  to his information needs
• Web mining is applied to determine user-
  profiles by click-stream analysis
Data Warehouses

• Realize a common data storage approach

• Data from several operational sources (OLTP)
  are extracted, transformed, and loaded (ETL)
  into a data warehouse

• Analysis, such as OLAP, can be performed on
  cubes of integrated and aggregated data
Operational Data Stores
• A second example of a common data storage
• A “warehouse with fresh data” is built by
  immediately propagating updates in local data
  sources to the data store
• Up-to-Date integrated data is available for decision
  support
• Unlike in data warehouses, data is neither cleansed
  nor aggregated nor are data histories supported
Federated Database Systems

• Achieve a uniform data access solution by
 logically integrating data from underlying
 local DBMS

• Implement their own data model, support
 global queries, global transactions, and
 global access control
Workflow Management Systems

• Represent an integration-by-application approach

• Allow to implement business processes where
  each single step is executed by a different
  application or user

• Support modeling, execution, and maintenance of
  processes that are comprised of interactions
  between applications and human users
Integration by Web Services
• Performs integration through software components
  (web   services)     that   support   machine-to-machine
  interaction by XML-based messages conveyed by
  internet protocols
• Depending on offered integration functionality either
  represent
 - a uniform data access approach, or
 - a common data access for later manual or
   application-based integration
Peer-to-Peer (P2P) Integration
• A decentralized approach to integration between
  distributed peers where data can be mutually shared
  and integrated
• Depending on offered integration functionality either
  represent
 - a uniform data access approach, or
 - a common data access for later manual or
   application-based integration
Semantic Data Integration
Data integration

More Related Content

PPS
Data Warehouse 101
PPTX
Data Modeling Basics
PPTX
Introduction to Data Warehousing
PPTX
DATA WAREHOUSING
PPTX
Big data frameworks
PPTX
Data integration
PDF
Data warehousing
PPTX
Data Warehousing (Need,Application,Architecture,Benefits), Data Mart, Schema,...
Data Warehouse 101
Data Modeling Basics
Introduction to Data Warehousing
DATA WAREHOUSING
Big data frameworks
Data integration
Data warehousing
Data Warehousing (Need,Application,Architecture,Benefits), Data Mart, Schema,...

What's hot (20)

DOC
Data mining notes
PPTX
Data science unit1
PPTX
DATA WAREHOUSING
PPTX
web mining
PPTX
Data mining tasks
PPTX
Dbms database models
PPTX
Business Intelligence
PPTX
Major issues in data mining
PPT
Data preprocessing
PPT
Data models
PPTX
Data warehousing
PPTX
Introduction to distributed database
PPTX
Data mining
PDF
Data warehouse architecture
KEY
Testing Hadoop jobs with MRUnit
PPT
data modeling and models
PDF
NOSQL- Presentation on NoSQL
PPTX
Data Modeling PPT
PDF
Object oriented databases
PDF
Data preprocessing using Machine Learning
Data mining notes
Data science unit1
DATA WAREHOUSING
web mining
Data mining tasks
Dbms database models
Business Intelligence
Major issues in data mining
Data preprocessing
Data models
Data warehousing
Introduction to distributed database
Data mining
Data warehouse architecture
Testing Hadoop jobs with MRUnit
data modeling and models
NOSQL- Presentation on NoSQL
Data Modeling PPT
Object oriented databases
Data preprocessing using Machine Learning
Ad

Viewers also liked (17)

PPTX
Database , 4 Data Integration
PPT
Data Integration (ETL)
PDF
Jarrar: Data Schema Integration
PDF
Introduction to ETL and Data Integration
PDF
Pal gov.tutorial2.session13 1.data schema integration
PDF
Pal gov.tutorial2.session15 1.linkeddata
PDF
Data integration ppt-bhawani nandan prasad - iim calcutta
PPT
[ABDO] Data Integration
PDF
Pal gov.tutorial2.session13 2.gav and lav integration
PPTX
Jarrar: Data Schema Integration
PPT
Prestiva nomination: SAP IS-OIL TSW, SPW, TRM, GTM
DOCX
Informatica
PPTX
Etl process in data warehouse
PPT
Distributed database management systems
PPT
19. Distributed Databases in DBMS
PPTX
Data quality and data profiling
PDF
SAP JVA ( Joint Venture Accounting )
Database , 4 Data Integration
Data Integration (ETL)
Jarrar: Data Schema Integration
Introduction to ETL and Data Integration
Pal gov.tutorial2.session13 1.data schema integration
Pal gov.tutorial2.session15 1.linkeddata
Data integration ppt-bhawani nandan prasad - iim calcutta
[ABDO] Data Integration
Pal gov.tutorial2.session13 2.gav and lav integration
Jarrar: Data Schema Integration
Prestiva nomination: SAP IS-OIL TSW, SPW, TRM, GTM
Informatica
Etl process in data warehouse
Distributed database management systems
19. Distributed Databases in DBMS
Data quality and data profiling
SAP JVA ( Joint Venture Accounting )
Ad

Similar to Data integration (20)

PPT
Introduction to Data Warehousing
PDF
B131626
PDF
slides on the subject of information integration and application
PPTX
ontology based- data_integration.ali_aljadaa.1125048
PDF
Data Integration in Multi-sources Information Systems
PPTX
Everything Self-Service:Linked Data Applications with the Information Workbench
PPS
Introduction to Data Warehousing
PDF
Jarrar: Architectural Solutions in Data Integration
PDF
Integration
PPTX
The Information Workbench as a Self-Service Platform for Linked Data Applicat...
PPTX
Linked Data as a Service
PPTX
Fi nf068c73aef66f694f31a049aff3f4
PPTX
DATAWAREHOUSE MAIn under data mining for
PPT
Semantic Interoperability in Infocosm: Beyond Infrastructural and Data Intero...
PPT
Lecture1
PDF
How to Use the Right Tools for Operational Data Integration
PDF
Semantic e commerce
PPT
Managing Data Integration Initiatives
PPT
Datawarehousing & DSS
PPTX
Data warehousing
Introduction to Data Warehousing
B131626
slides on the subject of information integration and application
ontology based- data_integration.ali_aljadaa.1125048
Data Integration in Multi-sources Information Systems
Everything Self-Service:Linked Data Applications with the Information Workbench
Introduction to Data Warehousing
Jarrar: Architectural Solutions in Data Integration
Integration
The Information Workbench as a Self-Service Platform for Linked Data Applicat...
Linked Data as a Service
Fi nf068c73aef66f694f31a049aff3f4
DATAWAREHOUSE MAIn under data mining for
Semantic Interoperability in Infocosm: Beyond Infrastructural and Data Intero...
Lecture1
How to Use the Right Tools for Operational Data Integration
Semantic e commerce
Managing Data Integration Initiatives
Datawarehousing & DSS
Data warehousing

More from Umar Alharaky (6)

PPTX
Function Point Counting Practices
PPTX
CMMI for Development
PDF
Generalized Stochastic Petri Nets
PDF
Spam Filtering
PDF
Simulation Tracking Object Reference Model (STORM)
PDF
Turing machine
Function Point Counting Practices
CMMI for Development
Generalized Stochastic Petri Nets
Spam Filtering
Simulation Tracking Object Reference Model (STORM)
Turing machine

Recently uploaded (20)

PDF
Machine learning based COVID-19 study performance prediction
PDF
Empathic Computing: Creating Shared Understanding
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PPT
Teaching material agriculture food technology
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Encapsulation theory and applications.pdf
PPTX
Big Data Technologies - Introduction.pptx
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Spectral efficient network and resource selection model in 5G networks
PPTX
Programs and apps: productivity, graphics, security and other tools
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Approach and Philosophy of On baking technology
PPTX
MYSQL Presentation for SQL database connectivity
PPTX
Spectroscopy.pptx food analysis technology
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
Machine learning based COVID-19 study performance prediction
Empathic Computing: Creating Shared Understanding
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
20250228 LYD VKU AI Blended-Learning.pptx
Teaching material agriculture food technology
Chapter 3 Spatial Domain Image Processing.pdf
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Encapsulation theory and applications.pdf
Big Data Technologies - Introduction.pptx
Digital-Transformation-Roadmap-for-Companies.pptx
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Spectral efficient network and resource selection model in 5G networks
Programs and apps: productivity, graphics, security and other tools
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
NewMind AI Weekly Chronicles - August'25 Week I
Approach and Philosophy of On baking technology
MYSQL Presentation for SQL database connectivity
Spectroscopy.pptx food analysis technology
Advanced methodologies resolving dimensionality complications for autism neur...

Data integration

  • 2. Definition In general, integration of multiple information systems aims at combining selected systems so that they form a unified new whole and give users the illusion of interacting with one single information system
  • 3. Reasons for Integration First, given a set of existing information systems, an integrated view can be created to facilitate information access and reuse through a single information access point
  • 4. Reasons for Integration Second, given a certain information need, data from different complementing information systems is to be combined to gain a more comprehensive basis to satisfy the need
  • 5. Applications In the area of Business Intelligence (BI) integrated information used for querying and reporting for • Statistical Analysis • OLAP • Data Mining In order to enable • Forecasting • Decision Making • Enterprise-wide Planning
  • 6. Integration Problem • Users will be provided with homogeneous logical view of data physically distributed over heterogeneous data sources • All data has to be represented using the same abstraction principle (unified global data model and unified semantic)
  • 7. Kinds of Heterogeneity • Hardware and Operating Systems • Data Management Software • Data Models, Schemas and Semantic • Middle-ware • User Interfaces • Business Rules and Integrity Constraints
  • 9. 1. Manual Integration • Users directly interact with all relevant information systems and manually integrate selected data • Users have to deal with different user interfaces and query languages • Users need to have detailed knowledge on location, logical data representation, and data semantics.
  • 10. 2. Common User Interface • The user is supplied with a common user interface (e.g. a web browser) that provides a uniform look and feel • Data from relevant information systems is still separately presented • Homogenization and integration of data yet has to be done by the users • For instance, as in Search Engines
  • 11. 3. Integration by Applications • Uses Integration applications that access various data sources and return integrated results to the user • Practical for a small number of component systems • Applications become increasingly fat as the number of system interfaces and data formats to homogenize and integrate grows
  • 12. 4. Integration by Middle-ware • Middleware provides functionality used to solve aspects of the integration problem • Integration efforts are still needed in applications • Different middleware tools usually have to be combined to build integrated systems.
  • 13. 5. Uniform Data Access • A logical integration of data is accomplished at the data access level • Global applications are provided with a unified global view of physically distributed data • Global provision of physically integrated data can be time-consuming • Data access, homogenization, and integration have to be done at runtime
  • 14. 6. Common Data Storage • Physical data integration is performed by transferring data to a new data storage • Local sources can either be retired or remain operational • In general, provides fast data access • If local data sources are retired, applications have to be migrated to the new data storage • In case local data sources remain operational, periodical refreshing of the common data storage needs to be considered
  • 15. Important Examples • Mediated Query Systems • Portals • Data Warehouses • Operational Data Stores • Federated Database Systems (FDBMS) • Workflow Management Systems (WFMS) • Integration by Web Services • Peer-to-Peer (P2P) Integration
  • 16. Mediated Query Systems • Represent a uniform data access solution by providing a single point for read-only querying access to various data sources • Uses a mediator that contains a global query processor to send sub-queries to local data sources; returned local query results are then combined
  • 17. Portals • Another form of uniform data access are personalized doorways to the internet or intranet • Each user is provided with information tailored to his information needs • Web mining is applied to determine user- profiles by click-stream analysis
  • 18. Data Warehouses • Realize a common data storage approach • Data from several operational sources (OLTP) are extracted, transformed, and loaded (ETL) into a data warehouse • Analysis, such as OLAP, can be performed on cubes of integrated and aggregated data
  • 19. Operational Data Stores • A second example of a common data storage • A “warehouse with fresh data” is built by immediately propagating updates in local data sources to the data store • Up-to-Date integrated data is available for decision support • Unlike in data warehouses, data is neither cleansed nor aggregated nor are data histories supported
  • 20. Federated Database Systems • Achieve a uniform data access solution by logically integrating data from underlying local DBMS • Implement their own data model, support global queries, global transactions, and global access control
  • 21. Workflow Management Systems • Represent an integration-by-application approach • Allow to implement business processes where each single step is executed by a different application or user • Support modeling, execution, and maintenance of processes that are comprised of interactions between applications and human users
  • 22. Integration by Web Services • Performs integration through software components (web services) that support machine-to-machine interaction by XML-based messages conveyed by internet protocols • Depending on offered integration functionality either represent - a uniform data access approach, or - a common data access for later manual or application-based integration
  • 23. Peer-to-Peer (P2P) Integration • A decentralized approach to integration between distributed peers where data can be mutually shared and integrated • Depending on offered integration functionality either represent - a uniform data access approach, or - a common data access for later manual or application-based integration