Data Warehousing Olap And Data Mining S Nagabhushana
Data Warehousing Olap And Data Mining S Nagabhushana
Data Warehousing Olap And Data Mining S Nagabhushana
Data Warehousing Olap And Data Mining S Nagabhushana
1. Data Warehousing Olap And Data Mining S
Nagabhushana download
https://ebookbell.com/product/data-warehousing-olap-and-data-
mining-s-nagabhushana-2109832
Explore and download more ebooks at ebookbell.com
2. Here are some recommended products that we believe you will be
interested in. You can click the link to download.
Data Warehousing In The Age Of Big Data Krishnan Krish
https://ebookbell.com/product/data-warehousing-in-the-age-of-big-data-
krishnan-krish-22061912
Data Warehousing Fundamentals For It Professionals 2nd Edition Paulraj
Ponniah
https://ebookbell.com/product/data-warehousing-fundamentals-for-it-
professionals-2nd-edition-paulraj-ponniah-2269204
Data Warehousing And Knowledge Discovery 13th International Conference
Dawak 2011 Toulouse France August 29september 22011 Proceedings 1st
Edition Joo Pedro Costa
https://ebookbell.com/product/data-warehousing-and-knowledge-
discovery-13th-international-conference-dawak-2011-toulouse-france-
august-29september-22011-proceedings-1st-edition-joo-pedro-
costa-2453964
Data Warehousing And Knowledge Discovery 11th International Conference
Dawak 2009 Linz Austria August 31september 2 2009 Proceedings 1st
Edition Laura M Haas
https://ebookbell.com/product/data-warehousing-and-knowledge-
discovery-11th-international-conference-dawak-2009-linz-austria-
august-31september-2-2009-proceedings-1st-edition-laura-m-haas-4141372
3. Data Warehousing And Knowledge Discovery 12th International Conference
Dawak 2010 Bilbao Spain Augustseptember 2010 Proceedings 1st Edition
Carlo Dellaquila
https://ebookbell.com/product/data-warehousing-and-knowledge-
discovery-12th-international-conference-dawak-2010-bilbao-spain-
augustseptember-2010-proceedings-1st-edition-carlo-dellaquila-4141374
Data Warehousing And Knowledge Discovery 13th International Conference
Dawak 2011 Toulouse France August 29september 22011 Proceedings 1st
Edition Joo Pedro Costa
https://ebookbell.com/product/data-warehousing-and-knowledge-
discovery-13th-international-conference-dawak-2011-toulouse-france-
august-29september-22011-proceedings-1st-edition-joo-pedro-
costa-4141376
Data Warehousing And Knowledge Discovery 14th International Conference
Dawak 2012 Vienna Austria September 36 2012 Proceedings 1st Edition
Zineb El Akkaoui
https://ebookbell.com/product/data-warehousing-and-knowledge-
discovery-14th-international-conference-dawak-2012-vienna-austria-
september-36-2012-proceedings-1st-edition-zineb-el-akkaoui-4141378
Data Warehousing And Knowledge Discovery 8th International Conference
Dawak 2006 Krakow Poland September 48 2006 Proceedings 1st Edition
Christian Thomsen
https://ebookbell.com/product/data-warehousing-and-knowledge-
discovery-8th-international-conference-dawak-2006-krakow-poland-
september-48-2006-proceedings-1st-edition-christian-thomsen-4239806
Data Warehousing And Knowledge Discovery 9th International Conference
Dawak 2007 Regensburg Germany September 37 2007 Proceedings 1st
Edition Todd Eavis
https://ebookbell.com/product/data-warehousing-and-knowledge-
discovery-9th-international-conference-dawak-2007-regensburg-germany-
september-37-2007-proceedings-1st-edition-todd-eavis-4239932
12. PREFACE
This book is intended for Information Technology (IT) professionals who have been
hearing about or have been tasked to evaluate, learn or implement data warehousing
technologies. This book also aims at providing fundamental techniques of KDD and Data
Mining as well as issues in practical use of Mining tools.
Far from being just a passing fad, data warehousing technology has grown much in scale
and reputation in the past few years, as evidenced by the increasing number of products,
vendors, organizations, and yes, even books, devoted to the subject. Enterprises that have
successfully implemented data warehouses find it strategic and often wonder how they ever
managed to survive without it in the past. Also Knowledge Discovery and Data Mining (KDD)
has emerged as a rapidly growing interdisciplinary field that merges together databases,
statistics, machine learning and related areas in order to extract valuable information and
knowledge in large volumes of data.
Volume-I is intended for IT professionals, who have been tasked with planning, manag-
ing, designing, implementing, supporting, maintaining and analyzing the organization’s data
warehouse.
The first section introduces the Enterprise Architecture and Data Warehouse concepts,
the basis of the reasons for writing this book.
The second section focuses on three of the key People in any data warehousing initia-
tive: the Project Sponsor, the CIO, and the Project Manager. This section is devoted to
addressing the primary concerns of these individuals.
The third section presents a Process for planning and implementing a data warehouse
and provides guidelines that will prove extremely helpful for both first-time and experienced
warehouse developers.
The fourth section focuses on the Technology aspect of data warehousing. It lends order
to the dizzying array of technology components that you may use to build your data ware-
house.
The fifth section opens a window to the future of data warehousing.
The sixth section deals with On-Line Analytical Processing (OLAP), by providing differ-
ent features to select the tools from different vendors.
Volume-II shows how to achieve success in understanding and exploiting large databases
by uncovering valuable information hidden in data; learn what data has real meaning and
what data simply takes up space; examining which data methods and tools are most effective
for the practical needs; and how to analyze and evaluate obtained results.
S. NAGABHUSHANA
14. ACKNOWLEDGEMENTS
My sincere thanks to Prof. P. Rama Murthy, Principal, Intell Engineering College,
Anantapur, for his able guidance and valuable suggestions - in fact, it was he who brought
my attention to the writing of this book. I am grateful to Smt. G. Hampamma, Lecturer in
English, Intell Engineering College, Anantapur and her whole family for their constant sup-
port and assistance while writing the book. Prof. Jeffrey D. Ullman, Department of Computer
Science, Stanford University, U.S.A., deserves my special thanks for providing all the neces-
sary resources. I am also thankful to Mr. R. Venkat, Senior Technical Associate at Virtusa,
Hyderabad, for going through the script and encouraging me.
Last but not least, I thank Mr. Saumya Gupta, Managing Director, New Age Interna-
tional (P) Limited, Publishers. New Delhi, for their interest in the publication of the book.
16. (xi)
CONTENTS
Preface (vii)
Acknowledgements (ix)
VOLUME I: DATA WAREHOUSING
IMPLEMENTATION AND OLAP
PART I : INTRODUCTION
Chapter 1. The Enterprise IT Architecture 5
1.1 The Past: Evolution of Enterprise Architectures 5
1.2 The Present: The IT Professional’s Responsibility 6
1.3 Business Perspective 7
1.4 Technology Perspective 8
1.5 Architecture Migration Scenarios 12
1.6 Migration Strategy: How do We Move Forward? 20
Chapter 2. Data Warehouse Concepts 24
2.1 Gradual Changes in Computing Focus 24
2.2 Data Warehouse Characteristics and Definition` 26
2.3 The Dynamic, Ad Hoc Report 28
2.4 The Purposes of a Data Warehouse 29
2.5 Data Marts 30
2.6 Operational Data Stores 33
2.7 Data Warehouse Cost-Benefit Analysis / Return on Investment 35
PART II : PEOPLE
Chapter 3. The Project Sponsor 39
3.1 How does a Data Warehouse Affect Decision-Making Processes? 39
17. 3.2 How does a Data Warehouse Improve Financial Processes? Marketing?
Operations? 40
3.3 When is a Data Warehouse Project Justified? 41
3.4 What Expenses are Involved? 43
3.5 What are the Risks? 45
3.6 Risk-MitigatingApproaches 50
3.7 Is Organization Ready for a Data Warehouse? 51
3.8 How the Results are Measured? 51
Chapter 4. The CIO 54
4.1 How is the Data Warehouse Supported? 54
4.2 How Does Data Warehouse Evolve? 55
4.3 Who should be Involved in a Data Warehouse Project? 56
4.4 What is the Team Structure Like? 60
4.5 What New Skills will People Need? 60
4.6 How Does Data Warehousing Fit into IT Architecture? 62
4.7 How Many Vendors are Needed to Talk to? 63
4.8 What should be Looked for in a Data Warehouse Vendor? 64
4.9 How Does Data Warehousing Affect Existing Systems? 67
4.10 Data Warehousing and its Impact on Other Enterprise Initiatives 68
4.11 When is a Data Warehouse not Appropriate? 69
4.12 How to Manage or Control a Data Warehouse Initiative? 71
Chapter 5. The Project Manager 73
5.1 How to Roll Out a Data Warehouse Initiative? 73
5.2 How Important is the Hardware Platform? 76
5.3 What are the Technologies Involved? 78
5.4 Are the Relational Databases Still Used for Data Warehousing? 79
5.5 How Long Does a Data Warehousing Project Last? 83
5.6 How is a Data Warehouse Different from Other IT Projects? 84
5.7 What are the Critical Success Factors of a Data Warehousing 85
Project?
(xii)
18. PART III : PROCESS
Chapter 6. Warehousing Strategy 89
6.1 Strategy Components 89
6.2 Determine Organizational Context 90
6.3 Conduct Preliminary Survey of Requirements 90
6.4 Conduct Preliminary Source System Audit 92
6.5 Identify External Data Sources (If Applicable) 93
6.6 Define Warehouse Rollouts (Phased Implementation) 93
6.7 Define Preliminary Data Warehouse Architecture 94
6.8 Evaluate Development and Production Environment and Tools 95
Chapter 7. Warehouse Management and Support Processes 96
7.1 Define Issue Tracking and Resolution Process 96
7.2 Perform Capacity Planning 98
7.3 Define Warehouse Purging Rules 108
7.4 Define Security Management 108
7.5 Define Backup and Recovery Strategy 111
7.6 Set Up Collection of Warehouse Usage Statistics 112
Chapter 8. Data Warehouse Planning 114
8.1 Assemble and Orient Team 114
8.2 Conduct Decisional Requirements Analysis 115
8.3 Conduct Decisional Source System Audit 116
8.4 Design Logical and Physical Warehouse Schema 119
8.5 Produce Source-to-Target Field Mapping 119
8.6 Select Development and Production Environment and Tools 121
8.7 Create Prototype for this Rollout 121
8.8 Create Implementation Plan of this Rollout 122
8.9 Warehouse Planning Tips and Caveats 124
Chapter 9. Data Warehouse Implementation 128
9.1 Acquire and Set Up Development Environment 128
9.2 Obtain Copies of Operational Tables 129
9.3 Finalize Physical Warehouse Schema Design 129
(xiii)
19. (xiv)
9.4 Build or Configure Extraction and Transformation Subsystems 130
9.5 Build or Configure Data Quality Subsystem 131
9.6 Build Warehouse Load Subsystem 135
9.7 Set Up Warehouse Metadata 138
9.8 Set Up Data Access and Retrieval Tools 138
9.9 Perform the Production Warehouse Load 140
9.10 Conduct User Training 140
9.11 Conduct User Testing and Acceptance 141
PART IV : TECHNOLOGY
Chapter 10. Hardware and Operating Systems 145
10.1 Parallel Hardware Technology 145
10.2 The Data Partitioning Issue 148
10.3 Hardware Selection Criteria 152
Chapter 11. Warehousing Software 154
11.1 Middleware and Connectivity Tools 155
11.2 Extraction Tools 155
11.3 Transformation Tools 156
11.4 Data Quality Tools 158
11.5 Data Loaders 158
11.6 Database Management Systems 159
11.7 Metadata Repository 159
11.8 Data Access and Retrieval Tools 160
11.9 Data Modeling Tools 162
11.10 Warehouse Management Tools 163
11.11 Source Systems 163
Chapter 12. Warehouse Schema Design 165
12.1 OLTP Systems Use Normalized Data Structures 165
12.2 Dimensional Modeling for Decisional Systems 167
12.3 Star Schema 168
12.4 Dimensional Hierarchies and Hierarchical Drilling 169
12.5 The Granularity of the Fact Table 170
20. (xv)
12.6 Aggregates or Summaries 171
12.7 DimensionalAttributes 173
12.8 Multiple Star Schemas 173
12.9 Advantages of Dimensional Modeling 174
Chapter 13. Warehouse Metadata 176
13.1 Metadata Defined 176
13.2 Metadata are a Form of Abstraction 177
13.3 Importance of Metadata 178
13.4 Types of Metadata 179
13.5 Metadata Management 181
13.6 Metadata as the Basis for Automating Warehousing Tasks 182
13.7 Metadata Trends 182
Chapter 14. Warehousing Applications 184
14.1 The Early Adopters 184
14.2 Types of WarehousingApplications 184
14.3 FinancialAnalysis and Management 185
14.4 Specialized Applications of Warehousing Technology 186
PART V: MAINTENANCE, EVOLUTION AND TRENDS
Chapter 15. Warehouse Maintenance and Evolution 191
15.1 Regular Warehouse Loads 191
15.2 Warehouse Statistics Collection 191
15.3 Warehouse User Profiles 192
15.4 Security and Access Profiles 193
15.5 Data Quality 193
15.6 Data Growth 194
15.7 Updates to Warehouse Subsystems 194
15.8 Database Optimization and Tuning 195
15.9 Data Warehouse Staffing 195
15.10 Warehouse Staff and User Training 196
15.11 Subsequent Warehouse Rollouts 196
15.12 Chargeback Schemes 197
15.13 Disaster Recovery 197
21. (xvi)
Chapter 16. Warehousing Trends 198
16.1 Continued Growth of the Data Warehouse Industry 198
16.2 Increased Adoption of Warehousing Technology by More Industries 198
16.3 Increased Maturity of Data Mining Technologies 199
16.4 Emergence and Use of Metadata Interchange Standards 199
16.5 Increased Availability of Web-Enabled Solutions 199
16.6 Popularity of Windows NT for Data Mart Projects 199
16.7 Availability of Warehousing Modules for Application Packages 200
16.8 More Mergers and Acquisitions Among Warehouse Players 200
PART VI: ON-LINE ANALYTICAL PROCESSING
Chapter 17. Introduction 203
17.1 What is OLAP ? 203
17.2 The Codd Rules and Features 205
17.3 The origins of Today’s OLAP Products 209
17.4 What’s in a Name 219
17.5 Market Analysis 221
17.6 OLAP Architectures 224
17.7 Dimensional Data Structures 229
Chapter 18. OLAP Applications 233
18.1 Marketing and SalesAnalysis 233
18.2 Click streamAnalysis 235
18.3 Database Marketing 236
18.4 Budgeting 237
18.5 Financial Reporting and Consolidation 239
18.6 Management Reporting 242
18.7 EIS 242
18.8 Balanced Scorecard 243
18.9 ProfitabilityAnalysis 245
18.10 QualityAnalysis 246
22. VOLUME II: DATA MINING
Chapter 1. Introduction 249
1.1 What is Data Mining 251
1.2 Definitions 252
1.3 Data Mining Process 253
1.4 Data Mining Background 254
1.5 Data Mining Models 256
1.6 Data Mining Methods 257
1.7 Data Mining Problems/Issues 260
1.8 PotentialApplications 262
1.9 Data Mining Examples 262
Chapter 2. Data Mining with Decision Trees 267
2.1 How a Decision Tree Works 269
2.2 Constructing Decision Trees 271
2.3 Issues in Data Mining with Decision Trees 275
2.4 Visualization of Decision Trees in System CABRO 279
2.5 Strengths and Weakness of Decision Tree Methods 281
Chapter 3. Data Mining with Association Rules 283
3.1 When is Association Rule Analysis Useful ? 285
3.2 How does Association Rule Analysis Work ? 286
3.3 The Basic Process of Mining Association Rules 287
3.4 The Problem of Large Datasets 292
3.5 Strengths and Weakness of Association Rules Analysis 293
Chapter 4. Automatic Clustering Detection 295
4.1 Searching for Clusters 297
4.2 The K-means Method 299
4.3 Agglomerative Methods 309
4.4 Evaluating Clusters 311
4.5 Other Approaches to Cluster Detection 312
4.6 Strengths and Weakness of Automatic Cluster Detection 313
(xvii)
23. (xviii)
Chapter 5. Data Mining with Neural Network 315
5.1 Neural Networks for Data Mining 317
5.2 Neural Network Topologies 318
5.3 Neural Network Models 321
5.4 Iterative Development Process 327
5.5 Strengths and Weakness of Artificial Neural Network 320
26. PART I : INTRODUCTION
The term Enterprise Architecture refers to a collection of
technology components and their interrelationships, which are
integrated to meet the information requirements of an
enterprise. This section introduces the concept of Enterprise
IT Architectures with the intention of providing a framework
for the various types of technologies used to meet an
enterprise’s computing needs.
Data warehousing technologies belong to just one of the many
components in IT architecture. This chapter aims to define
how data warehousing fits within the overall IT architecture,
in the hope that IT professionals will be better positioned to
use and integrate data warehousing technologies with the
other IT components used by the enterprise.
28. 5
This chapter begins with a brief look at the changing business requirements and how,
over time influenced the evolution of Enterprise Architectures. The InfoMotion (“Information
in Motion”) Enterprise Architecture is introduced to provide IT professionals with a framework
with which to classify the various technologies currently available.
1.1 THE PAST: EVOLUTION OF ENTERPRISE ARCHITECTURES
The IT architecture of an enterprise at a given time depends on three main factors:
• the business requirements of the enterprise;
• the available technology at that time; and
• the accumulated investments of the enterprise from earlier technology generations.
The business requirements of an enterprise are constantly changing, and the changes
are coming at an exponential rate. Business requirements have, over the years, evolved
from the day-to-day clerical recording of transactions to the automation of business processes.
Exception reporting has shifted from tracking and correcting daily transactions that have
gone astray to the development of self-adjusting business processes.
Technology has likewise advanced by delivering exponential increases in computing
power and communications capabilities. However, for all these advances in computing
hardware, a significant lag exists in the realms of software development and architecture
definition. Enterprise Architectures thus far have displayed a general inability to gracefully
evolve in line with business requirements, without either compromising on prior technology
investments or seriously limiting their own ability to evolve further.
In hindsight, the evolution of the typical Enterprise Architecture reflects the continuous,
piecemeal efforts of IT professionals to take advantage of the latest technology to improve
the support of business operations. Unfortunately, this piecemeal effort has often resulted
in a morass of incompatible components.
THE ENTERPRISE IT ARCHITECTURE
1
CHAPTER
29. 6 DATA WAREHOUSING, OLAP AND DATA MINING
1.2 THE PRESENT: THE IT PROFESSIONAL’S RESPONSIBILITY
Today, the IT professional continues to have a two-fold responsibility: Meet business
requirements through Information Technology and integrate new technology into the existing
Enterprise Architecture.
Meet Business Requirements
The IT professional must ensure that the enterprise IT infrastructure properly supports
a myriad set of requirements from different business users, each of whom has different and
constantly changing needs, as illustrated in Figure 1.1.
I need to find out
why our sales in the
South are dropping...
We need to get
this modified order
quickly to our
European supplier...
Where can I find
a copy of last month’s
Newsletter?
Someone from
XYZ, Inc.
wants to know
what the
status of
their order is..
Figure 1.1. Different Business Needs
Take Advantage of Technology Advancements
At the same time, the IT professional must also constantly learn new buzzwords, review
new methodologies, evaluate new tools, and maintain ties with technology partners. Not all
the latest technologies are useful; the IT professional must first sift through the technology
jigsaw puzzle (see Figure 1.2) to find the pieces that meet the needs of the enterprise, then
integrate the newer pieces with the existing ones to form a coherent whole.
Decision
Support Web
Technology
OLAP
OLTP
Intranet
Data
Warehouse
Flash Monitoring
& Reporting
Legacy Client/Server
Figure 1.2. The Technology Jigsaw Puzzle
30. THE ENTERPRISE IT ARCHITECTURE 7
One of the key constraints the IT professional faces today is the current Enterprise IT
Architecture itself. At this point, therefore, it is prudent to step back, assess the current
state of affairs and identify the distinct but related components of modern Enterprise
Architectures.
The two orthogonal perspectives of business and technology are merged to form one
unified framework, as shown in Figure 1.3.
INFOMOTION
ENTERPRISE ARCHITECTURE
Transactional
Web Scripts
Informational
Web Scripts
Decision
Support
Applications
Flash
Monitoring
& Reporting
OLTP
Applications
Workflow
Management
Clients
Transactional
Web
Services
Informational
Web
Services
Data
Warehouse
Operational
Data Store
Active
Data base
Workflow
Management
Services
Legacy Systems
VIRTUAL CORP. INFORMATIONAL DECISIONAL OPERATIONAL
Logical
Client
Layer
Logical
Server
Layer
Legacy
Layer
Figure 1.3. The InfoMotion Enterprise Architecture
1.3 BUSINESS PERSPECTIVE
From the business perspective, the requirements of the enterprise fall into categories
illustrated in Figure 1.4 and described below.
Operational
Technology supports the smooth execution and continuous improvement of day-to-day
operations, the identification and correction of errors through exception reporting and
workflow management, and the overall monitoring of operations. Information retrieved
about the business from an operational viewpoint is used to either complete or optimize the
execution of a business process.
Decisional
Technology supports managerial decision-making and long-term planning. Decision-
makers are provided with views of enterprise data from multiple dimensions and in varying
levels of detail. Historical patterns in sales and other customer behavior are analyzed.
Decisional systems also support decision-making and planning through scenario-based
modeling, what-if analysis, trend analysis, and rule discovery.
Informational
Technology makes current, relatively static information widely and readily available to
as many people as need access to it. Examples include company policies, product and service
information, organizational setup, office location, corporate forms, training materials and
company profiles.
31. 8 DATA WAREHOUSING, OLAP AND DATA MINING
DECISIONAL
VIRTUAL
CORPORATION
INFORMATIONAL
OPERATIONAL
Figure 1.4. The InfoMotion Enterprise Architecture
Virtual Corporation
Technology enables the creation of strategic links with key suppliers and customers to
better meet customer needs. In the past, such links were feasible only for large companies
because of economy of scale. Now, the affordability of Internet technology provides any
enterprise with this same capability.
1.4 TECHNOLOGY PERSPECTIVE
This section presents each architectural component from a technology standpoint and
highlights the business need that each is best suited to support.
Operational Needs
Legacy Systems
The term legacy system refers to any information system currently in use that was built
using previous technology generations. Most legacy systems are operational in nature, largely
because the automation of transaction-oriented business processes had long been the priority
of Information Technology projects.
OPERATIONAL
• Legacy System
• OLTP Aplication
• Active Database
• Operational Data Store
• Flash Monitoring and Reporting
• Workflow Management (Groupware)
OLTP Applications
The term Online Transaction Processing refers to systems that automate and capture
business transactions through the use of computer systems. In addition, these applications
32. THE ENTERPRISE IT ARCHITECTURE 9
traditionally produce reports that allow business users to track the status of transactions.
OLTP applications and their related active databases compose the majority of client/server
systems today.
Active Databases
Databases store the data produced by Online Transaction Processing applications. These
databases were traditionally passive repositories of data manipulated by business applications.
It is not unusual to find legacy systems with processing logic and business rules contained
entirely in the user interface or randomly interspersed in procedural code.
With the advent of client/server architecture, distributed systems, and advances in
database technology, databases began to take on a more active role through database
programming (e.g., stored procedures) and event management. IT professionals are now
able to bullet-proof the application by placing processing logic in the database itself. This
contrasts with the still-popular practice of replicating processing logic (sometimes in an
inconsistent manner) across the different parts of a client application or across different
client applications that update the same database. Through active databases, applications
are more robust and conducive to evolution.
Operational Data Stores
An Operational Data Store or ODS is a collection of integrated databases designed to
support the monitoring of operations. Unlike the databases of OLTP applications (that are
function oriented), the Operational Data Store contains subject-oriented, volatile, and current
enterprise-wide detailed information; it serves as a system of record that provides
comprehensive views of data in operational systems.
Data are transformed and integrated into a consistent, unified whole as they are obtained
from legacy and other operational systems to provide business users with an integrated and
current view of operations (see Figure 1.5). Data in the Operational Data Store are constantly
refreshed so that the resulting image reflects the latest state of operations.
Legacy System Y
Legacy System X Other Systems
Legacy System Z
Integration and
Transformation of
Legacy Data
Operational
Data Store
Figure 1.5. Legacy Systems and the Operational Data Store
Flash Monitoring and Reporting
These tools provide business users with a dashboard-meaningful online information on
the operational status of the enterprise by making use of the data in the Operational Data
33. 10 DATA WAREHOUSING, OLAP AND DATA MINING
Store. The business user obtains a constantly refreshed, enterprise-wide view of operations
without creating unwanted interruptions or additional load on transaction processing systems.
Workflow Management and Groupware
Workflow management systems are tools that allow groups to communicate and
coordinate their work. Early incarnations of this technology supported group scheduling,
e-mail, online discussions, and resource sharing. More advanced implementations of this
technology are integrated with OLTP applications to support the execution of business
processes.
Decisional Needs
Data Warehouse
The data warehouse concept developed as IT professionals increasingly realized that
the structure of data required for transaction reporting was significantly different from the
structure required to analyze data.
DECISIONAL
• Data Warehouse
• Decision Support Application
(OLAP)
The data warehouse was originally envisioned as a separate architectural component
that converted and integrated masses of raw data from legacy and other operational systems
and from external sources. It was designed to contain summarized, historical views of data
in production systems. This collection provides business users and decision-makers with a
cross functional, integrated, subject-oriented view of the enterprise.
The introduction of the Operational Data Store has now caused the data warehouse
concept to evolve further. The data warehouse now contains summarized, historical views
of the data in the Operational Data Store. This is achieved by taking regular “snapshots”
of the contents of the Operational Data Store and using these snapshots as the basis for
warehouse loads.
In doing so, the enterprise obtains the information required for long term and historical
analysis, decision-making, and planning.
Decision Support Applications
Also known as OLAP (Online Analytical Processing), these applications provide
managerial users with meaningful views of past and present enterprise data. User-friendly
formats, such as graphs and charts are frequently employed to quickly convey meaningful
data relationships.
Decision support processing typically does not involve the update of data; however,
some OLAP software allows users to enter data for budgeting, forecasting, and “what-if ”
analysis.
34. THE ENTERPRISE IT ARCHITECTURE 11
Informational Needs
Informational Web Services and Scripts
Web browsers provide their users with a universal tool or front-end for accessing
information from web servers. They provide users with a new ability to both explore and
publish information with relative ease. Unlike other technologies, web technology makes
any user an instant publisher by enabling the distribution of knowledge and expertise, with
no more effort than it takes to record the information in the first place.
INFORMATIONAL
• Informational Web Services
By its very nature, this technology supports a paperless distribution process. Maintenance
and update of information is straightforward since the information is stored on the web server.
Virtual Corporation Needs
Transactional Web Services and Scripts
Several factors now make Internet technology and electronic commerce a realistic option
for enterprises that wish to use the Internet for business transactions.
VIRTUAL
CORPORATION
• Transactional Web Services
• Cost. The increasing affordability of Internet access allows businesses to establish
cost-effective and strategic links with business partners. This option was originally
open only to large enterprises through expensive, dedicated wide-area networks or
metropolitan area networks.
• Security. Improved security and encryption for sensitive data now provide customers
with the confidence to transact over the Internet. At the same time, improvements
in security provide the enterprise with the confidence to link corporate computing
environments to the Internet.
• User-friendliness. Improved user-friendliness and navigability from web technology
make Internet technology and its use within the enterprise increasingly popular.
Figure 1.6 recapitulates the architectural components for the different types of business
needs. The majority of the architectural components support the enterprise at the operational
level. However, separate components are now clearly defined for decisional and information
purposes, and the virtual corporation becomes possible through Internet technologies.
35. 12 DATA WAREHOUSING, OLAP AND DATA MINING
Other Components
Other architectural components are so pervasive that most enterprises have begun to
take their presence for granted. One example is the group of applications collectively known
as office productivity tools (such as Microsoft Office or Lotus SmartSuite). Components of
this type can and should be used across the various layers of the Enterprise Architecture
and, therefore, are not described here as a separate item.
DECISIONAL
VIRTUAL
CORPORATION
• Transactional Web
Services
• Informational Web
Services
OPERATIONAL
INFORMATIONAL
• Legacy Systems
• OLTP Application
• Active Database
• Operational Data Store
• Flash Monitoring and Reporting
• Workflow Management (Groupware)
• Data Warehouse
• Decision Support Applications
(OLAP)
Figure 1.6. InfoMotion Enterprise Architecture Components (Applicability to Business Needs)
1.5 ARCHITECTURE MIGRATION SCENARIOS
Given the typical path that most Enterprise Architectures have followed, an enterprise
will find itself in need of one or more of the following six migration scenarios. Which are
recommended for fulfilling those needs.
Legacy Integration
The Need
The integration of new and legacy systems is a constant challenge because of the
architectural templates upon which legacy systems were built. Legacy systems often attempt
to meet all types of information requirements through a single architectural component;
consequently, these systems are brittle and resistant to evolution.
Despite attempts to replace them with new applications, many legacy systems remain
in use because they continue to meet a set of business requirements: they represent significant
investments that the enterprise cannot afford to scrap, or their massive replacement would
result in unacceptable levels of disruption to business operations.
The Recommended Approach
The integration of legacy systems with the rest of the architecture is best achieved
through the Operational Data Store and/or the data warehouse. Figure 1.7 modifies
Figure 1.5 to show the integration of legacy systems.
Legacy programs that produce and maintain summary information are migrated to the
data warehouse. Historical data are likewise migrated to the data warehouse. Reporting
36. THE ENTERPRISE IT ARCHITECTURE 13
functionality in legacy systems is moved either to the flash reporting and monitoring tools
(for operational concerns), or to decision support applications (for long-term planning and
decision-making). Data required for operational monitoring are moved to the Operational
Data Store. Table 1.1 summarizes the migration avenues.
The Operational Data Store and the data warehouse present IT professionals with a
natural migration path for legacy migration. By migrating legacy systems to these two
components, enterprises can gain a measure of independence from legacy components that
were designed with old, possibly obsolete, technology. Figure 1.8 highlights how this approach
fits into the Enterprise Architecture.
Data Warehouse
Operational
Data Store
Integration and
Transformation of
Legacy Data
Legacy System N
Legacy System 2
Legacy System 1
Other Systems
Figure 1.7. Legacy Integration
Operational Monitoring
The Need
Today’s typical legacy systems are not suitable for supporting the operational monitoring
needs of an enterprise. Legacy systems are typically structured around functional or
organizational areas, in contrast to the cross-functional view required by operations monitoring.
Different and potentially incompatible technology platforms may have been used for different
systems. Data may be available in legacy databases but are not extracted in the format
required by business users. Or data may be available but may be too raw to be of use for
operational decision-making (further summarization, calculation, or conversion is required).
And lastly, several systems may contain data about the same item but may examine the data
from different viewpoints or at different time frames, therefore requiring reconciliation.
Table 1.1. Migration of Legacy Functionality to the Appropriate Architectural Component
Functionality in Legacy Systems Should be Migrated to . . .
Summary Information Data Warehouse
Historical Data Data Warehouse
Operational Reporting Flash Monitoring and Reporting Tools
Data for Operational Monitoring Operational Data Store
Decisional Reporting Decision Support Applications
37. 14 DATA WAREHOUSING, OLAP AND DATA MINING
INFOMOTION
LEGACY INTEGRATION
Transactional
Web Scripts
Informational
Web Scripts
Decision
Support
Applications
Flash
Monitoring
& Reporting
OLTP
Applications
Workflow
Management
Clients
Transactional
Web
Services
Informational
Web
Services
Data
Warehouse
Operational
Data Store
Active
Data base
Workflow
Management
Services
Legacy Systems
VIRTUAL CORP. INFORMATIONAL DECISIONAL OPERATIONAL
Logical
Client
Layer
Logical
Server
Layer
Legacy
Layer
Figure 1.8. Legacy Integration: Architectural View
The Recommended Approach
An integrated view of current, operational information is required for the successful
monitoring of operations. Extending the functionality of legacy applications to meet this
requirement would merely increase the enterprise’s dependence on increasingly obsolete
technology. Instead, an Operational Data Store, coupled with flash monitoring and reporting
tools, as shown in Figure 1.9, meets this requirement without sacrificing architectural integrity.
Like a dashboard on a car, flash monitoring and reporting tools keep business users
apprised of the latest cross-functional status of operations. These tools obtain data from the
Operational Data Store, which is regularly refreshed with the latest information from legacy
and other operational systems.
Business users are consequently able to step in and correct problems in operations
while they are still smaller or better, to prevent problems from occurring altogether. Once
alerted of a potential problem, the business user can manually intervene or make use of
automated tools (i.e., control panel mechanisms) to fine-tune operational processes. Figure
1.10 highlights how this approach fits into the Enterprise Architecture.
Legacy System 1
Legacy System 2
Legacy System N
Integration and
Transformation of
Legacy Data
Operational
Data Store
Flash Monitoring
and Reporting
Other System
Figure 1.9. Operational Monitoring
38. THE ENTERPRISE IT ARCHITECTURE 15
Process Implementation
The Need
In the early 90s, the popularity of business process reengineering (BPR) caused businesses
to focus on the implementation of new and redefined business processes.
Raymond Manganelli and Mark Klein, in their book The Reengineering Handbook
(AMACOM, 1994, ISBN: 0-8144-0236-4) define BPR as “the rapid and radical redesign of
strategic, value-added business processes–and the systems, policies, and organizational
structures that support them–to optimize the work flow and productivity in an organization.”
Business processes are redesigned to achieve desired results in an optimum manner.
INFOMOTION
OPERATIONAL MONITORING
Transactional
Web Scripts
Informational
Web Scripts
Decision
Support
Applications
Flash
Monitoring
& Reporting
OLTP
Applications
Workflow
Management
Clients
Transactional
Web
Services
Informational
Web
Services
Data
Warehouse
Operational
Data Store
Active
Data base
Workflow
Management
Services
Legacy Systems
VIRTUAL CORP. INFORMATIONAL DECISIONAL OPERATIONAL
Logical
Client
Layer
Logical
Server
Layer
Legacy
Layer
Figure 1.10. Operational Monitoring: Architectural View
The Recommended Approach
With BPR, the role of Information Technology shifted from simple automation to enabling
radically redesigned processes. Client/server technology, such as OLTP applications serviced
by active databases, is particularly suited to supporting this type of business need. Technology
advances have made it possible to build and modify systems quickly in response to changes
in business processes. New policies, procedures and controls are supported and enforced by
the systems.
In addition, workflow management systems can be used to supplement OLTP
applications. A workflow management system converts business activities into a goal-directed
process that flows through the enterprise in an orderly fashion (see Figure 1.11). The
workflow management system alerts users through the automatic generation of notification
messages or reminders and routes work so that the desired business result is achieved in
an expedited manner.
Figure 1.12 highlights how this approach fits into the Enterprise Architecture.
39. 16 DATA WAREHOUSING, OLAP AND DATA MINING
Figure 1.11. Process Implementation
Decision Support
The Need
It is not possible to anticipate the information requirements of decision makers for the
simple reason that their needs depend on the business situation that they face. Decision-
makers need to review enterprise data from different dimensions and at different levels of
detail to find the source of a business problem before they can attack it. They likewise need
information for detecting business opportunities to exploit.
Decision-makers also need to analyze trends in the performance of the enterprise.
Rather than waiting for problems to present themselves, decision-makers need to proactively
mobilize the resources of the enterprise in anticipation of a business situation.
INFOMOTION
PROCESS IMPLEMENTATION
Transactional
Web Scripts
Informational
Web Scripts
Decision
Support
Applications
Flash
Monitoring
& Reporting
OLTP
Applications
Workflow
Management
Clients
Transactional
Web
Services
Informational
Web
Services
Data
Warehouse
Operational
Data Store
Active
Data base
Workflow
Management
Services
Legacy Systems
VIRTUAL CORP. INFORMATIONAL DECISIONAL OPERATIONAL
Logical
Client
Layer
Logical
Server
Layer
Legacy
Layer
Figure 1.12. Process Implementation: Architectural View
40. THE ENTERPRISE IT ARCHITECTURE 17
Since these information requirements cannot be anticipated, the decision maker often
resorts to reviewing pre-designed inquiries or reports in an attempt to find or derive needed
information. Alternatively, the IT professional is pressured to produce an ad hoc report from
legacy systems as quickly as possible. If unlucky, the IT professional will find the data
needed for the report are scattered throughout different legacy systems. An even unluckier
may find that the processing required to produce the report will have a toll on the operations
of the enterprise.
These delays are not only frustrating both for the decision-maker and the IT professional,
but also dangerous for the enterprise. The information that eventually reaches the decision-
maker may be inconsistent, inaccurate, worse, or obsolete.
The Recommended Approach
Decision support applications (or OLAP) that obtain data from the data warehouse are
recommended for this particular need. The data warehouse holds transformed and integrated
enterprise-wide operational data appropriate for strategic decision-making, as shown in
Figure 1.13. The data warehouse also contains data obtained from external-sources, whenever
this data is relevant to decision-making.
Alert System
Exception Reporting
Data Mining
EIS/DSS
Report
Writers
OLAP
Data
Warehouse
Legacy System 1
Legacy System 2
Legacy System N
Figure 1.13. Decision Support
Decision support applications analyze and make data warehouse information available
in formats that are readily understandable by decision-makers. Figure 1.14 highlights how
this approach fits into the Enterprise Architecture.
Hyperdata Distribution
The Need
Past informational requirements were met by making data available in physical form
through reports, memos, and company manuals. This practice resulted in an overflow of
documents providing much data and not enough information.
41. 18 DATA WAREHOUSING, OLAP AND DATA MINING
Paper-based documents also have the disadvantage of becoming dated. Enterprises
encountered problems in keeping different versions of related items synchronized. There
was a constant need to update, republish and redistribute documents.
(INFO) INFOMOTION (MOTION)
DECISION SUPPORT
Transactional
Web Scripts
Informational
Web Scripts
Decision
Support
Applications
Flash
Monitoring
& Reporting
OLTP
Applications
Workflow
Management
Clients
Transactional
Web
Services
Informational
Web
Services
Data
Warehouse
Operational
Data Store
Active
Data base
Workflow
Management
Services
Legacy Systems
VIRTUAL CORP. INFORMATIONAL DECISIONAL OPERATIONAL
Logical
Client
Layer
Logical
Server
Layer
Legacy
Layer
Figure 1.14. Decision Support: Architectural View
In response to this problem, enterprises made data available to users over a network
to eliminate the paper. It was hoped that users could selectively view the data whenever
they needed it. This approach likewise proved to be insufficient because users still had to
navigate through a sea of data to locate the specific item of information that was needed.
The Recommended Approach
Users need the ability to browse through nonlinear presentations of data. Web technology
is particularly suitable to this need because of its extremely flexible and highly visual
method of organizing information (see Figure 1.15).
Corporate Forms,
Training Materials
Company Profiles,
Product, and
Service Information
Company Policies,
Organizational Setup
Figure 1.15. Hyperdata Distribution
42. THE ENTERPRISE IT ARCHITECTURE 19
Web technology allows users to display charts and figures; navigate through large
amounts of data; visualize the contents of database files; seamlessly navigate across charts,
data, and annotation; and organize charts and figures in a hierarchical manner. Users are
therefore able to locate information with relative ease. Figure 1.16 highlights how this
approach fits into the Enterprise Architecture.
INFOMOTION
HYPERDATA DISTRIBUTION
Transactional
Web Scripts
Informational
Web Scripts
Decision
Support
Applications
Flash
Monitoring
& Reporting
OLTP
Applications
Workflow
Management
Clients
Transactional
Web
Services
Informational
Web
Services
Data
Warehouse
Operational
Data Store
Active
Data base
Workflow
Management
Services
Legacy Systems
VIRTUAL CORP. INFORMATIONAL DECISIONAL OPERATIONAL
Logical
Client
Layer
Logical
Server
Layer
Legacy
Layer
Figure 1.16. Hyperdata Distribution: Architectural View
Virtual Corporation
The Need
A virtual corporation is an enterprise that has extended its business processes to
encompass both its key customers and suppliers. Its business processes are newly redesigned;
its product development or service delivery is accelerated to better meet customer needs and
preferences; its management practices promote new alignments between management and
labor, as well as new linkages among enterprise, supplier and customer. A new level of
cooperation and openness is created and encouraged between the enterprise and its key
business partners.
The Recommended Approach
Partnerships at the enterprise level translate into technological links between the
enterprise and its key suppliers or customers (see Figure 1.17). Information required by
each party is identified, and steps are taken to ensure that this data crosses organizational
boundaries properly. Some organizations seek to establish a higher level of cooperation with
their key business partners by jointly redesigning their business processes to provide greater
value to the customer.
Internet and web technologies are well suited to support redesigned, transactional
processes. Thanks to decreasing Internet costs, improved security measures, improved user-
friendliness, and navigability. Figure 1.18 highlights how this approach fits into the Enterprise
Architecture.
43. 20 DATA WAREHOUSING, OLAP AND DATA MINING
Supplier
Enterprise Customer
Figure 1.17. Virtual Corporation
INFOMOTION
VIRTUAL CORPORATION
Transactional
Web Scripts
Informational
Web Scripts
Decision
Support
Applications
Flash
Monitoring
& Reporting
OLTP
Applications
Workflow
Management
Clients
Transactional
Web
Services
Informational
Web
Services
Data
Warehouse
Operational
Data Store
Active
Data base
Workflow
Management
Services
Legacy Systems
VIRTUAL CORP. INFORMATIONAL DECISIONAL OPERATIONAL
Logical
Client
Layer
Logical
Server
Layer
Legacy
Layer
Figure 1.18. Virtual Corporation: Architectural View
1.6 MIGRATION STRATEGY: HOW DO WE MOVE FORWARD?
The strategies presented in the previous section enable organizations to move from
their current technology architectures into the InfoMotion Enterprise Architecture. This
section describes the tasks for any migration effort.
Review the Current Enterprise Architecture
As simple as this may sound, the starting point is a review of the current Enterprise
Architecture. It is important to have an idea of whatever that is already available before
planning for further achievements.
The IT department or division should have this information readily available, although
it may not necessarily be expressed in terms of the architectural components identified
above. A short and simple exercise of mapping the current architecture of an enterprise to
the architecture described above should quickly highlight any gaps in the current architecture.
44. THE ENTERPRISE IT ARCHITECTURE 21
Identify Information Architecture Requirements
Knowing that the Enterprise IT Architecture has gaps is not sufficient. It is important
to know whether these can be considered real gaps when viewed within the context of the
enterprise’s requirements. Gaps should cause concern only if the absence of an architectural
component prevents the IT infrastructure from meeting present requirements or from
supporting long-term strategies.
For example, if transactional web scripts are not critical to an enterprise given its
current needs and strategies, there should be no cause for concern.
Develop a Migration Plan Based on Requirements
It is not advisable for an enterprise to use this list of architectural gaps to justify a
dramatic overhaul of its IT infrastructure; such an undertaking would be expensive and
would cause unnecessary disruption of business operations. Instead, the enterprise would do
well to develop a migration plan that consciously maps coming IT projects to the InfoMotion
Enterprise Architecture.
The Natural Migration Path
While developing the migration plan, the enterprise should consider the natural migration
path that the InfoMotion architecture implies, as illustrated in Figure 1.19.
Internet
Intranet
Client Server
Legacy
Integration
Figure 1.19. Natural Migration Roadmap
• The legacy layer at the very core of the Enterprise Architecture. For most companies,
this core layer is where the majority of technology investments have been made.
It should also be the starting point of any architecture migration effort, i.e., the
enterprise should start from this core technology before focusing its attention on
newer forms or layers of technology.
• The Legacy Integration layer insulates the rest of the Enterprise Architecture from
the growing obsolescence of the Legacy layer. It also provides the succeeding
technology layers with a more stable foundation for future evolution.
• Each of the succeeding technology layers (i.e., Client/Server, Intranet, Internet)
builds upon its predecessors.
• At the outermost layer, the public Internet infrastructure itself supports the
operations of the enterprise.
45. 22 DATA WAREHOUSING, OLAP AND DATA MINING
The Customized Migration Path
Depending on the priorities and needs of the enterprise, one or more of the migration
scenarios described in the previous section will be helpful starting points. The scenarios
provide generic roadmaps that address typical architectural needs.
The migration plan, however, must be customized to address the specific needs of the
enterprise. Each project defined in the plan must individually contribute to the enterprise
in the short term, while laying the groundwork for achieving long-term enterprise and IT
objectives.
By incrementally migrating its IT infrastructure (one component and one project at a
time), the enterprise will find itself slowly but surely moving towards a modern, resilient
Enterprise Architecture, with minimal and acceptable levels of disruption in operations.
Monitor and Update the Migration Plan
The migration plan must be monitored, and the progress of the different projects fed
back into the planning task. One must not lose sight of the fact that a modern Enterprise
Architecture is a moving target; inevitable new technology renders continuous evolution of
the Enterprise Architecture.
IN Summary
An enterprise has longevity in the business arena only when its products and services
are perceived by its customers to be of value.
Likewise, Information Technology has value in an enterprise only when its cost is
outweighed by its ability to increase and guarantee quality, improve service, cut costs or
reduce cycle time, as depicted in Figure 1.20.
The Enterprise Architecture is the foundation for all Information Technology efforts. It
therefore must provide the enterprise with the ability to:
Value
QualityĂ— Service
CostĂ— CycleTime
=
Figure 1.20. The Value Equation
• distill information of value from the data which surrounds it, which it continuously
generates (information/data); and
• get that information to the right people and processes at the right time (motion).
These requirements form the basis for the InfoMotion equation, shown in Figure 1.21.
Info
Information
DataĂ— Motion
Motion =
Figure 1.21. The InfoMotion Equation
By identifying distinct architectural components and their interrelationships, the
InfoMotion Enterprise Architecture increases the capability of the IT infrastructure to meet
present business requirements while positioning the enterprise to leverage emerging trends,
46. THE ENTERPRISE IT ARCHITECTURE 23
such as data warehousing, in both business and technology. Figure 1.22 shows the InfoMotion
Enterprise Architecture, the elements of which we have discussed.
INFOMOTION
ENTERPRISE ARCHITECTURE
Transactional
Web Scripts
Informational
Web Scripts
Decision
Support
Applications
Flash
Monitoring
& Reporting
OLTP
Applications
Workflow
Management
Clients
Transactional
Web
Services
Informational
Web
Services
Data
Warehouse
Operational
Data Store
Active
Data base
Workflow
Management
Services
Legacy Systems
VIRTUAL CORP. INFORMATIONAL DECISIONAL OPERATIONAL
Logical
Client
Layer
Logical
Server
Layer
Legacy
Layer
Figure 1.22. The InfoMotion Architecture
47. 24
This chapter explains how computing has changed its focus from operational to decisional
concerns. It also defines data warehousing concepts and cites the typical reasons for building
data warehouses.
2.1 GRADUAL CHANGES IN COMPUTING FOCUS
In retrospect, it is easy to see how computing has shifted its focus from operational to
decisional concerns. The differences in operational and decisional information requirements
presented new challenges that old computing practices could not meet. Below, we elaborate
on how this change in computing focus became the impetus for the development of data
warehousing technologies.
Early Computing Focused on Operational Requirements
The Business Cycle (depicted in Figure 2.1) shows that any enterprise must operate at
three levels: operational (i.e., the day-to-day running of the business), tactical (i.e., the
definition of policy and the monitoring of operations) and strategic (i.e., the definition of
organization’s vision, goals and objectives).
Strategic
Tactical
Operational
Strategic
Monitoring
(Decisional Systems)
Policy
Operations
(Operational Systems)
Figure 2.1. The Business Cycle
DATA WAREHOUSE CONCEPTS
2
CHAPTER
48. In Chapter 1, it is noted that much of the effort and money in computing has been
focused on meeting the operational business requirements of enterprises. After all, without
the OLTP applications that records thousands, even millions of discrete transactions each
day, it would not be possible for any enterprise to meet customer needs while enforcing
business policies consistently. Nor would it be possible for an enterprise to grow without
significantly expanding its manpower base.
With operational systems deployed and day-to-day information needs being met by the
OLTP systems, the focus of computing has over the recent years shifted naturally to meeting
the decisional business requirements of an enterprise. Figure 2.1 illustrates the business
cycle as it is viewed today.
Decisional Requirements Cannot be Fully Anticipated
Unfortunately, it is not possible for IT professionals to anticipate the information
requirements of an enterprise’s decision-makers, for the simple reason that their information
needs and report requirements change as the business situation changes.
Decision-makers themselves cannot be expected to know their information requirements
ahead of time; they review enterprise data from different perspectives and at different levels
of detail to find and address business problems as the problems arise. Decision-makers also
need to look through business data to identify opportunities that can be exploited. They
examine performance trends to identify business situations that can provide competitive
advantage, improve profits, or reduce costs. They analyze market data and make the tactical
as well as strategic decisions that determine the course of the enterprise.
Operational Systems Fail to Provide Decisional Information
Since these information requirements cannot be anticipated, operational systems (which
correctly focus on recording and completing different types of business transactions) are
unable to provide decision-makers with the information they need. As a result, business
managers fall back on the time-consuming, and often frustrating process of going through
operational inquiries or reports already supported by operational systems in an attempt to
find or derive the information they really need. Alternatively, IT professionals are pressured
to produce an adhoc report from the operational systems as quickly as possible.
It will not be unusual for the IT professional to find that the data needed to produce
the report are scattered throughout different operational systems and must first be carefully
integrated. Worse, it is likely that the processing required to extract the data from each
operational system will demand so much of the system resources that the IT professional
must wait until non-operational hours before running the queries required to produce the
report.
Those delays are not only time-consuming and frustrating both for the IT professionals
and the decision-makers, but also dangerous for the enterprise. When the report is finally
produced, the data may be inconsistent, inaccurate, or obsolete. There is also the very real
possibility that this new report will trigger the request for another adhoc report.
Decisional Systems have Evolved to Meet Decisional Requirements
Over the years, decisional systems have been developed and implemented in the hope
of meeting these information needs. Some enterprises have actually succeeded in developing
49. 26 DATA WAREHOUSING, OLAP AND DATA MINING
and deploying data warehouses within their respective organizations, long before the term
data warehouse became fashionable.
Most decisional systems, however, have failed to deliver on their promises. This book
introduces data warehousing technologies and shares lessons learnt from the success and
failures of those who have been on the “bleeding edge.”
2.2 DATA WAREHOUSE CHARACTERISTICS AND DEFINITION
A data warehouse can be viewed as an information system with the following attributes:
• It is a database designed for analytical tasks, using data from multiple applications.
• It supports a relatively small number of users with relatively long interactions.
• Its usage is read-intensive.
• Its content is periodically updated (mostly additions).
• It contains current and historical data to provide a historical perspective of
information.
• It contains a few large tables.
Each query frequently results in a large results set and involves frequent full table scan
and multi-table joins.
What is a data warehouse? William H. Inmon in Building the Data Warehouse (QED
Technical Publishing Group, 1992 ISBN: 0-89435-404-3) defines a data warehouse as “a
collection of integrated subject-oriented databases designed to supply the information required
for decision-making.”
A more thorough look at the above definition yields the following observations.
Integrated
A data warehouse contains data extracted from the many operational systems of the
enterprise, possibly supplemented by external data. For example, a typical banking data
warehouse will require the integration of data drawn from the deposit systems, loan systems,
and the general ledger.
Each of these operational systems records different types of business transactions and
enforces the policies of the enterprise regarding these transactions. If each of the operational
systems has been custom built or an integrated system is not implemented as a solution,
then it is unlikely that these systems are integrated. Thus, Customer A in the deposit
system and Customer B in the loan system may be one and the same person, but there is
no automated way for anyone in the bank to know this. Customer relationships are managed
informally through relationships with bank officers.
A data warehouse brings together data from the various operational systems to provide
an integrated view of the customer and the full scope of his or her relationship with the
bank.
Subject Oriented
Traditional operational systems focus on the data requirements of a department or
division, producing the much-criticized “stovepipe” systems of model enterprises. With the
50. DATA WAREHOUSE CONCEPTS 27
advent of business process reengineering, enterprises began espousing process-centered teams
and case workers. Modern operational systems, in turn, have shifted their focus to the
operational requirements of an entire business process and aim to support the execution of
the business process from start to finish.
A data warehouse goes beyond traditional information views by focusing on enterprise-
wide subjects such as customers, sales, and profits. These subjects span both organizational
and process boundaries and require information from multiple sources to provide a complete
picture.
Databases
Although the term data warehousing technologies is used to refer to the gamut of
technology components that are required to plan, develop, manage, implement, and use a data
warehouse, the term data warehouse itself refers to a large, read-only repository of data.
At the very heart of every data warehouse lie the large databases that store the integrated
data of the enterprise, obtained from both internal and external data sources. The term
internal data refers to all data that are extracted from the operational systems of the
enterprise. External data are data provided by third-party organizations, including business
partners, customers, government bodies, and organizations that choose to make a profit by
selling their data (e.g., credit bureaus).
Also stored in the databases are the metadata that describe the contents of the data
warehouse. A more thorough discussion on metadata and their role in data warehousing is
provided in Chapter 3.
Required for Decision-Making
Unlike the databases of operational systems, which are often normalized to preserve
and maintain data integrity, a data warehouse is designed and structured in a demoralized
manner to better support the usability of the data warehouse. Users are better able to
examine, derive, summarize, and analyze data at various levels of detail, over different
periods of time, when using a demoralized data structure.
The database is demoralized to mimic a business user’s dimensional view of the business.
For example, while a finance manager is interested in the profitability of the various products
of a company, a product manager will be more interested in the sales of the product in the
various sales regions. In data warehousing parlance, users need to “slice and dice” through
different areas of the database at different levels of detail to obtain the information they
need. In this manner, a decision-maker can start with a high-level view of the business, then
drill down to get more detail on the areas that require his attention, or vice versa.
Each Unit of Data is Relevant to a Point in Time
Every data warehouse will inevitably have a Time dimension; each data item {also
called facts or measures) in the data warehouse is time-stamped to support queries or
reports that require the comparison of figures from prior months or years.
The time-stamping of each fact also makes it possible for decision-makers to recognize
trends and patterns in customer or market behavior over time.
51. 28 DATA WAREHOUSING, OLAP AND DATA MINING
A Data Warehouse Contains both Atomic and Summarized Data
Data warehouses hold data at different levels of detail. Data at the most detailed level,
i.e., the atomic level, are used to derive the summarized aggregated values. Aggregates
(presummarized data) are stored in the warehouse to speed up responses to queries at
higher levels of granularity.
If the data warehouse stores data only at summarized levels, its users will not be able
to drill down on data items to get more detailed information. However, the storage of very
detailed data results in larger space requirements.
2.3 THE DYNAMIC, AD HOC REPORT
The most ideal scenario for enterprise decision-makers (and for IT professionals) is to
have a repository of data and a set of tools that will allow decision-makers to create their
own set of dynamic reports. The term dynamic report refers to a report that can be quickly
modified by its user to present either greater or lesser detail, without any additional
programming required. Dynamic reports are the only kind of reports that provide true, ad-
hoc reporting capabilities. Figure 2.2 presents an example of a dynamic report.
For Current Year, 2Q
Sales Region Targets Actuals
(’000s) (’000s)
Asia 24,000 25,550
Europe 10,000 12,200
North America 8,000 2,000
Africa 5,600 6,200
Figure 2.2. The Dynamic Report–Summary View
A decision-maker should be able to start with a short report that summarizes the
performance of the enterprise. When the summary calls attention to an area that bears
closer inspecting, the decision-maker should be able to point to that portion of the report,
then obtain greater detail on it dynamically, on an as-needed basis, with no further
programming. Figure 2.3 presents a detailed view of the summary shown in Figure 2.2,
For Current Year, 2Q
Sales Region Country Targets (’000s) Actuals (’000s)
Asia Philippines 14,000 15,050
Hong Kong 10,000 10,500
Europe France 4,000 4,050
Italy 6,000 8,150
North America United States 1,000 1,500
Canada 7,000 500
Africa Egypt 5,600 6,200
Figure 2.3. The Dynamic Report–Detailed View
52. DATA WAREHOUSE CONCEPTS 29
By providing business users with the ability to dynamically view more or less of the
data on an ad hoc, as needed basis, the data warehouse eliminates delays in getting
information and removes the IT professional from the report-creation loop.
2.4 THE PURPOSES OF A DATA WAREHOUSE
At this point, it is helpful to summarize the typical reasons, the enterprises undertake
data warehousing initiatives.
To Provide Business Users with Access to Data
The data warehouse provides access to integrated enterprise data previously locked
away in unfriendly, difficult-to-access environments. Business users can now establish, with
minimal effort, a secured connection to the warehouse through their desktop PC. Security
is enforced either by the warehouse front-end application, or by the server database, or by
the both.
Because of its integrated nature, a data warehouse spares business users from the need
to learn, understand, or access operational data in their native environments and data
structures.
To Provide One Version of the Truth
The data in the data warehouse are consistent and quality assured before being released
to business users. Since a common source of information is now used, the data warehouse
puts to rest all debates about the veracity of data used or cited in meetings. The data
warehouse becomes the common information resource for decisional purposes throughout
the organization.
Note that “one version of the truth” is often possible only after much discussion and
debate about the terms used within the organization. For example, the term customer can
have different meanings to different people—it is not unusual for some people to refer to
prospective clients as “customers,” while others in the same organization may use the term
“customers” to mean only actual, current clients.
While these differences may seem trivial at the first glance, the subtle nuances that
exist depending on the context may result in misleading numbers and ill-informed decisions.
For example, when the Western Region sales manager asks for the number of customers,
he probably means the “number of customers from the Western Region,” not the “number
of customers served by the entire company.”
To Record the Past Accurately
Many of the figures arid numbers that managers receive have little meaning unless
compared to historical figures. For example, reports that compare the company’s present
performance with that of the last year’s are quite common. Reports that show the company’s
performance for the same month over the past three years are likewise of interest to
decision-makers.
The operational systems will not be able to meet this kind of information need for a
good reason. A data warehouse should be used to record the past accurately, leaving the
OLTP systems free to focus on recording current transactions and balances. Actual historical
53. 30 DATA WAREHOUSING, OLAP AND DATA MINING
values are neither stored on the operational system nor derived by adding or subtracting
transaction values against the latest balance. Instead, historical data are loaded and integrated
with other data in the warehouse for quick access.
To Slice and Dice Through Data
As stated earlier in this chapter, dynamic reports allow users to view warehouse data
from different angles, at different levels of detail business users with the means and the
ability to slice and dice through warehouse data can actively meet their own information
needs.
The ready availability of different data views also improves business analysis by reducing
the time and effort required to collect, format, and distill information from data.
To Separate Analytical and Operational Processing
Decisional processing and operational information processing have totally divergent
architectural requirements. Attempts to meet both decisional and operational information
needs through the same system or through the same system architecture merely increase
the brittleness of the IT architecture and will create system maintenance nightmares.
Data warehousing disentangles analytical from operational processing by providing a
separate system architecture for decisional implementations. This makes the overall IT
architecture of the enterprise more resilient to changing requirements.
To Support the Reengineering of Decisional Processes
At the end of each BPR initiative come the projects required to establish the technological
and organizational systems to support the newly reengineered business process.
Although reengineering projects have traditionally focused on operational processes,
data warehousing technologies make it possible to reengineer decisional business processes
as well. Data warehouses, with their focus on meeting decisional business requirements, are
the ideal systems for supporting reengineered decisional business processes.
2.5 DATA MARTS
A discussion of data warehouses is not complete without a note on data marts. The
concept of the data mart is causing a lot of excitement and attracts much attention in the
data warehouse industry. Mostly, data marts are presented as an inexpensive alternative
to a data warehouse that takes significantly less time and money to build. However, the
term data mart means different things to different people. A rigorous definition of this term
is a data store that is subsidiary to a data warehouse of integrated data. The data mart is
directed at a partition of data (often called a subject area) that is created for the use of a
dedicated group of users. A data mart might, in fact, be a set of denormalized, summarized,
or aggregated data. Sometimes, such a set could be placed on the data warehouse database
rather than a physically separate store of data. In most instances, however, the data mart
is a physically separate store of data and is normally resident on a separate database server,
often on the local area enterprises relational OLAP technology which creates highly
denormalized star schema relational designs or hypercubes of data for analysis by groups
of users with a common interest in a limited portion of the database. In other cases, the data
warehouse architecture may incorporate data mining tools that extract sets of data for a
54. DATA WAREHOUSE CONCEPTS 31
particular type of analysis. All these type of data marts, called dependent data marts
because their data content is sourced from the data warehouse, have a high value because
no matter how many are deployed and no matter how many different enabling technologies
are used, the different users are all accessing the information views derived from the same
single integrated version of the data.
Unfortunately, the misleading statements about the simplicity and low cost of data
marts sometimes result in organizations or vendors incorrectly positioning them as an
alternative to the data warehouse. This viewpoint defines independent data marts that in
fact represent fragmented point solutions to a range of business problems in the enterprise.
This type of implementation should rarely be deployed in the context of an overall technology
of applications architecture. Indeed, it is missing the ingredient that is at the heart of the
data warehousing concept: data integration. Each independent data mart makes its own
assumptions about how to consolidate the data, and the data across several data marts may
not be consistent.
Moreover, the concept of an independent data mart is dangerous – as soon as the first
data mart is created, other organizations, groups, and subject areas within the enterprise
embark on the task of building their own data marts. As a result, an environment is created
in which multiple operational systems feed multiple non-integrated data marts that are
often overlapping in data content, job scheduling, connectivity, and management. In other
words, a complex many-to-one problem of building a data warehouse is transformed from
operational and external data sources to a many-to-many sourcing and management
nightmare.
Another consideration against independent data marts is related to the potential
scalability problem: the first simple and inexpensive data mart was most probably designed
without any serious consideration about the scalability (for example, an expensive parallel
computing platform for an “inexpensive” and “small” data mart would not be considered).
But, as usage begets usage, the initial small data mart needs to grow (i.e., in data sizes and
the number of concurrent users), without any ability to do so in a scalable fashion.
It is clear that the point-solution-independent data mart is not necessarily a bad thing,
and it is often a necessary and valid solution to a pressing business problem, thus achieving
the goal of rapid delivery of enhanced decision support functionality to end users. The
business drivers underlying such developments include:
• Extremely urgent user requirements.
• The absence of a budget for a full data warehouse strategy.
• The absence of a sponsor for an enterprise wide decision support strategy.
• The decentralization of business units.
• The attraction of easy-to-use tools and a mind-sized project.
To address data integration issues associated with data marts, the recommended
approach proposed by Ralph Kimball is as follows. For any two data mart in an enterprise,
the common dimensions must conform to the equality and roll-up rule, which states that
these dimensions are either the same or that one is a strict roll-up of another.
Thus, in a retail store chain, if the purchase orders database is one data mart and the
sales database is another data mart, the two data marts will form a coherent part of an
76. Welcome to our website – the perfect destination for book lovers and
knowledge seekers. We believe that every book holds a new world,
offering opportunities for learning, discovery, and personal growth.
That’s why we are dedicated to bringing you a diverse collection of
books, ranging from classic literature and specialized publications to
self-development guides and children's books.
More than just a book-buying platform, we strive to be a bridge
connecting you with timeless cultural and intellectual values. With an
elegant, user-friendly interface and a smart search system, you can
quickly find the books that best suit your interests. Additionally,
our special promotions and home delivery services help you save time
and fully enjoy the joy of reading.
Join us on a journey of knowledge exploration, passion nurturing, and
personal growth every day!
ebookbell.com