SlideShare a Scribd company logo
DATA WAREHOUSING   AND DATA MINING M.Mageshwari,Lecturer Lecturer,Department of CE M.S.P.V.L Polytechnic College
Course Overview The course: what and how 0. Introduction I. Data Warehousing II. Decision Support and OLAP III. Data Mining IV. Looking Ahead Demos and Labs
A producer wants to know…. Which are our  lowest/highest margin  customers ? Who are my customers  and what products  are they buying? Which customers  are most likely to go  to the competition ?   What impact will  new products/services  have on revenue  and margins? What product prom- -otions have the biggest  impact on revenue? What is the most  effective distribution  channel?
Data, Data everywhere yet ... I can’t find the data I need data is scattered over the network many versions, subtle differences I can’t get the data I need need an expert to get the data I can’t understand the data I found available data poorly documented I can’t use the data I found results are unexpected data needs to be transformed from one form to other
What is a Data Warehouse? A single, complete and consistent store of data obtained from a variety of different sources made available to end users in a what they can understand and use in a business context.
What are the users saying... Data should be integrated across the enterprise Summary data has a real value to the organization Historical data holds the key to understanding data over time What-if capabilities are required
What is Data Warehousing? A  process  of transforming  data  into  information  and making it available to users in a timely enough manner to make a difference Data Information
Evolution 60’s:  Batch reports hard to find and analyze information inflexible and expensive, reprogram every new request 70’s: Terminal-based DSS(Decision Support System and EIS (executive information systems) still inflexible, not integrated with desktop tools
Data Warehouse Structure base customer (1985-87) custid, from date, to date, name, phone, dob base customer (1988-90) custid, from date, to date, name, credit rating, employer customer activity (1986-89) -- monthly summary customer activity detail (1987-89) custid, activity date, amount, clerk id, order no customer activity detail (1990-91) custid, activity date, amount, line item no, order no
Definition of DSS  Decision support system is defined as a system that helps the decision makers in various levels to take decisions This system uses data, analytical models and user friendly software for taking decision
Definition of EIS  Executive information system(EIS) is defined as a system that helps the high level executives to take policy decisions. This system user higher level data, analytical models and user friendly software for taking decisions.
Evolution 80’s:  Desktop data access and analysis tools query tools, spreadsheets, GUIs easier to use, but only access operational databases 90’s:  Data warehousing with integrated OLAP(online analytical processing)engines and tools
Data Warehousing --  It is a process Technique for assembling and managing data from various sources for the purpose of answering business questions. Thus making decisions that were not previous possible A decision support database maintained separately from the organization’s operational database
Characteristics of Data Warehouse A data warehouse is a  subject-oriented integrated time-varying non-volatile collection of data that is used primarily in organizational decision making.
  ]\ Subject-Oriented A data warehouse is organized around the major subjects of the organization such as customer, supplier, product, sales, etc.., Data warehouse provides a simple and concise view around a particular subject by excluding data that are not useful to the decision support process .
Integrated A data warehouse is constructed by integrating multiple sources of data such as relational database, flat files and on-line transaction records. Data cleaning and data integration techniques are applied to ensure consistency in naming conventions, encoding structures, attributes etc..,
Time Variant Data warehouse maintains records of both historical and current data. So it can provide information in a historical perspective
Non Volatile Once data warehouse is loaded with data, it is not possible to perform any modifications in the stored data.
Explorers, Farmers and Tourists Explorers:  Seek out the unknown and previously unsuspected rewards hiding in the detailed data Farmers:  Harvest information from known access paths Tourists:  Browse information about Tourists
Application-Orientation vs. Subject-Orientation Application-Orientation Operational Database Loans Credit  Card Trust Savings Subject-Orientation Data Warehouse Customer Vendor Product Activity
Functioning of Data warehousing Data Source Cleaning Transformation Data Warehouse New Update
Collection Data Data warehousing collect data from various data sources such as relational data base, flat files and on-line records The collection of data are stored in database inside the warehouse. The type of data collection used depends on the architecture of the ware house.
Integration Each and every data source uses from different schema. Data warehouse get data from different source with different schema and convert the data from various sources into a common integrated schema .
Star Schema A single fact table and for each dimension one dimension table Does not capture hierarchies directly T i m e p r o d c u s t c i t y f a c t date, custno, prodno, cityname,  ...
Snowflake schema Represent dimensional hierarchy directly by normalizing tables.  Easy to maintain and saves storage T i m e p r o d c u s t c i t y f a c t date, custno, prodno, cityname,  ... r e g i o n
Data Warehouse for Decision Support & OLAP Putting Information technology to help the knowledge worker make faster and better decisions Which of my customers are most likely to go to the competition? What product promotions have the biggest impact on revenue? How  did the share price of software companies correlate with profits over last 10 years?
Decision Support Used to manage and control business Data is historical or point-in-time Optimized for inquiry rather than update Use of the system is loosely defined and can be ad-hoc Used by managers and end-users to understand the business and make judgments
OLAP(Online analytical processing) A data warehouse stores data , but OLAP transform the data warehouse data into specific meaningful information. Therefore OLAP provides a user friendly  environment for interactive data analysis.
OLAP User Result Result set Request SQL DATA WAREHOUSE OLAP SERVER FRONT END TOOL
OLAP OPERATION on the Multidimensional data Roll-up(GROUP) Drill down(Less) Slice and Dice(Pice) Pivot(rotate)
TYPES OF OLAP MOLAP(MULTIDIMENSIONAL OLAP) ROLAP(RELATIONAL ROLAP)
Multi-dimensional Data “ Hey…I sold $100M worth of goods” Dimensions:  Product, Region, Time Hierarchical summarization paths Product  Region  Time Industry  Country  Year Category  Region  Quarter  Product  City  Month  Week   Office  Day Month 1  2 3  4  7 6  5  Product Toothpaste  Juice Cola Milk  Cream Soap  Region W S  N
Data Warehouse Architecture Data Warehouse  Engine Optimized Loader Extraction Cleansing Analyze Query Metadata Repository Relational Databases Legacy Data Purchased  Data ERP Systems
Architecture of data warehousing External data Data Acquisition Data Manager Warehouse data External data Data Dictionary Information Directiory Warehouse data Middleware Design Management Data Access
Architecture of
Design Component The data warehouse designer design the database of the data warehouse and the warehouse administrator manages the data warehouse. The designer and administrator use the design component to design and store data
Types of design Bottom-up design Business value can be returned as quickly as the first  data marts  can be created Top-down design Atomic data, that is, data at the lowest level of detail, are stored in the data warehouse. Hybrid design
Data Manager Component The database in the data warehouse uses the data manager component for managing and accessing the data stored in the data warehouse . Rdbms Mdbms
Management Component Administering data acquisition operation Managing backup copies of the data Recovering the lost data  Providing security to the data stored in the data warehouse. Authorizing access to the data stored in the data warehouse.
Data Acquisition Component This component acquires data from various sources by using the data acquisition applications The data acquisition applications are based on rules that are defined by the data warehouse developers.
The operation performed during data clean up Restructuring the records and fields of the database tables. Removing the irrelevant and redundant data obtaining and adding missing data. Verifying integrity and consistency of the data
The operation performed on the data for enhancement are Decoding and translating the values in fields. Summarizing data Calculating the derived values.
Information directory Component This component helps the end users to know the details of the data stored in the data warehouse. This is done with the help of the data about the data named meta data. Technical data Business data
Middleware Component This components connect to the local databases. Analytical server used to analyze multidimensional data. Intelligent data warehousing middleware to control the access to the warehouse database.
Data Mart Data mart is a database that contains data needed for a  small group of users for their  own department needs. Dependent data mart Independent data mart
Different between  data warehouse and data mart Data warehouse Data Mart Data mart is therefore useful for small organizations with very few departments data warehousing is suitable to support an entire corporate environment. If you listen to some vendors, you may be left thinking that building data warehouses is a waste of time.  data mart vendor that tells you this are looking out for their own best interests.  This supports the entire information requirement of an organization. This support the information requirement of a department in an organization This has large model, wider implementation, large data and more number of users. This has small data model, shorter implementation, less data and some users.
Advantages of data mart Since each department has its own data mart, the departments can summarize, sort , select structure etc their own department’s data. This will not confused with any other department. The department can do whatever DSS processing they want. The processing cost and storage are less that the data warehouse. The department can select a software for their data mart. it is powerful to fit their needs.
Data warehousing life cycle Design Enhance prototype Operate deploy
Data Modeling(Multi-dimensional Database) “ Hey…I sold $100M worth of goods” Dimensions:  Product, Region, periods Hierarchical summarization paths Product  Region  Period Industry  Country  Year Category  Region  Quarter  Product  City  Month  Week   Office  Day Month 1  2 3  4  7 6  5  Product Toothpaste  Juice Cola Milk  Cream Soap  Region W S  N
Building of data warehouse The builder must forecast the usage of the warehouse by the users. The design should support accessing data with any meaningful values of the attributes. To build a good data warehouse data acquisition process must follow the steps given flow extract the data from multiple heterogeneous sources Format the data for consistency within the warehouse. The data must be cleaned to ensure validity The data must be converted from relational ,object oriented ,hierarchy model to a multidimensional model. The data are loaded into the warehouse. Good monitoring tools are necessary to recover from incorrect load.
Data warehouse and views Data warehouse is a permanent storage of data in multidimensional tables. View are temporarily created when needed using data warehouse. This is used for decision support system .
Different between  Data warehouse and views Data warehouse Views Data warehouse is a permanent storage data. Views are created from warehouse data when needed and it is not permanent Data warehouse are multidimensional Views are relational Data warehouse can be indexed to maximize performance. Views cannot be indexed. Data warehouse provides specific support to a functionality Views cannot give specific support to a functionality. Data warehouse provide large amount of data. Views are created  by extracting minimum data from data warehouse.
Data warehouse Future New techniques must be introduced in data cleaning ,indexing and partitioning. The manual operation involved in data acquisition ,management data quality and performance maximization must be automated. Proper business rules must be developed and incorporated in warehouse creation and maintenance process.
Data Mining Data mining is sorting through data to identify patterns and establish relationships.
Data Mining (cont.)
Data Mining works with Warehouse Data Data Warehousing provides the Enterprise with a memory Data Mining provides the Enterprise with intelligence
Data Mining Motivation “ The key in business is to know something that nobody else knows .” —  Aristotle Onassis “ To understand is to perceive patterns.”   —  Sir Isaiah Berlin PHOTO:  LUCINDA DOUGLAS-MENZIES PHOTO:  HULTON-DEUTSCH COLL
Application Areas Industry Application Finance Credit Card Analysis Insurance Claims, Fraud Analysis Telecommunication Call record analysis Consumer goods promotion analysis Data Service providers Value added data Utilities Power usage analysis
Data Mining in Use The US Government uses Data Mining to track fraud A Supermarket becomes an information broker Basketball teams use it to track game strategy Cross Selling Warranty claims Routing Holding on to Good Customers Weeding out Bad Customers
What is data mining technology The process of extracting or finding hidden knowledge from large database is called data mining. Ex: Age 21------   we can understand he is major data information
Data Mining Technology Cleaning and Integration Databases Data Warehouse Flat Files Patterns Knowledge Selection and transformation Data Mining
Data Mining Technology various step Data cleaning   To remove noise and inconsistent data Data integration   Data from multiple sources are combined Data selection   relevant data are retrieved from the database for analysis Data transformation   The selected data are made for mining by performing aggregation operations Data mining   Intelligent methods are applied to extract data patterns Pattern evaluation   Identify the needed patterns Knowledge presentation   present  the mined knowledge to the user
Loading the Warehouse Cleaning the data before it is loaded
Data Integration Across Sources Trust Credit card Savings Loans Same data  different name Different data  Same name Data found here  nowhere else Different keys same data
Data Transformation Example encoding unit field appl A - balance appl B - bal appl C - currbal appl D - balcurr appl A - pipeline - cm appl B - pipeline - in appl C - pipeline - feet appl D - pipeline - yds appl A - m,f appl B - 1,0 appl C - x,y appl D - male, female Data Warehouse
Structuring/Modeling Issues
Data Warehouse vs. Data Marts
From the Data Warehouse to Data Marts Departmentally Structured Individually Structured Data Warehouse Organizationally Structured Less More History Normalized Detailed Data Information
Data Warehouse and Data Marts OLAP Data Mart Lightly summarized Departmentally structured Organizationally structured Atomic Detailed Data Warehouse Data
Characteristics of the Departmental Data Mart OLAP Small Flexible Customized by Department Source is departmentally structured data warehouse
Techniques for Creating Departmental Data Mart OLAP Subset Summarized Superset Indexed Arrayed Sales Mktg. Finance
Data Mart Centric Data Marts Data Sources Data Warehouse
True Warehouse Data Marts Data Sources Data Warehouse
II.  On-Line Analytical Processing (OLAP) Making Decision Support Possible
What Is OLAP? Online Analytical Processing - coined by  EF Codd in 1994 paper contracted by  Arbor Software Generally synonymous with earlier terms such as Decisions Support, Business Intelligence, Executive Information System OLAP = Multidimensional Database MOLAP:  Multidimensional OLAP (Arbor Essbase, Oracle Express) ROLAP:  Relational OLAP (Informix MetaCube, Microstrategy DSS Agent)
The OLAP Market  Rapid growth in the enterprise market 1995:  $700 Million 1997:  $2.1 Billion Significant consolidation activity among major DBMS vendors 10/94:  Sybase acquires ExpressWay 7/95:  Oracle acquires Express  11/95:  Informix acquires Metacube 1/97:  Arbor partners up with IBM 10/96:  Microsoft acquires Panorama Result:  OLAP shifted from small vertical niche to mainstream DBMS category
Strengths of OLAP It is a powerful visualization paradigm It provides fast, interactive response times It is good for analyzing time series It can be useful to find some clusters and outliers Many vendors offer OLAP tools
OLAP Is FASMI Fast Analysis Shared Multidimensional Information
Data Cube Lattice Cube lattice ABC   AB  AC  BC   A  B  C   none Can materialize some groupbys, compute others on demand Question:  which groupbys to materialze? Question:  what indices to create Question:  how to organize data (chunks, etc)
Visualizing Neighbors is simpler
A Visual Operation:  Pivot (Rotate) 10 47 30 12 Juice Cola Milk  Cream NY LA SF 3/1  3/2  3/3 3/4 Date Month Region Product
“ Slicing and Dicing” Product Sales Channel Regions Retail Direct Special Household Telecomm Video Audio India Far East Europe The Telecomm Slice
Roll-up and Drill Down Sales Channel Region Country State  Location Address Sales Representative Roll Up Higher Level of Aggregation Low-level Details Drill-Down
Nature of OLAP Analysis Aggregation -- (total sales, percent-to-total) Comparison -- Budget vs. Expenses Ranking -- Top 10, quartile analysis Access to detailed and aggregate data Complex criteria specification Visualization
Organizationally Structured Data Different Departments look at the same detailed data in different ways.  Without the detailed, organizationally structured data as a foundation, there is no reconcilability of data marketing manufacturing sales finance
Multidimensional Spreadsheets Analysts need spreadsheets that support pivot tables (cross-tabs) drill-down and roll-up slice and dice sort selections derived attributes Popular in retail domain
OLAP Operations © Prentice Hall Single Cell Multiple Cells Slice Dice Roll Up Drill Down
Relational OLAP:  3 Tier DSS Store atomic data in industry standard RDBMS. Generate SQL execution plans in the ROLAP engine to obtain OLAP functionality. Obtain multi-dimensional reports from the DSS Client. Data Warehouse ROLAP Engine Decision Support Client Database Layer Application Logic Layer Presentation Layer
MD-OLAP: 2 Tier DSS MDDB Engine MDDB Engine Decision Support Client Database Layer Application Logic Layer Presentation Layer Store atomic data in a proprietary data structure (MDDB), pre-calculate as many outcomes as possible, obtain OLAP functionality via proprietary algorithms running against this data. Obtain multi-dimensional reports from the DSS Client.
MSPVL Polytechnic College Pavoorchatram

More Related Content

PPTX
Microsoft Data Platform - What's included
PDF
How to Create a Data Analytics Roadmap
 
PPTX
Big data
PPTX
Basic Introduction of Data Warehousing from Adiva Consulting
PDF
Make Data Work for You
PPT
Data Warehousing and Data Mining
PDF
Emerging Trends in Data Architecture – What’s the Next Big Thing?
PDF
Digital Transformation Strategy PowerPoint Presentation Slides
Microsoft Data Platform - What's included
How to Create a Data Analytics Roadmap
 
Big data
Basic Introduction of Data Warehousing from Adiva Consulting
Make Data Work for You
Data Warehousing and Data Mining
Emerging Trends in Data Architecture – What’s the Next Big Thing?
Digital Transformation Strategy PowerPoint Presentation Slides

What's hot (20)

PPTX
Data Lakehouse, Data Mesh, and Data Fabric (r1)
PDF
Putting the Ops in DataOps: Orchestrate the Flow of Data Across Data Pipelines
PDF
Crafting a Winning Reporting Strategy with Oracle Cloud
PPT
Why Data Virtualization? An Introduction by Denodo
PPTX
Data analytics
PPTX
Data warehouse logical design
PPT
Introduction to Business Intelligence
PPTX
Oltp vs olap
PPTX
Data Lake Overview
PDF
Building a Data Strategy – Practical Steps for Aligning with Business Goals
PPTX
Differentiate Big Data vs Data Warehouse use cases for a cloud solution
PDF
PPTX
Data Vault and DW2.0
PPTX
Data analytics and powerbi intro
PDF
Big Data & Analytics (Conceptual and Practical Introduction)
PDF
Snowflake for Data Engineering
PDF
Data Architecture Strategies: Data Architecture for Digital Transformation
PDF
Data Profiling, Data Catalogs and Metadata Harmonisation
PPTX
Business intelligence- Components, Tools, Need and Applications
PPTX
Better decision making with proper business intelligence
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Putting the Ops in DataOps: Orchestrate the Flow of Data Across Data Pipelines
Crafting a Winning Reporting Strategy with Oracle Cloud
Why Data Virtualization? An Introduction by Denodo
Data analytics
Data warehouse logical design
Introduction to Business Intelligence
Oltp vs olap
Data Lake Overview
Building a Data Strategy – Practical Steps for Aligning with Business Goals
Differentiate Big Data vs Data Warehouse use cases for a cloud solution
Data Vault and DW2.0
Data analytics and powerbi intro
Big Data & Analytics (Conceptual and Practical Introduction)
Snowflake for Data Engineering
Data Architecture Strategies: Data Architecture for Digital Transformation
Data Profiling, Data Catalogs and Metadata Harmonisation
Business intelligence- Components, Tools, Need and Applications
Better decision making with proper business intelligence
Ad

Viewers also liked (19)

PPTX
Extracting data from xml
DOCX
Spm report
DOCX
Cloud Computing
PPTX
Applications
PPT
Android tutorial (2)
PPTX
Leadership
PPTX
Android structure
PDF
Chain Reactions
PPTX
Triggers
PPT
Informatica PowerAnalyzer 4.0 3 of 3
DOCX
Job analysis of a reporter
PDF
Relational algebra1
PPT
Informatica PowerAnalyzer 4.0 2 of 3
PPT
Data Warehouse
DOCX
Software Testing Tool Report
PPTX
Introduction to XML
PDF
Mendelian Randomisation
PPT
Zackman frame work
Extracting data from xml
Spm report
Cloud Computing
Applications
Android tutorial (2)
Leadership
Android structure
Chain Reactions
Triggers
Informatica PowerAnalyzer 4.0 3 of 3
Job analysis of a reporter
Relational algebra1
Informatica PowerAnalyzer 4.0 2 of 3
Data Warehouse
Software Testing Tool Report
Introduction to XML
Mendelian Randomisation
Zackman frame work
Ad

Similar to Dataware housing (20)

PPT
Datawarehousing
PPTX
DATA WAREHOUSING
PPTX
Data warehouse
PPT
IT Ready - DW: 1st Day
PPT
Datawarehouse Overview
PPT
dw_concepts_2_day_course.ppt
PPT
Datawarehouse & bi introduction
PPT
Datawarehouse & bi introduction
PPT
Datawarehouse & bi introduction
PPT
20IT501_DWDM_PPT_Unit_I.ppt
PPT
Data Warehousing Datamining Concepts
PPTX
MIS and Business Functions, TPS/DSS/ESS, MIS and Business Processes, Impact o...
PPT
Gulabs Ppt On Data Warehousing And Mining
PPT
Datawarehousing
PPTX
DATAWAREHOUSE MAIn under data mining for
PPT
11667 Bitt I 2008 Lect4
PPT
20IT501_DWDM_PPT_Unit_I.ppt
PDF
Introduction to Data Warehouse
PPT
What is OLAP -Data Warehouse Concepts - IT Online Training @ Newyorksys
PPT
Data warehouse
Datawarehousing
DATA WAREHOUSING
Data warehouse
IT Ready - DW: 1st Day
Datawarehouse Overview
dw_concepts_2_day_course.ppt
Datawarehouse & bi introduction
Datawarehouse & bi introduction
Datawarehouse & bi introduction
20IT501_DWDM_PPT_Unit_I.ppt
Data Warehousing Datamining Concepts
MIS and Business Functions, TPS/DSS/ESS, MIS and Business Processes, Impact o...
Gulabs Ppt On Data Warehousing And Mining
Datawarehousing
DATAWAREHOUSE MAIn under data mining for
11667 Bitt I 2008 Lect4
20IT501_DWDM_PPT_Unit_I.ppt
Introduction to Data Warehouse
What is OLAP -Data Warehouse Concepts - IT Online Training @ Newyorksys
Data warehouse

Recently uploaded (20)

PDF
Anesthesia in Laparoscopic Surgery in India
PPTX
Cell Structure & Organelles in detailed.
PDF
102 student loan defaulters named and shamed – Is someone you know on the list?
PPTX
The Healthy Child – Unit II | Child Health Nursing I | B.Sc Nursing 5th Semester
PDF
Mark Klimek Lecture Notes_240423 revision books _173037.pdf
PDF
VCE English Exam - Section C Student Revision Booklet
PPTX
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
PPTX
PPH.pptx obstetrics and gynecology in nursing
PDF
Module 4: Burden of Disease Tutorial Slides S2 2025
PDF
Supply Chain Operations Speaking Notes -ICLT Program
PDF
O5-L3 Freight Transport Ops (International) V1.pdf
PDF
Basic Mud Logging Guide for educational purpose
PPTX
BOWEL ELIMINATION FACTORS AFFECTING AND TYPES
PDF
FourierSeries-QuestionsWithAnswers(Part-A).pdf
PPTX
Institutional Correction lecture only . . .
PDF
2.FourierTransform-ShortQuestionswithAnswers.pdf
PDF
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
PPTX
Microbial diseases, their pathogenesis and prophylaxis
PPTX
Pharma ospi slides which help in ospi learning
PDF
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
Anesthesia in Laparoscopic Surgery in India
Cell Structure & Organelles in detailed.
102 student loan defaulters named and shamed – Is someone you know on the list?
The Healthy Child – Unit II | Child Health Nursing I | B.Sc Nursing 5th Semester
Mark Klimek Lecture Notes_240423 revision books _173037.pdf
VCE English Exam - Section C Student Revision Booklet
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
PPH.pptx obstetrics and gynecology in nursing
Module 4: Burden of Disease Tutorial Slides S2 2025
Supply Chain Operations Speaking Notes -ICLT Program
O5-L3 Freight Transport Ops (International) V1.pdf
Basic Mud Logging Guide for educational purpose
BOWEL ELIMINATION FACTORS AFFECTING AND TYPES
FourierSeries-QuestionsWithAnswers(Part-A).pdf
Institutional Correction lecture only . . .
2.FourierTransform-ShortQuestionswithAnswers.pdf
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
Microbial diseases, their pathogenesis and prophylaxis
Pharma ospi slides which help in ospi learning
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student

Dataware housing

  • 1. DATA WAREHOUSING AND DATA MINING M.Mageshwari,Lecturer Lecturer,Department of CE M.S.P.V.L Polytechnic College
  • 2. Course Overview The course: what and how 0. Introduction I. Data Warehousing II. Decision Support and OLAP III. Data Mining IV. Looking Ahead Demos and Labs
  • 3. A producer wants to know…. Which are our lowest/highest margin customers ? Who are my customers and what products are they buying? Which customers are most likely to go to the competition ? What impact will new products/services have on revenue and margins? What product prom- -otions have the biggest impact on revenue? What is the most effective distribution channel?
  • 4. Data, Data everywhere yet ... I can’t find the data I need data is scattered over the network many versions, subtle differences I can’t get the data I need need an expert to get the data I can’t understand the data I found available data poorly documented I can’t use the data I found results are unexpected data needs to be transformed from one form to other
  • 5. What is a Data Warehouse? A single, complete and consistent store of data obtained from a variety of different sources made available to end users in a what they can understand and use in a business context.
  • 6. What are the users saying... Data should be integrated across the enterprise Summary data has a real value to the organization Historical data holds the key to understanding data over time What-if capabilities are required
  • 7. What is Data Warehousing? A process of transforming data into information and making it available to users in a timely enough manner to make a difference Data Information
  • 8. Evolution 60’s: Batch reports hard to find and analyze information inflexible and expensive, reprogram every new request 70’s: Terminal-based DSS(Decision Support System and EIS (executive information systems) still inflexible, not integrated with desktop tools
  • 9. Data Warehouse Structure base customer (1985-87) custid, from date, to date, name, phone, dob base customer (1988-90) custid, from date, to date, name, credit rating, employer customer activity (1986-89) -- monthly summary customer activity detail (1987-89) custid, activity date, amount, clerk id, order no customer activity detail (1990-91) custid, activity date, amount, line item no, order no
  • 10. Definition of DSS Decision support system is defined as a system that helps the decision makers in various levels to take decisions This system uses data, analytical models and user friendly software for taking decision
  • 11. Definition of EIS Executive information system(EIS) is defined as a system that helps the high level executives to take policy decisions. This system user higher level data, analytical models and user friendly software for taking decisions.
  • 12. Evolution 80’s: Desktop data access and analysis tools query tools, spreadsheets, GUIs easier to use, but only access operational databases 90’s: Data warehousing with integrated OLAP(online analytical processing)engines and tools
  • 13. Data Warehousing -- It is a process Technique for assembling and managing data from various sources for the purpose of answering business questions. Thus making decisions that were not previous possible A decision support database maintained separately from the organization’s operational database
  • 14. Characteristics of Data Warehouse A data warehouse is a subject-oriented integrated time-varying non-volatile collection of data that is used primarily in organizational decision making.
  • 15. ]\ Subject-Oriented A data warehouse is organized around the major subjects of the organization such as customer, supplier, product, sales, etc.., Data warehouse provides a simple and concise view around a particular subject by excluding data that are not useful to the decision support process .
  • 16. Integrated A data warehouse is constructed by integrating multiple sources of data such as relational database, flat files and on-line transaction records. Data cleaning and data integration techniques are applied to ensure consistency in naming conventions, encoding structures, attributes etc..,
  • 17. Time Variant Data warehouse maintains records of both historical and current data. So it can provide information in a historical perspective
  • 18. Non Volatile Once data warehouse is loaded with data, it is not possible to perform any modifications in the stored data.
  • 19. Explorers, Farmers and Tourists Explorers: Seek out the unknown and previously unsuspected rewards hiding in the detailed data Farmers: Harvest information from known access paths Tourists: Browse information about Tourists
  • 20. Application-Orientation vs. Subject-Orientation Application-Orientation Operational Database Loans Credit Card Trust Savings Subject-Orientation Data Warehouse Customer Vendor Product Activity
  • 21. Functioning of Data warehousing Data Source Cleaning Transformation Data Warehouse New Update
  • 22. Collection Data Data warehousing collect data from various data sources such as relational data base, flat files and on-line records The collection of data are stored in database inside the warehouse. The type of data collection used depends on the architecture of the ware house.
  • 23. Integration Each and every data source uses from different schema. Data warehouse get data from different source with different schema and convert the data from various sources into a common integrated schema .
  • 24. Star Schema A single fact table and for each dimension one dimension table Does not capture hierarchies directly T i m e p r o d c u s t c i t y f a c t date, custno, prodno, cityname, ...
  • 25. Snowflake schema Represent dimensional hierarchy directly by normalizing tables. Easy to maintain and saves storage T i m e p r o d c u s t c i t y f a c t date, custno, prodno, cityname, ... r e g i o n
  • 26. Data Warehouse for Decision Support & OLAP Putting Information technology to help the knowledge worker make faster and better decisions Which of my customers are most likely to go to the competition? What product promotions have the biggest impact on revenue? How did the share price of software companies correlate with profits over last 10 years?
  • 27. Decision Support Used to manage and control business Data is historical or point-in-time Optimized for inquiry rather than update Use of the system is loosely defined and can be ad-hoc Used by managers and end-users to understand the business and make judgments
  • 28. OLAP(Online analytical processing) A data warehouse stores data , but OLAP transform the data warehouse data into specific meaningful information. Therefore OLAP provides a user friendly environment for interactive data analysis.
  • 29. OLAP User Result Result set Request SQL DATA WAREHOUSE OLAP SERVER FRONT END TOOL
  • 30. OLAP OPERATION on the Multidimensional data Roll-up(GROUP) Drill down(Less) Slice and Dice(Pice) Pivot(rotate)
  • 31. TYPES OF OLAP MOLAP(MULTIDIMENSIONAL OLAP) ROLAP(RELATIONAL ROLAP)
  • 32. Multi-dimensional Data “ Hey…I sold $100M worth of goods” Dimensions: Product, Region, Time Hierarchical summarization paths Product Region Time Industry Country Year Category Region Quarter Product City Month Week Office Day Month 1 2 3 4 7 6 5 Product Toothpaste Juice Cola Milk Cream Soap Region W S N
  • 33. Data Warehouse Architecture Data Warehouse Engine Optimized Loader Extraction Cleansing Analyze Query Metadata Repository Relational Databases Legacy Data Purchased Data ERP Systems
  • 34. Architecture of data warehousing External data Data Acquisition Data Manager Warehouse data External data Data Dictionary Information Directiory Warehouse data Middleware Design Management Data Access
  • 36. Design Component The data warehouse designer design the database of the data warehouse and the warehouse administrator manages the data warehouse. The designer and administrator use the design component to design and store data
  • 37. Types of design Bottom-up design Business value can be returned as quickly as the first data marts can be created Top-down design Atomic data, that is, data at the lowest level of detail, are stored in the data warehouse. Hybrid design
  • 38. Data Manager Component The database in the data warehouse uses the data manager component for managing and accessing the data stored in the data warehouse . Rdbms Mdbms
  • 39. Management Component Administering data acquisition operation Managing backup copies of the data Recovering the lost data Providing security to the data stored in the data warehouse. Authorizing access to the data stored in the data warehouse.
  • 40. Data Acquisition Component This component acquires data from various sources by using the data acquisition applications The data acquisition applications are based on rules that are defined by the data warehouse developers.
  • 41. The operation performed during data clean up Restructuring the records and fields of the database tables. Removing the irrelevant and redundant data obtaining and adding missing data. Verifying integrity and consistency of the data
  • 42. The operation performed on the data for enhancement are Decoding and translating the values in fields. Summarizing data Calculating the derived values.
  • 43. Information directory Component This component helps the end users to know the details of the data stored in the data warehouse. This is done with the help of the data about the data named meta data. Technical data Business data
  • 44. Middleware Component This components connect to the local databases. Analytical server used to analyze multidimensional data. Intelligent data warehousing middleware to control the access to the warehouse database.
  • 45. Data Mart Data mart is a database that contains data needed for a small group of users for their own department needs. Dependent data mart Independent data mart
  • 46. Different between data warehouse and data mart Data warehouse Data Mart Data mart is therefore useful for small organizations with very few departments data warehousing is suitable to support an entire corporate environment. If you listen to some vendors, you may be left thinking that building data warehouses is a waste of time. data mart vendor that tells you this are looking out for their own best interests. This supports the entire information requirement of an organization. This support the information requirement of a department in an organization This has large model, wider implementation, large data and more number of users. This has small data model, shorter implementation, less data and some users.
  • 47. Advantages of data mart Since each department has its own data mart, the departments can summarize, sort , select structure etc their own department’s data. This will not confused with any other department. The department can do whatever DSS processing they want. The processing cost and storage are less that the data warehouse. The department can select a software for their data mart. it is powerful to fit their needs.
  • 48. Data warehousing life cycle Design Enhance prototype Operate deploy
  • 49. Data Modeling(Multi-dimensional Database) “ Hey…I sold $100M worth of goods” Dimensions: Product, Region, periods Hierarchical summarization paths Product Region Period Industry Country Year Category Region Quarter Product City Month Week Office Day Month 1 2 3 4 7 6 5 Product Toothpaste Juice Cola Milk Cream Soap Region W S N
  • 50. Building of data warehouse The builder must forecast the usage of the warehouse by the users. The design should support accessing data with any meaningful values of the attributes. To build a good data warehouse data acquisition process must follow the steps given flow extract the data from multiple heterogeneous sources Format the data for consistency within the warehouse. The data must be cleaned to ensure validity The data must be converted from relational ,object oriented ,hierarchy model to a multidimensional model. The data are loaded into the warehouse. Good monitoring tools are necessary to recover from incorrect load.
  • 51. Data warehouse and views Data warehouse is a permanent storage of data in multidimensional tables. View are temporarily created when needed using data warehouse. This is used for decision support system .
  • 52. Different between Data warehouse and views Data warehouse Views Data warehouse is a permanent storage data. Views are created from warehouse data when needed and it is not permanent Data warehouse are multidimensional Views are relational Data warehouse can be indexed to maximize performance. Views cannot be indexed. Data warehouse provides specific support to a functionality Views cannot give specific support to a functionality. Data warehouse provide large amount of data. Views are created by extracting minimum data from data warehouse.
  • 53. Data warehouse Future New techniques must be introduced in data cleaning ,indexing and partitioning. The manual operation involved in data acquisition ,management data quality and performance maximization must be automated. Proper business rules must be developed and incorporated in warehouse creation and maintenance process.
  • 54. Data Mining Data mining is sorting through data to identify patterns and establish relationships.
  • 56. Data Mining works with Warehouse Data Data Warehousing provides the Enterprise with a memory Data Mining provides the Enterprise with intelligence
  • 57. Data Mining Motivation “ The key in business is to know something that nobody else knows .” — Aristotle Onassis “ To understand is to perceive patterns.” — Sir Isaiah Berlin PHOTO: LUCINDA DOUGLAS-MENZIES PHOTO: HULTON-DEUTSCH COLL
  • 58. Application Areas Industry Application Finance Credit Card Analysis Insurance Claims, Fraud Analysis Telecommunication Call record analysis Consumer goods promotion analysis Data Service providers Value added data Utilities Power usage analysis
  • 59. Data Mining in Use The US Government uses Data Mining to track fraud A Supermarket becomes an information broker Basketball teams use it to track game strategy Cross Selling Warranty claims Routing Holding on to Good Customers Weeding out Bad Customers
  • 60. What is data mining technology The process of extracting or finding hidden knowledge from large database is called data mining. Ex: Age 21------  we can understand he is major data information
  • 61. Data Mining Technology Cleaning and Integration Databases Data Warehouse Flat Files Patterns Knowledge Selection and transformation Data Mining
  • 62. Data Mining Technology various step Data cleaning  To remove noise and inconsistent data Data integration  Data from multiple sources are combined Data selection  relevant data are retrieved from the database for analysis Data transformation  The selected data are made for mining by performing aggregation operations Data mining  Intelligent methods are applied to extract data patterns Pattern evaluation  Identify the needed patterns Knowledge presentation  present the mined knowledge to the user
  • 63. Loading the Warehouse Cleaning the data before it is loaded
  • 64. Data Integration Across Sources Trust Credit card Savings Loans Same data different name Different data Same name Data found here nowhere else Different keys same data
  • 65. Data Transformation Example encoding unit field appl A - balance appl B - bal appl C - currbal appl D - balcurr appl A - pipeline - cm appl B - pipeline - in appl C - pipeline - feet appl D - pipeline - yds appl A - m,f appl B - 1,0 appl C - x,y appl D - male, female Data Warehouse
  • 67. Data Warehouse vs. Data Marts
  • 68. From the Data Warehouse to Data Marts Departmentally Structured Individually Structured Data Warehouse Organizationally Structured Less More History Normalized Detailed Data Information
  • 69. Data Warehouse and Data Marts OLAP Data Mart Lightly summarized Departmentally structured Organizationally structured Atomic Detailed Data Warehouse Data
  • 70. Characteristics of the Departmental Data Mart OLAP Small Flexible Customized by Department Source is departmentally structured data warehouse
  • 71. Techniques for Creating Departmental Data Mart OLAP Subset Summarized Superset Indexed Arrayed Sales Mktg. Finance
  • 72. Data Mart Centric Data Marts Data Sources Data Warehouse
  • 73. True Warehouse Data Marts Data Sources Data Warehouse
  • 74. II. On-Line Analytical Processing (OLAP) Making Decision Support Possible
  • 75. What Is OLAP? Online Analytical Processing - coined by EF Codd in 1994 paper contracted by Arbor Software Generally synonymous with earlier terms such as Decisions Support, Business Intelligence, Executive Information System OLAP = Multidimensional Database MOLAP: Multidimensional OLAP (Arbor Essbase, Oracle Express) ROLAP: Relational OLAP (Informix MetaCube, Microstrategy DSS Agent)
  • 76. The OLAP Market Rapid growth in the enterprise market 1995: $700 Million 1997: $2.1 Billion Significant consolidation activity among major DBMS vendors 10/94: Sybase acquires ExpressWay 7/95: Oracle acquires Express 11/95: Informix acquires Metacube 1/97: Arbor partners up with IBM 10/96: Microsoft acquires Panorama Result: OLAP shifted from small vertical niche to mainstream DBMS category
  • 77. Strengths of OLAP It is a powerful visualization paradigm It provides fast, interactive response times It is good for analyzing time series It can be useful to find some clusters and outliers Many vendors offer OLAP tools
  • 78. OLAP Is FASMI Fast Analysis Shared Multidimensional Information
  • 79. Data Cube Lattice Cube lattice ABC AB AC BC A B C none Can materialize some groupbys, compute others on demand Question: which groupbys to materialze? Question: what indices to create Question: how to organize data (chunks, etc)
  • 81. A Visual Operation: Pivot (Rotate) 10 47 30 12 Juice Cola Milk Cream NY LA SF 3/1 3/2 3/3 3/4 Date Month Region Product
  • 82. “ Slicing and Dicing” Product Sales Channel Regions Retail Direct Special Household Telecomm Video Audio India Far East Europe The Telecomm Slice
  • 83. Roll-up and Drill Down Sales Channel Region Country State Location Address Sales Representative Roll Up Higher Level of Aggregation Low-level Details Drill-Down
  • 84. Nature of OLAP Analysis Aggregation -- (total sales, percent-to-total) Comparison -- Budget vs. Expenses Ranking -- Top 10, quartile analysis Access to detailed and aggregate data Complex criteria specification Visualization
  • 85. Organizationally Structured Data Different Departments look at the same detailed data in different ways. Without the detailed, organizationally structured data as a foundation, there is no reconcilability of data marketing manufacturing sales finance
  • 86. Multidimensional Spreadsheets Analysts need spreadsheets that support pivot tables (cross-tabs) drill-down and roll-up slice and dice sort selections derived attributes Popular in retail domain
  • 87. OLAP Operations © Prentice Hall Single Cell Multiple Cells Slice Dice Roll Up Drill Down
  • 88. Relational OLAP: 3 Tier DSS Store atomic data in industry standard RDBMS. Generate SQL execution plans in the ROLAP engine to obtain OLAP functionality. Obtain multi-dimensional reports from the DSS Client. Data Warehouse ROLAP Engine Decision Support Client Database Layer Application Logic Layer Presentation Layer
  • 89. MD-OLAP: 2 Tier DSS MDDB Engine MDDB Engine Decision Support Client Database Layer Application Logic Layer Presentation Layer Store atomic data in a proprietary data structure (MDDB), pre-calculate as many outcomes as possible, obtain OLAP functionality via proprietary algorithms running against this data. Obtain multi-dimensional reports from the DSS Client.
  • 90. MSPVL Polytechnic College Pavoorchatram