SlideShare a Scribd company logo
COMPUTER APPLICATION IN MANAGEMENT
session 4
Shivani Tiwari
Data Management for
Decision Making
Introduction
 An organization is nothing but an information processing system
 The ability to provide relevant, accurate and timely information is critical to the success of any organization
 Any successful organization should have an integrated database to be able to create information
 The characteristics of organization-wide database
 Sharable
 Consistent
 Reduced redundancy
 Standardized
Types of Databases
Types of Databases :
Transaction Databases :
 are used to enter raw data and transactions from original sources
 are created by OLTP
 must be standardized, sharable, consistent and with reduced redundancy across the organization
Operational Databases :
 Built from transaction databases
 Large databases that support all the application for day to day transaction & reporting processes
 Not designed to store historic data or to support ad-hoc queries
Data Warehouses :
 Designed for strategic decision support and built from operational databases
 Contain vast amount of data
 Smaller, local data warehouses are called Data marts
 Necessary in organizations where
 high volume of data processing is required
 Cross functional flow of information is required
 Single and centralized data source is a necessity
 Increased quality and consistency of organization’s data is must
Types of Databases
Unstructured databases :
 Business data exists in various unstructured formats
 Text formats
 GIS data in the form of maps & locations
 Chemical data in the form of protein structures, molecule structures etc.
 Software engineering data in the form of program statements
 Multimedia data in the form of audio, video, images etc.
 WWW is a universal repository of strictly structured data to completely unstructured pages
Types of Databases
Data Warehousing
 A data warehouse is a single, centralized, enterprise-wide repository that combines all data from all legacy
systems and theoretically gives all users access to appropriate information.
 The basic concept of a Data Warehouse is to facilitate a single version of truth for a company for decision
making and forecasting.
 A Data warehouse is an information system that contains historical and commutative data from single or
multiple sources.
 Data Warehouse Concepts simplify the reporting and analysis process of organizations.
Data Warehousing
Characteristics of a good Data warehouse
Data warehouse is governed by some specific rules.
Time Dependent :
 A data warehouse contains information collected over a period of time, i.e. historic information.
 There is a connection between information stored in a warehouse & the time when it was entered.
 Every record entered contains an element of time, explicitly or implicitly.
 Once data is inserted in the warehouse, it can't be updated or changed.
Data Warehousing
Characteristics of a good Data warehouse
Non-volatile :
 Data warehouse is also non-volatile. The previous data is not erased or overwritten when new data is
entered in it.
 Data is read-only and periodically refreshed. This also helps to analyze historical data and understand what
& when happened.
 It does not require transaction process, recovery and concurrency control mechanisms.
 Only two types of data operations performed in the Data Warehousing are
 Data loading (insertion)
 Data access (retrieval)
Data Warehousing
Characteristics of a good Data warehouse
Subject Oriented :
 A data warehouse is subject oriented in a sense that it offers information regarding a business function
instead of companies' operational transactions. The functions can be sales, marketing, distributions, etc.
 A data warehouse never focuses on the day-to-day operations.
 It emphasizes on modeling and analysis of data for decision making.
 It also provides a simple and concise view around the specific subject by excluding data which not helpful
to support the decision process.
Data Warehousing
Characteristics of a good Data warehouse
Integrated :
 Data in a data warehouse is from various operations and sources across the organization and requires to be
made standardized and consistent
 A data warehouse is developed by integrating data from varied sources like a mainframe, relational
databases, flat files, etc.
 it must keep consistent naming conventions, format, and coding
 The integration helps in effective analysis of data. Consistency in naming conventions, attribute measures,
encoding structure etc. have to be ensured.
 after transformation and cleaning process all this data is stored in common format in the Data Warehouse
Data Warehousing
Components of Data warehouse
There are mainly 5 components of Data Warehouse Architecture:
1. Database
2. ETL Tools
3. Meta Data
4. Query Tools
5. DataMarts
Data Warehousing
Components of Data warehouse
1. Database
 The central database is the foundation of the data warehousing environment.
 This database is implemented on the RDBMS technology.
 Although, it is constrained by the fact that traditional RDBMS system is optimized for transactional
database processing and not for data warehousing. For instance, ad-hoc query, multi-table joins,
aggregates are resource intensive and slow down performance.
 New index structures are used to bypass relational table scan and improve speed
 Use of multidimensional database (MDDBs) to overcome any limitations which are placed because of
the relational Data Warehouse Models
Data Warehousing
Components of Data warehouse
2. ETL (Extract, Transform and Load) Tools
 The data sourcing, transformation, and migration tools are used for performing all the conversions,
summarizations, and all the changes needed to transform data into a unified format in the data
warehouse. These are also called Extract, Transform and Load (ETL) Tools.
 Their functionality includes:
 Anonymize data as per regulatory stipulations.
 Eliminating unwanted data in operational databases from loading into Data warehouse.
 Search and replace common names and definitions for data arriving from different sources.
 Calculating summaries and derived data
 In case of missing data, populate them with defaults.
 De-normalize repeated data arriving from multiple data sources.
 These ETL Tools have to deal with challenges of Database & Data heterogeneity.
Data Warehousing
Components of Data warehouse
3. Metadata
• Metadata provides the necessary details to provide data legibility, use and administration.
• IT contains data about data, activity and knowledge. In other words, Metadata is data about data which
defines the data warehouse.
• It is used for building, maintaining and managing the data warehouse. It is like the encyclopedia about
data warehouse. It sets the framework for the data warehouse
• The ultimate goal of metadata is to corral, catalogue, integrate, guide and support various
transformations and loading processes, schema layouts, system tables, partition settings, indices, view
definitions etc.
• Metadata can be classified into following categories:
 Technical Meta Data: This kind of Metadata contains information about warehouse which is used
by Data warehouse designers and administrators.
 Business Meta Data: This kind of Metadata contains detail that gives end-users a way easy to
understand information stored in the data warehouse.
Data Warehousing
4. Query Tools
 One of the primary objectives of data warehousing is to provide information to businesses to make
strategic decisions. Query tools allow users to interact with the data warehouse system.
 These tools fall into four different categories:
i. Query and reporting tools
 Reporting tools - Report writers & Production reporting
 Managed query tools - SQL
ii. Application Development tools - custom reports are developed using Application development
tools
iii. Data mining tools - Data mining is a process of discovering meaningful new correlation, pattens,
and trends by mining large amount data. Data mining tools are used to make this process
automatic.
iv. OLAP tools - These tools are based on concepts of a multidimensional database. It allows users
to analyze the data using elaborate and complex multidimensional views
Data Warehousing
5. DataMarts
 A Datamart contains data from the data werahouse tailored to support the specific analytical
requirements of a given business unit or function.
 Data mart is a subsidiary of a data warehouse. The data mart is used for partition of data which is
created for the specific group of users or functions.
 Data marts could be created in the same database as the Data warehouse or a physically separate
Database.
Data Warehousing
MIS and Business Functions, TPS/DSS/ESS, MIS and Business Processes, Impact of MIS on Business, Using Information Systems for Competitive Advantage, Managing Information System Resources
DATA
MINING
Data Mining
 Data mining is the most innovative as well as most used concept related to the database management
techniques.
 Data mining, in present times, is an innovation also known as knowledge discovery process used for analyzing
the different perspectives of data and encapsulate into proficient information.
 Like the mining (for minerals) is the process or industry of obtaining coal or other minerals from a mine, we need
some specific information by analyzing the huge organizational/business data.
 In DM, large amount of data are inspected, facts are discovered and brought to the attention of the person doing
the mining.
 The process of data mining uses various tools which are used to predict the behavior of huge data which is
further used to take decision
 While large-scale information technology has been evolving separate transaction and analytical systems, data
mining provides the link between the two.
 Data mining software analyzes relationships and patterns in stored transaction data based on open-ended user
queries.
 DM is a more efficient mode of finding useful facts about data.
Data Mining
Data Mining Definition : (Different types)
Type 1 :
Data Mining is the process used for the extraction of hidden predictive data from huge databases.
Type 2 :
Data Mining is process of discovering the patterns in very large data sets involving the different methods like
Machine Learning, statistics, different database systems.
Type 3:
Data mining is defined as a process used to extract usable data from a larger set of any raw data which implies
analyzing data patterns in large batches of data using one or more software.
Data Mining
Type 4 :
The automated extraction of hidden data from a large amount of database is Data Mining.
Type 5 :
Data mining refers to the process of extracting the valid and previously unknown information from a large
database to make crucial business decisions.
 Automatic pattern predictions based on trend and behaviour analysis.
 Prediction based on likely outcomes.
 Creation of decision-oriented information.
 Focus on large data sets and databases for analysis.
 Clustering based on finding and visually documented groups of facts not previously known.
Features of Data Mining
Data Mining Techniques
Data mining involves effective data collection and warehousing as well as computer processing.
 For segmenting the data and evaluating the probability of future events, data mining uses sophisticated
mathematical algorithms.
 Data mining is also known as Knowledge Discovery in Data (KDD).
 There are various Techniques of Data Mining as follows.
1. Decision Trees
2. Sequential Patterns
3. Clustering
4. Prediction
5. Association
6. Classification
1. Decision Trees
 A tree shaped structure that represents a set of decisions. These decisions generate rules for the
classification of the data set.
 A decision tree is a predictive model that can be viewed as a tree structure
 Each branch of the tree is a classification question & the leaves are partitions o the data set with their
classification.
 It is a predctive model that makes prediction on the basis of a series of decisions.
 It divides up data on each branch point without loosing any data., i.e., the total no. of records in a given
parent node is equal to the sum of ecords contained in its children.
 Because of their structure and ability to easily generate rules, they are the favoured technique for
building understandable models
 Because of their clarity, they also allow for more complex profit & ROI models to be added easily on
top of the predictive model.
2.Sequential Patterns
 A sequential pattern function analyses collection of related records and detects frequently occurring
patterns in these records over time.
 Sequential pattern mining functions are quite powerful & can be used to detect the set of records
associated with some patterns.
For example, use of this function could be in the discovery of a rule that states that
 70% of the time when Stock X increased its value by a maximum of 10% over a 5 day trading
period,
 and Stock Y increased its value by 10% - 20% during the same period
 ten the value of Stock Z also increased by 17% - 20% in the subsequent week.
3.Clustering
 Clustering is the method by which similar records are grouped together.
 Usually this is done to give the end user a top level view or bird’s eye view of what is going on in the
database.
 Demographic data such as income, age, occupation, housing, religion and caste, taken from census
report are usually clustered.
 Clustering is one of the oldest technique used in data mining.
 In order to predict a value in a record, records with similar predictor values in the historical database
should be looked into and the prediction value from the record nearest to the unclassified record
should be used. This is known as nearest-neighbor technique.
 The input to a clustering operator is a collection of untagged records. No classes are known at the time
the clustering operator is applied.
 The goal of a clustering function is to produce a reasonable segmentation of the set of input records
according to some criterion.
4. Prediction
This method discovers the relationship between independent and dependent instances. For example, in the
area of sales; to predict the future profit, sale acts as independent instance and profit could be dependent.
Then based on historical data of sales and profit, associated profit is predicted.
Data Mining Process :
The Knowledge Discovery in Databases (KDD) process is commonly defined with the stages:
4. Data Mining
5. Interpretation/Evaluation
1. Selection
2. Pre-processing
3. Transformation
However, It exists, in many variations on this theme. For example, the Cross Industry Standard Process for Data
Mining (CRISP-DM) defines six phases of the Knowledge Discobery process:
1. Business Understanding
2. Data Understanding
3. Data Preparation
4. Modeling
5. Evaluation
6. Deployment
or a simplified process such as
1. pre-processing 2. data mining 3. results validation
The CRISP-DM methodology is the leading methodology used by majority of data miners.
Data Mining
Data Mining Process :
CRISP-DM methodology simplified process
(1) pre-processing, (2) data mining, and (3) results validation.
(1) pre-processing :
 Before data mining algorithms can be used, a target data set must be assembled.
 As data mining can only uncover patterns actually present in the data, the target data set must be large
enough to contain these patterns while remaining concise enough to be mined within an acceptable time
limit.
 A common source for data is a data mart or data warehouse.
 Pre- processing is essential to analyze the multivariate data sets before data mining.
 The target set is then cleaned. Data cleaning removes the observations containing noise and those
with missing data.
Data Management
for Decision Making
Data Mining
Data Mining Process :
(2) Data Mining :
 Data mining involves six common classes of tasks:
1. Anomaly detection (Outlier/change/deviation detection)
 The identification of unusual data records, that might be interesting or data errors and require
further investigation.
2. Association rule learning (Dependency modeling)
 Searches for relationships between variables.
 For example, a supermarket might gather data on customer purchasing habits. Using association rule
learning, the supermarket can determine which products are frequently bought together and use this
information for marketing purposes. This is sometimes referred to as market basket analysis.
Data Management
for Decision Making
Data Mining
Data Mining Process :
(2) Data Mining : (six common tasks)
3. Clustering
It is the task of discovering groups and structures in the data that are in some way or another “similar”,
without using known structures in the data.
4. Classification
It is the task of generalizing known structure to apply to new data. For example, an e-mail program might
attempt to classify an e-mail as “legitimate” or as “spam”.
5. Regression
Attempts to find a function which models the data with the least error.
6. Summarization
providing a more compact representation of the data set, including visualization and report generation.
Data Management
for Decision Making
Data Mining
Data Mining Process :
(3) Results validation :
 The final step of knowledge discovery from data is to verify that the patterns produced by the data mining
algorithms occur in the wider data set.
 Not all patterns found by the data mining algorithms are necessarily valid. It is common for the data mining
algorithms to find patterns in the training set which are not present in the general data set. This is called overfitting.
 To overcome this, the evaluation uses a test set of data on which the data mining algorithm was not trained. The
learned patterns are applied to this test set and the resulting output is compared to the desired output.
For example, a data mining algorithm trying to distinguish “spam” from “legitimate” emails would be trained on
a training set of sample e-mails. Once trained, the learned patterns would be applied to the test set of e-mails on
which it had not been trained. The accuracy of the patterns can then be measured from how many e-mails they
correctly classify.
 A number of statistical methods may be used to evaluate the algorithm
 If the learned patterns do not meet the desired standards, then it is necessary to re-evaluate and change the pre-
processing and data mining steps. If the learned patterns do meet the desired standards, then the final step is to
interpret the learned patterns and turn them into knowledge.

More Related Content

PPTX
DATA WAREHOUSING
PPTX
Data Warehouse
PPTX
Data warehouse
DOCX
Unit 1
PDF
Introduction to Data Warehouse
PDF
Top 60+ Data Warehouse Interview Questions and Answers.pdf
DOCX
Data Mining and Warehousing -Unit III & Unit IV Notes
PPTX
ETL processes , Datawarehouse and Datamarts.pptx
DATA WAREHOUSING
Data Warehouse
Data warehouse
Unit 1
Introduction to Data Warehouse
Top 60+ Data Warehouse Interview Questions and Answers.pdf
Data Mining and Warehousing -Unit III & Unit IV Notes
ETL processes , Datawarehouse and Datamarts.pptx

Similar to MIS and Business Functions, TPS/DSS/ESS, MIS and Business Processes, Impact of MIS on Business, Using Information Systems for Competitive Advantage, Managing Information System Resources (20)

PPTX
Data warehouse presentaion
DOC
Oracle sql plsql & dw
PPT
DW 101
PDF
Chapter-6_BasicsOfDataIntegrationbibibibini.pdf
PPT
Datawarehousing
DOC
Data Mining
PPTX
Data warehouse
PPTX
DATA MINING AND WAREHOUSING_MBA_MIS_BMB208
PPTX
Data Warehouse: Concepts and Architecture
PPT
Dataware housing
PPT
20IT501_DWDM_PPT_Unit_I.ppt
PPTX
DATAWAREHOUSE MAIn under data mining for
PPT
11667 Bitt I 2008 Lect4
PPTX
Data Management
PPT
20IT501_DWDM_PPT_Unit_I.ppt
PPTX
Data Warehouse
PPT
20IT501_DWDM_PPT_Unit_I.ppt
PPTX
Business analysis of business of current
PPT
Datawarehousing
PDF
Data warehousing
Data warehouse presentaion
Oracle sql plsql & dw
DW 101
Chapter-6_BasicsOfDataIntegrationbibibibini.pdf
Datawarehousing
Data Mining
Data warehouse
DATA MINING AND WAREHOUSING_MBA_MIS_BMB208
Data Warehouse: Concepts and Architecture
Dataware housing
20IT501_DWDM_PPT_Unit_I.ppt
DATAWAREHOUSE MAIn under data mining for
11667 Bitt I 2008 Lect4
Data Management
20IT501_DWDM_PPT_Unit_I.ppt
Data Warehouse
20IT501_DWDM_PPT_Unit_I.ppt
Business analysis of business of current
Datawarehousing
Data warehousing
Ad

More from ShivaniTiwari24572 (10)

PPTX
MIS MIS and Business Functions TPS/DSS/ESS, MIS and Business Processes Impact...
PPTX
MIS and Business Functions TPS/DSS/ESS, MIS and Business Processes Impact of ...
PPTX
MIS and Business Functions TPS/DSS/ESS, MIS and Business Processes Impact of ...
PPTX
decisiontheory.pptx Decision Theory represents a general approach to decisio...
PPTX
blockchainintroduction-180323144102.pptx
PDF
Corporate social responsibility (CSR) is
PPTX
Marketing considerations.pptx is tell ab
PPTX
PITCH DECK.pptx a presenataion on how to
PPTX
Entreprenuership_ppt4.pptx idea generat
PPTX
BUINESS CANVAS MODEL CONTAINING KEY RESOURCE, VALUE PROPOSITION, KEY ACTIVIT...
MIS MIS and Business Functions TPS/DSS/ESS, MIS and Business Processes Impact...
MIS and Business Functions TPS/DSS/ESS, MIS and Business Processes Impact of ...
MIS and Business Functions TPS/DSS/ESS, MIS and Business Processes Impact of ...
decisiontheory.pptx Decision Theory represents a general approach to decisio...
blockchainintroduction-180323144102.pptx
Corporate social responsibility (CSR) is
Marketing considerations.pptx is tell ab
PITCH DECK.pptx a presenataion on how to
Entreprenuership_ppt4.pptx idea generat
BUINESS CANVAS MODEL CONTAINING KEY RESOURCE, VALUE PROPOSITION, KEY ACTIVIT...
Ad

Recently uploaded (20)

PPT
Chap8. Product & Service Strategy and branding
PDF
Pollitrace pitch deck- Ai powered multiple species
PPTX
_From Idea to Revenue How First-Time Founders Are Monetizing Faster in 2025.pptx
PDF
Investment Risk Assessment Brief: Zacharia Ali and Associated Entities
PDF
Meme Coin Empire- Launch, Scale & Earn $500K-Month_3.pdf
PPT
chap9.New Product Development product lifecycle.ppt
PDF
Budora Case Study: Building Trust in Canada’s Online Cannabis Market
PPTX
The Evolution of Search- Behaviour.pptx
PDF
Driving Innovation & Growth, Scalable Startup IT Services That Deliver Result...
PDF
Chapter 3 - Business environment - Final.pdf
PDF
Chapter 1 - Introduction to management.pdf
PDF
Decision trees for high uncertainty decisions
PDF
initiate-entrepreneurship-in-healthcare-service-management-in-sierra-leone.pdf
PDF
AI Cloud Sprawl Is Real—Here’s How CXOs Can Regain Control Before It Costs Mi...
PPTX
Process-and-Ethics-in-Research-1.potatoi
PPTX
TimeBee vs. Toggl: Which Time Tracking Tool is Best for You?
PDF
Business Risk Assessment and Due Diligence Report: Zacharia Ali and Associate...
PDF
Why Has Vertical Farming Recently Become More Economical.pdf
PPTX
ELS-07 Lifeskills ToT PPt-Adama (ABE).pptx
PPTX
Peerless Plumbing Company-Fort Worth.pptx
Chap8. Product & Service Strategy and branding
Pollitrace pitch deck- Ai powered multiple species
_From Idea to Revenue How First-Time Founders Are Monetizing Faster in 2025.pptx
Investment Risk Assessment Brief: Zacharia Ali and Associated Entities
Meme Coin Empire- Launch, Scale & Earn $500K-Month_3.pdf
chap9.New Product Development product lifecycle.ppt
Budora Case Study: Building Trust in Canada’s Online Cannabis Market
The Evolution of Search- Behaviour.pptx
Driving Innovation & Growth, Scalable Startup IT Services That Deliver Result...
Chapter 3 - Business environment - Final.pdf
Chapter 1 - Introduction to management.pdf
Decision trees for high uncertainty decisions
initiate-entrepreneurship-in-healthcare-service-management-in-sierra-leone.pdf
AI Cloud Sprawl Is Real—Here’s How CXOs Can Regain Control Before It Costs Mi...
Process-and-Ethics-in-Research-1.potatoi
TimeBee vs. Toggl: Which Time Tracking Tool is Best for You?
Business Risk Assessment and Due Diligence Report: Zacharia Ali and Associate...
Why Has Vertical Farming Recently Become More Economical.pdf
ELS-07 Lifeskills ToT PPt-Adama (ABE).pptx
Peerless Plumbing Company-Fort Worth.pptx

MIS and Business Functions, TPS/DSS/ESS, MIS and Business Processes, Impact of MIS on Business, Using Information Systems for Competitive Advantage, Managing Information System Resources

  • 1. COMPUTER APPLICATION IN MANAGEMENT session 4 Shivani Tiwari
  • 3. Introduction  An organization is nothing but an information processing system  The ability to provide relevant, accurate and timely information is critical to the success of any organization  Any successful organization should have an integrated database to be able to create information  The characteristics of organization-wide database  Sharable  Consistent  Reduced redundancy  Standardized
  • 4. Types of Databases Types of Databases : Transaction Databases :  are used to enter raw data and transactions from original sources  are created by OLTP  must be standardized, sharable, consistent and with reduced redundancy across the organization Operational Databases :  Built from transaction databases  Large databases that support all the application for day to day transaction & reporting processes  Not designed to store historic data or to support ad-hoc queries
  • 5. Data Warehouses :  Designed for strategic decision support and built from operational databases  Contain vast amount of data  Smaller, local data warehouses are called Data marts  Necessary in organizations where  high volume of data processing is required  Cross functional flow of information is required  Single and centralized data source is a necessity  Increased quality and consistency of organization’s data is must Types of Databases
  • 6. Unstructured databases :  Business data exists in various unstructured formats  Text formats  GIS data in the form of maps & locations  Chemical data in the form of protein structures, molecule structures etc.  Software engineering data in the form of program statements  Multimedia data in the form of audio, video, images etc.  WWW is a universal repository of strictly structured data to completely unstructured pages Types of Databases
  • 7. Data Warehousing  A data warehouse is a single, centralized, enterprise-wide repository that combines all data from all legacy systems and theoretically gives all users access to appropriate information.  The basic concept of a Data Warehouse is to facilitate a single version of truth for a company for decision making and forecasting.  A Data warehouse is an information system that contains historical and commutative data from single or multiple sources.  Data Warehouse Concepts simplify the reporting and analysis process of organizations. Data Warehousing
  • 8. Characteristics of a good Data warehouse Data warehouse is governed by some specific rules. Time Dependent :  A data warehouse contains information collected over a period of time, i.e. historic information.  There is a connection between information stored in a warehouse & the time when it was entered.  Every record entered contains an element of time, explicitly or implicitly.  Once data is inserted in the warehouse, it can't be updated or changed. Data Warehousing
  • 9. Characteristics of a good Data warehouse Non-volatile :  Data warehouse is also non-volatile. The previous data is not erased or overwritten when new data is entered in it.  Data is read-only and periodically refreshed. This also helps to analyze historical data and understand what & when happened.  It does not require transaction process, recovery and concurrency control mechanisms.  Only two types of data operations performed in the Data Warehousing are  Data loading (insertion)  Data access (retrieval) Data Warehousing
  • 10. Characteristics of a good Data warehouse Subject Oriented :  A data warehouse is subject oriented in a sense that it offers information regarding a business function instead of companies' operational transactions. The functions can be sales, marketing, distributions, etc.  A data warehouse never focuses on the day-to-day operations.  It emphasizes on modeling and analysis of data for decision making.  It also provides a simple and concise view around the specific subject by excluding data which not helpful to support the decision process. Data Warehousing
  • 11. Characteristics of a good Data warehouse Integrated :  Data in a data warehouse is from various operations and sources across the organization and requires to be made standardized and consistent  A data warehouse is developed by integrating data from varied sources like a mainframe, relational databases, flat files, etc.  it must keep consistent naming conventions, format, and coding  The integration helps in effective analysis of data. Consistency in naming conventions, attribute measures, encoding structure etc. have to be ensured.  after transformation and cleaning process all this data is stored in common format in the Data Warehouse Data Warehousing
  • 12. Components of Data warehouse There are mainly 5 components of Data Warehouse Architecture: 1. Database 2. ETL Tools 3. Meta Data 4. Query Tools 5. DataMarts Data Warehousing
  • 13. Components of Data warehouse 1. Database  The central database is the foundation of the data warehousing environment.  This database is implemented on the RDBMS technology.  Although, it is constrained by the fact that traditional RDBMS system is optimized for transactional database processing and not for data warehousing. For instance, ad-hoc query, multi-table joins, aggregates are resource intensive and slow down performance.  New index structures are used to bypass relational table scan and improve speed  Use of multidimensional database (MDDBs) to overcome any limitations which are placed because of the relational Data Warehouse Models Data Warehousing
  • 14. Components of Data warehouse 2. ETL (Extract, Transform and Load) Tools  The data sourcing, transformation, and migration tools are used for performing all the conversions, summarizations, and all the changes needed to transform data into a unified format in the data warehouse. These are also called Extract, Transform and Load (ETL) Tools.  Their functionality includes:  Anonymize data as per regulatory stipulations.  Eliminating unwanted data in operational databases from loading into Data warehouse.  Search and replace common names and definitions for data arriving from different sources.  Calculating summaries and derived data  In case of missing data, populate them with defaults.  De-normalize repeated data arriving from multiple data sources.  These ETL Tools have to deal with challenges of Database & Data heterogeneity. Data Warehousing
  • 15. Components of Data warehouse 3. Metadata • Metadata provides the necessary details to provide data legibility, use and administration. • IT contains data about data, activity and knowledge. In other words, Metadata is data about data which defines the data warehouse. • It is used for building, maintaining and managing the data warehouse. It is like the encyclopedia about data warehouse. It sets the framework for the data warehouse • The ultimate goal of metadata is to corral, catalogue, integrate, guide and support various transformations and loading processes, schema layouts, system tables, partition settings, indices, view definitions etc. • Metadata can be classified into following categories:  Technical Meta Data: This kind of Metadata contains information about warehouse which is used by Data warehouse designers and administrators.  Business Meta Data: This kind of Metadata contains detail that gives end-users a way easy to understand information stored in the data warehouse. Data Warehousing
  • 16. 4. Query Tools  One of the primary objectives of data warehousing is to provide information to businesses to make strategic decisions. Query tools allow users to interact with the data warehouse system.  These tools fall into four different categories: i. Query and reporting tools  Reporting tools - Report writers & Production reporting  Managed query tools - SQL ii. Application Development tools - custom reports are developed using Application development tools iii. Data mining tools - Data mining is a process of discovering meaningful new correlation, pattens, and trends by mining large amount data. Data mining tools are used to make this process automatic. iv. OLAP tools - These tools are based on concepts of a multidimensional database. It allows users to analyze the data using elaborate and complex multidimensional views Data Warehousing
  • 17. 5. DataMarts  A Datamart contains data from the data werahouse tailored to support the specific analytical requirements of a given business unit or function.  Data mart is a subsidiary of a data warehouse. The data mart is used for partition of data which is created for the specific group of users or functions.  Data marts could be created in the same database as the Data warehouse or a physically separate Database. Data Warehousing
  • 20. Data Mining  Data mining is the most innovative as well as most used concept related to the database management techniques.  Data mining, in present times, is an innovation also known as knowledge discovery process used for analyzing the different perspectives of data and encapsulate into proficient information.  Like the mining (for minerals) is the process or industry of obtaining coal or other minerals from a mine, we need some specific information by analyzing the huge organizational/business data.  In DM, large amount of data are inspected, facts are discovered and brought to the attention of the person doing the mining.  The process of data mining uses various tools which are used to predict the behavior of huge data which is further used to take decision  While large-scale information technology has been evolving separate transaction and analytical systems, data mining provides the link between the two.  Data mining software analyzes relationships and patterns in stored transaction data based on open-ended user queries.  DM is a more efficient mode of finding useful facts about data.
  • 21. Data Mining Data Mining Definition : (Different types) Type 1 : Data Mining is the process used for the extraction of hidden predictive data from huge databases. Type 2 : Data Mining is process of discovering the patterns in very large data sets involving the different methods like Machine Learning, statistics, different database systems. Type 3: Data mining is defined as a process used to extract usable data from a larger set of any raw data which implies analyzing data patterns in large batches of data using one or more software.
  • 22. Data Mining Type 4 : The automated extraction of hidden data from a large amount of database is Data Mining. Type 5 : Data mining refers to the process of extracting the valid and previously unknown information from a large database to make crucial business decisions.
  • 23.  Automatic pattern predictions based on trend and behaviour analysis.  Prediction based on likely outcomes.  Creation of decision-oriented information.  Focus on large data sets and databases for analysis.  Clustering based on finding and visually documented groups of facts not previously known. Features of Data Mining
  • 24. Data Mining Techniques Data mining involves effective data collection and warehousing as well as computer processing.  For segmenting the data and evaluating the probability of future events, data mining uses sophisticated mathematical algorithms.  Data mining is also known as Knowledge Discovery in Data (KDD).  There are various Techniques of Data Mining as follows. 1. Decision Trees 2. Sequential Patterns 3. Clustering 4. Prediction 5. Association 6. Classification
  • 25. 1. Decision Trees  A tree shaped structure that represents a set of decisions. These decisions generate rules for the classification of the data set.  A decision tree is a predictive model that can be viewed as a tree structure  Each branch of the tree is a classification question & the leaves are partitions o the data set with their classification.  It is a predctive model that makes prediction on the basis of a series of decisions.  It divides up data on each branch point without loosing any data., i.e., the total no. of records in a given parent node is equal to the sum of ecords contained in its children.  Because of their structure and ability to easily generate rules, they are the favoured technique for building understandable models  Because of their clarity, they also allow for more complex profit & ROI models to be added easily on top of the predictive model.
  • 26. 2.Sequential Patterns  A sequential pattern function analyses collection of related records and detects frequently occurring patterns in these records over time.  Sequential pattern mining functions are quite powerful & can be used to detect the set of records associated with some patterns. For example, use of this function could be in the discovery of a rule that states that  70% of the time when Stock X increased its value by a maximum of 10% over a 5 day trading period,  and Stock Y increased its value by 10% - 20% during the same period  ten the value of Stock Z also increased by 17% - 20% in the subsequent week.
  • 27. 3.Clustering  Clustering is the method by which similar records are grouped together.  Usually this is done to give the end user a top level view or bird’s eye view of what is going on in the database.  Demographic data such as income, age, occupation, housing, religion and caste, taken from census report are usually clustered.  Clustering is one of the oldest technique used in data mining.  In order to predict a value in a record, records with similar predictor values in the historical database should be looked into and the prediction value from the record nearest to the unclassified record should be used. This is known as nearest-neighbor technique.  The input to a clustering operator is a collection of untagged records. No classes are known at the time the clustering operator is applied.  The goal of a clustering function is to produce a reasonable segmentation of the set of input records according to some criterion.
  • 28. 4. Prediction This method discovers the relationship between independent and dependent instances. For example, in the area of sales; to predict the future profit, sale acts as independent instance and profit could be dependent. Then based on historical data of sales and profit, associated profit is predicted.
  • 29. Data Mining Process : The Knowledge Discovery in Databases (KDD) process is commonly defined with the stages: 4. Data Mining 5. Interpretation/Evaluation 1. Selection 2. Pre-processing 3. Transformation However, It exists, in many variations on this theme. For example, the Cross Industry Standard Process for Data Mining (CRISP-DM) defines six phases of the Knowledge Discobery process: 1. Business Understanding 2. Data Understanding 3. Data Preparation 4. Modeling 5. Evaluation 6. Deployment or a simplified process such as 1. pre-processing 2. data mining 3. results validation The CRISP-DM methodology is the leading methodology used by majority of data miners.
  • 30. Data Mining Data Mining Process : CRISP-DM methodology simplified process (1) pre-processing, (2) data mining, and (3) results validation. (1) pre-processing :  Before data mining algorithms can be used, a target data set must be assembled.  As data mining can only uncover patterns actually present in the data, the target data set must be large enough to contain these patterns while remaining concise enough to be mined within an acceptable time limit.  A common source for data is a data mart or data warehouse.  Pre- processing is essential to analyze the multivariate data sets before data mining.  The target set is then cleaned. Data cleaning removes the observations containing noise and those with missing data.
  • 31. Data Management for Decision Making Data Mining Data Mining Process : (2) Data Mining :  Data mining involves six common classes of tasks: 1. Anomaly detection (Outlier/change/deviation detection)  The identification of unusual data records, that might be interesting or data errors and require further investigation. 2. Association rule learning (Dependency modeling)  Searches for relationships between variables.  For example, a supermarket might gather data on customer purchasing habits. Using association rule learning, the supermarket can determine which products are frequently bought together and use this information for marketing purposes. This is sometimes referred to as market basket analysis.
  • 32. Data Management for Decision Making Data Mining Data Mining Process : (2) Data Mining : (six common tasks) 3. Clustering It is the task of discovering groups and structures in the data that are in some way or another “similar”, without using known structures in the data. 4. Classification It is the task of generalizing known structure to apply to new data. For example, an e-mail program might attempt to classify an e-mail as “legitimate” or as “spam”. 5. Regression Attempts to find a function which models the data with the least error. 6. Summarization providing a more compact representation of the data set, including visualization and report generation.
  • 33. Data Management for Decision Making Data Mining Data Mining Process : (3) Results validation :  The final step of knowledge discovery from data is to verify that the patterns produced by the data mining algorithms occur in the wider data set.  Not all patterns found by the data mining algorithms are necessarily valid. It is common for the data mining algorithms to find patterns in the training set which are not present in the general data set. This is called overfitting.  To overcome this, the evaluation uses a test set of data on which the data mining algorithm was not trained. The learned patterns are applied to this test set and the resulting output is compared to the desired output. For example, a data mining algorithm trying to distinguish “spam” from “legitimate” emails would be trained on a training set of sample e-mails. Once trained, the learned patterns would be applied to the test set of e-mails on which it had not been trained. The accuracy of the patterns can then be measured from how many e-mails they correctly classify.  A number of statistical methods may be used to evaluate the algorithm  If the learned patterns do not meet the desired standards, then it is necessary to re-evaluate and change the pre- processing and data mining steps. If the learned patterns do meet the desired standards, then the final step is to interpret the learned patterns and turn them into knowledge.

Editor's Notes

  • #1: Ask ? And note. Subject useful ? Just like OR we will have comprehensive revision and move advance asap One of the most imp topic from business point of view, Food chain discussion in Chennai and baroda kyc