MIS and Business Functions, TPS/DSS/ESS, MIS and Business Processes, Impact of MIS on Business, Using Information Systems for Competitive Advantage, Managing Information System Resources

COMPUTER APPLICATION IN MANAGEMENT
session 4
Shivani Tiwari

Data Management for
Decision Making

Introduction
 An organization is nothing but an information processing system
 The ability to provide relevant, accurate and timely information is critical to the success of any organization
 Any successful organization should have an integrated database to be able to create information
 The characteristics of organization-wide database
 Sharable
 Consistent
 Reduced redundancy
 Standardized

Types of Databases
Types of Databases :
Transaction Databases :
 are used to enter raw data and transactions from original sources
 are created by OLTP
 must be standardized, sharable, consistent and with reduced redundancy across the organization
Operational Databases :
 Built from transaction databases
 Large databases that support all the application for day to day transaction & reporting processes
 Not designed to store historic data or to support ad-hoc queries

Data Warehouses :
 Designed for strategic decision support and built from operational databases
 Contain vast amount of data
 Smaller, local data warehouses are called Data marts
 Necessary in organizations where
 high volume of data processing is required
 Cross functional flow of information is required
 Single and centralized data source is a necessity
 Increased quality and consistency of organization’s data is must
Types of Databases

Unstructured databases :
 Business data exists in various unstructured formats
 Text formats
 GIS data in the form of maps & locations
 Chemical data in the form of protein structures, molecule structures etc.
 Software engineering data in the form of program statements
 Multimedia data in the form of audio, video, images etc.
 WWW is a universal repository of strictly structured data to completely unstructured pages
Types of Databases

Data Warehousing
 A data warehouse is a single, centralized, enterprise-wide repository that combines all data from all legacy
systems and theoretically gives all users access to appropriate information.
 The basic concept of a Data Warehouse is to facilitate a single version of truth for a company for decision
making and forecasting.
 A Data warehouse is an information system that contains historical and commutative data from single or
multiple sources.
 Data Warehouse Concepts simplify the reporting and analysis process of organizations.
Data Warehousing

Characteristics of a good Data warehouse
Data warehouse is governed by some specific rules.
Time Dependent :
 A data warehouse contains information collected over a period of time, i.e. historic information.
 There is a connection between information stored in a warehouse & the time when it was entered.
 Every record entered contains an element of time, explicitly or implicitly.
 Once data is inserted in the warehouse, it can't be updated or changed.
Data Warehousing

Non-volatile :
 Data warehouse is also non-volatile. The previous data is not erased or overwritten when new data is
entered in it.
 Data is read-only and periodically refreshed. This also helps to analyze historical data and understand what
& when happened.
 It does not require transaction process, recovery and concurrency control mechanisms.
 Only two types of data operations performed in the Data Warehousing are
 Data loading (insertion)
 Data access (retrieval)
Data Warehousing

Subject Oriented :
 A data warehouse is subject oriented in a sense that it offers information regarding a business function
instead of companies' operational transactions. The functions can be sales, marketing, distributions, etc.
 A data warehouse never focuses on the day-to-day operations.
 It emphasizes on modeling and analysis of data for decision making.
 It also provides a simple and concise view around the specific subject by excluding data which not helpful
to support the decision process.
Data Warehousing

Integrated :
 Data in a data warehouse is from various operations and sources across the organization and requires to be
made standardized and consistent
 A data warehouse is developed by integrating data from varied sources like a mainframe, relational
databases, flat files, etc.
 it must keep consistent naming conventions, format, and coding
 The integration helps in effective analysis of data. Consistency in naming conventions, attribute measures,
encoding structure etc. have to be ensured.
 after transformation and cleaning process all this data is stored in common format in the Data Warehouse
Data Warehousing

Components of Data warehouse
There are mainly 5 components of Data Warehouse Architecture:
1. Database
2. ETL Tools
3. Meta Data
4. Query Tools
5. DataMarts
Data Warehousing

1. Database
 The central database is the foundation of the data warehousing environment.
 This database is implemented on the RDBMS technology.
 Although, it is constrained by the fact that traditional RDBMS system is optimized for transactional
database processing and not for data warehousing. For instance, ad-hoc query, multi-table joins,
aggregates are resource intensive and slow down performance.
 New index structures are used to bypass relational table scan and improve speed
 Use of multidimensional database (MDDBs) to overcome any limitations which are placed because of
the relational Data Warehouse Models
Data Warehousing

2. ETL (Extract, Transform and Load) Tools
 The data sourcing, transformation, and migration tools are used for performing all the conversions,
summarizations, and all the changes needed to transform data into a unified format in the data
warehouse. These are also called Extract, Transform and Load (ETL) Tools.
 Their functionality includes:
 Anonymize data as per regulatory stipulations.
 Eliminating unwanted data in operational databases from loading into Data warehouse.
 Search and replace common names and definitions for data arriving from different sources.
 Calculating summaries and derived data
 In case of missing data, populate them with defaults.
 De-normalize repeated data arriving from multiple data sources.
 These ETL Tools have to deal with challenges of Database & Data heterogeneity.
Data Warehousing

3. Metadata
• Metadata provides the necessary details to provide data legibility, use and administration.
• IT contains data about data, activity and knowledge. In other words, Metadata is data about data which
defines the data warehouse.
• It is used for building, maintaining and managing the data warehouse. It is like the encyclopedia about
data warehouse. It sets the framework for the data warehouse
• The ultimate goal of metadata is to corral, catalogue, integrate, guide and support various
transformations and loading processes, schema layouts, system tables, partition settings, indices, view
definitions etc.
• Metadata can be classified into following categories:
 Technical Meta Data: This kind of Metadata contains information about warehouse which is used
by Data warehouse designers and administrators.
 Business Meta Data: This kind of Metadata contains detail that gives end-users a way easy to
understand information stored in the data warehouse.
Data Warehousing

4. Query Tools
 One of the primary objectives of data warehousing is to provide information to businesses to make
strategic decisions. Query tools allow users to interact with the data warehouse system.
 These tools fall into four different categories:
i. Query and reporting tools
 Reporting tools - Report writers & Production reporting
 Managed query tools - SQL
ii. Application Development tools - custom reports are developed using Application development
tools
iii. Data mining tools - Data mining is a process of discovering meaningful new correlation, pattens,
and trends by mining large amount data. Data mining tools are used to make this process
automatic.
iv. OLAP tools - These tools are based on concepts of a multidimensional database. It allows users
to analyze the data using elaborate and complex multidimensional views
Data Warehousing

5. DataMarts
 A Datamart contains data from the data werahouse tailored to support the specific analytical
requirements of a given business unit or function.
 Data mart is a subsidiary of a data warehouse. The data mart is used for partition of data which is
created for the specific group of users or functions.
 Data marts could be created in the same database as the Data warehouse or a physically separate
Database.
Data Warehousing

MIS and Business Functions, TPS/DSS/ESS, MIS and Business Processes, Impact of MIS on Business, Using Information Systems for Competitive Advantage, Managing Information System Resources

Data Mining
 Data mining is the most innovative as well as most used concept related to the database management
techniques.
 Data mining, in present times, is an innovation also known as knowledge discovery process used for analyzing
the different perspectives of data and encapsulate into proficient information.
 Like the mining (for minerals) is the process or industry of obtaining coal or other minerals from a mine, we need
some specific information by analyzing the huge organizational/business data.
 In DM, large amount of data are inspected, facts are discovered and brought to the attention of the person doing
the mining.
 The process of data mining uses various tools which are used to predict the behavior of huge data which is
further used to take decision
 While large-scale information technology has been evolving separate transaction and analytical systems, data
mining provides the link between the two.
 Data mining software analyzes relationships and patterns in stored transaction data based on open-ended user
queries.
 DM is a more efficient mode of finding useful facts about data.

Data Mining
Data Mining Definition : (Different types)
Type 1 :
Data Mining is the process used for the extraction of hidden predictive data from huge databases.
Type 2 :
Data Mining is process of discovering the patterns in very large data sets involving the different methods like
Machine Learning, statistics, different database systems.
Type 3:
Data mining is defined as a process used to extract usable data from a larger set of any raw data which implies
analyzing data patterns in large batches of data using one or more software.

Data Mining
Type 4 :
The automated extraction of hidden data from a large amount of database is Data Mining.
Type 5 :
Data mining refers to the process of extracting the valid and previously unknown information from a large
database to make crucial business decisions.

 Automatic pattern predictions based on trend and behaviour analysis.
 Prediction based on likely outcomes.
 Creation of decision-oriented information.
 Focus on large data sets and databases for analysis.
 Clustering based on finding and visually documented groups of facts not previously known.
Features of Data Mining

Data Mining Techniques
Data mining involves effective data collection and warehousing as well as computer processing.
 For segmenting the data and evaluating the probability of future events, data mining uses sophisticated
mathematical algorithms.
 Data mining is also known as Knowledge Discovery in Data (KDD).
 There are various Techniques of Data Mining as follows.
1. Decision Trees
2. Sequential Patterns
3. Clustering
4. Prediction
5. Association
6. Classification

1. Decision Trees
 A tree shaped structure that represents a set of decisions. These decisions generate rules for the
classification of the data set.
 A decision tree is a predictive model that can be viewed as a tree structure
 Each branch of the tree is a classification question & the leaves are partitions o the data set with their
classification.
 It is a predctive model that makes prediction on the basis of a series of decisions.
 It divides up data on each branch point without loosing any data., i.e., the total no. of records in a given
parent node is equal to the sum of ecords contained in its children.
 Because of their structure and ability to easily generate rules, they are the favoured technique for
building understandable models
 Because of their clarity, they also allow for more complex profit & ROI models to be added easily on
top of the predictive model.

2.Sequential Patterns
 A sequential pattern function analyses collection of related records and detects frequently occurring
patterns in these records over time.
 Sequential pattern mining functions are quite powerful & can be used to detect the set of records
associated with some patterns.
For example, use of this function could be in the discovery of a rule that states that
 70% of the time when Stock X increased its value by a maximum of 10% over a 5 day trading
period,
 and Stock Y increased its value by 10% - 20% during the same period
 ten the value of Stock Z also increased by 17% - 20% in the subsequent week.

3.Clustering
 Clustering is the method by which similar records are grouped together.
 Usually this is done to give the end user a top level view or bird’s eye view of what is going on in the
database.
 Demographic data such as income, age, occupation, housing, religion and caste, taken from census
report are usually clustered.
 Clustering is one of the oldest technique used in data mining.
 In order to predict a value in a record, records with similar predictor values in the historical database
should be looked into and the prediction value from the record nearest to the unclassified record
should be used. This is known as nearest-neighbor technique.
 The input to a clustering operator is a collection of untagged records. No classes are known at the time
the clustering operator is applied.
 The goal of a clustering function is to produce a reasonable segmentation of the set of input records
according to some criterion.

4. Prediction
This method discovers the relationship between independent and dependent instances. For example, in the
area of sales; to predict the future profit, sale acts as independent instance and profit could be dependent.
Then based on historical data of sales and profit, associated profit is predicted.

Data Mining Process :
The Knowledge Discovery in Databases (KDD) process is commonly defined with the stages:
4. Data Mining
5. Interpretation/Evaluation
1. Selection
2. Pre-processing
3. Transformation
However, It exists, in many variations on this theme. For example, the Cross Industry Standard Process for Data
Mining (CRISP-DM) defines six phases of the Knowledge Discobery process:
1. Business Understanding
2. Data Understanding
3. Data Preparation
4. Modeling
5. Evaluation
6. Deployment
or a simplified process such as
1. pre-processing 2. data mining 3. results validation
The CRISP-DM methodology is the leading methodology used by majority of data miners.

Data Mining
CRISP-DM methodology simplified process
(1) pre-processing, (2) data mining, and (3) results validation.
(1) pre-processing :
 Before data mining algorithms can be used, a target data set must be assembled.
 As data mining can only uncover patterns actually present in the data, the target data set must be large
enough to contain these patterns while remaining concise enough to be mined within an acceptable time
limit.
 A common source for data is a data mart or data warehouse.
 Pre- processing is essential to analyze the multivariate data sets before data mining.
 The target set is then cleaned. Data cleaning removes the observations containing noise and those
with missing data.

Data Management
for Decision Making
Data Mining
Data Mining Process :
(2) Data Mining :
 Data mining involves six common classes of tasks:
1. Anomaly detection (Outlier/change/deviation detection)
 The identification of unusual data records, that might be interesting or data errors and require
further investigation.
2. Association rule learning (Dependency modeling)
 Searches for relationships between variables.
 For example, a supermarket might gather data on customer purchasing habits. Using association rule
learning, the supermarket can determine which products are frequently bought together and use this
information for marketing purposes. This is sometimes referred to as market basket analysis.

Data Management
for Decision Making
Data Mining
(2) Data Mining : (six common tasks)
3. Clustering
It is the task of discovering groups and structures in the data that are in some way or another “similar”,
without using known structures in the data.
4. Classification
It is the task of generalizing known structure to apply to new data. For example, an e-mail program might
attempt to classify an e-mail as “legitimate” or as “spam”.
5. Regression
Attempts to find a function which models the data with the least error.
6. Summarization
providing a more compact representation of the data set, including visualization and report generation.

Data Management
for Decision Making
Data Mining
(3) Results validation :
 The final step of knowledge discovery from data is to verify that the patterns produced by the data mining
algorithms occur in the wider data set.
 Not all patterns found by the data mining algorithms are necessarily valid. It is common for the data mining
algorithms to find patterns in the training set which are not present in the general data set. This is called overfitting.
 To overcome this, the evaluation uses a test set of data on which the data mining algorithm was not trained. The
learned patterns are applied to this test set and the resulting output is compared to the desired output.
For example, a data mining algorithm trying to distinguish “spam” from “legitimate” emails would be trained on
a training set of sample e-mails. Once trained, the learned patterns would be applied to the test set of e-mails on
which it had not been trained. The accuracy of the patterns can then be measured from how many e-mails they
correctly classify.
 A number of statistical methods may be used to evaluate the algorithm
 If the learned patterns do not meet the desired standards, then it is necessary to re-evaluate and change the pre-
processing and data mining steps. If the learned patterns do meet the desired standards, then the final step is to
interpret the learned patterns and turn them into knowledge.

MIS and Business Functions, TPS/DSS/ESS, MIS and Business Processes, Impact of MIS on Business, Using Information Systems for Competitive Advantage, Managing Information System Resources

More Related Content

Similar to MIS and Business Functions, TPS/DSS/ESS, MIS and Business Processes, Impact of MIS on Business, Using Information Systems for Competitive Advantage, Managing Information System Resources (20)

More from ShivaniTiwari24572 (10)

Recently uploaded (20)

MIS and Business Functions, TPS/DSS/ESS, MIS and Business Processes, Impact of MIS on Business, Using Information Systems for Competitive Advantage, Managing Information System Resources

Editor's Notes