SlideShare a Scribd company logo
Edwin S. Garcia – October 2019
DATA MINING
Big Data Trends 2019
OUTLINE
Introduction
Data Mining
Terminologies
Data Warehouse
Data Warehouse Architecture
Data Mining Applications
On-line Transaction Processing (OLTP)
On-line Analytical Processing (OLAP)
DW Vendors / DM Software Tools
“Everything in this world can be learned.”
DATA MINING
Definitions
Business Intelligence
The process of analyzing and sorting through large volume of
data to identify patterns to discover new business information
and establish relationships in data.
Encyclopedia of Business Analytics and Optimization
Data mining is the process of analyzing data from different
perspectives and summarizing it into useful and actionable
information. Data mining software is one of a number of analytical
tools for analyzing data.
In other words…
Data mining (knowledge discovery from data) Extraction of
interesting (non-trivial, implicit, previously unknown and potentially
useful) patterns or knowledge from huge amount of data
Data Exploration and Analysis
Data Mining is AKA
Business Intelligence
Data Mining is AKA
Information Harvesting
Data Mining is AKA
Data Dredging
Data Mining is AKA
Knowledge Discovery (mining) in
Databases (KDD)
Data Mining is AKA
Knowledge Extraction
Data Mining is AKA
Data / Pattern Analysis
Data Mining is AKA
Data Archeology
Data Mining is AKA
DATA MINING
Other Names or Also Known As (AKA)
1
2
3
45
6
7
8
Data are any facts,
numbers, or text that can
be processed by a
computer.
DATA
01
The patterns,
associations, or
relationships among all
this data can provide
information.
Facts provided or learned
about something or
someone.
INFORMATION
02
Information can be
converted into
knowledge about
historical patterns and
future trends.
KNOWLEDGE
03
Knowledge Discovery in
Databases
refers to the broad
process of finding
knowledge in data, and
emphasizes the "high-
level" application of
particular data mining
methods.
KDD
04
Terminologies
DATA NEVER SLEEPS
How Much Data is Generated Evey Minute?
In 2019, our lives are filled and surrounded with data of
all kinds. And this data never sleeps.
Data is generated in ad clicks, likes on social media,
shares, rides, transactions, streaming content, and so
much more. And when you put data in the hands of
everyone, it can transform the way you think about
business. Domo wants to empower every person in
your organization by creating a new relationship
between your people, data and systems.
In our seventh edition of Data Never Sleeps, you’ll find
out exactly how much data is generated in every
minute of every day with some of the most popular
platforms and companies in 2019.
What is DATABASE?
Purpose of Database Systems / Stages of Database Systems
Unprocessed
Information
DATA
01
Processed Data
INFORMATION
02
Evaluated
Information using
Measures
KNOWLEDGE
03
Data Analytics and
Future Predictions
ACTION
04
Database is any organized collections of data. Shared collection of logically related data (and a description of this data),
designed to meet the information needs of the organization.
Data Mining works with Warehouse Data
Data Warehousing provides the Enterprise with a Memory.
Data Mining provides the Enterprise with Intelligence.
Data Mining
Sample Scenario
What impact will new
products/services have on revenue
and margins?
What product promotions have the
biggest impact on revenue?
What is the most effective
distribution channel?
A PRODUCER WANTS TO KNOW…
Which are our lowest/highest margin
customers?
Who are my customers and what
products are they buying?
Which customers are most likely to
go to the competition?
Data Mining
Data, Data everywhere yet…
@
I can’t get the data I need
• Need an expert to get the data
@
I can’t understand the data I found
• Available data poorly documented
@
I can’t find the data I need
• Data is scattered over the network
• Many version, subtle differences
@
I can’t use the data I found
• Results are unexpected
• Data needs to be transformed from
one form to other
Knowledge Discovery in Databases (KDD)
Data mining
Apply algorithms to
transformed data
an extract patterns.
Data Integration
Combines data from
multiple sources
Combines data from
multiple sources into
a coherent store.
Data can be encoded
in common into a
coherent store -Data
can be encoded in
common formats,
normalized, reduced.
Data Cleaning
Incomplete , noisy,
inconsistent data to
be cleaned. Missing
data may be
ignored or
predicted,
erroneous data may
be deleted or
corrected.
Data Extraction
Obtaining Data
from
heterogeneous
data sources:
Databases, Data
warehouses, World
wide web or other
information
repositories.
SELECTION
PREPROCESSING
TRANSFORMATION
DATAMINING
Pattern Evaluation
Evaluatethe
interestingness of
resultingpatterns or
apply interestingness
measuresto filter out
discovered patterns.
Knowledge
presentation
present the mined
knowledge-
visualization
techniques can be
used.
INTERPRETATION
What is Data Warehouse?
“A Data Warehouse is simply a single, complete and consistent store of data obtained from a variety of
sources and made available to end-users in a way that they can understand and use in a business context.”
-- Dr. Barry Devlin (Founder of the Data Warehousing industry defining its first architecture in 1985.)
A “data warehouse” is a repository of historical data that is organized by subject to support decision makers
in an organization. Data warehouses are systems used to store data from one or more disparate sources in a
centralized place where it can be accessed for reporting and data analytics. The data in the data warehouse
may be current or historical, and may be in its original raw data form or processed / summarized.
1970
• Mainframe computers
• Simple data entry
• Routine reporting
• Primitive database structures
• Teradata incorporated
1980
• Mini/personal computers (PCs)
• Business applications for PCs
• Distributer DBMSRelational DBMS
• Teradata ships commercial DBs
• Business Data Warehouse coined
2000
• Exponentially growing data Web data
• Consolidation of DW/BI industry
• Data warehouse appliances emerged
• Business intelligence popularized
• Data mining and predictive modeling
• Open source software
• SaaS, PaaS, Cloud Computing
1990
• Centralized data storage
• Data warehousing was born
• Inmon, Building the Data Warehouse
• Kimball, The Data Warehouse Toolkit
• EDW architecture design
2010
• Big Data analytics
• Social media analytics
• Text and Web Analytics
• Hadoop, MapReduce, NoSQL
• In-memory, in-database
Data Warehouse Evolution
Data Warehouses are Very Large Databases
Source: META Group, Inc.
100%
80%
60%
40%
20%
1st 2nd 3rd 4th 5th
Very Large Databases
Data Warehouse
1 TB An automated tape robot, 10 TB The printed collection of
the US Library of Congress, 24 TB WalMart, 50 TB The
contents of a large Mass Storage System
TERABYTES (10^12 Bytes)
1 PB 5 years of EOS data (at 46 mbps), 20 PB Production of
hard-disk drives in 1995, 200 PB Production of digital magnetic
tape in 1995, Geographic Information Systems
PETABYTES (10^15 Bytes)
5 Exabytes: All words ever spoken by human beings
National Medical Records
EXABYTES (10^18 Bytes)
Weather Images, Mark Liberman calculated the storage
requirements for all human speech ever spoken at 42
zettabytes if digitized as 16 kHz 16-bit audio.
ZETTABYTES (10^21 Bytes)
Named after Yoda, To save all those bytes you need a data
center as big as the states of Delaware and Rhode Island.
YOTTABYTES (10^24 Bytes)
What is Data Warehouse?
A data warehouse is a subject-oriented, integrated, time-variant and non-volatile collection of data in
support of management's decision making process. [Bill Inmon, the “Father of Data Warehousing,”]
Data is stable in a
data warehouse.
More data is
added but data is
never removed.
This enables
management to
gain a consistent
picture of the
business.
All data in the
data warehouse is
identified with a
particular time
period.
Data that is
gathered into the
data warehouse
from a variety of
sources and
merged into a
coherent whole.
Data that gives
information about
a particular
subject instead of
about a
company's
ongoing
operations.
SubjectOriented
Integrated
Time-variant
Non-volatile
Explorer is a user who dig deep in
to large data sets to seek for
unknown patterns and
unsuspected information that are
very much valuable for the
organization.
“Explorers are users who does not
know what they are looking for.”
“Farmer is a user that harvests
information from known data.”
Farmers are the users who knows
what they are looking for and
what they need at all times. These
types of users are the most
frequent users of the data
warehouse as they always search
for known data to find what they
need.
In general tourist is a type of user
that browse information as and
when they need. These users use
the data warehouse very rarely
and they are the users who least
use the data warehouse.
“Browse information harvested by
farmers”
Types of Data Warehouse User Classes
Explorers, Farmers, Tourists, [Data Miners, Operators]
Explorers TouristsFarmers
Data Warehouses Architecture
Three-Tier Architecture
Traditional data warehouse architecture employs a three-tier
structure composed of the following tiers.
• Bottom tier: This tier contains the database server used to
extract data from many different sources, such as from
transactional databases used for front-end applications.
• Middle tier: The middle tier houses an OLAP server, which
transforms the data into a structure better suited for analysis
and complex querying. The OLAP server can work in two ways:
either as an extended relational database management
system that maps the operations on multidimensional data to
standard relational operations (Relational OLAP), or using a
multidimensional OLAP model that directly implements the
multidimensional data and operations.
• Top tier: The top tier is the client layer. This tier holds the tools
used for high-level data analysis, querying reporting, and data
mining.
Data Warehouse Models
Traditional Architecture
01
Virtual Data
Warehouse
A virtual data
warehouse is a set
of separate
databases, which
can be queried
together, so a user
can effectively
access all the data
as if it was stored
in one data
warehouse.
02
Data Mart
Model
A data mart model
is used for
business-line
specific reporting
and analysis. In
this data
warehouse model,
data is aggregated
from a range of
source systems
relevant to a
specific business
area, such as sales
or finance.
03
Enterprise
Data
Warehouse
An enterprise data
warehouse model
prescribes that the
data warehouse
contain aggregated
data that spans the
entire organization.
This model sees
the data
warehouse as the
heart of the
enterprise’s
information
system,with
integrated data
from all business
units.
Star Schema vs. Snowflake Schema
The star schema and snowflake schema are
two ways to structure a data warehouse.
The star schema has a centralized data
repository, stored in a fact table. The
schema splits the fact table into a series of
denormalized dimension tables. The fact
table contains aggregated data to be used
for reporting purposes while the dimension
table describes the stored data.
Denormalized designs are less complex
because the data is grouped. The fact table
uses only one link to join to each dimension
table. The star schema’s simpler design
makes it much easier to write complex
queries.
Star Schema vs. Snowflake Schema
The snowflake schema is different because
it normalizes the data. Normalization
means efficiently organizing the data so
that all data dependencies are defined,
and each table contains minimal
redundancies. Single dimension tables
thus branch out into separate dimension
tables.
The snowflake schema uses less disk space
and better preserves data integrity. The
main disadvantage is the complexity of
queries required to access data—each
query must dig deep to get to the relevant
data because there are multiple joins.
ETL vs. ELT
ETL and ELT are two different methods of loading data into a warehouse.
Extract, Transform, Load (ETL) first extracts the data from a pool of data sources, which are typically
transactional databases. The data is held in a temporary staging database. Transformation operations are then
performed, to structure and convert the data into a suitable form for the target data warehouse system. The
structured data is then loaded into the warehouse, ready for analysis.
ETL vs. ELT
With Extract Load Transform (ELT), data is immediately loaded after being extracted from the source data pools.
There is no staging database, meaning the data is immediately loaded into the single, centralized repository.
The data is transformed inside the data warehouse system for use with business intelligence tools and analytics.
Organization Data Warehouse
The structure of an organization’s
data warehouse also depends on its
current situation and needs.
The basic structure lets end users of
the warehouse directly access
summary data derived from source
systems and perform analysis,
reporting, and mining on that data.
This structure is useful for when
data sources derive from the same
types of database systems.
Organization Data Warehouse
A warehouse with a staging
area is the next logical step
in an organization with
disparate data sources with
many different types and
formats of data. The
staging area converts the
data into a summarized
structured format that is
easier to query with
analysis and reporting
tools.
Organization Data Warehouse
A variation on the staging structure is the addition of data marts to the data warehouse. The data marts store
summarized data for a particular line of business, making that data easily accessible for specific forms of
analysis. For example, adding data marts can allow a financial analyst to more easily perform detailed queries
on sales data, to make predictions about customer behavior. Data marts make analysis easier by tailoring data
specifically to meet the needs of the end user.
Data mining is used to explore
increasingly large databases
and to improve market
segmentation. By analysing the
relationships between
parameters such as customer
age, gender, tastes, etc., it is
possible to guess their
behaviour in order to direct
personalised loyalty
campaigns.
Amazon, Lazada,
Shopee
MARKETING
DATA MINING APPLICATIONS
Supermarkets, for example,
use joint purchasing patterns
to identify product
associations and decide how
to place them in the aisles and
on the shelves. Data mining
also detects which offers are
most valued by customers or
increase sales at the checkout
queue.
SM, Robinson,
Walmart
RETAIL
Banks use data mining to
better understand market
risks. It is commonly applied to
credit ratings and to intelligent
anti-fraud systems to analyse
transactions, card transactions,
purchasing patterns and
customer financial data. Data
mining also allows banks to
learn more about our online
preferences or habits to
optimise the return on their
marketing campaigns
Banks, Credit Cards
BANKING
Data mining enables more
accurate diagnostics. Having all of
the patient's information, such as
medical records, physical
examinations, and treatment
patterns, allows more effective
treatments to be prescribed. It
also enables more effective,
efficient and cost-effective
management of health resources
by identifying risks, predicting
illnesses in certain segments of
the population
Hospital, Pharmacy
MEDICINE
There are networks that apply
real time data mining to measure
their online television (IPTV) and
radio audiences. These systems
collect and analyse, on the fly,
anonymous information from
channel views, broadcasts and
programming. Data mining allows
networks to make personalised
recommendations to radio
listeners and TV viewers, as well
as get to know their interests and
activities in real time and better
understand their behaviour.
TV, Radio
Networks, ISP
TV and RADIO
DATA MINING:
A PROFESSION OF THE FUTURE
Today, data search, analysis and management are
markets with enormous employment
opportunities. Data mining professionals work with
databases to evaluate information and discard any
information that is not useful or reliable. This
requires knowledge of big data, computing and
information analysis, and the ability to handle
different types of software.
LinkedIn's Annual Report on emerging jobs noted
that three of the most in-demand jobs in the
United States were positions related to big data.
Likewise, IBM forecasts that the demand for this
type of professionals will grow by 28% between
now and 2020.
FINANCE INDUSTRY
Application: Credit Card Analysis
INSURANCE INDUSTRY
Application: Claims, Fraud Analysis
TELECOMMUNICATION INDUSTRY
Application: Call Record Analysis
DATA MINING
Other Industry and Application Areas
TRANSPORT INDUSTRY
Application: Logistic Management
CONSUMER GOODS
Application: Promotion Analysis
DATA SERVICE PROVIDERS
Application: Value Added Data
UTILITIES INDUSTRY
Application: Power Usage Analysis
DATA WAREHOUSE VS DATA MINING
Key Differences between Data Warehousing vs Data Mining
DATA WAREHOUSE APPLICATION
Application of Data Warehouse in Real-Life
An operational system is a term used in data warehousing to refer to a system that is used
to process the day-to-day transactions of an organization. These systems are designed in a
manner that processing of day-to-day transactions is performed efficiently and the
integrity of the transactional data is preserved.
Used to run a business!
What are Operational Systems
They are OLTP systems.
Run mission critical
applications.
Need to work with stringent
performance requirements
for routine tasks.
Operational Systems
Optimized to handle large numbers of
simple read/write transactions
Based on up-to-the-second data
Run the business in real time
They are increasingly used by customers
Used by people who deal with customers,
products – Clerks, Salespeople etc.
Optimized for fast response to predefined
transactions
Examples of Operation Data
Application-Orientation VS Subject-Orientation
What is OLTP?
OLTP (Online Transactional Processing) is a category of data processing that is focused on transaction-oriented tasks. OLTP
typically involves inserting, updating, and/or deleting small amounts of data in a database. OLTP mainly deals with large
numbers of transactions by a large number of users. Examples of OLTP Transactions:
Online banking
Purchasing a book online
Booking an airline ticket
Sending a text message
Telemarketers entering telephone survey results
Call center staff viewing and updating customers’ details
Order entry
Characteristics of OLTP
OLTP transactions are usually very specific in the task that they perform, and they usually involve a single record or a small
selection of records. For example, an online banking customer might send money from his account to his wife’s account. In
this case, the transaction only involves two accounts – his account and his wife’s. It does not involve the other bank
customers.
This is in contrast to Online Analytical Processing (OLAP), which usually involves querying many records (even all records)
in a database for analytical purposes. An OLAP banking example could be a bank manager performing a query across all
customer accounts, so that he can see which suburbs had the most active online banking customers during a certain
period.
OLAP is often used to provide analytics on data that was captured via an OLTP application. So, while OLTP and OLAP often
work with the same data sets, they have different characteristics.
OLTP applications typically possess the following characteristics:
• Transactions that involve small amounts of data
• Indexed access to data
• A large number of users
• Frequent queries and updates
• Fast response times
What is OLAP?
OLAP (Online Analytical Processing) is the technology behind many Business Intelligence (BI) applications. OLAP is a powerful
technology for data discovery, including capabilities for limitless report viewing, complex analytical calculations, and
predictive “what if” scenario (budget, forecast) planning.
How is OLAP Technology Used?
OLAP performs multidimensional analysis
of business data and provides the capability
for complex calculations, trend analysis,
and sophisticated data modeling. It is the
foundation for many kinds of business
applications for Business Performance
Management, Planning, Budgeting,
Forecasting, Financial Reporting, Analysis,
Simulation Models, Knowledge Discovery,
and Data Warehouse Reporting.
OLAP enables end-users to perform ad hoc
analysis of data in multiple dimensions,
thereby providing the insight and
understanding they need for better
decision making.
Online Transaction
Processing (OLTP)
VS
Online Analytical
Processing (OLAP)
OLTP and OLAP Applications
Data Mining Software Tools
OLAP (Online Analytical Processing) is the technology behind many Business Intelligence (BI) applications. OLAP is a powerful
technology for data discovery, including capabilities for limitless report viewing, complex analytical calculations, and
predictive “what if” scenario (budget, forecast) planning.
Big Data Vendors
Big Data is a term that describes the
large volume of data – both structured
and unstructured – that inundates a
business on a day-to-day basis. But it’s
not the amount of data that’s
important. It’s what organizations do
with the data that matters. Big data
can be analyzed for insights that lead
to better decisions and strategic
business moves.
While the term “big data” is relatively
new, the act of gathering and storing
large amounts of information for
eventual analysis is ages old. The
concept gained momentum in the
early 2000s when industry analyst
Doug Laney articulated the now-
mainstream definition of big data as
the three Vs: Volume, Velocity, Variety
LINKEDIN MOST PROMISING JOBS
For the Year 2019
The Digital Skills Gap and the Future of Jobs 2020
Contact Me
Edwin Santos Garcia
Work 044-791-3451
Home: 044-762-6924
facebook.com/EdwinGarciaPH75
+639231142814
Edwin_GarciaPH@outlook.com
0411 Purok 4 Mercado
Hagonoy, Bulacan
THANK YOU
“People create their own success by learning what they need to learn
and then by practicing it until they become proficient at it.”

More Related Content

PDF
Lecture1 introduction to big data
DOCX
Datamining
PPTX
Introduction to Big Data
PPTX
Big Data Tutorial V4
PDF
Data mining (lecture 1 & 2) conecpts and techniques
PPTX
Introduction to-data-mining chapter 1
PPTX
PDF
Data minig with Big data analysis
Lecture1 introduction to big data
Datamining
Introduction to Big Data
Big Data Tutorial V4
Data mining (lecture 1 & 2) conecpts and techniques
Introduction to-data-mining chapter 1
Data minig with Big data analysis

What's hot (20)

PPT
Data mining
PDF
Big Data Final Presentation
PDF
Data mining
PPT
Introduction data mining
PDF
Big Data Fundamentals
PPTX
Big Data and Classification
PPTX
Lec 1 introduction
PDF
Data Mining and Business Intelligence Tools
PPTX
Digital data
PDF
Data Mining With Excel 2007 And SQL Server 2008
PPTX
Presentation on Big Data Analytics
PPSX
Intro to Data Science Big Data
PPTX
Lecture 1 introduction to data warehouse
PDF
INF2190_W1_2016_public
PDF
Ch 1 intro_dw
DOCX
Abstract
PPTX
Data mining
PPT
Data mining slides
 
PPT
Data Mining: Concepts and Techniques (3rd ed.) — Chapter _04 olap
PDF
Data Mining and Big Data Challenges and Research Opportunities
Data mining
Big Data Final Presentation
Data mining
Introduction data mining
Big Data Fundamentals
Big Data and Classification
Lec 1 introduction
Data Mining and Business Intelligence Tools
Digital data
Data Mining With Excel 2007 And SQL Server 2008
Presentation on Big Data Analytics
Intro to Data Science Big Data
Lecture 1 introduction to data warehouse
INF2190_W1_2016_public
Ch 1 intro_dw
Abstract
Data mining
Data mining slides
 
Data Mining: Concepts and Techniques (3rd ed.) — Chapter _04 olap
Data Mining and Big Data Challenges and Research Opportunities
Ad

Similar to Data Mining @ BSU Malolos 2019 (20)

PPTX
Big data and data mining
PPT
Data mining
PDF
Dm unit i r16
PPT
Datawarehouse
PPTX
Data warehousing and Data mining
PDF
Gerenral insurance Accounts IT and Investment
PPTX
Business Intelligence Module 3_Datawarehousing.pptx
PPT
dwdm unit 1.ppt
PPT
Introduction to Data Mining
PPT
Data Mining: Concepts and Techniques.ppt
PPT
Data Warehouse and Data Mining
PPTX
Introduction To Data Mining and Data Mining Techniques.pptx
PPTX
Lect 1 introduction
PPTX
Data mining with big data
PPTX
Week-1-Introduction to Data Mining.pptx
PPTX
Datawarehouse
PPTX
Business Intelligence and Analytics Unit-2 part-A .pptx
PDF
Data Mining mod1 ppt.pdf bca sixth semester notes
PPT
Data mining
Big data and data mining
Data mining
Dm unit i r16
Datawarehouse
Data warehousing and Data mining
Gerenral insurance Accounts IT and Investment
Business Intelligence Module 3_Datawarehousing.pptx
dwdm unit 1.ppt
Introduction to Data Mining
Data Mining: Concepts and Techniques.ppt
Data Warehouse and Data Mining
Introduction To Data Mining and Data Mining Techniques.pptx
Lect 1 introduction
Data mining with big data
Week-1-Introduction to Data Mining.pptx
Datawarehouse
Business Intelligence and Analytics Unit-2 part-A .pptx
Data Mining mod1 ppt.pdf bca sixth semester notes
Data mining
Ad

Recently uploaded (20)

PPTX
IMPACT OF LANDSLIDE.....................
PDF
REAL ILLUMINATI AGENT IN KAMPALA UGANDA CALL ON+256765750853/0705037305
PDF
Votre score augmente si vous choisissez une catégorie et que vous rédigez une...
PPT
Predictive modeling basics in data cleaning process
PDF
Introduction to the R Programming Language
PDF
OneRead_20250728_1808.pdfhdhddhshahwhwwjjaaja
PPTX
DS-40-Pre-Engagement and Kickoff deck - v8.0.pptx
PPTX
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
DOCX
Factor Analysis Word Document Presentation
PPTX
Topic 5 Presentation 5 Lesson 5 Corporate Fin
PDF
Systems Analysis and Design, 12th Edition by Scott Tilley Test Bank.pdf
PDF
Introduction to Data Science and Data Analysis
PPTX
Qualitative Qantitative and Mixed Methods.pptx
PPT
lectureusjsjdhdsjjshdshshddhdhddhhd1.ppt
PDF
Capcut Pro Crack For PC Latest Version {Fully Unlocked 2025}
PPTX
Introduction to Inferential Statistics.pptx
PPT
DU, AIS, Big Data and Data Analytics.ppt
PPT
ISS -ESG Data flows What is ESG and HowHow
PPTX
Market Analysis -202507- Wind-Solar+Hybrid+Street+Lights+for+the+North+Amer...
PDF
[EN] Industrial Machine Downtime Prediction
IMPACT OF LANDSLIDE.....................
REAL ILLUMINATI AGENT IN KAMPALA UGANDA CALL ON+256765750853/0705037305
Votre score augmente si vous choisissez une catégorie et que vous rédigez une...
Predictive modeling basics in data cleaning process
Introduction to the R Programming Language
OneRead_20250728_1808.pdfhdhddhshahwhwwjjaaja
DS-40-Pre-Engagement and Kickoff deck - v8.0.pptx
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
Factor Analysis Word Document Presentation
Topic 5 Presentation 5 Lesson 5 Corporate Fin
Systems Analysis and Design, 12th Edition by Scott Tilley Test Bank.pdf
Introduction to Data Science and Data Analysis
Qualitative Qantitative and Mixed Methods.pptx
lectureusjsjdhdsjjshdshshddhdhddhhd1.ppt
Capcut Pro Crack For PC Latest Version {Fully Unlocked 2025}
Introduction to Inferential Statistics.pptx
DU, AIS, Big Data and Data Analytics.ppt
ISS -ESG Data flows What is ESG and HowHow
Market Analysis -202507- Wind-Solar+Hybrid+Street+Lights+for+the+North+Amer...
[EN] Industrial Machine Downtime Prediction

Data Mining @ BSU Malolos 2019

  • 1. Edwin S. Garcia – October 2019 DATA MINING Big Data Trends 2019
  • 2. OUTLINE Introduction Data Mining Terminologies Data Warehouse Data Warehouse Architecture Data Mining Applications On-line Transaction Processing (OLTP) On-line Analytical Processing (OLAP) DW Vendors / DM Software Tools “Everything in this world can be learned.”
  • 3. DATA MINING Definitions Business Intelligence The process of analyzing and sorting through large volume of data to identify patterns to discover new business information and establish relationships in data. Encyclopedia of Business Analytics and Optimization Data mining is the process of analyzing data from different perspectives and summarizing it into useful and actionable information. Data mining software is one of a number of analytical tools for analyzing data. In other words… Data mining (knowledge discovery from data) Extraction of interesting (non-trivial, implicit, previously unknown and potentially useful) patterns or knowledge from huge amount of data
  • 4. Data Exploration and Analysis Data Mining is AKA Business Intelligence Data Mining is AKA Information Harvesting Data Mining is AKA Data Dredging Data Mining is AKA Knowledge Discovery (mining) in Databases (KDD) Data Mining is AKA Knowledge Extraction Data Mining is AKA Data / Pattern Analysis Data Mining is AKA Data Archeology Data Mining is AKA DATA MINING Other Names or Also Known As (AKA) 1 2 3 45 6 7 8
  • 5. Data are any facts, numbers, or text that can be processed by a computer. DATA 01 The patterns, associations, or relationships among all this data can provide information. Facts provided or learned about something or someone. INFORMATION 02 Information can be converted into knowledge about historical patterns and future trends. KNOWLEDGE 03 Knowledge Discovery in Databases refers to the broad process of finding knowledge in data, and emphasizes the "high- level" application of particular data mining methods. KDD 04 Terminologies
  • 6. DATA NEVER SLEEPS How Much Data is Generated Evey Minute? In 2019, our lives are filled and surrounded with data of all kinds. And this data never sleeps. Data is generated in ad clicks, likes on social media, shares, rides, transactions, streaming content, and so much more. And when you put data in the hands of everyone, it can transform the way you think about business. Domo wants to empower every person in your organization by creating a new relationship between your people, data and systems. In our seventh edition of Data Never Sleeps, you’ll find out exactly how much data is generated in every minute of every day with some of the most popular platforms and companies in 2019.
  • 7. What is DATABASE? Purpose of Database Systems / Stages of Database Systems Unprocessed Information DATA 01 Processed Data INFORMATION 02 Evaluated Information using Measures KNOWLEDGE 03 Data Analytics and Future Predictions ACTION 04 Database is any organized collections of data. Shared collection of logically related data (and a description of this data), designed to meet the information needs of the organization.
  • 8. Data Mining works with Warehouse Data Data Warehousing provides the Enterprise with a Memory. Data Mining provides the Enterprise with Intelligence.
  • 9. Data Mining Sample Scenario What impact will new products/services have on revenue and margins? What product promotions have the biggest impact on revenue? What is the most effective distribution channel? A PRODUCER WANTS TO KNOW… Which are our lowest/highest margin customers? Who are my customers and what products are they buying? Which customers are most likely to go to the competition?
  • 10. Data Mining Data, Data everywhere yet… @ I can’t get the data I need • Need an expert to get the data @ I can’t understand the data I found • Available data poorly documented @ I can’t find the data I need • Data is scattered over the network • Many version, subtle differences @ I can’t use the data I found • Results are unexpected • Data needs to be transformed from one form to other
  • 11. Knowledge Discovery in Databases (KDD) Data mining Apply algorithms to transformed data an extract patterns. Data Integration Combines data from multiple sources Combines data from multiple sources into a coherent store. Data can be encoded in common into a coherent store -Data can be encoded in common formats, normalized, reduced. Data Cleaning Incomplete , noisy, inconsistent data to be cleaned. Missing data may be ignored or predicted, erroneous data may be deleted or corrected. Data Extraction Obtaining Data from heterogeneous data sources: Databases, Data warehouses, World wide web or other information repositories. SELECTION PREPROCESSING TRANSFORMATION DATAMINING Pattern Evaluation Evaluatethe interestingness of resultingpatterns or apply interestingness measuresto filter out discovered patterns. Knowledge presentation present the mined knowledge- visualization techniques can be used. INTERPRETATION
  • 12. What is Data Warehouse? “A Data Warehouse is simply a single, complete and consistent store of data obtained from a variety of sources and made available to end-users in a way that they can understand and use in a business context.” -- Dr. Barry Devlin (Founder of the Data Warehousing industry defining its first architecture in 1985.) A “data warehouse” is a repository of historical data that is organized by subject to support decision makers in an organization. Data warehouses are systems used to store data from one or more disparate sources in a centralized place where it can be accessed for reporting and data analytics. The data in the data warehouse may be current or historical, and may be in its original raw data form or processed / summarized.
  • 13. 1970 • Mainframe computers • Simple data entry • Routine reporting • Primitive database structures • Teradata incorporated 1980 • Mini/personal computers (PCs) • Business applications for PCs • Distributer DBMSRelational DBMS • Teradata ships commercial DBs • Business Data Warehouse coined 2000 • Exponentially growing data Web data • Consolidation of DW/BI industry • Data warehouse appliances emerged • Business intelligence popularized • Data mining and predictive modeling • Open source software • SaaS, PaaS, Cloud Computing 1990 • Centralized data storage • Data warehousing was born • Inmon, Building the Data Warehouse • Kimball, The Data Warehouse Toolkit • EDW architecture design 2010 • Big Data analytics • Social media analytics • Text and Web Analytics • Hadoop, MapReduce, NoSQL • In-memory, in-database Data Warehouse Evolution
  • 14. Data Warehouses are Very Large Databases Source: META Group, Inc.
  • 15. 100% 80% 60% 40% 20% 1st 2nd 3rd 4th 5th Very Large Databases Data Warehouse 1 TB An automated tape robot, 10 TB The printed collection of the US Library of Congress, 24 TB WalMart, 50 TB The contents of a large Mass Storage System TERABYTES (10^12 Bytes) 1 PB 5 years of EOS data (at 46 mbps), 20 PB Production of hard-disk drives in 1995, 200 PB Production of digital magnetic tape in 1995, Geographic Information Systems PETABYTES (10^15 Bytes) 5 Exabytes: All words ever spoken by human beings National Medical Records EXABYTES (10^18 Bytes) Weather Images, Mark Liberman calculated the storage requirements for all human speech ever spoken at 42 zettabytes if digitized as 16 kHz 16-bit audio. ZETTABYTES (10^21 Bytes) Named after Yoda, To save all those bytes you need a data center as big as the states of Delaware and Rhode Island. YOTTABYTES (10^24 Bytes)
  • 16. What is Data Warehouse? A data warehouse is a subject-oriented, integrated, time-variant and non-volatile collection of data in support of management's decision making process. [Bill Inmon, the “Father of Data Warehousing,”] Data is stable in a data warehouse. More data is added but data is never removed. This enables management to gain a consistent picture of the business. All data in the data warehouse is identified with a particular time period. Data that is gathered into the data warehouse from a variety of sources and merged into a coherent whole. Data that gives information about a particular subject instead of about a company's ongoing operations. SubjectOriented Integrated Time-variant Non-volatile
  • 17. Explorer is a user who dig deep in to large data sets to seek for unknown patterns and unsuspected information that are very much valuable for the organization. “Explorers are users who does not know what they are looking for.” “Farmer is a user that harvests information from known data.” Farmers are the users who knows what they are looking for and what they need at all times. These types of users are the most frequent users of the data warehouse as they always search for known data to find what they need. In general tourist is a type of user that browse information as and when they need. These users use the data warehouse very rarely and they are the users who least use the data warehouse. “Browse information harvested by farmers” Types of Data Warehouse User Classes Explorers, Farmers, Tourists, [Data Miners, Operators] Explorers TouristsFarmers
  • 18. Data Warehouses Architecture Three-Tier Architecture Traditional data warehouse architecture employs a three-tier structure composed of the following tiers. • Bottom tier: This tier contains the database server used to extract data from many different sources, such as from transactional databases used for front-end applications. • Middle tier: The middle tier houses an OLAP server, which transforms the data into a structure better suited for analysis and complex querying. The OLAP server can work in two ways: either as an extended relational database management system that maps the operations on multidimensional data to standard relational operations (Relational OLAP), or using a multidimensional OLAP model that directly implements the multidimensional data and operations. • Top tier: The top tier is the client layer. This tier holds the tools used for high-level data analysis, querying reporting, and data mining.
  • 19. Data Warehouse Models Traditional Architecture 01 Virtual Data Warehouse A virtual data warehouse is a set of separate databases, which can be queried together, so a user can effectively access all the data as if it was stored in one data warehouse. 02 Data Mart Model A data mart model is used for business-line specific reporting and analysis. In this data warehouse model, data is aggregated from a range of source systems relevant to a specific business area, such as sales or finance. 03 Enterprise Data Warehouse An enterprise data warehouse model prescribes that the data warehouse contain aggregated data that spans the entire organization. This model sees the data warehouse as the heart of the enterprise’s information system,with integrated data from all business units.
  • 20. Star Schema vs. Snowflake Schema The star schema and snowflake schema are two ways to structure a data warehouse. The star schema has a centralized data repository, stored in a fact table. The schema splits the fact table into a series of denormalized dimension tables. The fact table contains aggregated data to be used for reporting purposes while the dimension table describes the stored data. Denormalized designs are less complex because the data is grouped. The fact table uses only one link to join to each dimension table. The star schema’s simpler design makes it much easier to write complex queries.
  • 21. Star Schema vs. Snowflake Schema The snowflake schema is different because it normalizes the data. Normalization means efficiently organizing the data so that all data dependencies are defined, and each table contains minimal redundancies. Single dimension tables thus branch out into separate dimension tables. The snowflake schema uses less disk space and better preserves data integrity. The main disadvantage is the complexity of queries required to access data—each query must dig deep to get to the relevant data because there are multiple joins.
  • 22. ETL vs. ELT ETL and ELT are two different methods of loading data into a warehouse. Extract, Transform, Load (ETL) first extracts the data from a pool of data sources, which are typically transactional databases. The data is held in a temporary staging database. Transformation operations are then performed, to structure and convert the data into a suitable form for the target data warehouse system. The structured data is then loaded into the warehouse, ready for analysis.
  • 23. ETL vs. ELT With Extract Load Transform (ELT), data is immediately loaded after being extracted from the source data pools. There is no staging database, meaning the data is immediately loaded into the single, centralized repository. The data is transformed inside the data warehouse system for use with business intelligence tools and analytics.
  • 24. Organization Data Warehouse The structure of an organization’s data warehouse also depends on its current situation and needs. The basic structure lets end users of the warehouse directly access summary data derived from source systems and perform analysis, reporting, and mining on that data. This structure is useful for when data sources derive from the same types of database systems.
  • 25. Organization Data Warehouse A warehouse with a staging area is the next logical step in an organization with disparate data sources with many different types and formats of data. The staging area converts the data into a summarized structured format that is easier to query with analysis and reporting tools.
  • 26. Organization Data Warehouse A variation on the staging structure is the addition of data marts to the data warehouse. The data marts store summarized data for a particular line of business, making that data easily accessible for specific forms of analysis. For example, adding data marts can allow a financial analyst to more easily perform detailed queries on sales data, to make predictions about customer behavior. Data marts make analysis easier by tailoring data specifically to meet the needs of the end user.
  • 27. Data mining is used to explore increasingly large databases and to improve market segmentation. By analysing the relationships between parameters such as customer age, gender, tastes, etc., it is possible to guess their behaviour in order to direct personalised loyalty campaigns. Amazon, Lazada, Shopee MARKETING DATA MINING APPLICATIONS Supermarkets, for example, use joint purchasing patterns to identify product associations and decide how to place them in the aisles and on the shelves. Data mining also detects which offers are most valued by customers or increase sales at the checkout queue. SM, Robinson, Walmart RETAIL Banks use data mining to better understand market risks. It is commonly applied to credit ratings and to intelligent anti-fraud systems to analyse transactions, card transactions, purchasing patterns and customer financial data. Data mining also allows banks to learn more about our online preferences or habits to optimise the return on their marketing campaigns Banks, Credit Cards BANKING Data mining enables more accurate diagnostics. Having all of the patient's information, such as medical records, physical examinations, and treatment patterns, allows more effective treatments to be prescribed. It also enables more effective, efficient and cost-effective management of health resources by identifying risks, predicting illnesses in certain segments of the population Hospital, Pharmacy MEDICINE There are networks that apply real time data mining to measure their online television (IPTV) and radio audiences. These systems collect and analyse, on the fly, anonymous information from channel views, broadcasts and programming. Data mining allows networks to make personalised recommendations to radio listeners and TV viewers, as well as get to know their interests and activities in real time and better understand their behaviour. TV, Radio Networks, ISP TV and RADIO
  • 28. DATA MINING: A PROFESSION OF THE FUTURE Today, data search, analysis and management are markets with enormous employment opportunities. Data mining professionals work with databases to evaluate information and discard any information that is not useful or reliable. This requires knowledge of big data, computing and information analysis, and the ability to handle different types of software. LinkedIn's Annual Report on emerging jobs noted that three of the most in-demand jobs in the United States were positions related to big data. Likewise, IBM forecasts that the demand for this type of professionals will grow by 28% between now and 2020.
  • 29. FINANCE INDUSTRY Application: Credit Card Analysis INSURANCE INDUSTRY Application: Claims, Fraud Analysis TELECOMMUNICATION INDUSTRY Application: Call Record Analysis DATA MINING Other Industry and Application Areas TRANSPORT INDUSTRY Application: Logistic Management CONSUMER GOODS Application: Promotion Analysis DATA SERVICE PROVIDERS Application: Value Added Data UTILITIES INDUSTRY Application: Power Usage Analysis
  • 30. DATA WAREHOUSE VS DATA MINING Key Differences between Data Warehousing vs Data Mining
  • 31. DATA WAREHOUSE APPLICATION Application of Data Warehouse in Real-Life
  • 32. An operational system is a term used in data warehousing to refer to a system that is used to process the day-to-day transactions of an organization. These systems are designed in a manner that processing of day-to-day transactions is performed efficiently and the integrity of the transactional data is preserved. Used to run a business! What are Operational Systems They are OLTP systems. Run mission critical applications. Need to work with stringent performance requirements for routine tasks.
  • 33. Operational Systems Optimized to handle large numbers of simple read/write transactions Based on up-to-the-second data Run the business in real time They are increasingly used by customers Used by people who deal with customers, products – Clerks, Salespeople etc. Optimized for fast response to predefined transactions
  • 36. What is OLTP? OLTP (Online Transactional Processing) is a category of data processing that is focused on transaction-oriented tasks. OLTP typically involves inserting, updating, and/or deleting small amounts of data in a database. OLTP mainly deals with large numbers of transactions by a large number of users. Examples of OLTP Transactions: Online banking Purchasing a book online Booking an airline ticket Sending a text message Telemarketers entering telephone survey results Call center staff viewing and updating customers’ details Order entry
  • 37. Characteristics of OLTP OLTP transactions are usually very specific in the task that they perform, and they usually involve a single record or a small selection of records. For example, an online banking customer might send money from his account to his wife’s account. In this case, the transaction only involves two accounts – his account and his wife’s. It does not involve the other bank customers. This is in contrast to Online Analytical Processing (OLAP), which usually involves querying many records (even all records) in a database for analytical purposes. An OLAP banking example could be a bank manager performing a query across all customer accounts, so that he can see which suburbs had the most active online banking customers during a certain period. OLAP is often used to provide analytics on data that was captured via an OLTP application. So, while OLTP and OLAP often work with the same data sets, they have different characteristics. OLTP applications typically possess the following characteristics: • Transactions that involve small amounts of data • Indexed access to data • A large number of users • Frequent queries and updates • Fast response times
  • 38. What is OLAP? OLAP (Online Analytical Processing) is the technology behind many Business Intelligence (BI) applications. OLAP is a powerful technology for data discovery, including capabilities for limitless report viewing, complex analytical calculations, and predictive “what if” scenario (budget, forecast) planning. How is OLAP Technology Used? OLAP performs multidimensional analysis of business data and provides the capability for complex calculations, trend analysis, and sophisticated data modeling. It is the foundation for many kinds of business applications for Business Performance Management, Planning, Budgeting, Forecasting, Financial Reporting, Analysis, Simulation Models, Knowledge Discovery, and Data Warehouse Reporting. OLAP enables end-users to perform ad hoc analysis of data in multiple dimensions, thereby providing the insight and understanding they need for better decision making.
  • 39. Online Transaction Processing (OLTP) VS Online Analytical Processing (OLAP)
  • 40. OLTP and OLAP Applications
  • 41. Data Mining Software Tools OLAP (Online Analytical Processing) is the technology behind many Business Intelligence (BI) applications. OLAP is a powerful technology for data discovery, including capabilities for limitless report viewing, complex analytical calculations, and predictive “what if” scenario (budget, forecast) planning.
  • 42. Big Data Vendors Big Data is a term that describes the large volume of data – both structured and unstructured – that inundates a business on a day-to-day basis. But it’s not the amount of data that’s important. It’s what organizations do with the data that matters. Big data can be analyzed for insights that lead to better decisions and strategic business moves. While the term “big data” is relatively new, the act of gathering and storing large amounts of information for eventual analysis is ages old. The concept gained momentum in the early 2000s when industry analyst Doug Laney articulated the now- mainstream definition of big data as the three Vs: Volume, Velocity, Variety
  • 43. LINKEDIN MOST PROMISING JOBS For the Year 2019
  • 44. The Digital Skills Gap and the Future of Jobs 2020
  • 45. Contact Me Edwin Santos Garcia Work 044-791-3451 Home: 044-762-6924 facebook.com/EdwinGarciaPH75 +639231142814 Edwin_GarciaPH@outlook.com 0411 Purok 4 Mercado Hagonoy, Bulacan
  • 46. THANK YOU “People create their own success by learning what they need to learn and then by practicing it until they become proficient at it.”