SlideShare a Scribd company logo
1
Applications of Data Mining
Issues in Data Mining
2
Applications
 Financial Data Analysis
 Retail Industry
 Telecommunication Industry
 Biological Data Analysis
 Other Scientific Applications
 Intrusion Detection
3
Financial Data Analysis
 Financial Data
 Collected from Banks and Financial Institutions
 Usually complete and reliable
 Design and Construction of data Warehouses for multi-
dimensional data analysis and mining
 Analysis – Changes by month, by region, by sector…and max,
min, total, average, trend etc.
 Characteristic and Comparative analysis, Outlier Analysis
4
 Loan payment and customer credit policy analysis
 Feature Selection and attribute relevance ranking (Debt ratio,
credit history, income, education level …)
 Loan granting policy can be adjusted
 Low risk Customers are granted loans
 Classification and Clustering of customers for targeted
marketing
 Customer group identification
 Multidimensional clustering techniques
 Can associate new customer with existing groups
Financial Data Analysis
5
 Detection of money laundering and financial crimes
 Data from several sources – integrated
 Data Analysis tools can be used to detect unusual patterns
 Data Visualization tools, Linkage Analysis tools
 Classification tools, Clustering tools
 Outlier Analysis tools
Financial Data Analysis
6
Retail Industry
 Sales Data, Customer Shopping history, Goods
Transportation, E-Commerce
 Mining can help to
 Identify buying behaviour, discover shopping trends
 Improve the quality of customer service, retain customers
 Design and Construction of data warehouses
 Several ways to design a warehouse
 Entities involved: Sales, Customers, Employers, Goods transportation…
 Preliminary data mining exercises can help to guide the design
process
 Dimensions and levels to involve and pre-processing to be done
7
 Multi-dimensional analysis of sales, customers, products,
time and region
 Multi-feature data cube
 Visualization tools
 Analysis of effectiveness of sales campaigns
 Compare sales and transaction volume
 Multidimensional analysis
 Compare sales amount, number of transactions containing same items before
and after the campaign
 Association Analysis
 Identify items likely to be purchased together
Retail Industry
8
 Customer Retention
 Customer loyalty and trends
 Sequential pattern mining
 Adjust pricing strategy and goods range
 Purchase recommendation and cross-reference of items
 Recommender Systems
 Sales promotion by displaying deal information in association
with items of interest
Retail Industry
9
Telecommunication Industry
 Computer and Web data transmission, fax, Mobile
phone, Telephone services
 Multidimensional analysis of telecommunication data
 Helps to identify and compare the data traffic, System work load,
Resource usage, User Group Behavior, Profit..
 Time-of-day usage patterns
 Fraudulent pattern analysis
 Identify fraudulent users and atypical usage patterns
 Illegal Customer account access
 Automatic Dial-out equipment
 Switch and route congestion patterns
10
 Multidimensional association and sequential pattern
analysis
 Usage patterns for a set of communication services by customer
group, time of day
 Sales Promotion
 Mobile Telecommunication Services
 Spatio-temporal data mining
 Use of visualization tools
Telecommunication Industry
11
Biomedical and DNA Data Analysis
 Research in DNA Analysis has led to
 Development of new drugs
 Cancer therapies
 Human genome study
 Discovery of genetic causes for many diseases
 Genome Research
 Study of DNA Sequences
 Adenine, Cytosine, Guanine, Thymine
 1,00,000 genes – each has hundreds of nucleotides – can be
combined in a number of ways
 Identifying Gene Sequence patterns is challenging
12
 Semantic Integration of Heterogeneous, distributed
genome databases
 Highly distributed generation and use of DNA data
 Integrated data warehouses and distributed federated databases
 Efficient Data Cleaning and Integration methods
 Similarity Search and Comparison among DNA
Sequences
 Gene sequences – isolated from healthy and diseased tissues
 Compare frequently occurring patterns in each class
 Help to identify the genetic factors of the disease and immune factors
 Non-numeric nature of data poses difficulties
Biomedical and DNA Data Analysis
13
 Association Analysis: Identification of co-occurring gene
sequences
 Diseases – triggered by a combination of genes acting together
 Association analysis helps to detect the kinds of genes that may
co-occur
 Study interactions and relationships between them
 Path Analysis: Linking genes to different stages of
disease development
 Different genes become active at different stages of the disease
 Develop drug interventions that target specific stages
Biomedical and DNA Data Analysis
14
 Visualization tools and genetic data analysis
 Complex Gene structures – Graphs, trees, Cuboids and
visualization tools
 Better Understanding and support interactive data
exploration
Biomedical and DNA Data Analysis
15
Intrusion Detection
 Intrusions
 Any set of actions that threaten the integrity, availability, or confidentiality of a
network resource
 Misuse detection: use patterns of well-known attacks to identify
intrusions
 Signatures – Must be updated
 Classification based on known intrusions
 E.g., three consecutive login failures: password guessing.
 Anomaly detection: use deviation from normal usage patterns to
identify intrusions
 Any significant deviations from the expected behavior are reported as possible
attacks
16
Intrusion Detection
 Data Mining Algorithms
 Misuse detection
 training data labeled – normal / intrusion
 Classifier can be used to detect known intrusions
 Classification algorithms, Association rule mining
 Anomaly detection
 Builds models of normal behavior and detects significant deviations
 Supervised – ‘normal’ training data
 Unsupervised – no information about training data
 Classification, clustering
17
Intrusion Detection
 Association and Correlation Analysis
 Finds relationships between system attributes describing the
network data
 Helps in selection of useful attributes
 Analysis of Stream data
 Transient and dynamic nature of intrusions
 An event maybe normal on its own but malicious when viewed as
a part of a sequence
 Distributed Data Mining
 Analysis of data from several locations
 Visualization and Querying tools
18
Data Mining in other Scientific Applications
 Old Scenario: Small, homogeneous data sets
 Formulate hypothesis, build model, evaluate results
 Current Scenario: High-dimensional data, stream data,
heterogeneous data (spatial, temporal)
 Collect and store data, mine for new hypotheses, confirm with
data or experimentation
 Vast amounts of data have been collected from Scientific
domains
 Climate and ecosystem modeling, Chemical engineering, fluid
dynamics, structural mechanics…
19
Other Scientific Applications
 Data Warehouses and data preprocessing
 Scientific applications – methods are needed for integrating
data from heterogeneous sources (Geospatial data
warehouse) and identifying events (Climate and Ecosystem
data)
 Mining complex data types
 Scientific data – Semi-structured and unstructured
 Multimedia and Spatial data
20
Other Scientific Applications
 Graph-based mining
 Labeled graphs – capture spatial, topological, geometric and
other relational characteristics present in scientific data
 Nodes – objects to be mined; edges – relationships
 Scalable and efficient mining methods are needed
 Visualization tools and domain specific knowledge
 High level GUIs and visualization tools are needed
 Integrated with existing domain-specific systems and database
systems
21
Issues in Data Mining
 Mining methodology and user interaction
 Mining different kinds of knowledge in databases
 Interactive mining of knowledge at multiple levels of abstraction
 Incorporation of background knowledge
 Data mining query languages and ad-hoc data mining
 Expression and visualization of data mining results
 Handling noise and incomplete data
 Pattern evaluation
22
Issues in Data Mining
 Issues relating to the diversity of data types
 Handling relational and complex types of data
 Mining information from heterogeneous databases and global
information systems (WWW)
 Performance and scalability
 Efficiency and scalability of data mining algorithms
 Parallel, distributed and incremental mining methods

More Related Content

PPT
What is Graph Database
PPTX
Gaurav web mining
PPTX
Data Protection by Design and Default for Learning Analytics
PPTX
FAIR principles and metrics for evaluation
PDF
Bmgt 311 chapter_5
PPTX
Developing and assessing FAIR digital resources
PPTX
Towards metrics to assess and encourage FAIRness
PDF
TEXT-MINING: BIG DATA ANALYTICS VOOR ONGESTRUCTUREERDE DATA - Big Data Expo 2019
What is Graph Database
Gaurav web mining
Data Protection by Design and Default for Learning Analytics
FAIR principles and metrics for evaluation
Bmgt 311 chapter_5
Developing and assessing FAIR digital resources
Towards metrics to assess and encourage FAIRness
TEXT-MINING: BIG DATA ANALYTICS VOOR ONGESTRUCTUREERDE DATA - Big Data Expo 2019

Similar to 1.3 applications, issues (20)

PPT
DM UNIT_5 ppt for btech final year students
PDF
Data Mining Appliction chapter 5.pdf
PPTX
Data Mining: Application and trends in data mining
PPTX
Data Mining: Application and trends in data mining
PPT
Introduction.ppt
PPT
Introduction
PPTX
Chap1-Introduction.pptx. Data Mining and introduction about it in a specified...
PDF
The Science Behind the Data_ A Deep Dive into Data Science.pdf
PPT
Data mining by_ashok
PDF
Data Mining for Big Data-Murat Yazıcı
PPT
Data mining 1
PPT
Data mining final year project in ludhiana
PPT
Data mining final year project in jalandhar
PPTX
Data warehouse and data mining
PPTX
Exploring Data Wealth: Data Mining Insights
PPT
Data mininng trends
PPTX
Data Mining Application and Trends
PDF
Data Mining
PPT
1328cvkdlgkdgjfdkjgjdfgdfkgdflgkgdfglkjgld8679 - Copy.ppt
PPT
Data Mining: Concepts and techniques: Chapter 13 trend
DM UNIT_5 ppt for btech final year students
Data Mining Appliction chapter 5.pdf
Data Mining: Application and trends in data mining
Data Mining: Application and trends in data mining
Introduction.ppt
Introduction
Chap1-Introduction.pptx. Data Mining and introduction about it in a specified...
The Science Behind the Data_ A Deep Dive into Data Science.pdf
Data mining by_ashok
Data Mining for Big Data-Murat Yazıcı
Data mining 1
Data mining final year project in ludhiana
Data mining final year project in jalandhar
Data warehouse and data mining
Exploring Data Wealth: Data Mining Insights
Data mininng trends
Data Mining Application and Trends
Data Mining
1328cvkdlgkdgjfdkjgjdfgdfkgdflgkgdfglkjgld8679 - Copy.ppt
Data Mining: Concepts and techniques: Chapter 13 trend
Ad

More from Rajendran (20)

PPT
Element distinctness lower bounds
PPT
Scheduling with Startup and Holding Costs
PPT
Divide and conquer surfing lower bounds
PPT
Red black tree
PPT
Hash table
PPT
Medians and order statistics
PPT
Proof master theorem
PPT
Recursion tree method
PPT
Recurrence theorem
PPT
Master method
PPT
Master method theorem
PPT
Hash tables
PPT
Lower bound
PPT
Master method theorem
PPT
Greedy algorithms
PPT
Longest common subsequences in Algorithm Analysis
PPT
Dynamic programming in Algorithm Analysis
PPT
Average case Analysis of Quicksort
PPT
Np completeness
PPT
computer languages
Element distinctness lower bounds
Scheduling with Startup and Holding Costs
Divide and conquer surfing lower bounds
Red black tree
Hash table
Medians and order statistics
Proof master theorem
Recursion tree method
Recurrence theorem
Master method
Master method theorem
Hash tables
Lower bound
Master method theorem
Greedy algorithms
Longest common subsequences in Algorithm Analysis
Dynamic programming in Algorithm Analysis
Average case Analysis of Quicksort
Np completeness
computer languages
Ad

Recently uploaded (20)

PPTX
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
PDF
FourierSeries-QuestionsWithAnswers(Part-A).pdf
PDF
OBE - B.A.(HON'S) IN INTERIOR ARCHITECTURE -Ar.MOHIUDDIN.pdf
PDF
O7-L3 Supply Chain Operations - ICLT Program
PPTX
202450812 BayCHI UCSC-SV 20250812 v17.pptx
PPTX
master seminar digital applications in india
PPTX
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
PDF
Module 4: Burden of Disease Tutorial Slides S2 2025
PDF
A systematic review of self-coping strategies used by university students to ...
PDF
Weekly quiz Compilation Jan -July 25.pdf
PDF
01-Introduction-to-Information-Management.pdf
PPTX
Pharma ospi slides which help in ospi learning
PPTX
Pharmacology of Heart Failure /Pharmacotherapy of CHF
PPTX
Tissue processing ( HISTOPATHOLOGICAL TECHNIQUE
DOC
Soft-furnishing-By-Architect-A.F.M.Mohiuddin-Akhand.doc
PPTX
Introduction-to-Literarature-and-Literary-Studies-week-Prelim-coverage.pptx
PDF
VCE English Exam - Section C Student Revision Booklet
PDF
Classroom Observation Tools for Teachers
PDF
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
PPTX
Orientation - ARALprogram of Deped to the Parents.pptx
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
FourierSeries-QuestionsWithAnswers(Part-A).pdf
OBE - B.A.(HON'S) IN INTERIOR ARCHITECTURE -Ar.MOHIUDDIN.pdf
O7-L3 Supply Chain Operations - ICLT Program
202450812 BayCHI UCSC-SV 20250812 v17.pptx
master seminar digital applications in india
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
Module 4: Burden of Disease Tutorial Slides S2 2025
A systematic review of self-coping strategies used by university students to ...
Weekly quiz Compilation Jan -July 25.pdf
01-Introduction-to-Information-Management.pdf
Pharma ospi slides which help in ospi learning
Pharmacology of Heart Failure /Pharmacotherapy of CHF
Tissue processing ( HISTOPATHOLOGICAL TECHNIQUE
Soft-furnishing-By-Architect-A.F.M.Mohiuddin-Akhand.doc
Introduction-to-Literarature-and-Literary-Studies-week-Prelim-coverage.pptx
VCE English Exam - Section C Student Revision Booklet
Classroom Observation Tools for Teachers
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
Orientation - ARALprogram of Deped to the Parents.pptx

1.3 applications, issues

  • 1. 1 Applications of Data Mining Issues in Data Mining
  • 2. 2 Applications  Financial Data Analysis  Retail Industry  Telecommunication Industry  Biological Data Analysis  Other Scientific Applications  Intrusion Detection
  • 3. 3 Financial Data Analysis  Financial Data  Collected from Banks and Financial Institutions  Usually complete and reliable  Design and Construction of data Warehouses for multi- dimensional data analysis and mining  Analysis – Changes by month, by region, by sector…and max, min, total, average, trend etc.  Characteristic and Comparative analysis, Outlier Analysis
  • 4. 4  Loan payment and customer credit policy analysis  Feature Selection and attribute relevance ranking (Debt ratio, credit history, income, education level …)  Loan granting policy can be adjusted  Low risk Customers are granted loans  Classification and Clustering of customers for targeted marketing  Customer group identification  Multidimensional clustering techniques  Can associate new customer with existing groups Financial Data Analysis
  • 5. 5  Detection of money laundering and financial crimes  Data from several sources – integrated  Data Analysis tools can be used to detect unusual patterns  Data Visualization tools, Linkage Analysis tools  Classification tools, Clustering tools  Outlier Analysis tools Financial Data Analysis
  • 6. 6 Retail Industry  Sales Data, Customer Shopping history, Goods Transportation, E-Commerce  Mining can help to  Identify buying behaviour, discover shopping trends  Improve the quality of customer service, retain customers  Design and Construction of data warehouses  Several ways to design a warehouse  Entities involved: Sales, Customers, Employers, Goods transportation…  Preliminary data mining exercises can help to guide the design process  Dimensions and levels to involve and pre-processing to be done
  • 7. 7  Multi-dimensional analysis of sales, customers, products, time and region  Multi-feature data cube  Visualization tools  Analysis of effectiveness of sales campaigns  Compare sales and transaction volume  Multidimensional analysis  Compare sales amount, number of transactions containing same items before and after the campaign  Association Analysis  Identify items likely to be purchased together Retail Industry
  • 8. 8  Customer Retention  Customer loyalty and trends  Sequential pattern mining  Adjust pricing strategy and goods range  Purchase recommendation and cross-reference of items  Recommender Systems  Sales promotion by displaying deal information in association with items of interest Retail Industry
  • 9. 9 Telecommunication Industry  Computer and Web data transmission, fax, Mobile phone, Telephone services  Multidimensional analysis of telecommunication data  Helps to identify and compare the data traffic, System work load, Resource usage, User Group Behavior, Profit..  Time-of-day usage patterns  Fraudulent pattern analysis  Identify fraudulent users and atypical usage patterns  Illegal Customer account access  Automatic Dial-out equipment  Switch and route congestion patterns
  • 10. 10  Multidimensional association and sequential pattern analysis  Usage patterns for a set of communication services by customer group, time of day  Sales Promotion  Mobile Telecommunication Services  Spatio-temporal data mining  Use of visualization tools Telecommunication Industry
  • 11. 11 Biomedical and DNA Data Analysis  Research in DNA Analysis has led to  Development of new drugs  Cancer therapies  Human genome study  Discovery of genetic causes for many diseases  Genome Research  Study of DNA Sequences  Adenine, Cytosine, Guanine, Thymine  1,00,000 genes – each has hundreds of nucleotides – can be combined in a number of ways  Identifying Gene Sequence patterns is challenging
  • 12. 12  Semantic Integration of Heterogeneous, distributed genome databases  Highly distributed generation and use of DNA data  Integrated data warehouses and distributed federated databases  Efficient Data Cleaning and Integration methods  Similarity Search and Comparison among DNA Sequences  Gene sequences – isolated from healthy and diseased tissues  Compare frequently occurring patterns in each class  Help to identify the genetic factors of the disease and immune factors  Non-numeric nature of data poses difficulties Biomedical and DNA Data Analysis
  • 13. 13  Association Analysis: Identification of co-occurring gene sequences  Diseases – triggered by a combination of genes acting together  Association analysis helps to detect the kinds of genes that may co-occur  Study interactions and relationships between them  Path Analysis: Linking genes to different stages of disease development  Different genes become active at different stages of the disease  Develop drug interventions that target specific stages Biomedical and DNA Data Analysis
  • 14. 14  Visualization tools and genetic data analysis  Complex Gene structures – Graphs, trees, Cuboids and visualization tools  Better Understanding and support interactive data exploration Biomedical and DNA Data Analysis
  • 15. 15 Intrusion Detection  Intrusions  Any set of actions that threaten the integrity, availability, or confidentiality of a network resource  Misuse detection: use patterns of well-known attacks to identify intrusions  Signatures – Must be updated  Classification based on known intrusions  E.g., three consecutive login failures: password guessing.  Anomaly detection: use deviation from normal usage patterns to identify intrusions  Any significant deviations from the expected behavior are reported as possible attacks
  • 16. 16 Intrusion Detection  Data Mining Algorithms  Misuse detection  training data labeled – normal / intrusion  Classifier can be used to detect known intrusions  Classification algorithms, Association rule mining  Anomaly detection  Builds models of normal behavior and detects significant deviations  Supervised – ‘normal’ training data  Unsupervised – no information about training data  Classification, clustering
  • 17. 17 Intrusion Detection  Association and Correlation Analysis  Finds relationships between system attributes describing the network data  Helps in selection of useful attributes  Analysis of Stream data  Transient and dynamic nature of intrusions  An event maybe normal on its own but malicious when viewed as a part of a sequence  Distributed Data Mining  Analysis of data from several locations  Visualization and Querying tools
  • 18. 18 Data Mining in other Scientific Applications  Old Scenario: Small, homogeneous data sets  Formulate hypothesis, build model, evaluate results  Current Scenario: High-dimensional data, stream data, heterogeneous data (spatial, temporal)  Collect and store data, mine for new hypotheses, confirm with data or experimentation  Vast amounts of data have been collected from Scientific domains  Climate and ecosystem modeling, Chemical engineering, fluid dynamics, structural mechanics…
  • 19. 19 Other Scientific Applications  Data Warehouses and data preprocessing  Scientific applications – methods are needed for integrating data from heterogeneous sources (Geospatial data warehouse) and identifying events (Climate and Ecosystem data)  Mining complex data types  Scientific data – Semi-structured and unstructured  Multimedia and Spatial data
  • 20. 20 Other Scientific Applications  Graph-based mining  Labeled graphs – capture spatial, topological, geometric and other relational characteristics present in scientific data  Nodes – objects to be mined; edges – relationships  Scalable and efficient mining methods are needed  Visualization tools and domain specific knowledge  High level GUIs and visualization tools are needed  Integrated with existing domain-specific systems and database systems
  • 21. 21 Issues in Data Mining  Mining methodology and user interaction  Mining different kinds of knowledge in databases  Interactive mining of knowledge at multiple levels of abstraction  Incorporation of background knowledge  Data mining query languages and ad-hoc data mining  Expression and visualization of data mining results  Handling noise and incomplete data  Pattern evaluation
  • 22. 22 Issues in Data Mining  Issues relating to the diversity of data types  Handling relational and complex types of data  Mining information from heterogeneous databases and global information systems (WWW)  Performance and scalability  Efficiency and scalability of data mining algorithms  Parallel, distributed and incremental mining methods