SlideShare a Scribd company logo
Outline
                                                  Data Mining
                                                        and
  „We are drowning in data, but we are starving for knowledge“
                                          Knowledge Discovery
  Part 2: Clustering                         in Large Databases
          - Hierarchical Clustering
          - Divisive Clustering
          - Density based Clustering



                                              Erik Kropat
                                       University of the Bundeswehr
                                            Munich, Germany
Why “Data Mining”?
• Companies are collecting massive amounts of data on customers,
  operations, and the competitive landscape.

        Firms can gain a competitive advantage from these data


• But, there is far too much data
   − Online shops record purchase behaviours for millions of customers
     (sometimes with hundreds features for each customer)
   − Phone companies keep info on 100’s of millions of accounts
     (each with thousands of transactions)
   − Databases can often be hundreds of terabytes in size
     (this will be peanuts in the future).
Why “Data Mining”?

     „We are drowning in data, but we are starving for knowledge“
                                                          (John Naisbitt)
Knowledge Discovery in Large Databases

      Process of finding valuable and useful patterns in datasets
Analysis of data sets from …
•   businesses & investments
•   finance & economics
•   science & technology
•   bioinformatics
• telecommunication



                               … or more complex data sets
                               • multimedia & sound
                               • images & video
                               • automatic news analysis
                               • social media analysis.
What are the data sources?
Consumer data
−   Credit card transactions data
−   Supermarket transactions data
−   Loyalty cards
−   Web server logs
−   Social media

                                    Variety of features
                                    − Name and address
                                    − History of shopping and purchases
                                    − Demographics
                                    − Credit rating
                                    − Quality & market share of products
Business Intelligence ‒
Customer Data Analytics & Market Analysis

  −   customer segmentation
  −   market basket analysis
  −   target marketing
  −   geo-marketing
  −   cross-selling / up-selling
  −   customer relation management
Market Basket Analysis ‒ Cross Selling
Key Tasks
               Decision Trees


              Assocation Rule
                 Learning

             Neural Networks


             Digital Forensics

            Automatic Derivation
               of Ontologies
Retail
• Customer segmentation
   Identify purchase patterns of „typical“ customers
   Targeted advertisement, costumized pricing, cost-effective promotions

• Market basket analysis
   Identify the purchase behaviour of groups of customers

• Sales promotions
   Identify likely responders to sales promotions
Banking

• Credit rating
   Given a large number names, which persons are likely
   to default on their credit cards?

• Fraud detection
   −   Credit card fraud detection
   −   Network intrusion detection
Telecommunications
Companies are facing an escalating competition and are forced to
aggressively market special pricing programs aimed at retaining
existing customers and attracting new ones.

• Call detail record analysis
     Identify customer segments with similar use patterns.
     Offer attractive pricing and feature promotions.

• Customer loyalty / customer churn management
     Some customers repeatedly „churn“ (switch providers).
     Identify those who are likely to switch or who are likely to remain loyal.
     Companies can target their spending on customers who will produce the most profit.

• Set pricing strategies in a highly competitive market.
Big Data is Big Business
Companies are using their data sets to aim their services
and products with increasing precision.


Business Intelligence
  −   SAP AG is a German global software corporation
      that provides enterprise software applications.
  −   SAP AG is one of the largest enterprise software companies.

  −   In October 2007, SAP AG announced a $6.8 billion deal to acquire „Business Objects“.
  −   Since 2009 „Business Objects“ is a division of SAP AG instead of a separate company.
Outline
Outline

  Part 1: Introduction                  Part 4: Classification
          - What is „Data Mining“ ?             - k-th Nearest Neighbors
          - Examples                            - Support Vector Machines

  Part 2: Formal Concept Analysis       Part 5: Spatial Data Mining
          - Contexts and Concepts                - DBSCAN
          - Concept Lattices                     - Density & Connectivity

  Part 3: Clustering                    Part 6: Regulatory Networks
          -   Hierarchical Clustering            - Eco-Finance Networks
          -   Partitional Clustering             - Gene-Environment Networks
          -   Fuzzy Clustering
          -   Graph Based Clustering
Questions ?

              For more information after today
                Email me at   Erik.Kropat@unibw.de

More Related Content

PDF
Data Mining methodology
PPT
Data mining - GDi Techno Solutions
PPTX
Knowledge Discovery and Data Mining
PDF
Fundamentals of data mining and its applications
PPT
Introduction
PDF
Introduction to Data Mining
PPTX
Datamining - On What Kind of Data
PDF
Introduction to Data Mining for Newbies
Data Mining methodology
Data mining - GDi Techno Solutions
Knowledge Discovery and Data Mining
Fundamentals of data mining and its applications
Introduction
Introduction to Data Mining
Datamining - On What Kind of Data
Introduction to Data Mining for Newbies

What's hot (19)

PPT
Knowledge Discovery Using Data Mining
PPTX
Data Mining on Twitter
PPT
Introduction data mining
PPTX
Data Mining: an Introduction
DOC
Data Mining
PDF
Data mining
PPTX
Knowledge Discovery in Databases
PPT
Introduction To Data Mining
PPTX
Additional themes of data mining for Msc CS
PPTX
Data mining , Knowledge Discovery Process, Classification
PDF
Ch 1 intro_dw
PDF
Data mining (lecture 1 & 2) conecpts and techniques
DOCX
knowledge discovery and data mining approach in databases (2)
PPT
Data mining
PPT
Introduction to Data Mining
PPTX
Data mining and knowledge discovery
PPT
Knowledge discovery thru data mining
PPT
A Practical Approach To Data Mining Presentation
PPT
Introduction-to-Knowledge Discovery in Database
Knowledge Discovery Using Data Mining
Data Mining on Twitter
Introduction data mining
Data Mining: an Introduction
Data Mining
Data mining
Knowledge Discovery in Databases
Introduction To Data Mining
Additional themes of data mining for Msc CS
Data mining , Knowledge Discovery Process, Classification
Ch 1 intro_dw
Data mining (lecture 1 & 2) conecpts and techniques
knowledge discovery and data mining approach in databases (2)
Data mining
Introduction to Data Mining
Data mining and knowledge discovery
Knowledge discovery thru data mining
A Practical Approach To Data Mining Presentation
Introduction-to-Knowledge Discovery in Database
Ad

Viewers also liked (14)

PDF
Study guide in theoretical data mining
PPT
Organizational+behavior
PPTX
Market Basket Analysis in SAS
PPT
Ch 1 organisational behaviour
PPTX
Entrepreneurship & small business management
PPT
Entrepreneurship And Business Management
PPT
Data mining
PPTX
Introduction to Small Business management
PPT
Organizational behaviour
PPTX
Fundamentals of organizational behavior ppt
PDF
Organisational behavior
PPT
Basic Concepts of Organisational Behaviour
PPT
Data Mining Concepts
PPTX
Data mining
Study guide in theoretical data mining
Organizational+behavior
Market Basket Analysis in SAS
Ch 1 organisational behaviour
Entrepreneurship & small business management
Entrepreneurship And Business Management
Data mining
Introduction to Small Business management
Organizational behaviour
Fundamentals of organizational behavior ppt
Organisational behavior
Basic Concepts of Organisational Behaviour
Data Mining Concepts
Data mining
Ad

Similar to Data Mining and Knowledge Discovery in Large Databases (20)

PPT
Data miningppt378
PPTX
Data mining concepts
PPT
introduction to data mining applications
PPTX
Data mining & Decison Trees
PPT
Dma unit 1
PDF
Understanding big data and data analytics - Data Mining
PDF
DATA MINING WITH CLUSTERING ON BIG DATA FOR SHOPPING MALL’S DATASET
PDF
Today's bi and data mining ecosystem v2
PDF
Data mining & column stores
PPT
Data Mining Overview
PDF
Turning Big Data to Business Advantage
PDF
Today's BI and Data Mining ecosystem
PPTX
Data mining
PPTX
Data mining-basic
PDF
Data mining 1 - Introduction (cheat sheet - printable)
DOCX
Mayer_R_212017705
PPTX
Lect 1 introduction
PPTX
Chapter 1 - Introduction to Data Mining Concepts and Techniques.pptx
PPTX
Data Mining Presentation for College Harsh.pptx
Data miningppt378
Data mining concepts
introduction to data mining applications
Data mining & Decison Trees
Dma unit 1
Understanding big data and data analytics - Data Mining
DATA MINING WITH CLUSTERING ON BIG DATA FOR SHOPPING MALL’S DATASET
Today's bi and data mining ecosystem v2
Data mining & column stores
Data Mining Overview
Turning Big Data to Business Advantage
Today's BI and Data Mining ecosystem
Data mining
Data mining-basic
Data mining 1 - Introduction (cheat sheet - printable)
Mayer_R_212017705
Lect 1 introduction
Chapter 1 - Introduction to Data Mining Concepts and Techniques.pptx
Data Mining Presentation for College Harsh.pptx

More from SSA KPI (20)

PDF
Germany presentation
PDF
Grand challenges in energy
PDF
Engineering role in sustainability
PDF
Consensus and interaction on a long term strategy for sustainable development
PDF
Competences in sustainability in engineering education
PDF
Introducatio SD for enginers
PPT
DAAD-10.11.2011
PDF
Talking with money
PDF
'Green' startup investment
PDF
From Huygens odd sympathy to the energy Huygens' extraction from the sea waves
PDF
Dynamics of dice games
PPT
Energy Security Costs
PPT
Naturally Occurring Radioactivity (NOR) in natural and anthropic environments
PDF
Advanced energy technology for sustainable development. Part 5
PDF
Advanced energy technology for sustainable development. Part 4
PDF
Advanced energy technology for sustainable development. Part 3
PDF
Advanced energy technology for sustainable development. Part 2
PDF
Advanced energy technology for sustainable development. Part 1
PPT
Fluorescent proteins in current biology
PPTX
Neurotransmitter systems of the brain and their functions
Germany presentation
Grand challenges in energy
Engineering role in sustainability
Consensus and interaction on a long term strategy for sustainable development
Competences in sustainability in engineering education
Introducatio SD for enginers
DAAD-10.11.2011
Talking with money
'Green' startup investment
From Huygens odd sympathy to the energy Huygens' extraction from the sea waves
Dynamics of dice games
Energy Security Costs
Naturally Occurring Radioactivity (NOR) in natural and anthropic environments
Advanced energy technology for sustainable development. Part 5
Advanced energy technology for sustainable development. Part 4
Advanced energy technology for sustainable development. Part 3
Advanced energy technology for sustainable development. Part 2
Advanced energy technology for sustainable development. Part 1
Fluorescent proteins in current biology
Neurotransmitter systems of the brain and their functions

Recently uploaded (20)

PDF
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PPTX
Renaissance Architecture: A Journey from Faith to Humanism
PDF
01-Introduction-to-Information-Management.pdf
PDF
RMMM.pdf make it easy to upload and study
PDF
Pre independence Education in Inndia.pdf
PDF
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
PDF
Sports Quiz easy sports quiz sports quiz
PPTX
Microbial diseases, their pathogenesis and prophylaxis
PPTX
GDM (1) (1).pptx small presentation for students
PDF
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
PDF
Microbial disease of the cardiovascular and lymphatic systems
PPTX
Lesson notes of climatology university.
PPTX
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
PDF
O7-L3 Supply Chain Operations - ICLT Program
PPTX
master seminar digital applications in india
PDF
Basic Mud Logging Guide for educational purpose
PPTX
Pharma ospi slides which help in ospi learning
PDF
102 student loan defaulters named and shamed – Is someone you know on the list?
PPTX
human mycosis Human fungal infections are called human mycosis..pptx
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
Final Presentation General Medicine 03-08-2024.pptx
Renaissance Architecture: A Journey from Faith to Humanism
01-Introduction-to-Information-Management.pdf
RMMM.pdf make it easy to upload and study
Pre independence Education in Inndia.pdf
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
Sports Quiz easy sports quiz sports quiz
Microbial diseases, their pathogenesis and prophylaxis
GDM (1) (1).pptx small presentation for students
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
Microbial disease of the cardiovascular and lymphatic systems
Lesson notes of climatology university.
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
O7-L3 Supply Chain Operations - ICLT Program
master seminar digital applications in india
Basic Mud Logging Guide for educational purpose
Pharma ospi slides which help in ospi learning
102 student loan defaulters named and shamed – Is someone you know on the list?
human mycosis Human fungal infections are called human mycosis..pptx

Data Mining and Knowledge Discovery in Large Databases

  • 1. Outline Data Mining and „We are drowning in data, but we are starving for knowledge“ Knowledge Discovery Part 2: Clustering in Large Databases - Hierarchical Clustering - Divisive Clustering - Density based Clustering Erik Kropat University of the Bundeswehr Munich, Germany
  • 2. Why “Data Mining”? • Companies are collecting massive amounts of data on customers, operations, and the competitive landscape. Firms can gain a competitive advantage from these data • But, there is far too much data − Online shops record purchase behaviours for millions of customers (sometimes with hundreds features for each customer) − Phone companies keep info on 100’s of millions of accounts (each with thousands of transactions) − Databases can often be hundreds of terabytes in size (this will be peanuts in the future).
  • 3. Why “Data Mining”? „We are drowning in data, but we are starving for knowledge“ (John Naisbitt)
  • 4. Knowledge Discovery in Large Databases Process of finding valuable and useful patterns in datasets
  • 5. Analysis of data sets from … • businesses & investments • finance & economics • science & technology • bioinformatics • telecommunication … or more complex data sets • multimedia & sound • images & video • automatic news analysis • social media analysis.
  • 6. What are the data sources? Consumer data − Credit card transactions data − Supermarket transactions data − Loyalty cards − Web server logs − Social media Variety of features − Name and address − History of shopping and purchases − Demographics − Credit rating − Quality & market share of products
  • 7. Business Intelligence ‒ Customer Data Analytics & Market Analysis − customer segmentation − market basket analysis − target marketing − geo-marketing − cross-selling / up-selling − customer relation management
  • 8. Market Basket Analysis ‒ Cross Selling
  • 9. Key Tasks Decision Trees Assocation Rule Learning Neural Networks Digital Forensics Automatic Derivation of Ontologies
  • 10. Retail • Customer segmentation Identify purchase patterns of „typical“ customers Targeted advertisement, costumized pricing, cost-effective promotions • Market basket analysis Identify the purchase behaviour of groups of customers • Sales promotions Identify likely responders to sales promotions
  • 11. Banking • Credit rating Given a large number names, which persons are likely to default on their credit cards? • Fraud detection − Credit card fraud detection − Network intrusion detection
  • 12. Telecommunications Companies are facing an escalating competition and are forced to aggressively market special pricing programs aimed at retaining existing customers and attracting new ones. • Call detail record analysis Identify customer segments with similar use patterns. Offer attractive pricing and feature promotions. • Customer loyalty / customer churn management Some customers repeatedly „churn“ (switch providers). Identify those who are likely to switch or who are likely to remain loyal. Companies can target their spending on customers who will produce the most profit. • Set pricing strategies in a highly competitive market.
  • 13. Big Data is Big Business Companies are using their data sets to aim their services and products with increasing precision. Business Intelligence − SAP AG is a German global software corporation that provides enterprise software applications. − SAP AG is one of the largest enterprise software companies. − In October 2007, SAP AG announced a $6.8 billion deal to acquire „Business Objects“. − Since 2009 „Business Objects“ is a division of SAP AG instead of a separate company.
  • 15. Outline Part 1: Introduction Part 4: Classification - What is „Data Mining“ ? - k-th Nearest Neighbors - Examples - Support Vector Machines Part 2: Formal Concept Analysis Part 5: Spatial Data Mining - Contexts and Concepts - DBSCAN - Concept Lattices - Density & Connectivity Part 3: Clustering Part 6: Regulatory Networks - Hierarchical Clustering - Eco-Finance Networks - Partitional Clustering - Gene-Environment Networks - Fuzzy Clustering - Graph Based Clustering
  • 16. Questions ? For more information after today Email me at Erik.Kropat@unibw.de