SlideShare a Scribd company logo
STUDIERENUND DURCHSTARTEN.Author I:	Dip.-Inf. (FH) Johannes HoppeAuthor II:	M.Sc. Johannes HofmeisterAuthor III:	Prof. Dr. Dieter HomeisterDate:	01.04.201108.04.201115.04.2011
Data Mining AppliedAuthor I:	Dip.-Inf. (FH) Johannes HoppeAuthor II:	M.Sc. Johannes HofmeisterAuthor III:	Prof. Dr. Dieter HomeisterDate:	01.04.201108.04.201115.04.2011
01Applications of Data Mining3
Applicationsof Data Mining4
Applicationsof Data MiningApplications of Data MiningDatabase Marketing Time-series prediction, detecting "trends" Detection (of whatever is detectable)Probability Estimation Information compression Sensitivity Analysis 5
Applicationsof Data MiningDatabase Marketing(1/2)Response modelingModel for the response of specific customers. Systematic selection of (old and potential) customers. Advertisements and promotion based on these results. ( CRM)Visualization: "Lift chart" shows how successful the selection should be. (later topic: DM validation)6
Lift Chart Example“For contacting 10% of customers, using no model we should get 10% of responders and using the given model we should get 30% of responders.”7
Applicationsof Data MiningDatabase Marketing(2/2)Cross selling: Selling additional products to existing customersQuestion: Which customer might buy which other product?Uses historical purchase data Uses credit card information, lifestyle data, demographic data, etc. Other possible information: Did the customer query special information? How customer heard of the company? 8
Applicationsof Data MiningDatabase Marketing(2/2)Cross selling: Selling additional products to existing customersResults for direct marketing, mailing lists, direct advertising (Amazon) Amazon: "Customers who bought this item also bought" and "personalized recommendations" 9
Applicationsof Data MiningTime-series predictionTime series: Stock prices, market shares, … Extrapolation of future values Detection of newly arising trends like customer movements to other productsOwn experience: German print magazines 10
Applicationsof Data MiningDetectionIdentification of existence or occurrence of a condition Fraud detection: Identifying patterns/criteria to detect credit card fraud Estimating creditworthiness ( German Schufa) Prediction of mail orders that will not be paid 11
Applicationsof Data MiningDetectionIdentification of existence or occurrence of a condition Intrusion detection (in computer networks) Find patterns that indicate when an attackis made on an network e.g. clustering: small clusters are of high interest,they point to unusual cases.Definition of Classes may be useful:e.g. harmless, possible harmful,harmful, immediately close LAN 12
Applicationsof Data MiningDetectionIdentification of existence or occurrence of a condition  Typical difficultiesNeeds knowledgeDM costs Cost of missing a fraud Cost of false positives(e.g. falsely accusing someone of fraud, company image problems)13
Applicationsof Data MiningProbability EstimationApproximate the likelihood of an event given an observation e.g. for classify a potential customer into an A,B,C range before any business14
Applicationsof Data MiningInformation CompressionCan be viewed as a special type of estimation problem. For a given set of data, estimate the key components that be can be used to construct the data. 15
Applicationsof Data MiningSensitivity AnalysisUnderstand how changes in one variable affect others. Identify sensitivity of one variable on another(find out if dependencies exist). 16
02Data Mining Algorithms17
Data Mining AlgorithmsData Mining AlgorithmsDifferent algorithms, different usesCombinedThe algorithm depends on what you want to doNot every algorithm is suited for what you want to do18
Data Mining AlgorithmsAlgorithms in SSAS: GroupsClassification algorithmsRegression algorithmsAssociation algorithmsSegmentation algorithmsSequence analysis algorithmsPlug-In algorithms19
Data Mining AlgorithmsClassification algorithmsPredict discrete attributesBased on experience valuesAlgorithms in SSAS:Naive BayesDecision TreesNeural Networks20
Data Mining AlgorithmsRegression algorithmsPredict continuous attributesThe same as classification algorithmsAlgorithms in SSASLinear Regression (Line)Logistic Regression (Curve)MS Time Series21
Data Mining AlgorithmsAssociation algorithmsPredict likely combinationsFind elements that occur in combinationAlgorithms in SSAS:MS Associtation Algorithm (Apriori)22
Data Mining AlgorithmsSegmentation algorithmsAlso called „Clustering algorithms“Groups data with similar propertiesAlgorithms in SSAS:MS Clustering Algorithms (e.g. K-Means)23
Data Mining AlgorithmsSequence analysis algorithms…are clustering algorithmsConsider the sorting; the sequence of values while clusteringDoes not group by similar propertiesGroups by similar sequencesAlgorithms in SSAS: MS Sequence Clustering24
Data Mining AlgorithmsPlug-In algorithms.NET Wrapper for COM objectsUse ANY algorithmProvided as an assembly(possible workshop to create one)25
03Repetition - Datatypes, Contentypes26
Repetition - Datatypes, ContentypesApplying anAlgorithmDatatypesContenttypes27
Repetition - Datatypes, ContentypesDatatypesDefinethestructure of thevaluesAvailabledatatypes:TextLongBooleanDoubleDate28
Repetition - Datatypes, ContentypesContenttypesDefinethebehaviour of valuesDiscreteContinuousDiscretizedKeyKey SequenceKey TimeOrderedCyclical29
Repetition - Datatypes, ContentypesContenttype: DiscreteFixed set of valuesExample:Commute Distance: 1-2, 2-5, 5-10Region: Pacific, Northern America, EuropeName: … … …Boolean values are always discreteText is most likely discrete30
Repetition - Datatypes, ContentypesContenttype: ContinuousUnlimited set of valuesInfinite items possibleExampleIncomeAgeDifference between Continuous and Discrete is the most important one31
Repetition - Datatypes, ContentypesContenttype: DiscretizedContinuousvaluesconvertedintodiscretevaluesExamples:Income to Categories:A, B, C, …Age to groups:0-20,21-30, 31-40, …32
Repetition - Datatypes, ContentypesContenttype: KeyKeyUniquely identifies a rowKey Sequence (sequence clustering models)Series of eventsSortedKey Time (time series models)Identify values on a time scale33
Repetition - Datatypes, ContentypesContenttype: OrderedDiscretevaluesthathave a sorting orderNodistancesvisibleNorelationsvisible„One Star“ to „Five Stars“34
Repetition - Datatypes, ContentypesContenttype: CyclicalDiscretevaluesthathave a cyclicalsorting orderExample:Weekdays: Monday, Tuesday, … Sunday, Monday, …	1,2,3, …,7, 1, …Months	Jan, Feb, Mar, … , Dec, Jan, …	1, 2, 3, …, 12, 1, …35
Available Combinations36
04Data Mining Algorithms - Decision Trees37
Applied Data Mining - Decision Trees38
Applied Data Mining - Decision TreesIn GeneralAlso known as: Classification TreesGoal: Sequentially partition DataCan detect non-linear relationshipsMachine Learning TechniqueSeparate into Training and Testing setTraining set is created to create model based on certain criteriaTest set is used to verify the model39
Applied Data Mining - Decision TreesTree for response of a mailing actionIncome > $30 000: 3,6 %Male 3,2%(Total: 4.677)Income < $30 000: 2,3 %2,6 % respose rate(Total: 10.000 persons)Age > 40: 3,8%Female 2,1%(Total: 5,323)Age < 40: 3,2 %40
Applied Data Mining - Decision TreesUsingtheTrainedTreeExample: the management decides to mail only to groups with response rate >3.5%. TrainedTreeMales: $30 000Response Rate: > 3,5 %Female: 40+41
Applied Data Mining - Decision TreesProsVery flexible, white box ModelKiss – Keep it simple, stupid!Little preparation and resources neededConsCan be tuned until deathLong time to buildRequires wisely selected training data!False training yields false resultsBig tree might require disk swapping(Computation might be difficult if it does not fit into main memory.) 42
Project: “DMDW Mining Test”43
Project: “DMDW Mining Test”(explanation of one note)44
Project: “DMDW Mining Test”(shows connections, more useful if there are more predictable values)
Project: “DMDW Mining Test”(Generic Content Tree Viewer  DMX (Data Mining Extensions))
ReferencesReferences for Decisions TreesOlivia Parr Rud et. al, Data Mining Cookbook - Modeling Data for Marketing, Risk, and Customer Relationship Management, Wiley, 2001David A. Grossman, Ophir Frieder: Introductionto Data Mining, Illinois Institute of Technology 2005Andrew W. Moore: DecisionTrees, Carnegie Mellon University, http://www.autonlab.org/tutorials/dtree16.pdfNongYe (ed.): The Handbook of Data Mining, Lawrence Erlbaum Associates, 2003Sushimita Mitra, TinkuAcharya, Data Mining - Multimedia, Soft Computing andBioinformatics, Wiley, 2003http://en.wikipedia.org/wiki/Classification_tree47
05Data Mining Algorithms - Clustering48
Data Mining Algorithms - ClusteringX1249
Data Mining Algorithms - ClusteringClusteringSegmentation AlgorithmFind homogenous groups within setFind similar variables for different casesIdentify new relationships that were unclear before(heuristics)e.g. „Person who rides a bike to work doesn‘t live far from his workplace“ (this is not obvious)50
51HomogeneousSubsetsIndependent VariablesDescription of classclassifyidentifyX12
52HomogeneousSubsetsIndependent VariablesDescription of class1. Clustering2. ClassificationclassifyidentifyX12
Clustering1. ClusteringReducesdatatoclasses of equaltypesBecomefriedswiththedataIterative AlgorithmClusteringValidateClassifyApplyhttp://msdn.microsoft.com/en-us/library/ms174879.aspx53
Data Mining Algorithms - Clustering2. ClassificationCreate a Description of a groupGive it a „name“Also: Characterization54
ProcessStart with random valuesReuse will create different sets and different groupsDifferent clustering technique / algorithm will create different groupReuse on same dataset, reseedExpert evaluate found classes and plausibility Good classes used for predictionsGood?1. ClusteringEvaluate, Check2. ClassifyApply(Predict)55
ClusteringMS Clustering AlgorithmCombination of two algorithmsK-Means – Hard! Datapoint can be in only one clusterExpectation Maximization – SoftDatapoint has different combinationsDatapoint belongs to different clustersProbability is calculated56Source: http://msdn.microsoft.com/en-us/library/cc280445.aspx
Clustering57ProsNo predictable variable to chooseTrains itself without much effortEasy to configure„Cons“Interpretation is everythingGood eye neededExpert has to check for plausibility
Project: “DMDW Mining Test”(strongest relations only, amount of matching cases for Region Europe)
Project: “DMDW Mining Test”(good to know: continuous attributes are shown by there arithmetic  average)
Project: “DMDW Mining Test”(comparing two clusters)
THANK YOUFOR YOUR ATTENTION61

More Related Content

PPTX
DMDW Lesson 08 - Further Data Mining Algorithms
PPTX
DMDW Lesson 04 - Data Mining Theory
PPTX
Data Mining: Mining stream time series and sequence data
PPTX
Summer Training Project On Data Structure & Algorithms
PDF
HRUG - Linear regression with R
PPT
Mining 3-Clusters in Vertically Partitioned Data
PDF
Big Data with Rough Set Using Map- Reduce
PDF
Data science
DMDW Lesson 08 - Further Data Mining Algorithms
DMDW Lesson 04 - Data Mining Theory
Data Mining: Mining stream time series and sequence data
Summer Training Project On Data Structure & Algorithms
HRUG - Linear regression with R
Mining 3-Clusters in Vertically Partitioned Data
Big Data with Rough Set Using Map- Reduce
Data science

What's hot (20)

PPT
5.4 mining sequence patterns in biological data
PPTX
Data Mining: Mining ,associations, and correlations
PPTX
introduction to Data Structure and classification
PPTX
Mining frequent patterns association
PPTX
Machine Learning and Real-World Applications
DOCX
mapReduce for machine learning
PDF
Feature Importance Analysis with XGBoost in Tax audit
PPT
Chapter - 7 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; Kamber
PPTX
Multidimensioal database
PPTX
Linear regression on 1 terabytes of data? Some crazy observations and actions
PDF
An improvised frequent pattern tree
PDF
FiDoop: Parallel Mining of Frequent Itemsets Using MapReduce
PDF
Simulation and Performance Analysis of Long Term Evolution (LTE) Cellular Net...
PPTX
What is Machine Learning
PDF
Stock Market Prediction Using ANN
PPTX
An intelligent scalable stock market prediction system
PDF
Graph Tea: Simulating Tool for Graph Theory & Algorithms
PDF
Machine Learning Real Life Applications By Examples
PPT
Lect12 graph mining
PDF
Graph based Approach and Clustering of Patterns (GACP) for Sequential Pattern...
5.4 mining sequence patterns in biological data
Data Mining: Mining ,associations, and correlations
introduction to Data Structure and classification
Mining frequent patterns association
Machine Learning and Real-World Applications
mapReduce for machine learning
Feature Importance Analysis with XGBoost in Tax audit
Chapter - 7 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; Kamber
Multidimensioal database
Linear regression on 1 terabytes of data? Some crazy observations and actions
An improvised frequent pattern tree
FiDoop: Parallel Mining of Frequent Itemsets Using MapReduce
Simulation and Performance Analysis of Long Term Evolution (LTE) Cellular Net...
What is Machine Learning
Stock Market Prediction Using ANN
An intelligent scalable stock market prediction system
Graph Tea: Simulating Tool for Graph Theory & Algorithms
Machine Learning Real Life Applications By Examples
Lect12 graph mining
Graph based Approach and Clustering of Patterns (GACP) for Sequential Pattern...
Ad

Viewers also liked (8)

PPTX
DMDW Lesson 01 - Introduction
PPTX
Ria 09 trends_and_technologies
PPTX
DMDW Lesson 03 - Data Warehouse Theory
PPTX
DMDW Extra Lesson - NoSql and MongoDB
PDF
2012-08-29 - NoSQL Bootcamp (Redis, RavenDB & MongoDB für .NET Entwickler)
PDF
2017 - NoSQL Vorlesung Mosbach
PPTX
NoSQL - Hands on
PPTX
Exkurs: Save the pixel
DMDW Lesson 01 - Introduction
Ria 09 trends_and_technologies
DMDW Lesson 03 - Data Warehouse Theory
DMDW Extra Lesson - NoSql and MongoDB
2012-08-29 - NoSQL Bootcamp (Redis, RavenDB & MongoDB für .NET Entwickler)
2017 - NoSQL Vorlesung Mosbach
NoSQL - Hands on
Exkurs: Save the pixel
Ad

Similar to DMDW Lesson 05 + 06 + 07 - Data Mining Applied (20)

PDF
BI Chapter 04.pdf business business business business
PPT
Cssu dw dm
PPTX
Data mining concepts and work
PPT
Chapter 1. Introduction
PDF
Data mining chapter for students of university
PPTX
Data mining , Knowledge Discovery Process, Classification
PDF
A Comparative Study of Various Data Mining Techniques: Statistics, Decision T...
PDF
Data science technology overview
DOC
Cyb 5675 class project final
PPTX
Explorartory Data Analytics and Knowledge Discovery techniques.pptx
PPTX
Introduction to-data-mining chapter 1
PPTX
algorithmic-decisions, fairness, machine learning, provenance, transparency
PPTX
Unit 1.pptx
PDF
data mining
PPT
Unit 1 (Chapter-1) on data mining concepts.ppt
PDF
ifip2008albashiri.pdf
DOCX
What is data In your address the.docx
DOC
Ci2004-10.doc
PPT
Introduction of Data Mining - Concept and techniques
PPTX
Introduction to Data Mining and Data Warehousing
BI Chapter 04.pdf business business business business
Cssu dw dm
Data mining concepts and work
Chapter 1. Introduction
Data mining chapter for students of university
Data mining , Knowledge Discovery Process, Classification
A Comparative Study of Various Data Mining Techniques: Statistics, Decision T...
Data science technology overview
Cyb 5675 class project final
Explorartory Data Analytics and Knowledge Discovery techniques.pptx
Introduction to-data-mining chapter 1
algorithmic-decisions, fairness, machine learning, provenance, transparency
Unit 1.pptx
data mining
Unit 1 (Chapter-1) on data mining concepts.ppt
ifip2008albashiri.pdf
What is data In your address the.docx
Ci2004-10.doc
Introduction of Data Mining - Concept and techniques
Introduction to Data Mining and Data Warehousing

More from Johannes Hoppe (20)

PDF
Einführung in Angular 2
PDF
MDC kompakt 2014: Hybride Apps mit Cordova, AngularJS und Ionic
PPTX
2015 02-09 - NoSQL Vorlesung Mosbach
PDF
2012-06-25 - MapReduce auf Azure
PDF
2013-06-25 - HTML5 & JavaScript Security
PDF
2013-06-24 - Software Craftsmanship with JavaScript
PDF
2013-06-15 - Software Craftsmanship mit JavaScript
PDF
2013 05-03 - HTML5 & JavaScript Security
PDF
2013-03-23 - NoSQL Spartakiade
PDF
2013 02-26 - Software Tests with Mongo db
PDF
2013-02-21 - .NET UG Rhein-Neckar: JavaScript Best Practices
PDF
2012-10-16 - WebTechCon 2012: HTML5 & WebGL
PDF
2012-10-12 - NoSQL in .NET - mit Redis und Mongodb
PDF
2012-09-18 - HTML5 & WebGL
PDF
2012-09-17 - WDC12: Node.js & MongoDB
PDF
2012-05-14 NoSQL in .NET - mit Redis und MongoDB
PDF
2012-05-10 - UG Karlsruhe: NoSQL in .NET - mit Redis und MongoDB
PDF
2012-04-12 - AOP .NET UserGroup Niederrhein
PDF
2012-03-20 - Getting started with Node.js and MongoDB on MS Azure
PDF
2012-01-31 NoSQL in .NET
Einführung in Angular 2
MDC kompakt 2014: Hybride Apps mit Cordova, AngularJS und Ionic
2015 02-09 - NoSQL Vorlesung Mosbach
2012-06-25 - MapReduce auf Azure
2013-06-25 - HTML5 & JavaScript Security
2013-06-24 - Software Craftsmanship with JavaScript
2013-06-15 - Software Craftsmanship mit JavaScript
2013 05-03 - HTML5 & JavaScript Security
2013-03-23 - NoSQL Spartakiade
2013 02-26 - Software Tests with Mongo db
2013-02-21 - .NET UG Rhein-Neckar: JavaScript Best Practices
2012-10-16 - WebTechCon 2012: HTML5 & WebGL
2012-10-12 - NoSQL in .NET - mit Redis und Mongodb
2012-09-18 - HTML5 & WebGL
2012-09-17 - WDC12: Node.js & MongoDB
2012-05-14 NoSQL in .NET - mit Redis und MongoDB
2012-05-10 - UG Karlsruhe: NoSQL in .NET - mit Redis und MongoDB
2012-04-12 - AOP .NET UserGroup Niederrhein
2012-03-20 - Getting started with Node.js and MongoDB on MS Azure
2012-01-31 NoSQL in .NET

Recently uploaded (20)

PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Review of recent advances in non-invasive hemoglobin estimation
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
cuic standard and advanced reporting.pdf
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PPTX
Spectroscopy.pptx food analysis technology
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PPTX
Big Data Technologies - Introduction.pptx
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Mobile App Security Testing_ A Comprehensive Guide.pdf
MYSQL Presentation for SQL database connectivity
Review of recent advances in non-invasive hemoglobin estimation
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
MIND Revenue Release Quarter 2 2025 Press Release
Programs and apps: productivity, graphics, security and other tools
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Network Security Unit 5.pdf for BCA BBA.
cuic standard and advanced reporting.pdf
Dropbox Q2 2025 Financial Results & Investor Presentation
Per capita expenditure prediction using model stacking based on satellite ima...
Chapter 3 Spatial Domain Image Processing.pdf
Spectroscopy.pptx food analysis technology
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Big Data Technologies - Introduction.pptx
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...

DMDW Lesson 05 + 06 + 07 - Data Mining Applied

  • 1. STUDIERENUND DURCHSTARTEN.Author I: Dip.-Inf. (FH) Johannes HoppeAuthor II: M.Sc. Johannes HofmeisterAuthor III: Prof. Dr. Dieter HomeisterDate: 01.04.201108.04.201115.04.2011
  • 2. Data Mining AppliedAuthor I: Dip.-Inf. (FH) Johannes HoppeAuthor II: M.Sc. Johannes HofmeisterAuthor III: Prof. Dr. Dieter HomeisterDate: 01.04.201108.04.201115.04.2011
  • 5. Applicationsof Data MiningApplications of Data MiningDatabase Marketing Time-series prediction, detecting "trends" Detection (of whatever is detectable)Probability Estimation Information compression Sensitivity Analysis 5
  • 6. Applicationsof Data MiningDatabase Marketing(1/2)Response modelingModel for the response of specific customers. Systematic selection of (old and potential) customers. Advertisements and promotion based on these results. ( CRM)Visualization: "Lift chart" shows how successful the selection should be. (later topic: DM validation)6
  • 7. Lift Chart Example“For contacting 10% of customers, using no model we should get 10% of responders and using the given model we should get 30% of responders.”7
  • 8. Applicationsof Data MiningDatabase Marketing(2/2)Cross selling: Selling additional products to existing customersQuestion: Which customer might buy which other product?Uses historical purchase data Uses credit card information, lifestyle data, demographic data, etc. Other possible information: Did the customer query special information? How customer heard of the company? 8
  • 9. Applicationsof Data MiningDatabase Marketing(2/2)Cross selling: Selling additional products to existing customersResults for direct marketing, mailing lists, direct advertising (Amazon) Amazon: "Customers who bought this item also bought" and "personalized recommendations" 9
  • 10. Applicationsof Data MiningTime-series predictionTime series: Stock prices, market shares, … Extrapolation of future values Detection of newly arising trends like customer movements to other productsOwn experience: German print magazines 10
  • 11. Applicationsof Data MiningDetectionIdentification of existence or occurrence of a condition Fraud detection: Identifying patterns/criteria to detect credit card fraud Estimating creditworthiness ( German Schufa) Prediction of mail orders that will not be paid 11
  • 12. Applicationsof Data MiningDetectionIdentification of existence or occurrence of a condition Intrusion detection (in computer networks) Find patterns that indicate when an attackis made on an network e.g. clustering: small clusters are of high interest,they point to unusual cases.Definition of Classes may be useful:e.g. harmless, possible harmful,harmful, immediately close LAN 12
  • 13. Applicationsof Data MiningDetectionIdentification of existence or occurrence of a condition Typical difficultiesNeeds knowledgeDM costs Cost of missing a fraud Cost of false positives(e.g. falsely accusing someone of fraud, company image problems)13
  • 14. Applicationsof Data MiningProbability EstimationApproximate the likelihood of an event given an observation e.g. for classify a potential customer into an A,B,C range before any business14
  • 15. Applicationsof Data MiningInformation CompressionCan be viewed as a special type of estimation problem. For a given set of data, estimate the key components that be can be used to construct the data. 15
  • 16. Applicationsof Data MiningSensitivity AnalysisUnderstand how changes in one variable affect others. Identify sensitivity of one variable on another(find out if dependencies exist). 16
  • 18. Data Mining AlgorithmsData Mining AlgorithmsDifferent algorithms, different usesCombinedThe algorithm depends on what you want to doNot every algorithm is suited for what you want to do18
  • 19. Data Mining AlgorithmsAlgorithms in SSAS: GroupsClassification algorithmsRegression algorithmsAssociation algorithmsSegmentation algorithmsSequence analysis algorithmsPlug-In algorithms19
  • 20. Data Mining AlgorithmsClassification algorithmsPredict discrete attributesBased on experience valuesAlgorithms in SSAS:Naive BayesDecision TreesNeural Networks20
  • 21. Data Mining AlgorithmsRegression algorithmsPredict continuous attributesThe same as classification algorithmsAlgorithms in SSASLinear Regression (Line)Logistic Regression (Curve)MS Time Series21
  • 22. Data Mining AlgorithmsAssociation algorithmsPredict likely combinationsFind elements that occur in combinationAlgorithms in SSAS:MS Associtation Algorithm (Apriori)22
  • 23. Data Mining AlgorithmsSegmentation algorithmsAlso called „Clustering algorithms“Groups data with similar propertiesAlgorithms in SSAS:MS Clustering Algorithms (e.g. K-Means)23
  • 24. Data Mining AlgorithmsSequence analysis algorithms…are clustering algorithmsConsider the sorting; the sequence of values while clusteringDoes not group by similar propertiesGroups by similar sequencesAlgorithms in SSAS: MS Sequence Clustering24
  • 25. Data Mining AlgorithmsPlug-In algorithms.NET Wrapper for COM objectsUse ANY algorithmProvided as an assembly(possible workshop to create one)25
  • 27. Repetition - Datatypes, ContentypesApplying anAlgorithmDatatypesContenttypes27
  • 28. Repetition - Datatypes, ContentypesDatatypesDefinethestructure of thevaluesAvailabledatatypes:TextLongBooleanDoubleDate28
  • 29. Repetition - Datatypes, ContentypesContenttypesDefinethebehaviour of valuesDiscreteContinuousDiscretizedKeyKey SequenceKey TimeOrderedCyclical29
  • 30. Repetition - Datatypes, ContentypesContenttype: DiscreteFixed set of valuesExample:Commute Distance: 1-2, 2-5, 5-10Region: Pacific, Northern America, EuropeName: … … …Boolean values are always discreteText is most likely discrete30
  • 31. Repetition - Datatypes, ContentypesContenttype: ContinuousUnlimited set of valuesInfinite items possibleExampleIncomeAgeDifference between Continuous and Discrete is the most important one31
  • 32. Repetition - Datatypes, ContentypesContenttype: DiscretizedContinuousvaluesconvertedintodiscretevaluesExamples:Income to Categories:A, B, C, …Age to groups:0-20,21-30, 31-40, …32
  • 33. Repetition - Datatypes, ContentypesContenttype: KeyKeyUniquely identifies a rowKey Sequence (sequence clustering models)Series of eventsSortedKey Time (time series models)Identify values on a time scale33
  • 34. Repetition - Datatypes, ContentypesContenttype: OrderedDiscretevaluesthathave a sorting orderNodistancesvisibleNorelationsvisible„One Star“ to „Five Stars“34
  • 35. Repetition - Datatypes, ContentypesContenttype: CyclicalDiscretevaluesthathave a cyclicalsorting orderExample:Weekdays: Monday, Tuesday, … Sunday, Monday, … 1,2,3, …,7, 1, …Months Jan, Feb, Mar, … , Dec, Jan, … 1, 2, 3, …, 12, 1, …35
  • 37. 04Data Mining Algorithms - Decision Trees37
  • 38. Applied Data Mining - Decision Trees38
  • 39. Applied Data Mining - Decision TreesIn GeneralAlso known as: Classification TreesGoal: Sequentially partition DataCan detect non-linear relationshipsMachine Learning TechniqueSeparate into Training and Testing setTraining set is created to create model based on certain criteriaTest set is used to verify the model39
  • 40. Applied Data Mining - Decision TreesTree for response of a mailing actionIncome > $30 000: 3,6 %Male 3,2%(Total: 4.677)Income < $30 000: 2,3 %2,6 % respose rate(Total: 10.000 persons)Age > 40: 3,8%Female 2,1%(Total: 5,323)Age < 40: 3,2 %40
  • 41. Applied Data Mining - Decision TreesUsingtheTrainedTreeExample: the management decides to mail only to groups with response rate >3.5%. TrainedTreeMales: $30 000Response Rate: > 3,5 %Female: 40+41
  • 42. Applied Data Mining - Decision TreesProsVery flexible, white box ModelKiss – Keep it simple, stupid!Little preparation and resources neededConsCan be tuned until deathLong time to buildRequires wisely selected training data!False training yields false resultsBig tree might require disk swapping(Computation might be difficult if it does not fit into main memory.) 42
  • 44. Project: “DMDW Mining Test”(explanation of one note)44
  • 45. Project: “DMDW Mining Test”(shows connections, more useful if there are more predictable values)
  • 46. Project: “DMDW Mining Test”(Generic Content Tree Viewer  DMX (Data Mining Extensions))
  • 47. ReferencesReferences for Decisions TreesOlivia Parr Rud et. al, Data Mining Cookbook - Modeling Data for Marketing, Risk, and Customer Relationship Management, Wiley, 2001David A. Grossman, Ophir Frieder: Introductionto Data Mining, Illinois Institute of Technology 2005Andrew W. Moore: DecisionTrees, Carnegie Mellon University, http://www.autonlab.org/tutorials/dtree16.pdfNongYe (ed.): The Handbook of Data Mining, Lawrence Erlbaum Associates, 2003Sushimita Mitra, TinkuAcharya, Data Mining - Multimedia, Soft Computing andBioinformatics, Wiley, 2003http://en.wikipedia.org/wiki/Classification_tree47
  • 48. 05Data Mining Algorithms - Clustering48
  • 49. Data Mining Algorithms - ClusteringX1249
  • 50. Data Mining Algorithms - ClusteringClusteringSegmentation AlgorithmFind homogenous groups within setFind similar variables for different casesIdentify new relationships that were unclear before(heuristics)e.g. „Person who rides a bike to work doesn‘t live far from his workplace“ (this is not obvious)50
  • 52. 52HomogeneousSubsetsIndependent VariablesDescription of class1. Clustering2. ClassificationclassifyidentifyX12
  • 53. Clustering1. ClusteringReducesdatatoclasses of equaltypesBecomefriedswiththedataIterative AlgorithmClusteringValidateClassifyApplyhttp://msdn.microsoft.com/en-us/library/ms174879.aspx53
  • 54. Data Mining Algorithms - Clustering2. ClassificationCreate a Description of a groupGive it a „name“Also: Characterization54
  • 55. ProcessStart with random valuesReuse will create different sets and different groupsDifferent clustering technique / algorithm will create different groupReuse on same dataset, reseedExpert evaluate found classes and plausibility Good classes used for predictionsGood?1. ClusteringEvaluate, Check2. ClassifyApply(Predict)55
  • 56. ClusteringMS Clustering AlgorithmCombination of two algorithmsK-Means – Hard! Datapoint can be in only one clusterExpectation Maximization – SoftDatapoint has different combinationsDatapoint belongs to different clustersProbability is calculated56Source: http://msdn.microsoft.com/en-us/library/cc280445.aspx
  • 57. Clustering57ProsNo predictable variable to chooseTrains itself without much effortEasy to configure„Cons“Interpretation is everythingGood eye neededExpert has to check for plausibility
  • 58. Project: “DMDW Mining Test”(strongest relations only, amount of matching cases for Region Europe)
  • 59. Project: “DMDW Mining Test”(good to know: continuous attributes are shown by there arithmetic average)
  • 60. Project: “DMDW Mining Test”(comparing two clusters)
  • 61. THANK YOUFOR YOUR ATTENTION61