Machine Learning, Data Mining   INFO 629 Dr. R. Weber
The picnic game How did you reason to find the rule? According to Michalski (1983) A theory and methodology of inductive learning. In Machine Learning, chapter 4,  “ inductive learning is a heuristic search through a space of symbolic descriptions (i.e., generalizations) generated by the application of rules to training instances .”
Learning Rote Learning Learn multiplication tables Supervised L e a r n i n g Examples are used to help a program identify a concept Examples are typically represented with attribute-value pairs Notion of supervision originates from guidance from examples Unsupervised Learning Human efforts at scientific discovery, theory formation
Inductive Learning Learning by generalization Performance of classification tasks Classification, categorization, clustering Rules indicate categories Goal:  Characterize a concept
Concept Learning is a Form of Inductive Learning Learner uses: positive examples (instances ARE examples of a concept) and  negative examples (instances ARE NOT examples of a concept)
Concept Learning Needs empirical validation Dense or sparse data determine quality of different methods
Validation of Concept Learning i The learned concept should be able to correctly classify new instances of the concept When it succeeds in a real instance of the concept it finds true positives  When it fails in a real instance of the concept it finds false negatives
Validation of Concept Learning ii The learned concept should be able to correctly classify new instances of the concept When it succeeds in a counterexample it finds true negatives When it fails in a counterexample it finds false positives
Basic classification tasks Classification Categorization Clustering
Categorization
Classification
Clustering
Clustering Data analysis method applied to data Data should naturally possess groupings Goal: group data into clusters Resulting clusters are collections where objects within a cluster are similar to each other Objects outside the cluster are dissimilar to objects inside Objects from one cluster are dissimilar to objects in other clusters  Distance measures are used to compute similarity
Rule Learning Learning widely used in data mining Version Space Learning is a search method to learn rules Decision Trees
Version Space i A=1,B=1,C=1     Outcome=1 A=0,B=.5,C=.5     Outcome=0 A=0,B=0,C=.3     Outcome=.5 Creates tree that includes all possible combinations Does not learn for rules with disjunctions (i.e. OR statements) Incremental method, trains additional data without the need to retrain all data
Decision trees Knowledge representation formalism Represent mutually exclusive rules (disjunction) A way of breaking up a data set into classes or categories Classification rules that determine, for each instance with attribute values, whether it belongs to one or another class
Decision trees consist of: - leaf nodes (classes) -  decision nodes  (tests on attribute values) - from decision nodes branches grow for each possible outcome of the test From Cawsey, 1997
Decision tree induction Goal is to correctly classify all example data Several algorithms to induce decision trees:  ID3 (Quinlan 1979) , CLS, ACLS, ASSISTANT, IND, C4.5 Constructs decision tree from past data Not incremental Attempts to find the simplest tree (not guaranteed because it is based on heuristics)
From: a set of target classes Training data containing objects of more than one class ID3 uses test to refine the training data set into subsets that contain objects of only one class each Choosing the right test is the key ID3 algorithm
Information gain or ‘minimum entropy’ Maximizing information gain corresponds to minimizing entropy Predictive features (good indicators of the outcome) How does ID3 chooses tests
ID3 algorithm no yes yes yes no Simon 6 yes yes yes no yes Gail 5 no yes no yes no Jeff 4 yes no yes no no Alison 3 yes no yes yes yes Alan 2 yes  yes no yes yes Richard 1 First this year? Drinks? Works hard? Male? First last year? Student No.
ID3 algorithm no yes yes yes no Simon 6 yes yes yes no yes Gail 5 no yes no yes no Jeff 4 yes no yes no no Alison 3 yes no yes yes yes Alan 2 yes  yes no yes yes Richard 1 First this year? Drinks? Works hard? Male? First last year? Student No.
ID3 algorithm no yes yes yes no Simon 6 yes yes yes no yes Gail 5 no yes no yes no Jeff 4 yes no yes no no Alison 3 yes no yes yes yes Alan 2 yes  yes no yes yes Richard 1 First this year? Drinks? Works hard? Male? First last year? Student No.
ID3 algorithm no yes yes yes no Simon 6 yes yes yes no yes Gail 5 no yes no yes no Jeff 4 yes no yes no no Alison 3 yes no yes yes yes Alan 2 yes  yes no yes yes Richard 1 First this year? Drinks? Works hard? Male? First last year? Student No.
ID3 algorithm no yes yes yes no Simon 6 yes yes yes no yes Gail 5 no yes no yes no Jeff 4 yes no yes no no Alison 3 yes no yes yes yes Alan 2 yes  yes no yes yes Richard 1 First this year? Drinks? Works hard? Male? First last year? Student No.
ID3 algorithm no yes yes yes no Simon 6 yes yes yes no yes Gail 5 no yes no yes no Jeff 4 yes no yes no no Alison 3 yes no yes yes yes Alan 2 yes  yes no yes yes Richard 1 First this year? Drinks? Works hard? Male? First last year? Student No.
ID3 algorithm no yes yes yes no Simon 6 yes yes yes no yes Gail 5 no yes no yes no Jeff 4 yes no yes no no Alison 3 yes no yes yes yes Alan 2 yes  yes no yes yes Richard 1 First this year? Drinks? Works hard? Male? First last year? Student No.
Explanation-based learning Incorporates domain knowledge into the learning process Feature values are assigned a relevance factor if their values are consistent with domain knowledge Features that are assigned relevance factors are considered in the learning process
Familiar Learning Task Learn relative importance of features Goal: learn individual weights Commonly used in case-based reasoning Methods include a similarity measure to get feedback about verify their relative importance: feedback methods Search methods: gradient descent ID3
Classification  using Naive Bayes Naïve  Bayes classifier uses two sources of information to classify a new instance The distribution of the rtaining dataset (prior probability) The region surrounding the new instance in the dataset (likelihood) Naïve because assumes conditional independence not always applicable It is made to simplify the computation and in this sense considered to be “Naïve”. Conditional independence reduces the requirement for large number of observations Bias in estimating probabilities often may not make a difference in practice -- it is the order of the probabilities, not their exact values, that determine the classifications. Comparable in performance with classification trees and with neural networks  Highly accurate and fast when applied to large databases Some links: http ://www.resample.com/xlminer/help/NaiveBC/classiNB_intro.htm http://www.statsoft.com/textbook/stnaiveb.html
KDD : definition Knowledge Discovery in Databases (KDD)  is the non-trivial process of identifying valid, novel, and potential useful and understandable patterns in data. (R.Feldman,2000) KDD  is the non-trivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data (Fayad, Piatetsky-Shapiro, Smyth 1996 p. 6).  Data mining  is one of the steps in the KDD process. Text mining   concerns applying data mining techniques to unstructured text.
The KDD Process DATA patterns interpretation SELECTED DATA PROCESSED DATA browsing KNOWLEDGE TRANSFORMED DATA filtering preprocessing transformation Data mining
Predictive modeling/risk assessment Database segmentation Data mining tasks i Classification, decision trees Kohonen nets, clustering techniques
Link analysis Deviation detection Data mining tasks ii Rules:  Association generation Relationships between entities How things change over time, trends
KDD applications Fraud detection Telecom (calling cards, cell phones) Credit cards Health insurance Loan approval Investment analysis Marketing and sales data analysis Identify potential customers Effectiveness of sales campaign Store layout
Text mining The problem starts with a query and the solution is a set of information (e.g., patterns, connections, profiles, trends) contained in several different texts that are potentially relevant to the initial query.
Text mining applications IBM Text Navigator Cluster documents by content; Each document is annotated by the 2 most frequently used words in the cluster; Concept Extraction (Los Alamos) Text analysis of medical records; Uses a clustering approach based on trigram representation; Documents in vectors, cosine for comparison;

More Related Content

DOCX
Mb0050 research methodology
PPT
Fundamentals of data analysis
DOCX
Mb0050 research methodology
PPTX
Pairwise reviews ranking and classification
PPT
Chp5 - Research Methods for Business By Authors Uma Sekaran and Roger Bougie
PDF
STAT 778 Project Proposal - Jonathan Poon
PPT
Data Mining
PPTX
Coding Your Results
Mb0050 research methodology
Fundamentals of data analysis
Mb0050 research methodology
Pairwise reviews ranking and classification
Chp5 - Research Methods for Business By Authors Uma Sekaran and Roger Bougie
STAT 778 Project Proposal - Jonathan Poon
Data Mining
Coding Your Results

Viewers also liked (6)

PDF
E-commerce product classification with deep learning
PDF
Applying machine learning to product categorization
PPTX
Machine Learning with Applications in Categorization, Popularity and Sequence...
PPTX
SF ElasticSearch Meetup 2013.04.06 - Monitoring
DOCX
Boosting conversion rates on ecommerce using deep learning algorithms
PDF
How Data Science can increase Ecommerce profits
E-commerce product classification with deep learning
Applying machine learning to product categorization
Machine Learning with Applications in Categorization, Popularity and Sequence...
SF ElasticSearch Meetup 2013.04.06 - Monitoring
Boosting conversion rates on ecommerce using deep learning algorithms
How Data Science can increase Ecommerce profits
Ad

Similar to Machine Learning (20)

PPT
Introduction-to-Knowledge Discovery in Database
PDF
Chapter8.coding
PPTX
3-Classification, Clustering and Prediction.pptx
PPTX
3-Classification, Clustering and Prediction.pptx
PPT
Research CHap 4
PPTX
Privacy and Deep Learning - Friends or Foes?
PPTX
Introducing grounded theory
PDF
Using ID3 Decision Tree Algorithm to the Student Grade Analysis and Prediction
PPTX
Statistics
PPTX
Data Collection
PPT
Thematic Qualitative Data Analysis By IGS.ppt
PPTX
Ai4life aiml-xops-sig
PPT
data mining presentation power point for the study
PPT
lghjghgggkgjhgjghhjgjhgkhjghjghjghjghect1.ppt
PPT
lect1.ppt
PPT
lect1lect1lect1lect1lect1lect1lect1lect1.ppt
PPT
Data Mining Course Overview Overview.ppt
DOCX
196309903 q-answer
PDF
00 - Lecture - 01_MVA - Quantitative Data Analysis - An Introduction.pdf
PPT
Grounded theory new
Introduction-to-Knowledge Discovery in Database
Chapter8.coding
3-Classification, Clustering and Prediction.pptx
3-Classification, Clustering and Prediction.pptx
Research CHap 4
Privacy and Deep Learning - Friends or Foes?
Introducing grounded theory
Using ID3 Decision Tree Algorithm to the Student Grade Analysis and Prediction
Statistics
Data Collection
Thematic Qualitative Data Analysis By IGS.ppt
Ai4life aiml-xops-sig
data mining presentation power point for the study
lghjghgggkgjhgjghhjgjhgkhjghjghjghjghect1.ppt
lect1.ppt
lect1lect1lect1lect1lect1lect1lect1lect1.ppt
Data Mining Course Overview Overview.ppt
196309903 q-answer
00 - Lecture - 01_MVA - Quantitative Data Analysis - An Introduction.pdf
Grounded theory new
Ad

More from butest (20)

PDF
EL MODELO DE NEGOCIO DE YOUTUBE
DOC
1. MPEG I.B.P frame之不同
PDF
LESSONS FROM THE MICHAEL JACKSON TRIAL
PPT
Timeline: The Life of Michael Jackson
DOCX
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
PDF
LESSONS FROM THE MICHAEL JACKSON TRIAL
PPTX
Com 380, Summer II
PPT
PPT
DOCX
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
DOC
MICHAEL JACKSON.doc
PPTX
Social Networks: Twitter Facebook SL - Slide 1
PPT
Facebook
DOCX
Executive Summary Hare Chevrolet is a General Motors dealership ...
DOC
Welcome to the Dougherty County Public Library's Facebook and ...
DOC
NEWS ANNOUNCEMENT
DOC
C-2100 Ultra Zoom.doc
DOC
MAC Printing on ITS Printers.doc.doc
DOC
Mac OS X Guide.doc
DOC
hier
DOC
WEB DESIGN!
EL MODELO DE NEGOCIO DE YOUTUBE
1. MPEG I.B.P frame之不同
LESSONS FROM THE MICHAEL JACKSON TRIAL
Timeline: The Life of Michael Jackson
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
LESSONS FROM THE MICHAEL JACKSON TRIAL
Com 380, Summer II
PPT
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
MICHAEL JACKSON.doc
Social Networks: Twitter Facebook SL - Slide 1
Facebook
Executive Summary Hare Chevrolet is a General Motors dealership ...
Welcome to the Dougherty County Public Library's Facebook and ...
NEWS ANNOUNCEMENT
C-2100 Ultra Zoom.doc
MAC Printing on ITS Printers.doc.doc
Mac OS X Guide.doc
hier
WEB DESIGN!

Machine Learning

  • 1. Machine Learning, Data Mining INFO 629 Dr. R. Weber
  • 2. The picnic game How did you reason to find the rule? According to Michalski (1983) A theory and methodology of inductive learning. In Machine Learning, chapter 4, “ inductive learning is a heuristic search through a space of symbolic descriptions (i.e., generalizations) generated by the application of rules to training instances .”
  • 3. Learning Rote Learning Learn multiplication tables Supervised L e a r n i n g Examples are used to help a program identify a concept Examples are typically represented with attribute-value pairs Notion of supervision originates from guidance from examples Unsupervised Learning Human efforts at scientific discovery, theory formation
  • 4. Inductive Learning Learning by generalization Performance of classification tasks Classification, categorization, clustering Rules indicate categories Goal: Characterize a concept
  • 5. Concept Learning is a Form of Inductive Learning Learner uses: positive examples (instances ARE examples of a concept) and negative examples (instances ARE NOT examples of a concept)
  • 6. Concept Learning Needs empirical validation Dense or sparse data determine quality of different methods
  • 7. Validation of Concept Learning i The learned concept should be able to correctly classify new instances of the concept When it succeeds in a real instance of the concept it finds true positives When it fails in a real instance of the concept it finds false negatives
  • 8. Validation of Concept Learning ii The learned concept should be able to correctly classify new instances of the concept When it succeeds in a counterexample it finds true negatives When it fails in a counterexample it finds false positives
  • 9. Basic classification tasks Classification Categorization Clustering
  • 13. Clustering Data analysis method applied to data Data should naturally possess groupings Goal: group data into clusters Resulting clusters are collections where objects within a cluster are similar to each other Objects outside the cluster are dissimilar to objects inside Objects from one cluster are dissimilar to objects in other clusters Distance measures are used to compute similarity
  • 14. Rule Learning Learning widely used in data mining Version Space Learning is a search method to learn rules Decision Trees
  • 15. Version Space i A=1,B=1,C=1  Outcome=1 A=0,B=.5,C=.5  Outcome=0 A=0,B=0,C=.3  Outcome=.5 Creates tree that includes all possible combinations Does not learn for rules with disjunctions (i.e. OR statements) Incremental method, trains additional data without the need to retrain all data
  • 16. Decision trees Knowledge representation formalism Represent mutually exclusive rules (disjunction) A way of breaking up a data set into classes or categories Classification rules that determine, for each instance with attribute values, whether it belongs to one or another class
  • 17. Decision trees consist of: - leaf nodes (classes) - decision nodes (tests on attribute values) - from decision nodes branches grow for each possible outcome of the test From Cawsey, 1997
  • 18. Decision tree induction Goal is to correctly classify all example data Several algorithms to induce decision trees: ID3 (Quinlan 1979) , CLS, ACLS, ASSISTANT, IND, C4.5 Constructs decision tree from past data Not incremental Attempts to find the simplest tree (not guaranteed because it is based on heuristics)
  • 19. From: a set of target classes Training data containing objects of more than one class ID3 uses test to refine the training data set into subsets that contain objects of only one class each Choosing the right test is the key ID3 algorithm
  • 20. Information gain or ‘minimum entropy’ Maximizing information gain corresponds to minimizing entropy Predictive features (good indicators of the outcome) How does ID3 chooses tests
  • 21. ID3 algorithm no yes yes yes no Simon 6 yes yes yes no yes Gail 5 no yes no yes no Jeff 4 yes no yes no no Alison 3 yes no yes yes yes Alan 2 yes yes no yes yes Richard 1 First this year? Drinks? Works hard? Male? First last year? Student No.
  • 22. ID3 algorithm no yes yes yes no Simon 6 yes yes yes no yes Gail 5 no yes no yes no Jeff 4 yes no yes no no Alison 3 yes no yes yes yes Alan 2 yes yes no yes yes Richard 1 First this year? Drinks? Works hard? Male? First last year? Student No.
  • 23. ID3 algorithm no yes yes yes no Simon 6 yes yes yes no yes Gail 5 no yes no yes no Jeff 4 yes no yes no no Alison 3 yes no yes yes yes Alan 2 yes yes no yes yes Richard 1 First this year? Drinks? Works hard? Male? First last year? Student No.
  • 24. ID3 algorithm no yes yes yes no Simon 6 yes yes yes no yes Gail 5 no yes no yes no Jeff 4 yes no yes no no Alison 3 yes no yes yes yes Alan 2 yes yes no yes yes Richard 1 First this year? Drinks? Works hard? Male? First last year? Student No.
  • 25. ID3 algorithm no yes yes yes no Simon 6 yes yes yes no yes Gail 5 no yes no yes no Jeff 4 yes no yes no no Alison 3 yes no yes yes yes Alan 2 yes yes no yes yes Richard 1 First this year? Drinks? Works hard? Male? First last year? Student No.
  • 26. ID3 algorithm no yes yes yes no Simon 6 yes yes yes no yes Gail 5 no yes no yes no Jeff 4 yes no yes no no Alison 3 yes no yes yes yes Alan 2 yes yes no yes yes Richard 1 First this year? Drinks? Works hard? Male? First last year? Student No.
  • 27. ID3 algorithm no yes yes yes no Simon 6 yes yes yes no yes Gail 5 no yes no yes no Jeff 4 yes no yes no no Alison 3 yes no yes yes yes Alan 2 yes yes no yes yes Richard 1 First this year? Drinks? Works hard? Male? First last year? Student No.
  • 28. Explanation-based learning Incorporates domain knowledge into the learning process Feature values are assigned a relevance factor if their values are consistent with domain knowledge Features that are assigned relevance factors are considered in the learning process
  • 29. Familiar Learning Task Learn relative importance of features Goal: learn individual weights Commonly used in case-based reasoning Methods include a similarity measure to get feedback about verify their relative importance: feedback methods Search methods: gradient descent ID3
  • 30. Classification using Naive Bayes Naïve Bayes classifier uses two sources of information to classify a new instance The distribution of the rtaining dataset (prior probability) The region surrounding the new instance in the dataset (likelihood) Naïve because assumes conditional independence not always applicable It is made to simplify the computation and in this sense considered to be “Naïve”. Conditional independence reduces the requirement for large number of observations Bias in estimating probabilities often may not make a difference in practice -- it is the order of the probabilities, not their exact values, that determine the classifications. Comparable in performance with classification trees and with neural networks Highly accurate and fast when applied to large databases Some links: http ://www.resample.com/xlminer/help/NaiveBC/classiNB_intro.htm http://www.statsoft.com/textbook/stnaiveb.html
  • 31. KDD : definition Knowledge Discovery in Databases (KDD) is the non-trivial process of identifying valid, novel, and potential useful and understandable patterns in data. (R.Feldman,2000) KDD is the non-trivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data (Fayad, Piatetsky-Shapiro, Smyth 1996 p. 6). Data mining is one of the steps in the KDD process. Text mining concerns applying data mining techniques to unstructured text.
  • 32. The KDD Process DATA patterns interpretation SELECTED DATA PROCESSED DATA browsing KNOWLEDGE TRANSFORMED DATA filtering preprocessing transformation Data mining
  • 33. Predictive modeling/risk assessment Database segmentation Data mining tasks i Classification, decision trees Kohonen nets, clustering techniques
  • 34. Link analysis Deviation detection Data mining tasks ii Rules: Association generation Relationships between entities How things change over time, trends
  • 35. KDD applications Fraud detection Telecom (calling cards, cell phones) Credit cards Health insurance Loan approval Investment analysis Marketing and sales data analysis Identify potential customers Effectiveness of sales campaign Store layout
  • 36. Text mining The problem starts with a query and the solution is a set of information (e.g., patterns, connections, profiles, trends) contained in several different texts that are potentially relevant to the initial query.
  • 37. Text mining applications IBM Text Navigator Cluster documents by content; Each document is annotated by the 2 most frequently used words in the cluster; Concept Extraction (Los Alamos) Text analysis of medical records; Uses a clustering approach based on trigram representation; Documents in vectors, cosine for comparison;

Editor's Notes

  • #34: What is predictive modeling? Predictive modeling uses demographic, medical and pharmacy claims information to determine the range and intensity of medical problems for a given population of insured persons. This assessment of risk allows health plans, payers and provider groups to plan, evaluate and fund health care management programs more effectively. From: http://www.dxcgrisksmart.com/faq.html
  • #35: What is predictive modeling? Predictive modeling uses demographic, medical and pharmacy claims information to determine the range and intensity of medical problems for a given population of insured persons. This assessment of risk allows health plans, payers and provider groups to plan, evaluate and fund health care management programs more effectively. From: http://www.dxcgrisksmart.com/faq.html