SlideShare a Scribd company logo
4
Most read
5
Most read
6
Most read
Data mining knowledge representation
1 What Defines a Data Mining Task?
• Task relevant data: where and how to retrieve the data to be used
for mining
• Background knowledge: Concept hierarchies
• Interestingness measures: informal and formal selection techniques
to be applied to the output knowledge
• Representing input data and output knowledge: the structures used
to represent the input of the output of the data mining techniques
• Visualization techniques: needed to best view and document the
results of the whole process
2 Task relevant data
• Database or data warehouse name: where to find the data
• Database tables or data warehouse cubes
• Condition for data selection, relevant attributes or dimensions and
data grouping criteria: all this is used in the SQL query to retrieve
the data
1
3 Background knowledge: Concept hierarchies
The concept hierarchies are induced by a partial order1
over the values
of a given attribute. Depending on the type of the ordering relation we
distinguish several types of concept hierarchies.
3.1 Schema hierarchy
• Relating concept generality. The ordering reflects the generality of
the attribute values, e.g. street < city < state < country.
3.2 Set-grouping hierarchy
• The ordering relation is the subset relation (⊆). Applies to set
values.
• Example:
{13, ..., 39} = young; {13, ..., 19} = teenage;
{13, ..., 19} ⊆ {13, ..., 39} ⇒ teenage < young.
• Theory:
– power set: the set of all subsets of a set, X.
– lattice (2X
, ⊆), sup(X, Y ) = X ∩ Y , inf(X, Y ) = X ∪ Y .
X ∩ Y
X Y
X ∪ Y
@
@
@
@
@
@
– top element > = {} (empty set), bottom element ⊥ = X.
1Consider a set A and an ordering relation R. R is a full order if for any x, y ∈ A, xRy exists. R is a partial order
if for any x ∈ A, there exists y ∈ A, such that either xRy or yRx exists.
2
3.3 Operation-derived hierarchy
Produced by applying an operation (encoding, decoding, information
extraction). For example:
markovz@cs.ccsu.edu
instantiates the hierarcy user−name < department < university <
usa−univeristy.
3.4 Rule-based hierarchy
Using rules to define the partial order, for example:
if antecedent then consequent
defines the order antecedent < consequent.
4 Interestingness measures
Criteria to evaluate hypotheses (knowledge extracted from data when
applying data mining techniques). This issue will be discussed in more
detail in Lecture notes - Chapter 9: ”Evaluating what’s been learned”.
4.1 Bayesian evaluation
• E - data
• H = {H1, H2, ..., Hn} - hypotheses
• Hbest = argmaxi P(Hi|E)
• Bayes theorem:
P(Hi|E) =
P(Hi)P(E|Hi)
Pn
i=1 P(Hi)P(E|Hi)
3
4.2 Simplicity
Occam’s Razor
Consider for example, association rule length, decision tree size, num-
ber and length of classification rules. The intuition suggests that the
best hypotesis is the simplest (shortest) one. This is the so called Oc-
cam’s Razor Principle also expressed as a mathematical theorem (Oc-
cam’s Razor Theorem). Here is an example of applying this principle
to grammars:
• Data:
E = {0, 000, 00000, 0000000, 000000000}
• Hypotheses:
G1 : S → 0|000|00000|0000000|000000000
G2 : S → 00S|0
• Best hypothesis: G2 (fewer and simpler rules)
However, as simplicity is a subjective measure we need formal criteria
to define it.
Formal criteria for simplicity
• Bayesian approach: need of large volume of experimental results
(statistics) to define prior probabilities.
• Algorithmic (Kolmogorov) complexity of an object (bit string): the
length of the shortest program of Universal Turing Machine, that
generates the string. Problems: computational complexity.
• Information-based approches: Minimum Description Length Prin-
ciple (MDL). Most often used in practice.
4
4.3 Minimum Description Length Principle (MDL)
• Bayes Theorem:
P(Hi|E) =
P(Hi)P(E|Hi)
Pn
i=1 P(Hi)P(E|Hi)
• Take a − log of both sides of Bayes (C is a constant):
− log2 P(Hi|E) = − log2 P(Hi) − log2 P(E|Hi) + C
• I(A) – information in message A, L(A) – min length of A in bits:
log2 P(A) = I(A) = L(A)
• Then: L(Hi|E) = L(Hi) + L(E|Hi) + C
• MDL: The hypothesis must reduce the information needed to en-
code the data, i.e.
L(E) > L(Hi) + L(E|Hi)
• The best hypothesis must maximize information compression:
Hbest = argmaxi (L(E) − L(Hi) − L(E|Hi))
4.4 Certainty
• Confidence of association ”if A then B”:
P(B|A) =
# of tuples containing both A and B
# of tupples containing A
5
• Classification accuracy: Use a training set to generate the hypoth-
esis, then test it on a separate test set.
Accuracy =
# of correct classifications
# of tuples in the test set
• Utility (support) of association ”if A then B”:
P(A, B) =
# of tupples containing both A and B
total # of tupples
5 Representing input data and output knowledge
5.1 Concepts (classes, categories, hypotheses): things to be mined/learned
• Classification mining/learning: predicting a discrete class, a kind
of supervised learning, success is measured on new data for which
class labels are known (test data).
• Association mining/learning: detecting associations between at-
tributes, can be used to predict any attribute value and more than
one attribute values, hence more rules can be generated, therefore
we need constraints (minimum support and minimum confidence).
• Clustering: grouping similar instances into clusters, a kind of unsu-
pervised learning, success is measured subjectively or by objective
functions.
• Numeric prediction: predicting a numeric quantity, a kind of su-
pervised learning, success is measured on test data.
• Concept description: output of the learning scheme
6
5.2 Instances (examples, tuples, transactions)
• Things to be classified, associated, or clustered.
• Individual, independent examples of the concept to be learned (tar-
get concept).
• Described by predetermined set of attributes.
• Input to the learning scheme: set of instances (dataset), represented
as a single relation (table).
• Independence assumption: no relationships between attributes.
• Positive and negative examples for a concept, Closed World As-
sumption (CWA): {negative} = {all}{positive}.
• Relational (First Order Logic) descriptions:
– Using variables (more compact representation). For example:
< a, b, b >, < a, c, c >, < b, a, a > can be represented as one
relational tuple < X, Y, Y >.
– Multiple relation concepts (FOIL, Inductive Logic Program-
ming, see Lecture Notes - Chapter 11). Example:
grandfather(X, Z) ← father(X, Y )∧(father(Y, Z)∨mother(Y, Z))
5.3 Attributes (features)
• Predefined set of features to describe an instance.
• Nominal (categorical, enumerated, discrete) attributes:
– Values are distinct symbols.
– No relation among nominal values.
7
– Only equality test can be performed.
– Special case: boolean attributes, transforming nominal to boolean.
• Structured:
– Partial order among nominal values
– Example: concept hierarchy
• Numeric:
– Continuous: full order (e.g. integer or real numbers).
– Interval: partial order.
5.4 Output knowledge representation
• Association rules
• Decision trees
• Classification rules
• Rules with relations
• Prediction schemes:
– Nearest neighbor
– Bayesian classification
– Neural networks
– Regression
• Clusters:
– Type of grouping: partitions/hierarchical
– Grouping or describing: agglomerative/conceptual
– Type of descriptions: statistical/structural
8
6 Visualization techniques: Why visualize data?
• Identifying problems:
– Histograms for nominal attributes: is the distribution consistent
with background knowledge?
– Graphs for numeric values: detecting outliers.
• Visualization show dependencies
• Consulting domain experts
• If data are too much, take a sample
9

More Related Content

PPTX
Ai 8 puzzle problem
PPTX
DOCX
Nonrecursive predictive parsing
PPTX
Association Analysis in Data Mining
PDF
Syntax Directed Definition and its applications
PDF
I.BEST FIRST SEARCH IN AI
PPTX
Lecture 02: Preliminaries of Data structure
PPTX
uninformed search part 1.pptx
Ai 8 puzzle problem
Nonrecursive predictive parsing
Association Analysis in Data Mining
Syntax Directed Definition and its applications
I.BEST FIRST SEARCH IN AI
Lecture 02: Preliminaries of Data structure
uninformed search part 1.pptx

What's hot (20)

PDF
I. Hill climbing algorithm II. Steepest hill climbing algorithm
PDF
Ai lecture 7(unit02)
PPTX
Priority Queue in Data Structure
PPT
Disjoint sets
PPTX
Top down parsing
PDF
Design and analysis of algorithms
PDF
Design principle of pattern recognition system and STATISTICAL PATTERN RECOGN...
PPTX
Hash Function
PDF
14-Intermediate code generation - Variants of Syntax trees - Three Address Co...
PPTX
Unit iv(simple code generator)
PPTX
Polish Notation In Data Structure
PPTX
Sum of subset problem.pptx
PPTX
Decision tree induction \ Decision Tree Algorithm with Example| Data science
PDF
Linear regression
PPTX
PPTX
Machine learning with ADA Boost
PPTX
Transposition cipher techniques
PDF
Python list
PPTX
Time space trade off
PPT
Slide3.ppt
I. Hill climbing algorithm II. Steepest hill climbing algorithm
Ai lecture 7(unit02)
Priority Queue in Data Structure
Disjoint sets
Top down parsing
Design and analysis of algorithms
Design principle of pattern recognition system and STATISTICAL PATTERN RECOGN...
Hash Function
14-Intermediate code generation - Variants of Syntax trees - Three Address Co...
Unit iv(simple code generator)
Polish Notation In Data Structure
Sum of subset problem.pptx
Decision tree induction \ Decision Tree Algorithm with Example| Data science
Linear regression
Machine learning with ADA Boost
Transposition cipher techniques
Python list
Time space trade off
Slide3.ppt
Ad

Similar to Data mining knowledge representation Notes (20)

PPTX
lazy learners and other classication methods
PPTX
Data mining primitives
PPT
Machine Learning
PPT
LECTURE8.PPT
PPT
Ch 9-2.Machine Learning: Symbol-based[new]
PDF
Buku panduan untuk Machine Learning.pdf
PDF
E017153342
PDF
Data mining Algorithm’s Variant Analysis
PDF
Data mining Algorithm’s Variant Analysis
PPTX
Managing Data: storage, decisions and classification
PPT
Machine Learning and Inductive Inference
PPTX
Clustering, Types of clustering, Types of data
PPTX
Clustering.pptx
PPTX
Clustering.pptx
DOC
Presentation on Machine Learning and Data Mining
PPTX
Introduction to Machine learning ppt
PPT
Data matching.ppt
PPT
Data mining-2
PDF
Introduction to Data Mining, KDD Process, OLTP and OLAP
lazy learners and other classication methods
Data mining primitives
Machine Learning
LECTURE8.PPT
Ch 9-2.Machine Learning: Symbol-based[new]
Buku panduan untuk Machine Learning.pdf
E017153342
Data mining Algorithm’s Variant Analysis
Data mining Algorithm’s Variant Analysis
Managing Data: storage, decisions and classification
Machine Learning and Inductive Inference
Clustering, Types of clustering, Types of data
Clustering.pptx
Clustering.pptx
Presentation on Machine Learning and Data Mining
Introduction to Machine learning ppt
Data matching.ppt
Data mining-2
Introduction to Data Mining, KDD Process, OLTP and OLAP
Ad

Recently uploaded (20)

PDF
Pre independence Education in Inndia.pdf
PDF
Basic Mud Logging Guide for educational purpose
PPTX
BOWEL ELIMINATION FACTORS AFFECTING AND TYPES
PDF
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
PDF
Module 4: Burden of Disease Tutorial Slides S2 2025
PDF
102 student loan defaulters named and shamed – Is someone you know on the list?
PDF
O7-L3 Supply Chain Operations - ICLT Program
PPTX
Renaissance Architecture: A Journey from Faith to Humanism
PDF
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
PPTX
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
PDF
Business Ethics Teaching Materials for college
PDF
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
PPTX
Cell Types and Its function , kingdom of life
PDF
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
PPTX
PPH.pptx obstetrics and gynecology in nursing
PPTX
Pharmacology of Heart Failure /Pharmacotherapy of CHF
PDF
STATICS OF THE RIGID BODIES Hibbelers.pdf
PDF
Microbial disease of the cardiovascular and lymphatic systems
PDF
Classroom Observation Tools for Teachers
PDF
Anesthesia in Laparoscopic Surgery in India
Pre independence Education in Inndia.pdf
Basic Mud Logging Guide for educational purpose
BOWEL ELIMINATION FACTORS AFFECTING AND TYPES
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
Module 4: Burden of Disease Tutorial Slides S2 2025
102 student loan defaulters named and shamed – Is someone you know on the list?
O7-L3 Supply Chain Operations - ICLT Program
Renaissance Architecture: A Journey from Faith to Humanism
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
Business Ethics Teaching Materials for college
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
Cell Types and Its function , kingdom of life
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
PPH.pptx obstetrics and gynecology in nursing
Pharmacology of Heart Failure /Pharmacotherapy of CHF
STATICS OF THE RIGID BODIES Hibbelers.pdf
Microbial disease of the cardiovascular and lymphatic systems
Classroom Observation Tools for Teachers
Anesthesia in Laparoscopic Surgery in India

Data mining knowledge representation Notes

  • 1. Data mining knowledge representation 1 What Defines a Data Mining Task? • Task relevant data: where and how to retrieve the data to be used for mining • Background knowledge: Concept hierarchies • Interestingness measures: informal and formal selection techniques to be applied to the output knowledge • Representing input data and output knowledge: the structures used to represent the input of the output of the data mining techniques • Visualization techniques: needed to best view and document the results of the whole process 2 Task relevant data • Database or data warehouse name: where to find the data • Database tables or data warehouse cubes • Condition for data selection, relevant attributes or dimensions and data grouping criteria: all this is used in the SQL query to retrieve the data 1
  • 2. 3 Background knowledge: Concept hierarchies The concept hierarchies are induced by a partial order1 over the values of a given attribute. Depending on the type of the ordering relation we distinguish several types of concept hierarchies. 3.1 Schema hierarchy • Relating concept generality. The ordering reflects the generality of the attribute values, e.g. street < city < state < country. 3.2 Set-grouping hierarchy • The ordering relation is the subset relation (⊆). Applies to set values. • Example: {13, ..., 39} = young; {13, ..., 19} = teenage; {13, ..., 19} ⊆ {13, ..., 39} ⇒ teenage < young. • Theory: – power set: the set of all subsets of a set, X. – lattice (2X , ⊆), sup(X, Y ) = X ∩ Y , inf(X, Y ) = X ∪ Y . X ∩ Y X Y X ∪ Y @ @ @ @ @ @ – top element > = {} (empty set), bottom element ⊥ = X. 1Consider a set A and an ordering relation R. R is a full order if for any x, y ∈ A, xRy exists. R is a partial order if for any x ∈ A, there exists y ∈ A, such that either xRy or yRx exists. 2
  • 3. 3.3 Operation-derived hierarchy Produced by applying an operation (encoding, decoding, information extraction). For example: markovz@cs.ccsu.edu instantiates the hierarcy user−name < department < university < usa−univeristy. 3.4 Rule-based hierarchy Using rules to define the partial order, for example: if antecedent then consequent defines the order antecedent < consequent. 4 Interestingness measures Criteria to evaluate hypotheses (knowledge extracted from data when applying data mining techniques). This issue will be discussed in more detail in Lecture notes - Chapter 9: ”Evaluating what’s been learned”. 4.1 Bayesian evaluation • E - data • H = {H1, H2, ..., Hn} - hypotheses • Hbest = argmaxi P(Hi|E) • Bayes theorem: P(Hi|E) = P(Hi)P(E|Hi) Pn i=1 P(Hi)P(E|Hi) 3
  • 4. 4.2 Simplicity Occam’s Razor Consider for example, association rule length, decision tree size, num- ber and length of classification rules. The intuition suggests that the best hypotesis is the simplest (shortest) one. This is the so called Oc- cam’s Razor Principle also expressed as a mathematical theorem (Oc- cam’s Razor Theorem). Here is an example of applying this principle to grammars: • Data: E = {0, 000, 00000, 0000000, 000000000} • Hypotheses: G1 : S → 0|000|00000|0000000|000000000 G2 : S → 00S|0 • Best hypothesis: G2 (fewer and simpler rules) However, as simplicity is a subjective measure we need formal criteria to define it. Formal criteria for simplicity • Bayesian approach: need of large volume of experimental results (statistics) to define prior probabilities. • Algorithmic (Kolmogorov) complexity of an object (bit string): the length of the shortest program of Universal Turing Machine, that generates the string. Problems: computational complexity. • Information-based approches: Minimum Description Length Prin- ciple (MDL). Most often used in practice. 4
  • 5. 4.3 Minimum Description Length Principle (MDL) • Bayes Theorem: P(Hi|E) = P(Hi)P(E|Hi) Pn i=1 P(Hi)P(E|Hi) • Take a − log of both sides of Bayes (C is a constant): − log2 P(Hi|E) = − log2 P(Hi) − log2 P(E|Hi) + C • I(A) – information in message A, L(A) – min length of A in bits: log2 P(A) = I(A) = L(A) • Then: L(Hi|E) = L(Hi) + L(E|Hi) + C • MDL: The hypothesis must reduce the information needed to en- code the data, i.e. L(E) > L(Hi) + L(E|Hi) • The best hypothesis must maximize information compression: Hbest = argmaxi (L(E) − L(Hi) − L(E|Hi)) 4.4 Certainty • Confidence of association ”if A then B”: P(B|A) = # of tuples containing both A and B # of tupples containing A 5
  • 6. • Classification accuracy: Use a training set to generate the hypoth- esis, then test it on a separate test set. Accuracy = # of correct classifications # of tuples in the test set • Utility (support) of association ”if A then B”: P(A, B) = # of tupples containing both A and B total # of tupples 5 Representing input data and output knowledge 5.1 Concepts (classes, categories, hypotheses): things to be mined/learned • Classification mining/learning: predicting a discrete class, a kind of supervised learning, success is measured on new data for which class labels are known (test data). • Association mining/learning: detecting associations between at- tributes, can be used to predict any attribute value and more than one attribute values, hence more rules can be generated, therefore we need constraints (minimum support and minimum confidence). • Clustering: grouping similar instances into clusters, a kind of unsu- pervised learning, success is measured subjectively or by objective functions. • Numeric prediction: predicting a numeric quantity, a kind of su- pervised learning, success is measured on test data. • Concept description: output of the learning scheme 6
  • 7. 5.2 Instances (examples, tuples, transactions) • Things to be classified, associated, or clustered. • Individual, independent examples of the concept to be learned (tar- get concept). • Described by predetermined set of attributes. • Input to the learning scheme: set of instances (dataset), represented as a single relation (table). • Independence assumption: no relationships between attributes. • Positive and negative examples for a concept, Closed World As- sumption (CWA): {negative} = {all}{positive}. • Relational (First Order Logic) descriptions: – Using variables (more compact representation). For example: < a, b, b >, < a, c, c >, < b, a, a > can be represented as one relational tuple < X, Y, Y >. – Multiple relation concepts (FOIL, Inductive Logic Program- ming, see Lecture Notes - Chapter 11). Example: grandfather(X, Z) ← father(X, Y )∧(father(Y, Z)∨mother(Y, Z)) 5.3 Attributes (features) • Predefined set of features to describe an instance. • Nominal (categorical, enumerated, discrete) attributes: – Values are distinct symbols. – No relation among nominal values. 7
  • 8. – Only equality test can be performed. – Special case: boolean attributes, transforming nominal to boolean. • Structured: – Partial order among nominal values – Example: concept hierarchy • Numeric: – Continuous: full order (e.g. integer or real numbers). – Interval: partial order. 5.4 Output knowledge representation • Association rules • Decision trees • Classification rules • Rules with relations • Prediction schemes: – Nearest neighbor – Bayesian classification – Neural networks – Regression • Clusters: – Type of grouping: partitions/hierarchical – Grouping or describing: agglomerative/conceptual – Type of descriptions: statistical/structural 8
  • 9. 6 Visualization techniques: Why visualize data? • Identifying problems: – Histograms for nominal attributes: is the distribution consistent with background knowledge? – Graphs for numeric values: detecting outliers. • Visualization show dependencies • Consulting domain experts • If data are too much, take a sample 9