SlideShare a Scribd company logo
2
Most read
8
Most read
13
Most read
Learning Sets of
Rules
SEQUENTIAL COVERING ALGORITHMS
LEARNING RULE SETS
LEARNING FIRST ORDER RULES
LEARNING SETS OF FIRST ORDER RULES
4/24/2020
1
Learning Rules
 One of the most expressive and human readable
representations for learned hypotheses(Target
Function) is sets of production rules (if-then rules).
 Rules can be derived from other representations
 Decision trees
 Genetic Algorithms
 or they can be learned directly (direct method).
 An important aspect of direct rule-learning algorithms
is that they can learn sets of first-order rules which
have much more representational power than the
propositional rules that can be derived from decision
trees.
4/24/2020
2
Propositional versus First-
Order Logic
 Propositional Logic does not include variables and thus
cannot express general relations among the values of the
attributes.
 Example 1: in Propositional logic, you can write:
 IF (Father1=Bob) ^ (Name2=Bob)^ (Female1=True) THEN
Daughter1,2=True.
This rule applies only to a specific family!
 Example 2: In First-Order logic, you can write:
 IF Father(y,x) ^ Female(y), THEN Daughter(x,y)
This rule (which you cannot write in Propositional Logic) applies to
any family!
4/24/2020
3
Sequential Covering Algorithms:
 In SCA, “Family of Algorithms used to learn the rule sets
based on the strategy of learning one rule by removing
the data it covers and iterating the process “
 Example: (explains why it is called sequential covering)
 Assume a subroutine “Learn one rule”
 Input : Set of positive and Negative examples
 Expected output : Rule that covers many +ve and few –ve
examples
 Requirement : Output rule with high accuracy and low
coverage
 Approach :
 Learn a rule and remove a positive example covered by rule
 Invoke subroutines again to learn second rule for another
positive example
 Iterate it as many times as required to learn rules as “disjunctive
set of rules”
4/24/2020
4
Learning Propositional Rules:
Sequential Covering Algorithms
Sequential-Covering(Target_attribute, Attributes,
Examples, Threshold)
 Learned_rules <-- { }
 Rule <-- Learn-one-rule(Target_attribute, Attributes,
Examples)
 While Performance(Rule, Examples) > Threshold, do
 Learned_rules <-- Learned_rules + Rule
 Examples <-- Examples -{examples correctly classified
by Rule}
 Rule <-- Learn-one-rule(Target_attribute, Attributes,
Examples)
 Learned_rules <-- sort Learned_rules according to
Performance over Examples
 Return Learned_rules
4/24/2020
5
Learning Propositional Rules:
Sequential Covering Algorithms
 The algorithm is called a sequential covering algorithm because it
sequentially learns a set of rules that together cover the whole set of
positive examples.
 It has the advantage of reducing the problem of learning a
disjunctive set of rules to a sequence of simpler problems, each
requiring that a single conjunctive rule be learned.
 The final set of rules is sorted so that the most accurate rules are
considered first at classification time.
 However, because it does not backtrack, this algorithm is not
guaranteed to find the smallest or best set of rules ---> Learn-one-
rule must be very effective!
4/24/2020
6
Learning Propositional Rules: Learn-one-rule
General-to-Specific Search:
 Consider the most general rule (hypothesis) which matches
every instances in the training set.
 Repeat
 Add the attribute that most improves rule performance measured over the training set.
 Until the hypothesis reaches an acceptable level of
performance.
 The difference between this algorithm and ID3 is that it
follows a single descendant at each step
 It has no backtracking and hence chances of making
suboptimal choice.
 Solution is making use of beam search rather than single
best candidate
4/24/2020
7
General-to-Specific Beam Search
(CN2):
 Rather than considering a single candidate at each
search step, keep track of the k best candidates.
 It keep track of most promising alternative to the
current top rated hypotheses
 Learn-one-rule (Target attribute, Attributes, Examples,
k)
 It returns a single rule that covers some of the examples
 Conducts general to specific greedy beam search for the
best rule guided by performance metric
8
Implementation of Learn one rule using
generic to specific beam search
 Learn-one-rule (Target attribute, Attributes, Examples, k)
1. Best-Hypothesis  most general hypothesis {ø}
2. Candidate-Hypotheses  {Set of best hypothesis}
3. While Candidate-Hypothesis is not empty, Do
a. Generate the next most specific Candidate Hypothesis
 All-Constraints  set of all constraints of the form (a=v)
 New Candidate-Hypothesis  Generated for each h in Candidate-
Hypothesis and for each c in all constraints
 Remove from New Candidate-Hypotheses duplicate, inconsistent or
not maximally specific hypothesis
b. Update Best Hypothesis
 For all h in New Candidate-Hypotheses do
 IF (Performance(h, Examples, Target-attribute) > Performance(Best-
Hypotheses, Examples, Target-attribute))
Then Best Hypothesis  h
9
10
c. Update Candidate Hypotheses
 Candidate –Hypotheses  the k best members of New
Candidate-Hypotheses, according to the performance measure
 Return a rule of the form
“IF Best-Hypothesis THEN prediction”
Where prediction is the most frequent value of Target
attribute among those examples that match Best-Hypothesis
Performance (h, Examples, Target-attribute)
h-examples  the subset of examples that match h
return –Entropy (h-examples), where entropy is w.r.t.
Target attribute
Variation:
1. Learning only the rules covered by positive examples
(negation as failure)
2. Using single positive example and covering attributes that
cover the example
Comments and Variations regarding the
Basic Rule Learning Algorithms
 Sequential versus Simultaneous covering: sequential covering
algorithms (CN2) make a larger number of independent choices than
simultaneous covering ones (ID3). Choice depends on the number of
dataset
 Direction of the search: CN2 uses a general-to-specific search
strategy. Other systems (GOLEM) uses a specific to general search
strategy. General-to-specific search has the advantage of having a
single hypothesis from which to start.
 Generate-then-test versus example-driven: CN2(Clark and Niblett-
Learn one rule) is a generate-then-test method(impact of noise is
minimized).
 Other methods (Find S, Cand Elimi.) are example-driven. Generate-then-
test systems are more robust to noise(easily misled by single noisy data).
4/24/2020
11
Comments and Variations regarding the
Basic Rule Learning Algorithms, Cont’d
 Post-Pruning: pre-conditions can be removed from the rule
whenever this leads to improved performance over a set of
pruning examples distinct from the training set.
 Performance measure: different evaluation can be used.
 Example: relative frequency = nc/n
 m-estimate of accuracy = (nc+mp)/(n+m) (certain versions
of CN2)
 Entropy (original CN2).
4/24/2020
12
Learning Sets of First-Order
Rules: FOIL (Quinlan, 1990)
FOIL is similar to the Propositional Rule
learning approach except for the following:
 FOIL accommodates first-order rules and thus needs to
accommodate variables in the rule pre-conditions.
 FOIL uses a special performance measure (FOIL-GAIN) which takes
into account the different variable bindings.
 FOILS seeks only rules that predict when the target literal is True
(instead of predicting when it is True or when it is False).
 FOIL performs a simple hillclimbing search rather than a beam
search.
4/24/2020
13
Induction as Inverted Deduction
 Let D be a set of training examples, each of the form <xi,f(xi)>.
Then, learning is the problem of discovering a hypothesis h, such
that the classification f(xi) of each training instance xi follows
deductively from the hypothesis h, the description of xi and any
other background knowledge B known to the system.
Example:
 xi: Male(Bob), Female(Sharon), Father(Sharon, Bob)
 f(xi): Child(Bob, Sharon)
 B: Parent(u,v) <-- Father(v,u)
 we want to find h such that (B^h^xi) |-- f(xi).
h1: Child(u,v) <-- Father(v,u)
h2: Child(u,v) <-- Parents(v,u)
14
 Name1=Sharon, Mother1 = Louise, Father1=Bob,
Male1= False, Female1= True,
 Name2 = Bob, father2 = Victor, Mother2=Nora, Male2
= True, Female2 = False,
Daughter1,2 = True
 IF((Father1= Bob)^ (Name2 = Bob)^(Female1=True))
Then Daughter1,2 = True
 If Father(y, x) ^ Female(y), Then Daughter (x, y)
 If father(y, z) ^ Mother (z, x) ^ Female (y)
then __?__ (x, y)
Granddaughter
15
Basic Definitions from First Order Logic
 Constants (capitalized Symbols)
 Ex : Bob, Sharon
 Variables (Lower case symbols)
 Ex : x , y, z
 Predicate symbols (Capitalized Symbols)
 Female, Male, Married
 Function Symbols(Lower case symbols)
 age
 Term : Any Constant, Variable or “function applied to any term”
 Example : Mary, x, Age(Mary), Age(x)
 Literal : Any Predicate or its negation applied to any term
 Example : Married(Bob)  positive literal
Greater_Than (age (Sharon),20)  negative literal
Ground Literal : Literal without any variable
 Clause : Disjunction of literals
16
Basic Definitions from First Order Logic
 Horn Clause : Clause with at most one positive literal
 H L1 L2 . . . . . . Ln
 H  (L1л L2 л L3 л . . . . . Ln)  IF (L1л L2 л L3 л . . . . . Ln)
THEN H
 In a Horn Clause :
 H Head or Consequent of Horn clause.
 (L1л L2 л L3 л . . . . . Ln) Body or antecedents of Horn Clause
 Well formed expression composed of :
1. Constants 3. Variables
2. Functions 4. Predicates
 Example : Female(Joe)
 Substitution : A function that replaces variables by terms
 Example: {x/3, y/z} replaces x by 3 and y by z
 In general, Substitution is θ and literal is L (Lθ ) represents
substitution
 A unifying Substitution : For any two literals L1 and L2,
substitution θ is such that L1θ =L2θ
17
Learning Sets of First Order Rules (FOIL)
FOIL (Target_Predicate, Predicates, Examples)
 Pos  examples with Target_Predicate True
 Neg  examples with Target_Predicate False
 Learned Rules  { }
 While Pos, do ; Outer loop
Learn a new rule
 NewRule  the rule that predicts Target_Predicate with
no preconditions
 NewRuleNeg  Neg
 While NewRuleNeg, do ; inner loop
Add a new literal to specialize NewRule
 Candidate_literals  generate candidate new literals for
NewRule, based on Predicates
 Best_Literal  argmax Foil_Gain (L, NewRule)
L Є Candidate_literals
18
 Add Best_literal to preconditions of NewRule
 NewRuleNeg  subset of NewRuleNeg that satisfies
NewRule preconditions
 Leraned_rules  Learned_rules + NewRule
 Pos Pos – {Members of Pos covered by NewRule}
 Return Learned_rules
Comments :
 Each iteration of outer loop adds new rule to
Learned_rules by performimg
 specific to general hypothesis
 Inner loop performs finer grained search to determine
the exact definition of each rule by performing general
to specific hill climbing search
19

More Related Content

PPTX
Learning rule of first order rules
PPTX
Learning set of rules
PPTX
Inductive analytical approaches to learning
PPTX
Advanced topics in artificial neural networks
PPTX
Instance based learning
PDF
Bayesian Networks - A Brief Introduction
PDF
Artificial Neural Networks Lect3: Neural Network Learning rules
PPTX
Inductive bias
Learning rule of first order rules
Learning set of rules
Inductive analytical approaches to learning
Advanced topics in artificial neural networks
Instance based learning
Bayesian Networks - A Brief Introduction
Artificial Neural Networks Lect3: Neural Network Learning rules
Inductive bias

What's hot (20)

PPT
Rule Based System
PPTX
MACHINE LEARNING - GENETIC ALGORITHM
PPTX
Naïve Bayes Classifier Algorithm.pptx
PPTX
Evaluating hypothesis
PDF
Bayesian Learning- part of machine learning
PPTX
Concept learning and candidate elimination algorithm
PPTX
Naive bayes
PPTX
Machine learning with ADA Boost
PPTX
Bayesian Belief Network and its Applications.pptx
PPTX
Multilayer & Back propagation algorithm
PPT
Heuristic Search Techniques {Artificial Intelligence}
PPTX
Combining inductive and analytical learning
PPTX
ML_ Unit 2_Part_B
PDF
Performance Metrics for Machine Learning Algorithms
PPTX
Support vector machines (svm)
PPTX
Deep neural networks
PDF
Bayesian learning
PPTX
Reinforcement Learning
PPT
Planning
PPT
Knowledge Representation & Reasoning
Rule Based System
MACHINE LEARNING - GENETIC ALGORITHM
Naïve Bayes Classifier Algorithm.pptx
Evaluating hypothesis
Bayesian Learning- part of machine learning
Concept learning and candidate elimination algorithm
Naive bayes
Machine learning with ADA Boost
Bayesian Belief Network and its Applications.pptx
Multilayer & Back propagation algorithm
Heuristic Search Techniques {Artificial Intelligence}
Combining inductive and analytical learning
ML_ Unit 2_Part_B
Performance Metrics for Machine Learning Algorithms
Support vector machines (svm)
Deep neural networks
Bayesian learning
Reinforcement Learning
Planning
Knowledge Representation & Reasoning
Ad

Similar to Learning sets of rules, Sequential Learning Algorithm,FOIL (20)

PPT
New_ML_Lecture_9.ppt
PPTX
Poggi analytics - ebl - 1
PPTX
MACHINE LEARNING-LEARNING RULE
PPTX
General-to specific ordering of hypotheses Learning algorithms which use this...
PPTX
Concept Learning presentation with benefits and uses
PDF
Data mining knowledge representation Notes
PPT
Learning Agents by Prof G. Tecuci
PPT
Learning Agents by Prof G. Tecuci
PPTX
Concept Learning with types and benefits
PPT
ppt
PPT
ppt
PDF
10 logic+programming+with+prolog
PDF
AI Lesson 17
PPTX
Poggi analytics - geneticos - 1
PPT
S10
PPT
S10
PPT
Machine Learning: Decision Trees Chapter 18.1-18.3
PPT
Introduction to Machine Learning.
PPT
Poggi analytics - star - 1a
New_ML_Lecture_9.ppt
Poggi analytics - ebl - 1
MACHINE LEARNING-LEARNING RULE
General-to specific ordering of hypotheses Learning algorithms which use this...
Concept Learning presentation with benefits and uses
Data mining knowledge representation Notes
Learning Agents by Prof G. Tecuci
Learning Agents by Prof G. Tecuci
Concept Learning with types and benefits
ppt
ppt
10 logic+programming+with+prolog
AI Lesson 17
Poggi analytics - geneticos - 1
S10
S10
Machine Learning: Decision Trees Chapter 18.1-18.3
Introduction to Machine Learning.
Poggi analytics - star - 1a
Ad

Recently uploaded (20)

PPTX
Pharma ospi slides which help in ospi learning
PPTX
Pharmacology of Heart Failure /Pharmacotherapy of CHF
PDF
Microbial disease of the cardiovascular and lymphatic systems
PDF
O7-L3 Supply Chain Operations - ICLT Program
PDF
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
PPTX
PPH.pptx obstetrics and gynecology in nursing
PDF
Computing-Curriculum for Schools in Ghana
PDF
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
PDF
Insiders guide to clinical Medicine.pdf
PDF
O5-L3 Freight Transport Ops (International) V1.pdf
PPTX
BOWEL ELIMINATION FACTORS AFFECTING AND TYPES
PPTX
Institutional Correction lecture only . . .
PPTX
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
PPTX
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
PDF
Complications of Minimal Access Surgery at WLH
PDF
Module 4: Burden of Disease Tutorial Slides S2 2025
PPTX
Cell Types and Its function , kingdom of life
PDF
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
PPTX
Renaissance Architecture: A Journey from Faith to Humanism
PDF
Classroom Observation Tools for Teachers
Pharma ospi slides which help in ospi learning
Pharmacology of Heart Failure /Pharmacotherapy of CHF
Microbial disease of the cardiovascular and lymphatic systems
O7-L3 Supply Chain Operations - ICLT Program
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
PPH.pptx obstetrics and gynecology in nursing
Computing-Curriculum for Schools in Ghana
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
Insiders guide to clinical Medicine.pdf
O5-L3 Freight Transport Ops (International) V1.pdf
BOWEL ELIMINATION FACTORS AFFECTING AND TYPES
Institutional Correction lecture only . . .
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
Complications of Minimal Access Surgery at WLH
Module 4: Burden of Disease Tutorial Slides S2 2025
Cell Types and Its function , kingdom of life
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
Renaissance Architecture: A Journey from Faith to Humanism
Classroom Observation Tools for Teachers

Learning sets of rules, Sequential Learning Algorithm,FOIL

  • 1. Learning Sets of Rules SEQUENTIAL COVERING ALGORITHMS LEARNING RULE SETS LEARNING FIRST ORDER RULES LEARNING SETS OF FIRST ORDER RULES 4/24/2020 1
  • 2. Learning Rules  One of the most expressive and human readable representations for learned hypotheses(Target Function) is sets of production rules (if-then rules).  Rules can be derived from other representations  Decision trees  Genetic Algorithms  or they can be learned directly (direct method).  An important aspect of direct rule-learning algorithms is that they can learn sets of first-order rules which have much more representational power than the propositional rules that can be derived from decision trees. 4/24/2020 2
  • 3. Propositional versus First- Order Logic  Propositional Logic does not include variables and thus cannot express general relations among the values of the attributes.  Example 1: in Propositional logic, you can write:  IF (Father1=Bob) ^ (Name2=Bob)^ (Female1=True) THEN Daughter1,2=True. This rule applies only to a specific family!  Example 2: In First-Order logic, you can write:  IF Father(y,x) ^ Female(y), THEN Daughter(x,y) This rule (which you cannot write in Propositional Logic) applies to any family! 4/24/2020 3
  • 4. Sequential Covering Algorithms:  In SCA, “Family of Algorithms used to learn the rule sets based on the strategy of learning one rule by removing the data it covers and iterating the process “  Example: (explains why it is called sequential covering)  Assume a subroutine “Learn one rule”  Input : Set of positive and Negative examples  Expected output : Rule that covers many +ve and few –ve examples  Requirement : Output rule with high accuracy and low coverage  Approach :  Learn a rule and remove a positive example covered by rule  Invoke subroutines again to learn second rule for another positive example  Iterate it as many times as required to learn rules as “disjunctive set of rules” 4/24/2020 4
  • 5. Learning Propositional Rules: Sequential Covering Algorithms Sequential-Covering(Target_attribute, Attributes, Examples, Threshold)  Learned_rules <-- { }  Rule <-- Learn-one-rule(Target_attribute, Attributes, Examples)  While Performance(Rule, Examples) > Threshold, do  Learned_rules <-- Learned_rules + Rule  Examples <-- Examples -{examples correctly classified by Rule}  Rule <-- Learn-one-rule(Target_attribute, Attributes, Examples)  Learned_rules <-- sort Learned_rules according to Performance over Examples  Return Learned_rules 4/24/2020 5
  • 6. Learning Propositional Rules: Sequential Covering Algorithms  The algorithm is called a sequential covering algorithm because it sequentially learns a set of rules that together cover the whole set of positive examples.  It has the advantage of reducing the problem of learning a disjunctive set of rules to a sequence of simpler problems, each requiring that a single conjunctive rule be learned.  The final set of rules is sorted so that the most accurate rules are considered first at classification time.  However, because it does not backtrack, this algorithm is not guaranteed to find the smallest or best set of rules ---> Learn-one- rule must be very effective! 4/24/2020 6
  • 7. Learning Propositional Rules: Learn-one-rule General-to-Specific Search:  Consider the most general rule (hypothesis) which matches every instances in the training set.  Repeat  Add the attribute that most improves rule performance measured over the training set.  Until the hypothesis reaches an acceptable level of performance.  The difference between this algorithm and ID3 is that it follows a single descendant at each step  It has no backtracking and hence chances of making suboptimal choice.  Solution is making use of beam search rather than single best candidate 4/24/2020 7
  • 8. General-to-Specific Beam Search (CN2):  Rather than considering a single candidate at each search step, keep track of the k best candidates.  It keep track of most promising alternative to the current top rated hypotheses  Learn-one-rule (Target attribute, Attributes, Examples, k)  It returns a single rule that covers some of the examples  Conducts general to specific greedy beam search for the best rule guided by performance metric 8
  • 9. Implementation of Learn one rule using generic to specific beam search  Learn-one-rule (Target attribute, Attributes, Examples, k) 1. Best-Hypothesis  most general hypothesis {ø} 2. Candidate-Hypotheses  {Set of best hypothesis} 3. While Candidate-Hypothesis is not empty, Do a. Generate the next most specific Candidate Hypothesis  All-Constraints  set of all constraints of the form (a=v)  New Candidate-Hypothesis  Generated for each h in Candidate- Hypothesis and for each c in all constraints  Remove from New Candidate-Hypotheses duplicate, inconsistent or not maximally specific hypothesis b. Update Best Hypothesis  For all h in New Candidate-Hypotheses do  IF (Performance(h, Examples, Target-attribute) > Performance(Best- Hypotheses, Examples, Target-attribute)) Then Best Hypothesis  h 9
  • 10. 10 c. Update Candidate Hypotheses  Candidate –Hypotheses  the k best members of New Candidate-Hypotheses, according to the performance measure  Return a rule of the form “IF Best-Hypothesis THEN prediction” Where prediction is the most frequent value of Target attribute among those examples that match Best-Hypothesis Performance (h, Examples, Target-attribute) h-examples  the subset of examples that match h return –Entropy (h-examples), where entropy is w.r.t. Target attribute Variation: 1. Learning only the rules covered by positive examples (negation as failure) 2. Using single positive example and covering attributes that cover the example
  • 11. Comments and Variations regarding the Basic Rule Learning Algorithms  Sequential versus Simultaneous covering: sequential covering algorithms (CN2) make a larger number of independent choices than simultaneous covering ones (ID3). Choice depends on the number of dataset  Direction of the search: CN2 uses a general-to-specific search strategy. Other systems (GOLEM) uses a specific to general search strategy. General-to-specific search has the advantage of having a single hypothesis from which to start.  Generate-then-test versus example-driven: CN2(Clark and Niblett- Learn one rule) is a generate-then-test method(impact of noise is minimized).  Other methods (Find S, Cand Elimi.) are example-driven. Generate-then- test systems are more robust to noise(easily misled by single noisy data). 4/24/2020 11
  • 12. Comments and Variations regarding the Basic Rule Learning Algorithms, Cont’d  Post-Pruning: pre-conditions can be removed from the rule whenever this leads to improved performance over a set of pruning examples distinct from the training set.  Performance measure: different evaluation can be used.  Example: relative frequency = nc/n  m-estimate of accuracy = (nc+mp)/(n+m) (certain versions of CN2)  Entropy (original CN2). 4/24/2020 12
  • 13. Learning Sets of First-Order Rules: FOIL (Quinlan, 1990) FOIL is similar to the Propositional Rule learning approach except for the following:  FOIL accommodates first-order rules and thus needs to accommodate variables in the rule pre-conditions.  FOIL uses a special performance measure (FOIL-GAIN) which takes into account the different variable bindings.  FOILS seeks only rules that predict when the target literal is True (instead of predicting when it is True or when it is False).  FOIL performs a simple hillclimbing search rather than a beam search. 4/24/2020 13
  • 14. Induction as Inverted Deduction  Let D be a set of training examples, each of the form <xi,f(xi)>. Then, learning is the problem of discovering a hypothesis h, such that the classification f(xi) of each training instance xi follows deductively from the hypothesis h, the description of xi and any other background knowledge B known to the system. Example:  xi: Male(Bob), Female(Sharon), Father(Sharon, Bob)  f(xi): Child(Bob, Sharon)  B: Parent(u,v) <-- Father(v,u)  we want to find h such that (B^h^xi) |-- f(xi). h1: Child(u,v) <-- Father(v,u) h2: Child(u,v) <-- Parents(v,u) 14
  • 15.  Name1=Sharon, Mother1 = Louise, Father1=Bob, Male1= False, Female1= True,  Name2 = Bob, father2 = Victor, Mother2=Nora, Male2 = True, Female2 = False, Daughter1,2 = True  IF((Father1= Bob)^ (Name2 = Bob)^(Female1=True)) Then Daughter1,2 = True  If Father(y, x) ^ Female(y), Then Daughter (x, y)  If father(y, z) ^ Mother (z, x) ^ Female (y) then __?__ (x, y) Granddaughter 15
  • 16. Basic Definitions from First Order Logic  Constants (capitalized Symbols)  Ex : Bob, Sharon  Variables (Lower case symbols)  Ex : x , y, z  Predicate symbols (Capitalized Symbols)  Female, Male, Married  Function Symbols(Lower case symbols)  age  Term : Any Constant, Variable or “function applied to any term”  Example : Mary, x, Age(Mary), Age(x)  Literal : Any Predicate or its negation applied to any term  Example : Married(Bob)  positive literal Greater_Than (age (Sharon),20)  negative literal Ground Literal : Literal without any variable  Clause : Disjunction of literals 16
  • 17. Basic Definitions from First Order Logic  Horn Clause : Clause with at most one positive literal  H L1 L2 . . . . . . Ln  H  (L1л L2 л L3 л . . . . . Ln)  IF (L1л L2 л L3 л . . . . . Ln) THEN H  In a Horn Clause :  H Head or Consequent of Horn clause.  (L1л L2 л L3 л . . . . . Ln) Body or antecedents of Horn Clause  Well formed expression composed of : 1. Constants 3. Variables 2. Functions 4. Predicates  Example : Female(Joe)  Substitution : A function that replaces variables by terms  Example: {x/3, y/z} replaces x by 3 and y by z  In general, Substitution is θ and literal is L (Lθ ) represents substitution  A unifying Substitution : For any two literals L1 and L2, substitution θ is such that L1θ =L2θ 17
  • 18. Learning Sets of First Order Rules (FOIL) FOIL (Target_Predicate, Predicates, Examples)  Pos  examples with Target_Predicate True  Neg  examples with Target_Predicate False  Learned Rules  { }  While Pos, do ; Outer loop Learn a new rule  NewRule  the rule that predicts Target_Predicate with no preconditions  NewRuleNeg  Neg  While NewRuleNeg, do ; inner loop Add a new literal to specialize NewRule  Candidate_literals  generate candidate new literals for NewRule, based on Predicates  Best_Literal  argmax Foil_Gain (L, NewRule) L Є Candidate_literals 18
  • 19.  Add Best_literal to preconditions of NewRule  NewRuleNeg  subset of NewRuleNeg that satisfies NewRule preconditions  Leraned_rules  Learned_rules + NewRule  Pos Pos – {Members of Pos covered by NewRule}  Return Learned_rules Comments :  Each iteration of outer loop adds new rule to Learned_rules by performimg  specific to general hypothesis  Inner loop performs finer grained search to determine the exact definition of each rule by performing general to specific hill climbing search 19