Learning Set of Rules
Learning Set of Rules
Tom M. Mitchell. Machine Learning. [1] McGraw Hill. 1997. Chapter 10.
and his slides [2].
1 Learning Rules
We can learn sets of rules by using ID3 and then converting the tree to
rules.
We can also use a genetic algorithm that
encodes the rules as bit strings.
But, these only work with predicate rules (no
variables).
They also consider the set of rules as a whole, not one rule at a time.
2 Rules
First-order
predicate logic (calculus) [3] formalizes statements using
predicates (boolean
functions) and functions. Both can have
variables.
A rule set can look like
3 Sequential Covering
The idea in a sequential covering algorithm is
to learn one rule, remove the data it covers, then
repeat.
; e.g.
; e.g. PlayTennis(x)
https://jmvidal.cse.sc.edu/talks/learningrules/allslides.xml 1/11
8/23/22, 10:44 PM Learning Sets of Rules
Learned_rules = {}
return Learned_rules
3.1 Learn-One-Rule
Idea: organize the hypothesis space search in
general to
specific fashion.
Start with most general rule precondition, then greedily
add
the attribute that most improves performance measured over
the training examples.
candidate-hypotheses = {best-hypothesis}
while candidate-hypothesis
;Generate the next more specific
candidate-hypotheses
;Update best-hypothesis
best-hypothesis = argmax h
∈new-candidate-hypotheses
Performance(h,examples,target_attribute)
;Update candidate-hypotheses
https://jmvidal.cse.sc.edu/talks/learningrules/allslides.xml 2/11
8/23/22, 10:44 PM Learning Sets of Rules
3.2 CN2
https://jmvidal.cse.sc.edu/talks/learningrules/allslides.xml 3/11
8/23/22, 10:44 PM Learning Sets of Rules
∀ x, y : Ancestor(x,y) ← Parent(x,y)
Ancestor(x,y) :- Parent(x,y)
Female(Mary),
¬Female(x)
∀ x :
Female(x) ∨ Male(x)
H ← L1 ∧ L2 ∧ ... ∧
Ln
A ← B ⇒ A ∨ ¬
B
L1θ =
L2θ
If we give a bunch of these examples to CN2 or C4.5 they will output a set of rules like:
https://jmvidal.cse.sc.edu/talks/learningrules/allslides.xml 5/11
8/23/22, 10:44 PM Learning Sets of Rules
6.3 FOIL
FOIL(target-predicate, predicates, examples)
learnedRules = {}
while pos do
newRuleNeg = neg
while newRuleNeg do
bestLiteral = argmax l
∈candidateLiterals
Foil-gain(l, newRule)
return learnedRules
NewRule = GrandDaughter(x,y) ←
To specialize it, generate these candidate additions to the
preconditions:
NewRule = GrandDaughter(x,y) ←
Father(y,z).
Foil now considers all the literals from the previous step
as well as:
https://jmvidal.cse.sc.edu/talks/learningrules/allslides.xml 6/11
8/23/22, 10:44 PM Learning Sets of Rules
6.7 Foil-Gain
Ancestor(x,y) ← Parent(x,y)
Ancestor(x,y) ←
Parent(x,z) ∧ Ancestor(z,y)
So, it is possible.
https://jmvidal.cse.sc.edu/talks/learningrules/allslides.xml 7/11
8/23/22, 10:44 PM Learning Sets of Rules
f
(x i)
is the target function value for x i
B
is other background knowledge
Child(Bob,Sharon)
Parent(u,v) ← Father(u,v).
So we have:
x i
: Male(Bob), Female(Sharon),
Father(Sharon,Bob)
f
(x i)
: Child(Bob,Sharon)
B
:
Parent(u,v) ← Father(u,v)
What satisfies ∀ ⟨
x i
,f(x i)⟩∈D
B∧h∧x i
→f(x i
)
?
h 1
: Child(u,v) ← Father(v,u)
h 2
: Child(u,v) ← Parent(v,u)
Notice that h 1
does not require B
.
This process of augmenting the set of predicates, based on
background knowledge, is often
referred to as
constructive induction.
The relationship between these two has been known for a while.
(Jevons 1874)
https://jmvidal.cse.sc.edu/talks/learningrules/allslides.xml 8/11
8/23/22, 10:44 PM Learning Sets of Rules
Pros:
Cons:
P ∨ L
¬L ∨
R
--------
P∨R
More generally
1. Given initial clauses C1 and C2, find a literal L from
C1 such that ¬L is in C2.
2. Form the resolvent C by including all literals from C1 and C2 except for L and ¬L:
C2 = (C - (C1 - {L})) ∪ {¬ L}
so
C2 = A ∨ ¬D
C2 can also be A ∨ ¬D ∨ B.
In general, inverse resolution can produce multiple
clauses C2.
In general
1. Find a literal L1 from clause C1, L2 from clause C2, and
substitution θ such that L1 θ = ¬
L2
θ.
2. Form the resolvent C by including all literals from
C1θ and C2θ, except for L1θ and
¬L2θ.
More precisely, the set of literals occurring in the conclusion C is
So C = White(Fred).
C - (C1 -
{L1})θ1 = (C2 - {L2})θ2
then by definition we
have that L2 = ¬L1 θ1 θ2-1 so
C2 = (C - (C1 - {L1}θ1)θ2-1 ∪
{¬L1 θ1 θ2-1}
In applying this we will find multiple choices for L1, θ1, and θ2.
D = GrandChild(Bob, Shanon)
B = {Father(Shanon,Tom), Father(Tom,Bob)}.
C = GrandChild(Bob,Shanon)
C1 = Father(Shanon,
Tom)
L1 = Father(Shanon, Tom)
θ1-1
= {}
θ2-1 = {Shanon/x}
GrandChild(Bob,x) ∨
¬Father(x,Tom)
This inferred clause may now be used as the conclusion C
for a second inverse resolution.
9 PROGOL
The idea in the PROGOL system is to reduce the
combinatorial explosion by generating the most
specific
acceptable h
.
User specifies H
by stating predicates, functions, and
forms of arguments allowed for each.
PROGOL uses sequential covering algorithm. For each
⟨
x i,f(x i)⟩
it finds the most specific
hypothesis h i
s.t. B
∧h i∧x i
→f(x i
)
. (actually, it considers only k
-step entailment).
It then conducts a general-to-specific search bounded by
specific hypothesis h i
, choosing
hypotheses with minimum
description length.
URLs
1. Machine Learning book at Amazon,
http://www.amazon.com/exec/obidos/ASIN/0070428077/multiagentcom/
2. Slides by Tom Mitchell on Machine Learning, http://www-2.cs.cmu.edu/~tom/mlbook-
chapter-slides.html
3. wikipedia:First-order_logic, http://www.wikipedia.org/wiki/First-order_logic
4. wikipedia:Prolog, http://www.wikipedia.org/wiki/Prolog
5. CN2 Homepage, http://www.cs.utexas.edu/users/pclark/software.html#cn2
6. wikipedia:Soundness, http://www.wikipedia.org/wiki/Soundness
7. wikipedia:Completeness, http://www.wikipedia.org/wiki/Completeness
https://jmvidal.cse.sc.edu/talks/learningrules/allslides.xml 11/11