Logic: A Primer
Lecture Notes for Linguistics
Version of October 24, 2014
Erich H. Rast
Universidade Nova de Lisboa
Contents
Contents
1
2
i
Sets, Relations, Functions
1.1 Sets . . . . . . . . . . . . . . . . . . . . .
1.1.1 Defining sets . . . . . . . . . . .
1.1.2 Operations on Sets . . . . . . . .
1.1.3 Basic Properties of Sets . . . . .
1.1.4 Venn Diagrams . . . . . . . . . .
1.1.5 Exercises . . . . . . . . . . . . . .
1.2 Relations . . . . . . . . . . . . . . . . . .
1.2.1 Ordered Tuples . . . . . . . . . .
1.2.2 Characterization of Relations .
1.2.3 Other Important Notions . . . .
1.2.4 Properties of Relations . . . . .
1.2.5 Exercises . . . . . . . . . . . . . .
1.3 Functions . . . . . . . . . . . . . . . . . .
1.3.1 Characterization of a Function .
1.3.2 Properties of Functions . . . . .
1.3.3 Further Notions . . . . . . . . .
1.3.4 Exercises . . . . . . . . . . . . . .
1.4 Literature . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
1
1
1
5
9
9
9
13
13
14
15
16
19
20
20
23
24
26
28
Propositional Logic
2.1 Syntax of Propositional Logic .
2.1.1 Basic Expressions . . .
2.1.2 Well-Formed Formulas
2.1.3 Exercises . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
31
31
31
33
35
i
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
ii
CONTENTS
2.2
2.3
2.4
2.5
2.6
2.7
3
Semantics of Propositional Logic . . . . . . . . . . . . .
2.2.1 Models and Truth in a Model . . . . . . . . . . .
2.2.2 Truth Tables . . . . . . . . . . . . . . . . . . . .
2.2.3 Important Notions . . . . . . . . . . . . . . . . .
Proof Theory . . . . . . . . . . . . . . . . . . . . . . . . .
2.3.1 Tableaux Rules for Propositional Logic . . . . .
2.3.2 How to Use Tableaux . . . . . . . . . . . . . . .
2.3.3 Alternative Notation . . . . . . . . . . . . . . . .
2.3.4 Selected Theorems . . . . . . . . . . . . . . . . .
2.3.5 Exercises . . . . . . . . . . . . . . . . . . . . . . .
Deductive Arguments . . . . . . . . . . . . . . . . . . . .
2.4.1 Valid Argument Schemes . . . . . . . . . . . . .
2.4.2 Sound Arguments, Fallacies, Good Arguments
2.4.3 Exercises . . . . . . . . . . . . . . . . . . . . . . .
Metatheorems . . . . . . . . . . . . . . . . . . . . . . . .
Concluding Remarks . . . . . . . . . . . . . . . . . . . .
Literature . . . . . . . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
First-Order Logic
3.1 Syntax of First-order Predicate Logic with Identity . . . . . . . . .
3.1.1 Basic Expressions. . . . . . . . . . . . . . . . . . . . . . . . .
3.1.2 Well-formed Formula. . . . . . . . . . . . . . . . . . . . . . .
3.1.3 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2 Semantics of First-Order Logic with Identity . . . . . . . . . . . . .
3.2.1 Variable Assignments and Variants . . . . . . . . . . . . . .
3.2.2 Models and Truth in a Model . . . . . . . . . . . . . . . . . .
3.2.3 Explanation of the Rules for Predication and Quantification
3.2.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.3 Proof Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.3.1 Tableaux Rules for First-Order Predicate Logic . . . . . . .
3.3.2 Rules for Identity . . . . . . . . . . . . . . . . . . . . . . . . .
3.3.3 Using the Tableaux Rules . . . . . . . . . . . . . . . . . . . .
3.3.4 Selected Theorems . . . . . . . . . . . . . . . . . . . . . . . .
3.3.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.4 Defined Notions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.4.1 Russellian Descriptions . . . . . . . . . . . . . . . . . . . . .
3.4.2 Relativized Quantifiers . . . . . . . . . . . . . . . . . . . . .
3.4.3 Many-sorted Logic . . . . . . . . . . . . . . . . . . . . . . . .
3.4.4 The Existence Predicate . . . . . . . . . . . . . . . . . . . . .
3.5 Applications to Natural Languages . . . . . . . . . . . . . . . . . .
3.5.1 Truth-Conditions and Pre-Montegovian Semantics . . . .
3.5.2 Some Problems . . . . . . . . . . . . . . . . . . . . . . . . . .
3.5.3 Deductive Arguments . . . . . . . . . . . . . . . . . . . . . .
3.5.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
36
37
38
41
45
47
48
52
52
55
55
55
56
60
62
62
64
65
65
65
66
70
70
70
71
73
74
74
74
77
78
79
81
82
82
83
84
84
85
85
88
91
92
iii
CONTENTS
3.6
3.7
4
Metatheorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
Literature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
Higher-Order Logic
4.1 Syntax of Simple Type Theory . . . . . . . . . . . . . . . . .
4.1.1 Types . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.1.2 Terms . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.2 Semantics of Higher-Order Logic . . . . . . . . . . . . . . . .
4.2.1 General Models and Truth in a Model . . . . . . . .
4.2.2 Interdefinability of Quantifiers and Identity . . . .
4.2.3 More Definitions . . . . . . . . . . . . . . . . . . . . .
4.3 Typed λ-Calculus . . . . . . . . . . . . . . . . . . . . . . . . .
4.3.1 Conversion Rules . . . . . . . . . . . . . . . . . . . . .
4.3.2 λ-Abstraction at Work . . . . . . . . . . . . . . . . . .
4.3.3 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . .
4.4 Applicative Categorial Grammar . . . . . . . . . . . . . . . .
4.4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . .
4.4.2 Type-driven Evaluation . . . . . . . . . . . . . . . . .
4.5 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.5.1 Verbs, Proper Names . . . . . . . . . . . . . . . . . .
4.5.2 Generalized Quantifiers . . . . . . . . . . . . . . . . .
4.5.3 Generalized Quantifiers and the Finite Verb Phrase
4.5.4 Quantifier Scope Ambiguities . . . . . . . . . . . . .
4.5.5 Outlook and Limits . . . . . . . . . . . . . . . . . . .
4.6 Metatheorems . . . . . . . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
97
97
97
98
100
100
101
101
102
102
103
104
105
105
107
111
111
111
113
114
115
116
Solutions to Exercises
121
Index
129
Preface
This text is a short introduction to logic that was used for accompanying an
introductory course in Logic for Linguists held at the New University of Lisbon
(UNL) in fall 2010.
The main idea of this course was to give students the formal background
and skills in order to assess literature in logic, semantics, and related fields and
perhaps even use logic on their own for the purpose of doing truth-conditional
semantics. This course in logic does not replace a proper introduction to semantics and is not intended as such, although parts of Chapter 1 and 4 could be
used to supplement an introductory course in semantics. In contrast to other
introductions it has a certain focus on ‘writing things down correctly’, which
is why not always the simplest notation is used—for example, Greek letters
are used as metavariables to encourage students to learn the Greek alphabet.
However, proofs of metatheorems have been omitted entirely.
This text is not recommend for self-study, because it is in (eternal?) draft
status and still contains errors and typos. For self-study it is better to rely
on material that has been reviewed more extensively. Please report typos and
errors to erich@snafu.de. This manuscript was written hastily by a non-native
speaker, and so I’d like to apologize for any odd uses of the English language.
Erich Rast,
15. December 2011
The Greek Alphabet
α
β
γ
δ
ǫ
ζ
η
θ
ι
κ
λ
µ
ν
ξ
o
π
ρ
σ
τ
υ
φ
χ
ψ
ω
A
B
Γ
∆
E
Z
H
Θ
I
K
Λ
M
N
Ξ
O
Π
P
Σ
T
Y
Φ
X
Ψ
Ω
alpha
beta
gamma
delta
epsilon
zeta
eta
theta
iota
kappa
lambda
mu
nu
xi
omicron
pi
rho
sigma
tau
upsilon
phi
chi
psi
omega
List of Symbols
;
a∈B
A∪B
A∩B
A⊂B
A⊆B
A \B
A
P (A)
An
A1 × · · · × A n
BA
f : A→B
a 7→ b
f −1
R −1
=
∼
≥
>
¬
∨
˙
∨
∧
→
↔
empty set
membership,
a is a member of B
union of A and B
intersection of A and B
proper subset,
A is a proper subset of B
(improper) subset,
A is a subset of B
set difference, A without B
complement of A
powerset of A
n-ary Cartesian product of A
Cartesian Product of A 1 , . . . , A n
the functions from A to B
function f from A to B
a is mapped by a function to b
inverse function of f
inverse relation of R
identity
equivalence relation
greater than or equal
greater than
truth-functional negation
(inclusive) disjunction
exclusive disjunction
conjunction
conditional
biconditional
↑
↓
Cn(.)
M Íφ
Íφ
⊢φ
φ1 , . . . , φ n ⊢ ψ
φ M,g
∀
∃
ι
ι
∃!
g[x/a]
φ[x/t]
g ≈x h
λ
φ⇒ψ
φ⇔ψ
Sheffer stroke
Peirce stroke
deductive closure
φ is true in M
φ is a valid
φ is provable
ψ is provable
from φ1 , . . . , φn
evaluation of φ
in M under assignment g
universal quantifier
existential quantifier
iota operator
iota quantifier
there is exactly one
modified assignment,
s.t. g(x) = a
replace free x by t in φ
x-variant h of assignment g
λ-abstraction operator
rewrite φ as ψ
φ ⇒ ψ and ψ ⇒ φ
C HAPTER
1
Sets, Relations, Functions
The topic of this chapter are sets, relations, and functions. It is very likely
that you already have a fairly good idea of what these are from your inevitable
exposure to school mathematics, in case of which you may use this chapter and
the easy exercises in it to refresh your knowledge a bit. Set theory is a necessary
prerequisite to formulating logical languages and models for them in a modern
way and assess literature in logic and formal semantics. Even if you think you
already know what functions are you ought not skip this chapter in its entirety
for two reasons. First, the notation used here might differ from the one you’re
used to and the way sets, relations, and functions are used throughout here
might differ from what you already know. Second, we will take a look at a
number of linguistic examples that you might find interesting.
1.1 Sets
Sets are abstract collections of objects.
1.1.1 Defining sets
When characterizing a set you need to be precise and avoid ambiguities of
natural language. In the literature, sets are often defined using notation from
mathematics and logics, but the level of formality depends on the style of the
author and many great mathematicians and logicians have the ability to express
themselves precisely with only a minimum amount of ‘notational clutter’. There
is always a tension between notational correctness and readability, and the goal
is to find the right equilibrium between them. That being said, let us take a look
at the ways to define sets. English paraphrases are given in quotes; they also
represent how you would read those definitions aloud.
1
2
CHAPTER 1. SETS, RELATIONS, FUNCTIONS
Enumeration. All the members of the set are listed within curly braces { and
}, where list items are separated by comma.
• {1, 2, 3, 4}
“the set containing the numbers 1, 2, 3, 4”
• A = {Peter, John, Mary}
“the set A contains Peter, John, and Mary (and nothing else)”
• C := {1, 2, 3, . . . , 20}
“let C be a set containing the integer numbers from 1 to 20”
☞ Note 1 (Identity versus Definition) The sign ‘:=’ indicates a definition, which ought to be understood as a mere syntactic abbreviation. In
contrast to this ‘=’ denotes identity.
Abstraction, Set-builder Notation. A variable – usually x, y, z – is used in
combination with a condition that this variable must satisfy.
• A := { x | x is a natural number}
“A is the set of all natural numbers”
• B := { x | x is a male person}
“B is the set of all male persons”
• C := { x | x ∈ N and x ≥ 20}
“C is the set of all natural numbers that are greater than or equal to 20”
• D := { s | someone jumps in s at the time of s}
“D is the set of jumping situations”
Hereby the notation x ∈ N means that x is a member of the set N, which by
convention denotes the set of natural numbers. Sometimes the domain of the
variable is specified in the first part of the definition using the membership
relation ∈ (explained in more detail below).
• { x ∈ R | x > 0}
“the set of real numbers greater than 0”
• { x ∈ N | there is an k ∈ N such that x = 2k}
“the set of even natural numbers”
3
1.1. SETS
Recursive Definitions.
definition:
Sets can be built from more basic sets by a recursive
• t∈ A
“t is an element of A”
• If α ∈ A and β ∈ A, then (αβ) ∈ A
• Nothing else is in A
Giving a Precise Description. Sets are also often just described precisely
without using any special notation:
• Let A be the set of all prime numbers.
• B is the set of positive natural numbers greater than 2
Empty Set. The empty set ; is the set that does not contain any element.
There is only one empty set. Its existence is either postulated explicitly or can
be inferred in common axiomatic formulations of set theory.
L History 1 The symbol ‘;’ was introduced by the Bourbaki Group, an influential group of mathematicians publishing under the pseudonym ‘Nicolas
Bourbaki’ starting from 1935. The symbol is based on the letter ø used in
Danish and Norwegian.
It should be clear that just defining a set by abstraction doesn’t give you any
guarantee that the set actually has members. The members of sets stand
for concrete entities, but the sets themselves are abstract entities and not
aggregates of actual objects. You can see this from the fact that the empty set is
itself a set. Someone may for example clearly and unambiguously describe the
set of coins he has in his left pocket and yet it may happen that he has no coins
in his pocket – and a collection of 0 coins is not an aggregate of any actual coins!
When a set A has members, i.e. A 6= ;, you have to think of these members as the actual objects and not symbols standing for them unless we’re
talking about a set of symbols in the first place. For example, consider a set
{P edro, A f onso, Maria}; this set consists of Pedro, Afonso, and Maria and not
of the proper names ‘Pedro’, ‘Afonso’, and ‘Maria’. In other words, the names
are used and not mentioned. Of course, in linguistics it is also common to talk
about sets of symbols and in that case care has to be taken to make clear that
the objects in question are symbols. For example, {‘Pedro’, ‘Afonso’, ‘Maria’} is
the set of proper names ‘Pedro’, ‘Afonso’, and ‘Maria’.
4
CHAPTER 1. SETS, RELATIONS, FUNCTIONS
The actual members of a set are sometimes called its extension. The empty
set has an empty extension, because it has no members. When a set is described
or characterized using set-builder notation, we can think of the description of
the set as characterizing the set’s intension apart from describing its extension.
Take for example the two sets A := { x | x is a being that has a liver} and B := { x | x
is a being that has a heart}. For all we know A and B have the same extension:
There is no being with a liver that has no heart and there is no being with a
heart that has no liver. Nevertheless, A and B have different intensions. In
order to show that A and B have the same extension you have to look at all
beings with livers and hearts; you cannot infer this from the definitions of the
sets alone.
You may also look at the distinction in the following way: Suppose John is a
bit crazy and makes a list of all beings with a heart by naming not just their
species but each of them individually. Then by just looking at this huge list and
investigating the world you could not figure out whether he wanted to give you
the set of all beings with a liver, the set of all beings with a heart, the set of all
beings with a liver or a heart, or the set of all beings with a liver and a heart.
An intension contains more information than the extension. However, there
are many different ways to formally capture the notion of an intension, each of
them has its quirks, and there is a long history of philosophical controversies
about the notion of an intension. For the time being, we will not attempt to give
a precise account of what an intension is and instead characterize it negatively:
whatever is not clearly extensional is intensional. This rule of thumb is tied to
the identity conditions associated with a certain kind of entity. Even though they
can be defined by abstraction in terms of their intension sets are extensional
in the sense that only their members are taken into account when comparing
them.
Identity Between Sets. Two sets A and B are identical, i.e. A = B, if all
members of A are also members of B and all members of B are also members of
A.
L History 2 (Extension versus Intension) Many attempts of implementing intensions in a logical language were inspired by Gottlob Frege’s
work, in particular his well-known article ‘Über Sinn und Bedeutung’ (1892).
Frege distinguishes between the sense of an expression, which corresponds
to what is nowadays called an intension, and its reference, which corresponds to the extension.
The above identity condition is a deliberate choice. There are logical systems in
which entities that are in many ways similar to sets do not have an extensionality
1.1. SETS
5
principle like the above one. Notice that from the definition of identity between
sets it follows that {a, a, b, a} = {a, b} = { b, a}. Duplicate entries don’t count and
are usually omitted and the order to the elements in the specification of a set
doesn’t matter. Notice further that ; = ; is a special case. All members of ; are
also members of ; simply because ; has no members. This way of understanding
‘all’ is based on mathematical conventions.
☞ Note 2 (Presuppositional Readings of Quantifiers) In natural languages quantifiers are often ambiguous between a strict ‘logical’ reading
and a presuppositional reading. Consider the following example:
(1.1) Situation: Pedro is talking to Maria about the homework
assignments. Next to him is a small wooden box that is closed. The box
is empty. He is pointing to the box and utters:
If you make this exercise for me, I’ll give you all the money in this box.
According to the non-presuppositional reading there is nothing wrong with
Pedro’s suggestion. In this reading ‘all the money in this box’ can denote
the empty set. However, when somebody uses the English quantifier ‘all’ it
is often silently presupposed that there is at least one object satisfying the
quantifier restriction, i.e. satisfying the property being money in this box
in the above example.1 The presupposition that there is an object satisfying
the quantifier restriction is not commonly made in mathematical parlance.
1.1.2 Operations on Sets
Membership. a ∈ D expresses the fact that a is a member of the set D. If
a ∈ D we also say that D contains a, that a is an element of D, or simply that a
is in D. The empty set is a member of any set.
Intersection. A ∩ B denotes the set containing all elements that are in A and
in B, i.e. A ∩ B := { x | x ∈ A and x ∈ B}. A ∩ B is called the intersection of A and B.
Union. A ∪ B denotes the set containing all elements that are in A or in B or
in both, i.e. A ∪ B := { x | x ∈ A or x ∈ B}. A ∪ B is called the union of A and B.
Subset. A ⊆ B holds if and only if all elements of A are elements of B. If A ⊆ B
is the case we say that A is a subset of B.
1 This presupposition can also be explained by using Gricean conversational maximes. Do
you know how?
6
CHAPTER 1. SETS, RELATIONS, FUNCTIONS
Proper Subset. A is a proper subset of B, written A ⊂ B, if and only if A is a
subset of B and B is not a subset of A, i.e. A ⊆ B and not B ⊆ A.2
Set Difference and Complement. The difference of two sets A, B is written
A \B, spoken A without B or A minus B, and contains all members of A that
are not in B. When B ⊆ A is the case A \B is also called the complement of B
in A. A \B := { x ∈ A | x ∉ B}. Alternative notation: A − B. Sometimes a base
domain D such that A ⊆ D is known or presumed and then A is used to denote
the complement of A in D, i.e. A := { x ∈ D | x ∉ A } where D must be clear from
previous definitions or easily inferable from the context.
Cardinality. The cardinality of a set is a measure of its size. It is usually
written | A |. For example, for A = {a, b, c}, | A | = 3. Obviously, |;| = 0.
✧ Remark 1 (Cardinality and Infinity) The cardinality of the set of natural numbers N is called ℵ0 (pronounced: ‘aleph null’; ℵ is a letter of the
Hebrew alphabet), standing for infinitely many elements that can be counted.
Such a set is said to be countable or denumerable. The cardinality of the
set of real numbers R is 2ℵ0 . This cardinal number, called the cardinality of
the continuum, is distinct from ℵ0 , because it can be proved that there is
no one-on-one mapping from the set of real numbers to the set of natural
numbers. This means that even though both sets contain infinitely many
numbers there are more real numbers than natural numbers.
L History 3 (Georg Cantor (1845-1918)) Among the numerous important results by Georg Cantor was a famous proof, using a technique called
diagonalization, that there are more real numbers than natural numbers.
He also was the first to advance the continuum hypothesis, which states
that there is no set with a cardinality that lies between the cardinality of
the set of natural numbers and the cardinality of the set of real numbers.
In modern notation this would be expressed as the hypothesis that ℵ1 = 2ℵ0 .
This hypothesis was conjectured in 1897; it is still unproved but widely
accepted to be true.
2 It is a general convention to strike through symbols to indicate their negation, so the second
part of the above condition could have been written B 6⊆ A.
1.1. SETS
7
❉ Example 1 Consider a situation with the following domain D := {Pedro,
Afonso, Maria, Ana, Elisabeta, Erich, Rui}. Suppose, for example, D represents
the persons in a classroom at a given time. Using a bit non-standard notation,
let students := {Pedro, Afonso, Maria, Ana, Elisabeta}, yawn := {Erich, Ana},
laugh := {Maria, Afonso}, and work := {Pedro, Afonso, Maria, Ana, Elisabeta,
Rui}
(1.2) Erich and Rui are not students:
student = {Erich, Rui}
(1.3) No one is laughing and yawning:
yawn ∩ laugh = ;
(1.4) All students work:
students ⊆ work
(1.5) The set of students that laugh or yawn:
laugh ∪ yawn = {Maria, Afonso, Erich, Ana}
(1.6) The set of students that do not laugh:
students\laugh = {Pedro, Ana, Elisabeta}
(1.7) All laughing people (in the classroom, at the given time) are students:
laugh ⊆ students
(1.8) Some students yawn:
(students ∩ yawn) 6= ;
☞ Note 3 (Generalized Quantifiers) Natural language expressions like
‘todos estudantes’, ‘drei Könige’ (three kings), or ‘some linguists’ are from
a semantic perspective considered generalized quantifiers. Correspondingly, formulations of the meanings of expressions like ‘todos’, ‘drei’, and
‘some’ used in these generalized quantifiers are sometimes called generalized (quantifying) determiners. In a phrase like ‘Some students yawn’ the
expression ‘some’ is a generalized quantifying determiner, the plural NP
‘students’ is the quantifier restriction, and the finite verb phrase ‘yawn’ is
the body of the quantifier. Ignoring for the time being the intricacies of
the syntax-semantics interface, it is easy to give truth-conditions for many
generalized determiners using set theory. Here are some examples, where A
stands for the meaning of the quantifier restriction and B for its meaning of
the quantifier body both taken as sets of objects:
• some A B: (A ∩ B) 6= ;
Example: Some philosophers are linguists.
8
CHAPTER 1. SETS, RELATIONS, FUNCTIONS
• all A B: A ⊆ B
Example: All linguists hate philosophy.
• no A B: (A ∩ B) = ;
Example: No student likes logic.
• most A B: |(A ∩ B)| > |(A \B)|
Example: Most linguists like syntax.
• three= A B: |(A ∩ B)| = 3
Example: Three dogs bark.
A side note about ‘three’. I have marked this definition with a = in order
to indicate that according to this definition the quantifying determiner is
read ‘exactly three’. This reading seems to be prevalent.3 But in certain
circumstances it may also be possible to understand numerals like ‘three’
as in ‘x or more’, i.e. based on the definition |(A ∩ B)| ≥ x. For example,
when it is known by everyone in a conversation that a family with two or
more children benefits from reduced taxes an utterance of ‘Maria has two
children, so she’ll get tax benefits’ seems to be perfectly true if Maria has
four children. Or what do you think? What about Portuguese numerals? Do
they have the same default interpretation?
Powerset. The powerset P (A) of a set A is the set of all subsets of A, i.e.
P (A) = { X | X ⊆ A }. Alternative notations for the power set are P(A), ℘(A), or
2 A . Notice that ; ∈ P (A) and A ∈ P (A) for any set A.
✧ Remark 2 (The Power of the Powerset) As the name suggests the
powerset operation is very powerful. Generally, the powerset of a set with
n elements has 2n elements. Recall that the set of natural numbers N is
countably infinite, i.e. |N| = ℵ0 . The powerset P (N) thus has cardinality 2ℵ0 ,
i.e. it has the cardinality of the set of real numbers. This implies that the
members of P (N) are not countable.
3 Once more Gricean maxims can give an explanation as to why this reading is prevalent.
Bear in mind, though, that the question whether this reading is prevalent or not is empirical and
premature judgments ought to be avoided.
9
1.1. SETS
1.1.3 Basic Properties of Sets
Here are a few propositions that hold for any sets A, B:
1. ; ⊆ A
7. (A ∪ B) = (B ∪ A)
2. (A ∪ ;) = A
8. (A ∩ B) = (B ∩ A)
3. (; ∪ ;) = ;
9. ((A ∪ B) ∪ C) = (A ∪ (B ∪ C))
4. (A ∩ ;) = ;
10. A ⊆ (A ∪ B)
5. (A ∩ A) = A
11. (A ∩ B) ⊆ A
6. (A ∪ A) = A
12. (A ∩ B) = (A ∪ B)
1.1.4 Venn Diagrams
Venn diagrams are helpful tools for visualizing relations between sets. Examples
of Venn diagrams are given in Figure 1.1 to 1.5.
a
f
d
b
e
c
Figure 1.1: Union of sets A = {a, b, f , e} and B = { d, c, f , e}.
Figure 1.2: Intersection of sets A = {a, b, f , e} and B = { d, c, f , e}.
1.1.5 Exercises
✐ Exercise 1 Define the following sets by enumeration:
10
CHAPTER 1. SETS, RELATIONS, FUNCTIONS
A
a
b
B
f
d c
e
Figure 1.3: Set A = { f , d, c} is a proper subset of B = {a, b, c, d, e, f }.
a. all possible outcomes of a single throw of a standard dice
b. all faces that of a single throw of two dices might show, where it is not
distinguished between the dice (for example, the case when the first dice
shows 1 and the second 3 is treated as being equal to the case when the
second dice shows 1 and the first dice shows 3)
✐ Exercise 2 Define the following sets by specifying their intension, using set
abstraction:
a. the set of odd natural numbers greater than 3
b. the set of all subsets of a set S
c. the set of all blue sports cars in Lisbon today
d. the set of all subsets of a set A whose intersection is non-empty
✐ Exercise 3 Let A := {1, 3, 4, 5, 2}, B := {a, 3, 4, 5, b, c}, C := {1, 9}. Determine
the following sets by enumerating their members:
a. (A ∪ B) ∩ C
c. (A \C) ∩ B
b. (A ∩ B) ∪ C
d. (C \ A) ∪ ((B ∩ A) ∪ ;)
11
1.1. SETS
D
g
a
c
f
A
Figure 1.4: Let A = {a, c, f } and the domain D = { g, b, d, e, a, c, f }. Then A =
{ g, b, d, e}.
✐ Exercise 4
a. Does the phrase ‘the first three members of the set {a, b, c, d, e, f , g}’ make
sense? Explain!
b. Does it make sense to speak of the union of a given set of apples and a given
set of bananas? Explain!
c. Express the phrase ‘All employees that are not members of the union get a
higher salary’ in the language of set theory.
d. Express the phrases ‘Há estudantes que trabalham’ and ‘Há estudantes que
não trabalham’ in the language of set theory.
e. Express the phrase ‘Todos estudantes trabalham ou não trabalham’ in the
language of set theory.
f. Is the phrase of the previous exercise (e) always true or can it also be false?
g. Express the fact that the intersection of three sets A, B, C is non-empty in
the language of set theory.
h. Suppose a set A is empty and a set B is non-empty. Does A ⊆ B hold?
12
CHAPTER 1. SETS, RELATIONS, FUNCTIONS
D
C
A
B
Figure 1.5: The hatched area depicts (A \B) ∩ C. The grey area depicts the
complement (A \B) ∩ C in D.
✐ Exercise 5 Let A := {a, b, c} and B := {d, e, a} and let the total domain D
under consideration be A ∪ B ∪ { f }. Draw Venn diagrams to illustrate whether
or not the relations between the following sets hold:
a. A ∪ B
b. A ∪ B
c. A ∩ B
d. D \{ f }
e. D \(A ∪ B)
f. A ⊂ B
g. A ⊆ B and B ⊆ A
✐ Exercise 6 Use Venn diagrams to show whether or not the following propositions hold:
a. A ⊆ (A ∩ B)
b. If (A ∪ B) ⊆ B then A ⊆ B.
c. (A ∪ B) = (A ∩ B)
13
1.2. RELATIONS
✐ Exercise 7 Specify the answer to the following questions by enumerating
all members of the answer set.
a. A := {1, 2}. What is P (A)?
b. A := ;. What is 2 A ?
c. A := {a, b, c} and B := { c, a}. What is P (A ∩ B)?
✐ Exercise 8 Formulate truth conditions like in Note 3 for the following generalized determiners:
a. at least five A B
c. no more than three A B
b. exactly one A B
d. not one A B
1.2 Relations
We will now take a closer look at properties and relations. A property can
formally be represented by its extension: the set of objects that have the property.
For example, the property of being a student may be represented by the set of
all students and the property of being a record produced between 1970 and 1980
and liked by Erich can be represented as one huge, yet still finite, countable,
and enumerable set of records. But can we also represent relations between
objects extensionally? The answer is, of course, Yes. The extension of an n-ary
relation can be given by a set of ordered n-tuples. An ordered 2-tuple is called
an ordered pair.
1.2.1 Ordered Tuples
Ordered Pair. We write 〈a, b〉 for the ordered pair consisting of a in the first
place and b in the second place. Notice that, as the name implies, the order now
matters. This means that if a 6= b, then 〈a, b〉 6= 〈 b, a〉. Bear in mind, however,
that the case that a = b might sometimes have to be taken into account. The
notation (a, b) is also sometimes used for an ordered pair.
Ordered n-Tuple. We write 〈a, b, c〉 for the ordered triple consisting of a, b,
c (in that order), write 〈a, b, c, d 〉 for the ordered quadruple consisting of a, b,
c, and d (in that order), write 〈a, b, c, d, e〉 for the ordered quintuple consisting
of a, b, c, d, and e (in that order), and generally 〈a 1 , a 2 , . . . , a n 〉 for the ordered
n-tuple consisting of a 1 , a 2 , . . . , a n (in that order).4
4 Ordered pairs can be defined using set theory by representing 〈a, b〉 by {a, { b}}, 〈a, b, c〉 by
{a, { c, { c}}}, and so on. These tricks do not matter for our purpose, but they are sometimes used for
the definition of list data structures in programming languages.
14
CHAPTER 1. SETS, RELATIONS, FUNCTIONS
1.2.2 Characterization of Relations
Relation and Arity. A relation between two sorts of entities is called a binary
relation. A relation between three sorts of entities is called a ternary relation.
The number of arguments of a relation is called its arity. So we can generally
speak of n-ary relations (n ≥ 1). Notice that unary predicates have arity 1 and
are sometimes considered a special case of a relation.
Sometimes authors suggest that an n-ary predicate (n ≥ 2) expresses a relation
similar to saying that a unary predicate expresses a property. We avoid this
way of talking, because it carries some perhaps undesirable ontological baggage.
Instead, we use the term ‘relation’ sometimes for the symbol or expression like
‘R’ or ‘to know’ and sometimes for its meaning: Something that holds between
two or more objects. The term ‘predicate’ is here mostly used in the context of
talking about unary predicates (i.e. of arity 1). If only syntactic entities like ‘P’
or ‘R’ are meant we can make this explicit by calling them predicate symbols
and relation symbols respectively. In chapter 3 the connection between predicate
and relation symbols and their extensional meaning will be made precise by
specifying concrete interpretation rules that map prima facie meaningless and
arbitrary symbols to their extension, thereby specifying their meaning in a
mathematically precise way.
Extensional Representation of a Relation. The extension of an n-ary relation can be represented by a set of n-tuples of entities.
❉ Example 2 Here are some examples of relations:
(1.9) ≥ is a binary relation between two numbers
(1.10) As long as tense and aspect is ignored, giving something to someone
is a ternary relation between an agent, the recipient, and the object that
is given.
{〈 x, y, z〉 | x gives y to z in Sala 4.02 on October 24, 2014}
(1.11) x ≤ (y + z) is a ternary relation between numbers5
{〈 x, y, z〉 | x ≤ (y + z)}
(1.12) As long as tense and aspect is ignored, buying something from
someone is a relation of arity 4: A buyer buys something at a certain price
from a seller.
{〈 x1 , x2 , y, z〉 | x1 buys x2 from y at price z}
5 I’m using variables x, y, z for indicating arguments in this example. More about this will be
said in the following sections.
1.2. RELATIONS
15
1.2.3 Other Important Notions
Cartesian Product. The Cartesian product A × B between two sets A and B
is the set of all ordered pairs 〈 x, y〉 such that x ∈ A and y ∈ B, i.e. A × B = {〈 x, y〉 |
x ∈ A and y ∈ B}. By convention × is also commonly used for specifying the set
of n-tuples of several sets A 1 , A 2 , . . . , A n (n > 2).6
(1.13) Let A := {a, b, c}, B := { e, f }.
Then A × B = {〈a, e〉, 〈 b, e〉, 〈 c, e〉, 〈a, f 〉, 〈 b, f 〉, 〈 c, f 〉}.
❉ Example 3 (Cartesian Product)
(1.14) Let A := { x | x is a positive integer} and B := { x | x is a positive even
integer}. Then A × B = {〈 x, y〉 | x and y are positive integers and y is even}.
(1.15) A := { Mar y, J ohn} and B := { J ohn, P eter }. Then
A × B := {〈 Mar y, J ohn〉, 〈 Mar y, P eter 〉, 〈 J ohn, J ohn〉, 〈 J ohn, P eter 〉}.
Extensions of Predicates and Relations. Suppose there is a fixed domain
D of objects we’re talking about. Then the extension of a predicate is a member
of P (D), the extension of a binary relation is an element of P (D × D), the
extension of a ternary relation is an element of P (D × D × D), and so forth.
Inverse Relation. If R is an n-ary relation, then its inverse relation R −1 is
{〈 x1 , . . . , xn−1 , xn 〉 | R(xn , xn−1 , . . . , x1 )}. Notice that we used the notation R(x, y)
for indicating that x stands in relation R to y. Later we will make this usage
more precise by defining a formal language with relation symbols like R and
variables like x, y, z, which are interpreted correspondingly as relations and
individuals on a base domain. For the time being, let us adopt this notation as a
shortcut. Alternatively, we could have taken R to directly stand for its extension
and formulated the condition as R −1 := {〈 x1 , . . . , xn 〉 | 〈 xn , xn−1 , . . . , x1 〉 ∈ R }.
❉ Example 4 Consider the relation ‘x knows y’. Suppose we have a group
D of people, say D = { Maria, P edro, Anna}. We know that the extension of
our relation is an element in P (D × D), i.e. a subset of all ordered pairs on
D. Perhaps nobody in the group knows anyone else in the group. Then the
extension is the empty set. (Recall that ; ∈ P (A) for any set A, no matter what
it contains.) Let’s assume instead that the extension of ‘knows’ in our situation (with respect to D) is {〈 Maria, Anna〉, 〈 Anna, Maria〉, 〈 Maria, Maria〉,
〈P edro, Anna〉, 〈P edro, P edro〉}. Anna does not know herself – let us set aside
for the moment the interesting question whether it is adequate to represent
this fact in the above way or whether perhaps knowing oneself has a different
meaning than knowing someone and ought to be a relation on its own.7
6 Notice that for example A × (B × C) can be used for A × B × C. Even though 〈a, b, c〉 and
〈a, 〈 b, c〉〉 are not the same entities, they can be used for the same purpose and sometimes this is
done silently. Likewise, a 1-tupel 〈a〉 is in many applications silently taken as being equal to a
itself, even though this is formally not quite correct.
7 What’s your opinion?
16
CHAPTER 1. SETS, RELATIONS, FUNCTIONS
The inverse relation in this case can be paraphrased as ‘x is known by y’ and
its extension is {〈 Anna, Maria〉, 〈 Maria, Anna〉, 〈 Maria, Maria〉, 〈 Anna, P edro〉,
〈P edro, P edro〉}. Since Maria knows Anna, Anna is known by Maria and since
Pedro knows Anna, Anna is also known by Pedro. Maria and Pedro also know
themselves and hence are known by themselves. But Anna does not know Pedro,
even though she is known by him.
1.2.4 Properties of Relations
In the following list I will give the alternative symbolic notation in first-order
predicate logic. This logical language will be introduced in the next part, so for
the time being you may ignore this notation if you’re not yet familiar with it.
Common Properties of Relations.
1. A relation R is reflexive if and only if R(x, x) for all x.
∀ x[R(x, x)]
2. A relation R is irreflexive if and only if for all x it is not the case that
R(x, x). Notice that this is not the same as just not being reflexive!
∀ x[¬R(x, x)]
3. A relation R is symmetric if and only if for all x and y: If R(x, y) then
R(y, x).
∀ x, y[R(x, y) → R(y, x)]
4. A relation R is antisymmetric if and only if for all x and y: If R(x, y) and
R(y, x) then x = y. Notice that this is not the same as not being symmetric!
∀ x, y[(R(x, y) ∧ R(y, x)) → x = y]
5. A relation R is asymmetric if and only if for all x and y: If R(x, y) then it
is not the case that R(y, x).
∀ x, y[R(x, y) → ¬R(y, x)]
6. A relation R is transitive if and only if for all x, y, and z: If R(x, y) and
R(y, z) then R(x, z).
∀ x, y, z[(R(x, y) ∧ R(y, z)) → R(x, z)]
7. A relation R is Euclidean if and only if for all x, y, and z: If R(x, y) and
R(x, z) then R(y, z).
∀ x, y, z[(R(x, y) ∧ R(x, z)) → R(y, z)
8. A relation R is a preorder (quasi-order) if and only if R is reflexive and
transitive.
9. A relation R is total if and only if for all x and y: R(x, y) or R(y, x).
∀ x, y[R(x, y) ∨ R(y, x)]
1.2. RELATIONS
17
10. A relation R is a partial order if and only if R is reflexive, antisymmetric,
and transitive.
11. A relation R is a total order if and only if R is total, reflexive, antisymmetric, and transitive.
12. A relation R is an equivalence relation if and only if R is reflexive,
symmetric, and transitive.
Reflexivity, symmetry and transitivity are so common conditions and of such an
importance that you should be able to formulate these conditions by heart!
Usually, when a relation is said to have one of the above properties this is
not meant to hold just relative to some specific domain, situation, or example.
For example, suppose we are looking at a specific set of four people that, it
happens just so, all love themselves. We surely wouldn’t say that this particular,
possibly exceptional case shows that the relation ‘x loves y’ is reflexive in
general. Likewise, even though it is often the case that when x is a friend
of y and y is a friend of z then x is also a friend of z this need not be so in
general. You probably have meet a friend of one of your friends once in your life
that you wouldn’t consider your friend. However, not only mathematical but
also ‘everyday’ relations sometimes have one or more of the above properties in
general, i.e. regardless of what domain we’re considering.
❉ Example 5 Here are some examples of relations and their properties:
(1.16) If in general the Neocortex is part of the brain and the brain is part
of human beings, then in general the Neocortex is part of human beings.
This example suggests that the part-of relation is transitive.8
(1.17) If x is a relative of y, then y is also a relative of x, hence being the
relative of someone is symmetric.
(1.18) If x is a descendant of y and y is a descendant of z then x is also a
descendant of z. Being a descendant of someone is transitive.
(1.19) If x meets y then y meets x. Thus, meeting someone is symmetric.
(1.20) If x is the father of y then y is not the father of x. Being the father
of someone is antisymmetric.
It is sometimes helpful to visualize properties of relations by drawing diagrams.
Figure 1.6 depicts some common properties.
8 This claim has been disputed from time to time. Can you think of a counter-example?
18
CHAPTER 1. SETS, RELATIONS, FUNCTIONS
Figure 1.6: Common properties of relations
1.2. RELATIONS
19
1.2.5 Exercises
✐ Exercise 9 Consider a domain containing four individuals a, b, c, d, where
a is a constant for Afonso, b denotes Maria’s book, c stands for Afonso’s car, and
d stands for Maria.
a. Maria likes Afonso but doesn’t like his car. Afonso likes Maria and likes her
book. Define a binary relation x likes y that satisfies these constraints. Make
additionally sure that inanimate objects cannot like anything.
b. Define a relation x belongs to y to analyze the possessives ‘Afonso’s car’ and
‘Maria’s book’.
✐ Exercise 10 Determine which of the following relations are (in general)
transitive:
a. place x can be reached via Lisbon public transportation from place y
b. x is greater than y
c. x knows y
d. x is identical to y
e. x is similar to y
f. x is the mother of y
g. {〈 x, x〉 | x ∈ D } for any domain D
✐ Exercise 11 Check which of the following relations are reflexive, symmetric,
antisymmetric, Euclidean, and transitive respectively:
a. x is a cousin of y
b. x has the same hair color as y
c. x is better than y
d. x is bigger than y
e. x is a multiple of y (x, y ∈ N)
f. x has never heard of y
g. x has had a conversation with y
✐ Exercise 12 List five different equivalence relations.
20
CHAPTER 1. SETS, RELATIONS, FUNCTIONS
b
a
c
Figure 1.7: Graphical representation of a relation between three objects a, b, c.
✐ Exercise 13 In simple models preferences are sometimes considered a total
preorder. Suppose that an agent, say John, only has preferences between three
alternatives a, b, c.
a. Can John’s preference relation contain cycles?
b. Suppose Figure 1.7 depicts John’s preferences. Is this a preference relation
in the above sense?
1.3 Functions
1.3.1 Characterization of a Function
Function. A function maps the elements from one set, the domain of the
function, to elements of another set, the codomain or image of the function, such
that an element in the domain is never mapped to more than one element of the
codomain. A function is partial if some elements in the domain are not mapped
to anything and a function is total if all elements of the domain are mapped
to an element in the codomain. There may also be elements in the codomain
to which no element of the domain is mapped. Figures 1.9 and 1.10 illustrate
the difference between a function and a binary relation. Figure 1.8 depicts a
function as a mapping from its domain to its codomain.
When a total function is considered and elements of the codomain are ignored
when nothing maps to them (i.e. the function is surjective, to be explained
below), then a relation suffices to describe the function. The set of ordered
pairs representing that relation may not contain any two ordered pairs with
different second elements whose first element is the same. For example 〈a, b〉
and 〈a, c〉 would violate this condition. A more general way to describe a function
is to define it as an order triple 〈D, C, R 〉, where D is a set representing the
21
1.3. FUNCTIONS
domain
a
codomain
b
g
m
a
c
d
e
f
h
i
Figure 1.8: A function maps elements from the domain once to an element in
the codomain.
domain, C is a set representing the codomain, and R is a relation consisting
of ordered pairs 〈a, b〉 where a ∈ D and b ∈ C and additionally R satisfies the
before mentioned condition that it may not contain two distinct pairs with the
same first element. The key point to remember is: Two different elements of
the function domain may be mapped to the same element in the codomain, but
it is not allowed to map one and the same element of the domain to different
elements of the codomain. In the latter case, we would only have a relation but
not a function.
☞ Note 4 (Specifying Functions.) It is common to use small letters as
function names. It is also common to write f : A → B to specify that f is
a function from domain A to codomain B. There are then many ways to
actually define a function and you probably know most of them from school
mathematics. Only a few things that are important shall be noted here.
First, the symbol 7→ is frequently used to specify the mapping of one
particular element of the domain to an element in the codomain. Thus,
for example a 7→ g means that the particular element a is mapped to a
particular g. So it is better not to use 7→ instead of → in the notation
f : A → B, even though, as we will see later, → is also commonly used for
the truth-functional conditional in logical language.
22
CHAPTER 1. SETS, RELATIONS, FUNCTIONS
10
9
8
7
6
5
b
4
3
2
1
0
0
1
2
3
4
5
6
7
8
9
10
Figure 1.9: A function: Elements of the domain are mapped to exactly one
element in the codomain. For example f (7) ≈ 4.7.
Secondly, case distinctions may be used when defining a function. For
example,
f (x) =
(
3 if x = 4,
x otherwise
is a correct definition although it might disturb someone’s sense of mathematical beauty – rightly so in this particular case, but in many applications
in logic and linguistics case distinctions are unavoidable.
Third, a special notation intrinsically tied to logic is that of λ-calculus.
Going back to work by Alonzo Church according to this notation (λ x.P x)
is a function that ‘takes an x’ and applies P to it. For example, assuming
that a is a constant for the same kind of entities as the variable x, (λ x.P x)a
reduces to Pa, which when P is an ordinary predicate might for example
23
1.3. FUNCTIONS
10
9
8
b
7
6
5
b
4
3
2
b
1
0
0
1
2
3
4
5
6
7
8
9
10
Figure 1.10: A binary relation: Elements of the first set are mapped to one or
more elements of the second set. For example R(5, 1.5), R(5, 4.3), and R(5, 7.04).
yield 1 or 0 when being evaluated. We will see in chapter 4 how this works
in detail. For the time being, it suffices to be aware that λ x.P x informally
stands for a function that consumes entities of the sort of x and applies
whatever there is to apply to them in its body. To give another example,
(λ x.x + 2)3 would yield 3 + 2 via syntactic reduction rules, and interpreting +
and the numbers would result in 5. So in this case the meaning of (λ x.x + 2)
is the function f (x) = x + 2.
1.3.2 Properties of Functions
Common Properties of Functions.
In the following definitions f : A → B.
24
CHAPTER 1. SETS, RELATIONS, FUNCTIONS
e
a
f
b
g
c
h
d
i
Figure 1.11: A function that is injective and not surjective.
a
e
b
f
c
d
g
Figure 1.12: A function that is surjective and not injective.
1. A function is surjective or onto if all members of the codomain are
mapped to from one or more members of the domain. See Figure 1.12.
∀ y ∈ B∃ x ∈ A[ f (x) = y]
2. A function is injective or one-on-one if no two or more elements of the
domain are mapped to the same element in the codomain. See Figure
1.11.
∀ x, y ∈ A[( f (x) = f (y)) → x = y]
3. A function is bijective or one-on-one and onto if it is surjective and
injective. See Figure 1.13.
Figure 1.14 depicts a function that is neither surjective nor injective.9
1.3.3 Further Notions
9 Many thanks to Ana Sofia Rocha for having pointed out a mistake in an earlier version of
this picture.
25
1.3. FUNCTIONS
h
a
e
b
f
c
g
d
Figure 1.13: A bijective function.
h
a
i
e
b
f
c
g
d
Figure 1.14: A function that is neither surjective nor injective.
Inverse Function. Let f : A → B be injective. The inverse function f −1 is
that function f −1 : B → A such that f −1 (b) = a if and only if f (a) = b.
Notice that f −1 is guaranteed to be a unique function, because f is injective. If
f was not injective, then the reverse mapping would only result in a relation.
If f is a bijection, then f −1 is also a bijection, hence the name bijection. Bear
in mind, though, that from the fact that a function is injective (or bijective)
alone nothing can be concluded about how easy it is to construct or compute the
inverse function – unless we are speaking about functions based on finite sets
whose members can be listed easily.
Characteristic Function / Indicator Function. A characteristic function
or indicator function is a function 1 A : B → {1, 0} (A ⊆ B) defined as
1 A (x) =
(
1 if x ∈ A
0 if x ∉ A
26
CHAPTER 1. SETS, RELATIONS, FUNCTIONS
In other words, the characteristic function of a set A yields true if its argument
is an element of A, false otherwise. We use 1, 0 for truth and falsity here and
will continue to do so in the subsequent sections.
1.3.4 Exercises
✐ Exercise 14 Determine whether the following functions are total, partial,
surjective, injective, or bijective respectively, where Dom is the domain and Cod
the codomain of the function:10
a. f (x) = x2
(x ∈ N)
b. f (x) = 2x − 1
c. f (x) = 1/x2
(x ∈ N)
(x ∈ N)
f. f : {a, b, c} → {a, b, c} s.t. f (x) = x.
g. f : {a, b, c} → {a, b, c, d } s.t.
{(a, b), (b, c), (c, d)}
f =
d. Dom = {a, b, c}, Cod = {1, 0},
f = {(a, 1), (b, 1), (c, 1)}
h. Dom =(R, Cod = {1, 0},
1 if x > 2
f (x) =
0 otherwise
e. Dom = {a, b, c}, Cod = Dom,
f = {(a, a), (c, c), (b, c)}
i. Dom = {a, b, c, d } = Cod,
f = 〈Dom, Cod, {〈a, c〉, 〈 b, c〉, 〈 c, d 〉}〉
✐ Exercise 15 Specify the inverse function if there is one:
a. f (x) = x/2
b. f (x) = x2
1.0 if x < 0,
e. f (x) = 0.5 if x = 0,
0 if x > 0
c. f (x) = x
p
d. f (x) = x
e. Cod = {Maria, Thomas, Peter}, Dom = {Ana, Klaus, Teresa},
f = {(Ana, Thomas), (Teresa, Peter), (Klaus, Maria)}
✐ Exercise 16 Determine which of the following relations is also a function
and specify the inverse function if there is one:
a. the relation between all Turkish proper names and their referents
b. the relation between all Portuguese sentences to their possible translations
into English
c. the relation between all passport owners to the number of their passport
10 Different notations are used deliberately.
27
1.3. FUNCTIONS
d. the relation between all grammatically-well formed English sentences to
their meanings
e. the relation between all dog owners and their dogs
✐ Exercise 17 Let D = {Ana, Pedro, Mustafa, Joe, Lisa}, where everybody knows
himself, Ana knows Pedro and Mustafa, Pedro knows Ana, Mustafa, and Joe,
Mustafa knows Joe and Lisa, and Lisa knows everyone. Specify a function f (x, y)
with domain D that yields true if x knows y and false otherwise.
✐ Exercise 18 If f (x) = x then x is called a fixed point of f . Specify two
functions that have a fixed point and their fixed point. (Do not use Google to
look them up!)
✐ Exercise 19 Specify the characteristic function of the following sets:
a. the set of all native speakers of German
b. A = {1, 0, −1}
c. the set of any ordered pair of two persons x and y, where x says ‘Hi!’ to y
d. the set of all raining events
e. the set A = { x | x ∈ {1, 2, 3, 4, 5}} (A ⊂ B), where B = {1, 2, 3, 4, 5, 6, 7}
f. the union of A and B, where A = {a, b, c, d } and B = {a, b, d, f }
g. the set of ordered quintuples 〈 x, y, z1 , z2 , t〉 such that x buys y from z1 at price
z2 at time t
h. the set of all non-black ravens
✐ Exercise 20 Determine which of the following relations can also be regarded
as functions:
a. {〈a, a〉, 〈 b, b〉, 〈 c, a〉, 〈 d, 1〉}
b. {〈a, a〉, 〈 b, a〉, 〈 c, a〉, 〈 d, 1〉}
c. {〈a, a〉, 〈 b, b〉, 〈a, b〉, 〈 d, 1〉}
d. {〈 b, b〉, 〈a, a〉, 〈 d, 1〉, 〈 c, a〉, 〈 e, f 〉}
e. f −1 , where f (x) =
x2
x
(x > 0)
f. R = {〈 x, y〉 | x, y ∈ N and x > y}
g. to know someone
h. the relation between a sales item
and its price
i. the relation between the number of
letters printed in a book and the
physical thickness of the book
28
CHAPTER 1. SETS, RELATIONS, FUNCTIONS
✐ Exercise 21 In natural language syntax, the notion of c-command in a syntactic derivation tree is defined as follows. A node A dominates a node B if
A is higher than B in the tree and you can draw a line from A to B by only
going downwards.11 Node A c-commands node B if and only if (i) A does not
dominate B, (ii) B does not dominate A, and (iii) every node that dominates A,
also dominates B. For example, in the following tree NP1 c-commands V and
NP2 :
S
VP
NP1
John
V
takes
NP2
Det
N
the
key
a. Can ‘x c-commands y’ be represented by a function?
b. Can ‘x is c-commanded by y’ be represented by a function?
1.4 Literature
There are many introductions to set theory and discrete mathematics. Just
about any text you find appealing will do. I have for example used the following
ones for self-study and preparations:
• Kenneth H. Rosen (2007). Discrete Mathematics and its Applications.
McGraw-Hill.
• Barbara H. Partee, Alice ter Meulen, and Robert E. Wall (1990). Mathematical Methods in Linguistics. Springer.
Generalized quantifiers have first been investigated in detail from a formal point
of view by A. Mostowski On a generalization of quantifiers, Fund. Math. Vol.
44, pp. 12-36. Partee/ter Meulen/Wall provide a good and accessible overview.
Another accessible overview can be found in the following general introduction
to semantics:
• Irene Heim and Angelika Kratzer (1998). Semantics in Generative Grammar. Blackwell.
11 Giving a more precise definition is more complicated and best done recursively.
1.4. LITERATURE
29
Frege’s articles are a must-read for anyone interested in meaning. For linguistics
his most important articles are:
• Gottlob Frege (1892). Über Sinn und Bedeutung. Zeitschrift für Philosophie und philosophische Kritik, 25-50.
• Gottlob Frege (1918-1919). Der Gedanke. Beiträge zur Philosophie des
deutschen Idealismus 2 (1918-1919), 1918, 58-77.
English translations are ubiquitous, see for example Peter Geach and Max Black
(1980). Translations from the philosophical Writings of Gottlob Frege. Blackwell.
C HAPTER
2
Propositional Logic
In this chapter we take a look at propositional logic (often abbreviated PC for
‘propositional calculus’). In propositional logic there are no quantifiers: It only
deals with constants that stand for entire natural language sentences and the
ways these constants may be combined to form more complex expressions. For
example, let p and q stand for sentences; then we can express p and q by the
formula p ∧ q in the language of propositional logic. It is clear that propositional
logic alone is not of much use for linguistic theorizing. However, it is the basis
for more expressive logical languages in which predicates, relations, variables
and constants for objects, and ways to quantify over variables are available.
Everything that can be said about propositional logic will transfer neatly to
those more expressive languages.
2.1 Syntax of Propositional Logic
We start by defining a formal language, i.e. a set of strings that characterizes
all well-formed expressions of that language, and call this language PC. Later
on, the goal will be to approximate the syntax of the formal language as nearly
as possible to that of a natural language or at least provide an approximation
that allows us to obtain good semantic representations from an underlying
syntactic theory and the specification of a lexicon for a natural language. Since
propositional logic is only concerned about combinations between sentences
without taking into account quantifiers, we will not even come close to this goal
in this section.
2.1.1 Basic Expressions
Propositional Constants. We use p, q, r and their indexed variants such as
p′ , q′′ , r ′′′ , p 1 , q 2 , and r 99 as propositional constants. Let L C be the set containing
31
32
CHAPTER 2. PROPOSITIONAL LOGIC
them. We additionally assume that L C contains to special constants ⊤ called
Verum and ⊥ called Falsum.1
Connectives. Connectives are used to combine propositional constants into
˙ be the set of binary
more complex expressions. Let L B := {∨, ∧, →, ↔, ↑, ↓, ∨}
connectives for PC and JN := {¬} be the set of unary truth-functions containing
only the negation symbol ¬. We will soon see what exactly these mean. For now,
here is a list of their common names and some alternative notation:
Conjunction. ∧, &, AND
Disjunction. ∨, ;, OR, |
more precisely called inclusive disjunction, very rarely also called adjunction
Negation. ¬, ∼, −, NOT, NEG, p (overlining the expression to be negated)
more precisely called truth-functional negation, very rarely also called
outer negation
Conditional. →, ⊃, ⇒, I MPLIES, IF . . . T HEN
sometimes also called implication, material implication, rarely also called
subjunction
Biconditional. ↔, ≡, IFF, ⇔
sometimes also called equivalence, material equivalence, if and only if, iff.,
rarely also called bisubjunction, biimplication
˙ , X OR, |
Exclusive Disjunction. ∨
sometimes also called exclusive OR
Sheffer Stroke. ↑, N AND, |, i and similar symbols
Peirce Stroke. ↓, NOR, †, ! and similar symbols
sometimes also called Quine dagger or referred to as one of the Sheffer
strokes
Parentheses are used for making scope distinctions; to enhance readability we
will allow [ and ] as notational variants of ( and ).
1 The symbols ⊤ and ⊥ are sometimes used for other purposes. Don’t bother too much with
the choice of symbols.
2.1. SYNTAX OF PROPOSITIONAL LOGIC
33
✧ Remark 3 (Arrow or horseshoe?) The conditional is of particular importance in logic and many excellent logicians use the horseshoe ⊃ instead
of the right arrow for the conditional. In a sense this notation is a bit
misleading; there is a close connection between the conditional → and the
subset relation ⊆ of set theory and so the horseshoe is pointing into the
wrong direction. On the other hand, the horseshoe brings luck and so the
final word has not yet been spoken on this issue of uttermost importance.
L History 4 (A Well-Known Anecdote) This is not really related to the
use of ‘⊃’ in logic. A visitor is said to once have asked Niels Bohr, the famous
physicist, whether he really believed the horseshoe above the door of his
house would bring him luck. Bohr replied: “Of course not. . . but I am told it
works even if you don’t believe in it.”2
2.1.2 Well-Formed Formulas
Given the basic entities defined in the previous section, the syntax of our formal
language may be defined in various ways. A very common one is by giving a
recursive definition like the following one.
(Syn1) Well-Formed Formula.
defined as follows:
The set L S of well-formed formulas of PC is
1. If φ ∈ L C then φ ∈ L S .
2. If φ ∈ L S and τ ∈ L N , then τφ ∈ L S .
3. If φ, ψ ∈ L S and ◦ ∈ L B then (φ ◦ ψ) ∈ L S .
4. Nothing else is in L S .
Let’s abbreviate well-formed formula as wff and go through the definition. The
Greek letters φ and ψ are used as meta-variables for propositional constants,
ν stands for a unary junctor – which in our case is only ¬ – and ◦ stands for
any of the binary junctors such as ∧, ∨, or →. The definition first states that
any propositional constant alone is a wff. Then it states that when we put the
2 It is disputed whether this dictum can really be attributed to Niels Bohr; it might in fact
have been made by one of his neighbors and he liked the reply.
34
CHAPTER 2. PROPOSITIONAL LOGIC
negation sign ¬ in front of a wff, the result will be a wff and that any two wffs can
be combined by an infix binary junctor and when wrapped into parentheses will
result in a wff. So the definition defines for example the following strings as wffs:
p, r 78 , (p → q), ((p ∧ q) → p), ¬(¬ p ∧ ¬ q), (((p ∧ q) ∨ (¬ p ∧ ¬ q)) ↔ (p ↔ q). These
are just a few examples and it is easy to see that |L S | = ℵ0 . We can enumerate
all wffs by starting with the simplest ones and combine them systematically to
produce more complex wffs but there is no finite limit to the size of a wff; the
above definition produces countably infinitely many wffs.3
It is common to leave out outer parentheses to make wffs more readable
and we will do that, too. So instead of (p ∧ q) we may write p ∧ q and instead
of ((p ∧ q) → r) we may write (p ∧ q) → r. It is also common to define an order
of precedence for the connectives, often one according to which ∧ and ∨ bind
stronger than → and ↔. According to such conventions, it might for example
be allowed to write p ∧ q → r for (p ∧ q) → r. This frequently confuses beginners
and so we will not adopt precedence rules in this chapter. If you encounter a text
with precedence rules and are in doubt about the right grouping of the symbols,
apply the following general rule:
In case of doubt, refer to the definition!
Although full precedence rules are not used in this chapter, we will allow a
notational convention in order to make formulas more readable. When we have
several subsequent applications of conjunction we may leave out parentheses
except the outermost if they are necessary. For example, instead of (p ∧ (q ∧ r)) →
p we may write (p ∧ q ∧ r) → p. This is admissible because, as we shall soon be
able to prove, (φ1 ∧ (φ2 ∧ φ3 )) ↔ ((φ1 ∧ φ2 ) ∧ φ3 ). Not only that, as it will become
apparent when a precise semantics for ∧ has been given, in any sequence of
conjunctions the arguments can be permuted arbitrarily without having any
effect on the truth value of the formula as a whole. Since so far only syntax void
of any meaning is available these claims have to be taken for granted for the
time being.4
As you probably know or might easily imagine, there are many other ways
to define the syntax of a formal language. Because of its importance one more
method deserves mentioning. Some logicians and many computer scientists
sometimes use a very compact notation for defining wffs that looks like this:
S := (φ ∧ ψ) | (φ ∨ ψ) | (φ → ψ) | (φ ↔ ψ) | ¬φ
or, slightly more correctly,
3 A diligent logician would prove this claim by induction on the size of formulas, but this level
of detail is beyond the scope of this introductory course.
4 When asked about some issue early during his talk, a logician whose name I cannot
remember once answered at a conference: “I cannot answer this question, because I have only
introduced the syntax by now.” (It was a joke.)
2.1. SYNTAX OF PROPOSITIONAL LOGIC
35
S := (S 1 ∧ S 2 ) | (S 1 ∨ S 2 ) | (S 1 → S 2 ) | (S 1 ↔ S 2 ) | ¬S
These are essentially just concise variants of the definition given above.
Main Junctor. The main junctor or connective in a formula is the occurrence
of a connective at the topmost priority level of that formula. For example, → is
the main junctor in p → (q ∧ r) and ↑ is the main junctor in (p ↓ q) ↑ (p ↓ q). Of
course, a formula can contain more than one occurrence of a connective. Intuitively it is not very hard to understand what is meant by ‘topmost priority level.’
If we want to make the notion more precise, it is best to define it simultaneously
with the definition of a wff:
(Syn2) Well-formed Formula and Main Junctor.
1. If φ ∈ L C then φ ∈ L S and φ has no main junctor.
2. If φ ∈ L S and ν ∈ L N , then νφ ∈ L S and the main junctor of φ is ν.
3. If φ, ψ ∈ L S and ◦ ∈ L B then (φ ◦ ψ) ∈ L S and the main junctor of φ is ◦.
4. Nothing else is in L S .
The idea is here that previous determinations of the main junctor are overruled
by the very last one. So for example, p and q have no main junctor, the main
junctor of p ∧ q is ∧, the main junctor of p ∨ q is ∨, and finally the main junctor
of (p ∧ q) → (p ∨ q) is →.
L History 5 (Polish and Reverse Polish Notation) Before fancy typesetting was available, authors often used some more compact notation without any parentheses. Famous Polish logician Jan Łukasiewicz introduced
Polish notation depicted in table 2.1. For example, the propositional formula
(p ∧ q) ↔ ¬(¬ p ∨¬ q) for one of DeMorgan’s laws is written EK pqN AN pN q
in Polish notation.
In reverse Polish notation, the arguments are written first and then the
functor last. So for instance (p ∧ q) is written pqK. This notation is still
used for some scientific calculators and programming languages like Forth.
2.1.3 Exercises
✐ Exercise 22 Determine which of the following formulas are well-formed
according to definition S1 and which aren’t. (Outer parentheses may have been
left out.)
36
CHAPTER 2. PROPOSITIONAL LOGIC
Common Notation
¬φ
φ∧ψ
φ∨ψ
φ→ψ
φ↔ψ
φ↑ψ
Polish Notation
Nφ
Kφψ
Aφψ
Cφψ
Eφψ
Dφψ
Table 2.1: Polish notation for propositional logic.
a. ((p → q) ∧ (r ↔ q)) → p
g. p → (q → (p → q))
b. p ∧ q ∨ r
˙ (r 3 ↓ r 3 ))
h. p ↑ (q∨¬
c. ((p ↔ q)(r ∧ p))
i. (p ∧ (q → (q ↔ p))))
d. ((¬ p) ∧ (q → r 2 ))
j. (((p ∨ q) ∧ r) ∨ q 2 ) ↔ p′
e. (a ∧ (b ∨ c))
k. p ↔ p
f. (P(x) ↔ Q(x)
l. (q¬ ↔ (p ∨ r))
✐ Exercise 23 Underline the main junctor of the following wffs if there is one:
a. ((p ∧ q) ∨ (r ∧ (q ∨ r)))
b. q
c. ¬(p ↔ q)
d. (p ∧ q) ↑ (¬ p ∧ ¬ q)
e. (p ∧ q) → p
f. (((p → q) ∨ (q → r)) → ((p ∨ ¬ q) ∨ (¬ p ∨ q)))
g. ¬(¬ p ∨ ¬ q) ↔ (p ∧ q)
h. ¬ p
2.2 Semantics of Propositional Logic
So far we have only defined a way of notating formulas of a concise formal
language. These are supposed to stand for sentences, sentence negation and
combinations between sentences. So we now have to define how well-formed
2.2. SEMANTICS OF PROPOSITIONAL LOGIC
37
formulas, i.e. sentences of our formal language PC, are to be translated. We will
now specify the semantics of PC by giving rules that assign a meaning to each
of the infinitely many well-formed formulas.
A crucial thing to note about this semantics is that is based on a strong
but powerful idealization: The meaning of a complete sentence, here simply
represented by a propositional constant, is considered to be either truth or falsity
and nothing else. This might sound inadequate at first glance, but we will see
that it is amazingly powerful. In any case, propositional logic will turn out to
be the basis of many more powerful semantic representations. We begin by
specifying a model for PC and subsequently define what it means for a formula
to be true in a model.
2.2.1 Models and Truth in a Model
Model. A model M for propositional logic consists of an interpretation function
I : L C → {1, 0} for propositional constants.
Truth in a Model. We define truth in a model, writing M Í φ for the fact that
a well-formed formula φ is true in model M. (And we write M Õ φ for the fact
that it is not the case that M Í φ.) Í is defined recursively as follows:
• M Í φ iff. I(φ) = 1 (for φ ∈ L C ).
• M Í ¬φ iff. M Õ φ.
• M Í (φ ∧ ψ) iff. M Í φ and M Í ψ.
• M Í (φ ∨ ψ) iff. M Í φ or M Í ψ (or both).
˙ ψ) iff. either M Í φ or M Í ψ (but not both).
• M Í (φ∨
• M Í (φ → ψ) iff. M Õ φ or M Í ψ.
• M Í (φ ↔ ψ) iff. either M Í φ and M Í ψ, or both M Õ φ and M Õ ψ.
• M Í (φ ↑ ψ) iff. it is not the case that: M Í φ and M Í ψ.
• M Í (φ ↓ ψ) iff. it is not the case that: M Í φ or M Í ψ.
• M Í ⊤ is always true.
• M Í ⊥ is always false.
38
CHAPTER 2. PROPOSITIONAL LOGIC
☞ Note 5 (Object versus Meta-Language) The above denotational semantics provides an interpretation to the well-formed formulas that have
been defined in the previous section. When talking about logical systems
the object-language has to be distinguished from the meta-language. In case
of PC the object language is the set of well-formed formulas L S defined in
the previous section, or, if we take a look at the formal language plus its
interpretation, L s in combination with the above denotational semantics.
The meta-language, on the other hand, is English with a bit of mathematical
vocabulary. It is also possible to define a formal language on the basis of
another formal language.
2.2.2 Truth Tables
According to the above definition of truth in a model all connectives are interpreted as truth-functions taking well-formed formula as an argument and
yielding a truth-value. Instead of the above semantics we can also use truthtables to specify these functions. For example, truth-functional negation flips
the truth-value:
Negation
φ ¬φ
1
0
1
0
Here are the tables for the binary connectives:
Conjunction
φ ψ φ∧ψ
1 1
1
0
1 0
0
0 1
0 0
0
Disjunction
φ ψ φ∨ψ
1 1
1
1
1 0
1
0 1
0 0
0
Conditional
φ ψ φ→ψ
1 1
1
0
1 0
0 1
1
1
0 0
Biconditional
φ ψ φ↔ψ
1 1
1
1 0
0
0
0 1
0 0
1
Exclusive Disjunction
˙ψ
φ ψ φ∨
1 1
0
1 0
1
1
0 1
0
0 0
Sheffer Stroke
φ ψ φ↑ψ
1 1
0
1 0
1
1
0 1
0 0
1
39
2.2. SEMANTICS OF PROPOSITIONAL LOGIC
Peirce Stroke
φ ψ φ↓ψ
1 1
0
0
1 0
0 1
0
1
0 0
If you study these tables carefully, you’ll likely come to the conclusion that they
do not specify all possible truth functions.5 It is not hard to see that there are
22 = 4 unary truth-functions, among which negation is just one. The others are
not very interesting, though. For example, the identity function maps 1 to 1
and 0 to 0. The other two just yield false or true respectively, no matter what
input they get. Likewise, there are 24 = 16 binary truth-functions, but the most
interesting ones are displayed above.
☞ Note 6 (All Truth Functions) Here is a list of all 16 binary truth functions:
p
1
1
0
0
q
1
0
1
0
①
1
1
1
1
∨
1
1
1
0
②
1
1
0
1
→
1
0
1
1
↑
0
1
1
1
③
④
⑤
⑥
1
1
0
0
1
0
1
0
0
1
0
1
0
0
1
1
↔
1
0
0
1
˙
∨
0
1
1
0
↓
0
0
0
1
⑦
⑧
0
0
1
0
0
1
0
0
∧
1
0
0
0
⑨
0
0
0
0
The common functions are really just ∧, ∨, →, and ↔. The functions
marked with circled numbers are often not named. If you absolutely can’t
live without giving them names, here are some suggestions:
① Verum function; ② converse conditional; ③ projection of first argument; ④
projection of second argument; ⑤ negation of second argument; ⑥ negation
of first argument; ⑦ converse nonconditional; ⑧ nonconditional; ⑨ Falsum
function.
Don’t use these names! Everybody will understand the truth table but not
many people will associate anything useful with the the names for ① - ⑨.
Notice that according to the definition of truth in a model and the above tables
the specific semantic content or meaning of a sentence is disregarded and its
truth value in a model is taken into account by using the valuation function
to interpret the corresponding propositional constant. Take D t = {1, 0} as the
5 Or, you don’t study them carefully and instead skim through Wikipedia. However, from just
looking things up you do not learn how to think and solve problems on your own.
40
CHAPTER 2. PROPOSITIONAL LOGIC
set of truth-values. The interpretation of a unary connectives is a function
f : D t → D t and interpretation of a binary connective is a function f : (D t × D t ) →
D t . Hence, it is correct to say that unary connectives are interpreted as a
unary truth-function from truth-values to truth-values and binary connectives
are interpreted as a binary function from the set of all ordered pairs of truthvalues to truth-values. However, the distinction between the syntactic entity,
i.e. a connective, and its interpretation is often mixed up and in a more loose
way of talking it is quite common to speak about truth-functions for both the
connectives as syntactic entities and their interpretation.
As it turns out at a closer look, the truth-functions are mostly interdefinable.
Take for example the conditional p → q, which is only false if the antecedent p
is true and the succedent q is false and compare it with the following expression:
¬ p ∨ q. They have exactly the same truth-values, irrespective of what the actual
values of p and q are. We prove this by constructing a table that takes into
account all 4 possible combinations of values of p and q:
φ
1
1
0
0
→
1
0
1
1
ψ
1
0
1
0
¬
0
0
1
1
φ
1
1
0
0
∨
1
0
1
1
ψ
1
0
1
0
As you can see the main junctors of both formulas have exactly the same
entry 〈1, 0, 1, 1〉. So the truth conditions of these formulas are exactly the
same. This proves that → can be defined in terms of ¬ and ∨ by the purely
syntactic abbreviation scheme (φ → ψ) := (¬φ ∨ ψ). What about interdefinability
in general? We capture the interdefinability of connectives in propositional logic
by the notion of a base for propositional logic.
Base. A set S of unary or binary connectives is a base for classical 2-valued
propositional logic if and only if all other unary and binary connectives can be
defined by the ones in S by a syntactic abbreviation scheme.
It is not hard but also not trivial to show in the detail exactly which combinations
of connectives form a base and which don’t. Of course, it is very simple to show of
a given set of junctors S that it is indeed a base. Just define all other connectives
in terms of the ones in S and proof that the definition is correct by using the
method of the truth-tables illustrated above. One clear criterion for a base is
that we can express negation in it. So for example, the set {∧, ↔} is not a base,
because we cannot express solely in terms of conjunction and biconditional. In
contrast to this, {¬, →} is a base.
41
2.2. SEMANTICS OF PROPOSITIONAL LOGIC
☞ Note 7 (Sheffer Strokes.) Does any base contain at least two connectives? As it happens, the answer is No. Sometimes students are surprised
to hear or find out on their own that the Sheffer and Peirce strokes are
bases on their own. Let us take a look why this is so in case of Sheffer
stroke ↑. (The corresponding line of reasoning for the Peirce stroke is quite
similar.) Notice that the truth-table of ↑ is like one for conjunction except
that all result values have been ‘flipped.’ Hence φ ↑ ψ should be equivalent
to ¬(φ ∧ ψ) and indeed it is. You can check this by using the method of
truth-tables. Now let’s try to define negation. Obviously, negation is a unary
truth-function and so it only takes one value. To simulate this, we assume
only one propositional constant in whatever definition we come up with.
Let’s try p ↑ p. The truth-table for this is:
φ
1
0
↑
0
1
φ
1
0
That is indeed negation. If you’re not yet convinced, take a look at the
following, more verbose table that proves that the formula ¬ p ↔ (p ↑ p) for
any p and any value of p:
p
1
0
p
1
0
¬
0
1
p
1
0
↔
1
1
(p
1
0
↑
0
1
p)
1
0
To show that the dyadic truth-functions are definable by ↑ larger tables with
two propositional constants and all combinations of truth values between
them have to be constructed.
Once it has been shown that a set of connectives S 1 is a base, then in order
to show that another combination S 2 of connectives is a base it suffices to
show that that all the connectives in S 1 can be defined in terms of those in S 2 .
Exhaustively proving of a set of connectives (possibly just containing ↑ or ↓) that
it is a base is easy but tedious; we omit the proof here and merely point out that
{↑}, {↓}, {¬, ∧}, {¬, ∨}, and {¬, →} are common bases. That is also the reason why
the syntax and semantics of propositional logic and systems extending it often
just provides syntactic and semantic rules for a base like {¬, ∧} and all the other
connectives are defined as syntactic abbreviations.
2.2.3 Important Notions
42
CHAPTER 2. PROPOSITIONAL LOGIC
Validity, Tautology. A wff φ is valid if and only if it is true in any model. It is
common to omit reference to any particular model and write Í φ for expressing
the fact that φ is valid. A valid wff is also called a tautology.
Satisfiability.
that M Í φ.
A formula φ is satisfiable if and only if there is a model M such
This definition might look weird at first glance, because it relies so much on the
notion of a model. The idea is, however, simply that a satisfiable formula that
is not valid may be true or false depending on the state of the world, which is
represented by the valuation function of the model. Compare this with a valid
formula. Models represent arbitrary interpretations of our language and in a
sense define the ‘logical space’, what is logically possible, with respect to what
can be expressed in the language. If φ is true in all models this also means that
there is no model in which φ is false. Consequently, it is not (logically) possible
for φ to be false. In other words, a valid formula is true for purely logical reasons.
In contrast to this, when a wff that is satisfiable and not valid turns out to be
true in some particular model, then it is true because of the state of affairs of
the world represented by that model. A wff that is satisfiable but not valid is
sometimes called contingent.
For example, consider p to stand for ‘O Pedro ama a Maria’ and q stand
for ‘A Maria gosta do Pedro’. How do we find out whether p and q are true or
false respectively? We could, for example, ask Maria and Pedro and build a
model accordingly. Suppose they both agree on the respective statement. Then
in in our model M, I(p) = 1 and I(q) = 1. Hence, M Í p ∧ q. Clearly p ∧ q is not
valid, though, as we can easily build a model M ′ according to which for example
I(q) = 0. Contrast this with the statement ‘O Pedro ama a Maria ou o Pedro não
ama a Maria’. Understood literally, this statement could be translated to p ∨ ¬ p
and is clearly valid, as the following truth table proves:
p
1
0
∨
1
1
¬
0
1
p
1
0
✧ Remark 4 (Dependence on Analysis) The formula p ∨ ¬ p is valid in
propositional logic where we only whole sentences and their truth values
and no quantifiers are taken into account. According to a more fine-grained
reading of the sentence ‘O Pedro ama a Maria ou o Pedro não ama a Maria’
one could argue that the respective translation of this sentence into a logical
language ought not be valid, because there does not seem to be any rule
about Portuguese proper names that warrants that a person named ‘Pedro’
or a person named ‘Maria’ actually exist; after all, whether the bearer of
2.2. SEMANTICS OF PROPOSITIONAL LOGIC
43
a proper name exists or not can hardly be a matter of logic alone. This is
indeed a good point about proper names and has been discussed extensively
in the literature on empty names and existence presuppositions. There
is no way to express this view in the relatively impoverished language of
propositional logic, though.
Care should be taken not to prematurely reject a theoretical tool just
because it is not fully adequate for describing a certain phenomenon. As
long as it can be used to adequately describe other phenomena it might
have its place in the theoretical apparatus of a scientist. Ultimately it is a
virtue of any good theory to be restricted to a particular problem domain
and ignore other phenomena. Take for example classical mechanics. In
classical, Newtonian mechanics bodies are sometimes regarded a point
in space whereas we all know that any real object is spatially extended.
Nevertheless, nobody would reject mechanical theories just because they
don’t take into account the extendedness of physical bodies. In other words,
bear in mind the following deep wisdom:
Don’t expect from a hammer
to be useful for drilling holes!
Contradiction. A wff is a contradiction if it is not satisfiable in any model.
For example, p ∧ ¬ p is a contradiction as the following truth table proves:
p
1
0
∧
0
0
¬
0
1
p
1
0
Consistency. Two wffs are consistent with each other if they are satisfiable
conjointly, i.e. if there is a model in which both formulas are true. In other words,
two wffs are consistent if they are not contradictory to each other. Likewise, a
set of wffs is consistent if there is a model in which all of them are true, or, put
in other terms, if the conjunction of these wffs is not a contradiction.
Notice that the term ‘coherence’ is sometimes used in a similar sense as consistency, but has a much broader meaning. For example you could claim that
a sentence like ‘If the moon is made of green cheese then 1 equals 1’ is consistent, but that there is no coherent connection between the antecedent and the
succedent of ‘If. . . then’ in that phrase.
44
CHAPTER 2. PROPOSITIONAL LOGIC
Equivalence Between Formulas. Two wffs are equivalent if and only if
they are satisfied by exactly the same models. In other words, two formulas φ
and ψ are equivalent if and only if for all models M, M Í φ iff M Í ψ. Despite
some obvious similarities this way of talking about equivalence between formulas in a logical language needs to be kept apart from the general notion of an
equivalence relation. (See note 8 for more details.)
It is useful to memorize the following rules about the connection between satisfiability, validity, and contradictions:
• A valid wff is satisfiable (in any model).
• The negation of a valid wff is a contradiction.
• The negation of a contradiction is valid / is a tautology.
• The negation of a tautology is a contradiction.
☞ Note 8 (Abbreviation vs. Equivalence vs. Biconditional) Recall
our use of the notation := to indicate syntactic abbreviations. Now that
a formal language is available with a syntax and its interpretation in a
model this is clearly distinct from similar and closely related symbols like =
or ↔. The notation := really just indicates a syntactic abbreviation that
allows you to replace the string on the left-hand side in a formula by the
string on the right-hand side, as long as the syntax of the language and
the parentheses are respected. For example, if φ → ψ := ¬φ ∨ ψ, the wff
(p → q) → p must be rewritten as ¬(¬ p ∨ q) ∨ p no matter what →, ¬, or ∨
actually mean.
In contrast to this, a = b stands for identity, meaning that the constants
‘a’ and ‘b’ denote the same object.6
Yet in contrast to this, p ↔ q is a truth-function that yields 1 if either
the value of both p and q is 1 or both have the value 0. Bear in mind that
↔ takes propositional constants and not constants or variables for objects,
whereas the arguments of = are objects in the very broad sense.
Notice the close relation between all of these notions, though. We could
have defined M Í φ ↔ ψ as yielding 1 if I(ψ) = I(φ), 0 otherwise. Moreover,
a formula defined by syntactic abbreviation just is the right-hand side and
thus only gets it meaning from the meaning of the right hand side. But
what about equivalence? Recall from the last chapter that any symmetric,
reflexive, and transitive relation is an equivalence relation. So when we
say a and b are equivalent it always means equivalent with respect to a
particular equivalence criterion. Now identity is an equivalence relation
between objects based on the criterion that they are really one and the
2.3. PROOF THEORY
45
same object, i.e. {〈 x, x〉 | x ∈ D } for a given domain D of objects, and the
biconditional is an equivalence relation based on the criterion that the
formulas on each of its sides have identical truth values. For this reason,
it is also common to call the truth-function ↔ equivalence or speak of
equivalence between formulas when both of them are satisfied in exactly
the same models.
Theory, Intended Model. A set of sentences (wffs) of a logical language is
often called a theory. Usually this is meant in such a way that the wffs in the
set are interpreted in intended models; these are the models in which all the
non-logical constants have their intended meaning, as it is characterized by
paraphrases. Of course, regarding a set of formulas a theory is an idealization.
The Method of Truth Tables.
for the use of truth tables:
It is useful to memorize the following rules
• If the column of the main junctor of a complete truth table has the value
1 for any combination of values of the propositional constants, then the
whole wff is valid.
• If the column of main junctor of a complete truth table has the value 0 for
any combination of values of the propositional constants, then the whole
wff is a contradiction.
• If the column of the main junctor of a complete truth table contains both 0
and 1’s then the wff is satisfiable but not valid.
• If the columns of the main junctors of two complete truth tables for two
formulas φ and ψ coincide, then φ and ψ are equivalent.
2.3 Proof Theory
In this section we will look at a proof theory for propositional logic. Proof theories
are more or less mechanical methods for testing whether a wff is valid or not.
We have already seen the method of truth tables that indeed presents a limited
6 It is not that simple. In ‘Über Sinn und Bedeutung’ Gottlob Frege considers this so called
meta-linguistic view of identity counter-intuitive, because symbols are arbitrary, and instead
suggested that the senses (or, in another parlance, meanings) of a and b determine the same
object. Which notion of identity is the right one is still a controversial issue in philosophy and
the difference in opinion matters a lot for linguistics, because Neo-Fregean views about meaning
have given rise to intensional semantics. I’ll leave this issue open for now.
46
CHAPTER 2. PROPOSITIONAL LOGIC
form of a proof theory, but truth tables cannot be regarded as a practical proof
theory in general. Why not? The tables we have seen so far were based on
wffs with two distinct propositional constants and had four rows. A table for
3 propositional constants has 8 rows. To construct it, take a table for two
propositional constants, duplicate the rows, and add the rows 〈1, 1, 1, 1, 0, 0, 0, 0〉
for the third propositional constant. In general, a table for n propositional
constants has 2n rows. Thus, a table for four variables has 16 rows, and a table
for five variables has 64 rows. That’s about the maximum number of rows that
are practical without the aid of a machine. Since the grow in size is exponential,
even fast computers soon get to their limits. Take for example a truth table for
a formula with 32 propositional constants. This has 232 = 4294967296 rows; I’m
not sure whether my computer can handle a table of this size efficiently. If you
take 265 propositional constants you will get 2265 rows, which is roughly the
same as the number of atoms in the universe.7
Fortunately there are other methods for testing for tautologies, and these
methods can also used for logical systems for which there is no practical method
of truth tables. With a lot of additional trickery by computer scientists some
proof theories can also be implemented efficiently. Common strands of proof
theories are:
• Axiom Systems
• Sequent Calculi
• Natural Deduction
• Semantic Tableaux
All of them are similar to each other and have their advantages and disadvantages depending on what they are used for. For example, axiom systems are
often used for capturing informal theories in a formal setting in a first step
of formalizing the theory by means of logical tools. Sequent calculi are often
used for proving important metatheorems about new logical systems. Natural
deduction rules to some extent reflect the way people reason logically (insofar as
they do so) and can be regarded as more accessible versions of sequent calculi.
Semantic tableaux are easy to use and often provide the basis for the implementation of automatic theorem provers. We will learn how to use semantic
tableaux. Once you have mastered semantic tableaux, you will have the skills
necessary to understand other proof theories even if you do not appreciate their
complexity.
Theorem. A wff that has been proved using proof theoretic methods is called
a theorem.8
7 According to the Wolfram search engine, which gives the number 1080 as an estimate.
8 Used in a more general sense, the term ‘theorem’ is also often used for any interesting
proposition that is provably true and of a certain importance. Whether a valid formula in some
47
2.3. PROOF THEORY
φ∧ψ
¬(φ ∨ ψ)
φ
ψ
¬φ
¬ψ
φ
¬¬φ
φ→ψ
¬(φ ∧ ψ)
¬φ
¬ψ
¬(φ → ψ)
φ
¬ψ
φ
φ∨ψ
¬φ
ψ
ψ
φ↔ψ
¬(φ ↔ ψ)
φ
ψ
φ
¬ψ
¬φ
¬ψ
¬φ
ψ
Table 2.2: Tableaux rules for propositional logic.
2.3.1 Tableaux Rules for Propositional Logic
A semantic tableaux is an at most binary branching tree containing formulas
at each node. Table 2.2 lists the rules for constructing such trees given some
initial wff. The tableaux rules are very easy to read. Suppose you have a wff
χ with main connective ∧ and conjuncts φ and ψ. Each conjunct could be a
propositional constant or a complex formula – that is the reason why we’ve used
meta variables for elements in our set L s of well-formed formulas. The rule for
∧ then allows you to put φ and ψ under χ within the same branch of the tree as
χ. Here is an example of a single application of the rule for conjunction:
p ∧ (q ∨ r)
p
q∨r
Nothing branches so far but as you can see the tree will start to branch when
we apply the rule for disjunction to q ∨ r, leading to the following tree:
interpreted logical language or a theorem in general is meant is usually clear from the context,
but in case of doubt theorems about a logical language should be called metatheorems. When the
terms are not used for logical languages in particular, a less interesting theorem is usually called
lemma, corollary, or just proposition.
48
CHAPTER 2. PROPOSITIONAL LOGIC
p ∧ (q ∨ r)
p
q∨r
q
r
Let’s take a look at another example. We want to proof the wff ¬(p ∧ ¬ p). First,
we negate it and start a new tree. Then we apply the rules that are applicable.
The resulting tree is:
¬¬(p ∧ ¬ p)
p ∧ ¬p
p
¬p
First the rule for double negation and then the rule for conjunction was applied.
As it happens p and ¬ p occur on the same branch. It follows from the rules of
our calculus that this means that there is no model for ¬¬(p ∧¬ p). From this we
can conclude that ¬(p ∧ ¬ p) is a theorem. Let’s cross-check with a truth table:
¬
1
1
(p
1
0
∧
0
0
¬
0
1
p)
1
0
The main junctor is the outer negation and for all combinations of truth values
of the propositional constant p the truth value of the whole formula is 1. Thus,
the formula is valid and our result was correct.
2.3.2 How to Use Tableaux
A few more definitions will make our intuitive assessment of how semantic
tableaux work more precise.
Complete Tree. A tree constructed with the tableaux rules is complete if the
leaves of all branches only contain formulas of the form φ or ¬φ, i.e. if no more
rules can be applied to formulas to which no rules has been applied so far.
Closed Branch.
its negation ¬φ.
Open Branch.
The branch of a tree is closed if it contains a formula φ and
A branch is open if it is not closed.
Closed Tree. A tree constructed with the tableaux rules is closed if all
branches of the tree are closed.
49
2.3. PROOF THEORY
Proving a Theorem. In order to prove that a wff φ is a theorem, start a new
tree with the negated wff ¬φ. Then subsequently extend the tree by applying
the rules to all open branches until all branches are closed or the tree is complete.
If the tree is closed then φ is a theorem; otherwise it is not a theorem.
Provability.
is provable.
When there is a proof for a wff φ we write ⊢ φ for the fact that φ
You might wonder why I talk about proving theorems instead of checking for
tautologies. I have claimed that the tableaux rules check whether a wff is valid
or not, but from a strictly logical point of view this hasn’t been shown yet. All I
have done is to smear some rules on paper – who knows whether they are the
right ones? Proving that they are the right ones has two aspects, soundness
and completeness, which will be briefly covered in section 2.5. I’ll assume these
proofs from now on – for propositional calculus and first-order predicate logic
they are part of the folklore, whereas you certainly have to show them for any
new logical system. If the proof theory is sound and complete it suffices to show
⊢ φ for showing that Í φ and vice versa..
Tableaux are systematic procedures for finding counter-models, i.e. valuations for all the propositional constants involved for which a given formula
is false. A counter-model can also be regarded as the (partial) description of
a situation in which the whole formula is definitely false. In order to proof φ
we then start searching for counter-models for ¬φ, i.e. we begin the tree by
assuming ¬φ. When the tree is complete, every branch of the tree that is closed
represents a counter-model to ¬φ, i.e. a situation in which ¬φ is false. If all
branches are closed, then ¬φ is a contradiction. Therefore, φ is a tautology. This
is a special form of the proof method called reduction ad absurdum.
Let us take a look at a sample proof now. We want to proof that p → (p ∨ q)
is a theorem of propositional logic. We proof this by showing that ¬(p → (p ∨ q))
is a contradiction using tableaux:
¬(p → (p ∨ q))
p
¬(p ∨ q)
¬p
¬q
The tree contains only one branch, which is closed by p and ¬ p. QED.
Let us take a look at an example of a longer proof with a branching tree. Let’s
proof that (¬ p ∧ (q → p)) → ¬ q is a theorem. This formula exemplifies an argu-
50
CHAPTER 2. PROPOSITIONAL LOGIC
mentation scheme that is often found in real world reasoning.9 We assume the
negation of this formula and show by using tableaux that it is not satisfiable, i.e.
that it is a contradiction:
¬((¬ p ∧ (q → p)) → ¬ q)
¬ p ∧ (q → p)
¬¬ q
q
¬p
q→ p
¬q
p
Both branches close: The ¬ q leaf is in conflict with the q node further above,
and the p leaf is in conflict with the ¬ p node further above in the tree.
Hints.
Remember the following points:
• To prove that a formula φ is a tautology, start the tree with ¬φ and apply
the rules until the tree is completed.
• When negating φ make sure you negate the whole formula by putting
parentheses around it when needed. For example, the negation of p → q
is ¬(p → q).
• You need to evaluate each subformula in each branch unless the branch is
already closed.
• If the tree branches into two subtrees and you apply a rule to a formula
higher in the tree (i.e. before it branches), you need to put the results into
both branches. Here is an example:
¬((p ∨ q) → (q ∨ p))
p∨q
¬(q ∨ p)
p
q
¬q
¬p
¬q
¬p
9 Deductive arguments are discussed later. Anyway, here is an example: If it rains, the
streets are wet. The streets aren’t wet. Hence, it doesn’t rain.
51
2.3. PROOF THEORY
• You should always evaluate formulas that do not branch first. The above
proof could have simpler been written as:
¬((p ∨ q) → (q ∨ p))
p∨q
¬(q ∨ p)
¬q
¬p
p
q
Logical Consequence, Deductive Argument. When ψ can be proved given
that the assumptions φ1 , . . . , φn hold, then we write φ1 , . . . , φn ⊢ ψ. We say that
ψ is a (logical) consequence of φ1 , . . . , φn . When a conclusion ψ logically follows
from (is a logical consequence of) a number of premises φ1 , . . . , φn this is also
often called a deductively valid argument.
In order to proof φ1 , . . . , φn ⊢ ψ using tableaux a new tree is started by putting
the premises φ1 , . . . , φn each in a node below each other, add ¬ψ, and then
continue according to the rules as in case of checking for a tautology. This means
that the conjunction of the premises and the negated conclusion is checked for
being contradictory. To see that this procedure is sound, consider the simplest
case φ ∧ ¬ψ first. This is equivalent to ¬(¬ψ ∨ ψ) by DeMorgan’s law and double
negation elimination, which in turn is equivalent to ¬(φ → ψ). If we replace
φ by the conjunction (φ1 ∧ · · · ∧ φn ) the same equivalences hold. So the above
proof procedure is based on the validity of the scheme ¬((φ1 ∧ · · · ∧ φn ) → ψ) ↔
(φ1 ∧ · · · ∧ φn ∧ ¬ψ).
Deductive Closure. With respect to the consequence relation ⊢ of a logical
language L with a set of wffs L we can define the deductive closure Cn : P (L) →
P (L) of a set S of wffs (i.e. of a theory) as follows:
Cn(S) := {φ ∈ L | ψ1 , . . . , ψn ∈ S and ψ1 , . . . , ψn ⊢ φ}
(2.1)
The deductive closure of a set of formulas represents all formulas that logically
follow from that set in a given logical system (such as our language PC), and the
concept of deductive closure plays an important role in many applications of logic
(e.g. in so-called belief revision theory). A special, deviate case to watch out for
is when the set of formulas is contradictory, e.g. { p ∧¬ p}. In that case, the result
is all sentences of the language, i.e. Cn({ p ∧ ¬ p}) = L, because in classical logic
everything follows from a contradiction – and L is, of course, also contradictory
as long as the language contains a negation. This issue is discussed in more
detail below.
52
CHAPTER 2. PROPOSITIONAL LOGIC
2.3.3 Alternative Notation
Consider this proof from the last section:
¬((¬ p ∧ (q → p)) → ¬ q)
¬ p ∧ (q → p)
¬¬ q
q
¬p
q→ p
¬q
p
This proof can be represented in a more verbose and failsafe way as follows:
To show: (¬ p ∧ (q → p)) → ¬ q
Proof:
1.
¬((¬ p ∧ (q → p)) → ¬ q)
2.
¬ p ∧ (q → p)
3.
¬¬ q
4.
q
5.
¬p
6.
q→ p
7a. ¬ q
7b. p
assumption
1:¬ →
1:¬ →
3:¬¬
2:∧
2:∧
6:→
The tree is closed by 7a, 4 and 7b, 5. QED.
While this notation is more cumbersome, it is less error prone and recommended
for larger proofs. Typesetting these tables in LATEX is not very pleasant but
writing them is not so hard. I often use a large sheet of paper sideways and
start with the formula centered in the middle on top.
2.3.4 Selected Theorems
In this section a number of theorems of propositional logic are laid out. Missing
proofs are left out for exercise 24.
T1. ¬(p ∨ q) ↔ (¬ p ∧ ¬ q)
Proof:
53
2.3. PROOF THEORY
1. ¬(¬(p ∨ q) ↔ (¬ p ∧ ¬ q))
2a. ¬(p ∨ q)
2b. ¬¬(p ∨ q)
3a. ¬(¬ p ∧ ¬ q)
3b. ¬ p ∧ ¬ q
4b. p ∨ q
4a. ¬ p
5a. ¬ q
¬p
6a. ¬¬ p 6b. ¬¬ q 6c. ¬ q
7b. q
7c. p 7d. q
7a. p
assumption
1:¬ ↔
1:¬ ↔
2a:¬∧, 2b:¬¬
2a:¬∧, 3b:∧
3a:¬∧, 3b:∧
6a:¬¬, 6b:¬¬, 4b:∨, 4b:∨
The tree is closed by (7a, 4a), (7b, 5a), (7c, 5b), and (7d, 6c). QED.
T2.
¬(p ∧ q) ↔ (¬ p ∨ ¬ q)
Proof:
1. ¬((p → q) ↔ (¬ p → ¬ q))
2b. ¬(p → q)
2a. p → q
3a. ¬(¬ q → ¬ p) 3b. ¬ q → ¬ p
4b. p
4a. ¬ q
5a. ¬¬ p
5b. ¬ q
6a. p
6b. ¬¬ q 6c. ¬ p
7c. q
7a. ¬ p 7b. q
assumption
1:¬ ↔
1:¬ ↔
3a : ¬ →, 2b : ¬ →
3a : ¬ →, 2b : ¬ →
5a : ¬¬, 3b :→, 3b :→
2a :→, 2a :→, 6b : ¬¬
The tree closes with (7a, 6a), (7b, 4a), (7c, 5b), and (6c, 4b). QED.
T3.
p → (p ∨ q)
Proof:
1.
2.
3.
4.
5.
¬(p → (p ∨ q))
p
¬(p ∨ q)
¬p
¬q
assumption
1:¬ →
1:¬ →
3:¬∨
3:¬∨
The tree is closed by 4, 2. QED.
T4. ((p ∧ q) ∨ r) ↔ ((p ∨ r) ∧ (q ∨ r))
Proof: Table 2.3 on page 54. The tree closes with (7a, 4a), (7b, 5a), (6c, 3b), (6d,
3b), (8c, 7c), (8d, 4d), (9a, 7d), (9b, 4d), and (8f, 4d). QED.
T5.
((p ∨ q) ∧ r) ↔ ((p ∧ r) ∨ (q ∧ r)
T6.
(p → q) ↔ (¬ q → ¬ p)
T7.
p ∨ ¬p
Proof:
CHAPTER 2. PROPOSITIONAL LOGIC
0. ¬(((p ∧ q) ∨ r) ↔ ((p ∨ r) ∧ (q ∨ r)))
1a. (p ∧ q) ∨ r
2a. ¬((p ∨ r) ∧ (q ∨ r))
3b. r
3a. p ∧ q
4a. p
4b. ¬(p ∨ r)
5b. ¬ p
5a. q
6a. ¬(p ∨ r) 6b. ¬(q ∨ r) 6c. ¬ r
7b. ¬ q
7a. ¬ p
8a. ¬ r
8b. ¬ r
4c. ¬(q ∨ r)
5c. ¬ q
6d. ¬ r
1b. ¬((p ∧ q) ∨ r)
2b. (p ∨ r) ∧ (q ∨ r)
3c. ¬(p ∧ q)
4d. ¬ r
5d. p ∨ r
6e. q ∨ r
7c. ¬ p
7d. ¬ q
8c. p 8d. r 8e. p
9a. q 9b. r
54
Table 2.3: Tableaux proof of theorem T4.
8f. r
9c. q
9d. r
55
2.4. DEDUCTIVE ARGUMENTS
1.
2.
3.
4.
¬(p ∨ ¬ p)
¬p
¬¬ p
p
assumption
1:¬∨
1:¬∨
3:¬¬
The tree is closed by 4,2. QED.
T8.
p ↔ (p ∧ (q ∨ p))
T9.
p ↔ (p ∨ (q ∧ p))
T10.
((p ∧ q) → r) ↔ (p → (q → r))
T11.
((p → q) → p) → p
T12.
((p → q) ∧ (¬ p → r)) ↔ ((p ∧ q) ∨ (¬ p ∧ r))
2.3.5 Exercises
✐ Exercise 24 Prove the following theorems using semantic tableaux:
a. T5, T6
c. T10, T11, T12
b. T8, T9
2.4 Deductive Arguments
2.4.1 Valid Argument Schemes
Deductively Valid Argument Schemes. When it has been shown that a
particular wff is a logical consequence of a number of premises, this holds in
general. For example, since p, p → q ⊢ q is provable q, q → p ⊢ p holds as well.
This can be shown by conducting the proof using meta-variables for propositional constants such as φ and ψ. We can therefore speak of deductively valid
argument schemes.
In case you are in doubt about the validity of some purported argument scheme
use truth tables or tableaux for checking whether the scheme is valid. As
opposed to what was usual in the Middle Age there is no longer any need for
learning dozens of arcane names for valid or invalid argument schemes by heart.
Nevertheless, the following names for valid argument schemes are so common
that you should know them:
• Modus Ponens: φ, φ → ψ ⊢ ψ
56
CHAPTER 2. PROPOSITIONAL LOGIC
• Modus Tollens: ¬ψ, φ → ψ ⊢ ¬φ
• Contraposition: φ → ψ ⊢ ¬ψ → ¬φ
2.4.2 Sound Arguments, Fallacies, Good Arguments
Of course, in order to argue correctly – not to speak of convincingly, which is
an entirely different matter! – it does not suffice to use a deductively valid
argument scheme. The arguer also has to make it plausible to the audience
that the premises are true. Although it might have happened occasionally in
the humanities (e.g. in philosophy), correctly deducing in painstaking detail a
conclusion from completely idiotic premises is quite pointless.
Sound Argument.
premises are true.
A sound argument is a deductively valid argument whose
Since in many cases truth cannot be established with absolute certainty this
definition must be weakened to be useful in real life. An arguer must attempt
to make the premises as plausible as possible such that the audience agrees
with them. Once this has been achieved real-world arguments that are also
deductively valid are usually considered sound. Of course, people often disagree
about the truth of the premises and so soundness of an argument is generally
harder to establish as deductive validity. Consider for example the following
deductive argument:
• God is the most supreme being.
• If something is the most supreme being, then it must also have the property of existing.
• Therefore, God exists.
In propositional logic this argument can only be approximated. Let p be read
as God is the most supreme being and q be read as God exists. The argument is
then just an application of modus ponens p, p → q ⊢ q and clearly deductively
valid. But perhaps not everyone buys into the premises.
On a side note, not only the quantification has been ignored by the translation to propositional logic. Suppose the gradable adjective ‘supreme’ expresses a
preorder relation. If this is so, one might argue that there may be more than
one most supreme being. When using the superlative of a gradable adjective
people commonly assume that there is only one object satisfying the superlative, yet this need not be so. It certainly doesn’t follow from the appealing
assumption that the meaning of the adjective is based on a preorder. Consider
for example ‘tallest mountain’. Although it is natural to speak of ‘the tallest
mountain’ there could be two or more mountains with exactly the same height.
2.4. DEDUCTIVE ARGUMENTS
57
Mathematically speaking, a preorder on a set does not in general ensure the
existence of a single minimum or maximum element in the set with respect
to that order. The set could be infinitely extending in one or both directions
(called an infinite chain) or there could be several elements in the maximum or
minimum that are equivalent with respect to the equivalence relation generated
by the preorder. Right? So the above argument is incomplete in a nontrivial way.
I mention this example in order to make clear that there is much more about
deductive arguments than just their deductive validity and that there always
remains room for disputes about the soundness of the premises and the correct
interpretation of the argument.
Logical Fallacy. In argumentation sometimes deductive argument schemes
are applied that are not valid. If someone uses such an invalid argument, he
commits a logical fallacy.
Here is a typical logical fallacy:
If it rains the street is wet.
It doesn’t rain.
Therefore, the street is not wet.
/
This popular fallacy is sometimes called denying the antecedent. The fallacy is
of the form ((φ → ψ) ∧ ¬φ) → ¬ψ. You know enough about logic to see that this
scheme is not valid, but in case of doubt you may check it using a truth table or
semantic tableaux. Here is another common logical fallacy:
If it rains the street is wet.
The street is wet.
/
Therefore, it rains.
This fallacy is called affirming the consequent. The inference is not valid, as
there could be a multitude of other reasons why the street is wet. The fallacy is of
the form ((φ → ψ) ∧ ψ) → φ. Again, you can check the invalidity of this argument
scheme easily using truth tables or tableaux. It is, however, important to bear
in mind that even deductively invalid argument schemes may convey useful
information about the subject matter or may confirm the conclusion to some
degree. For example, in the absence of any reasonable alternative antecedents
φ i , where φ i → ψ, assuming that ψ suggests φ can make sense. (See note 9
below.) Likewise, even if (p ∧ q) → r and p does not allow you to conclude r, the
two formulas tell you that you might want to check whether q holds in order to
establish r.
58
CHAPTER 2. PROPOSITIONAL LOGIC
✧ Remark 5 (Good Arguments) There is much more to putting forward
a good argument than just using a deductively valid argument scheme and
making the premises plausible. A good presentation is just as important.
On one hand, a logical fallacy will never make a good deductive argument.
On the other hand, it is not reasonable to demand that any good argument
must be deductively valid. It is quite common in real argumentation to
leave premises implicit and making all premises explicit would sometimes
not be economical and make the argument long and dull. Moreover, truthconducive albeit fallible heuristic reasoning patterns are used both in science
and everyday reasoning with some success and discarding them altogether
would be too strict. See Note 9 for more information.
Deductively valid arguments are truth-preserving: If the premises are true
then the conclusion is also true. Deductively valid arguments are also monotonic, because classical logical consequence is monotonic: If φ1 , . . . , φn ⊢ ψ then
φ1 , . . . , φn , φn+1 ⊢ ψ. Notice that there are two cases in such a situation:
• The additional premise φn+1 is consistent with φ1 , . . . , φn , i.e. φ1 ∧ · · · ∧
φn ∧ φn+1 is satisfiable. Then φn+1 is redundant. (But it is also possible
that one of φ1 , . . . , φn is redundant when we decide to keep φn+1 .)
• The additional premise φn+1 is inconsistent with φ1 , . . . , φn , i.e. φ1 ∧ · · · ∧
φn ∧ φn+1 is a contradiction. Then ψ follows because anything follows from
a contradiction, but the argument is not sound.
The rule of classical logic that anything follows from a contradiction, i.e. φ ∧¬φ ⊢
ψ, is sometimes called ex falso quod libet. It has given rise to numerous alternative accounts of logical consequence and alternative logical systems. Notice
that there is nothing mysterious about the fact that (φ ∧ ¬φ) → ψ, although
people have sometimes confused the above problem as one about the meaning of
the conditional and have given it the misleading label ‘paradox of the material
implication’. The truth-function →, however, is simply defined the way it is and
is one of the 16 possible binary connectives of classical bivalent logic. Messing
around with → is therefore bound to get you into troubles at one point or another,
usually due to the interdefinability of truth-functions or the lack thereof when
→ has been tweaked in a non-classical logic.
Regarding logical consequence, however, ex falso quod libet does indeed not
seem to be a very desirable rule. Still it seems wise not to change the notion
of logical consequence itself but rather emphasize that there is much more to
2.4. DEDUCTIVE ARGUMENTS
59
a good argument than just being deductively valid.10 Since ex falso quod libet
never leads to a sound argument, the problem is not very pressing. Likewise,
valid argument schemes like φ ⊢ φ ∨ ψ have been overly criticized. From a
logical point of view they do no harm, although they may lead astray and violate
principles of sound argumentation.
Notice, finally, that establishing the soundness of premises is not monotonic:
You might have good reasons for believing premises φ1 , . . . , φn at some time but
when you take into account additional evidence this might weaken some of the
premises you’ve previously considered well-established. The relation between
evidence, premises, and possible degrees of plausibility or credibility attributed
to them cannot be modeled adequately in classical bivalent logic.
☞ Note 9 (Induction and Abduction) Classical deductive arguments
represent a strictly monotonic and truth-preserving reasoning pattern. No
matter what additional information is taken into account, as long as the
premises of a valid deductive argument are true the conclusion will hold.
This reasoning plays a crucial role both in ‘everyday’ argumentation and
particularly in mathematical reasoning. There are two other reasoning patterns that are sometimes being regarded as being indispensable to gather
knowledge: abduction and induction.
Inductive arguments form the basis of genuine probabilistic reasoning.
From a small number of observed occurrences of events it is inferred that
the observed frequencies hold for the respective classes of events in general.
This kind of reasoning is non-monotonic, because the inference might no
longer hold after additional observations have been made. While it had
been controversial for a long time whether inductive reasoning is acceptable
in general or whether it is really needed, this issue has been pretty much
settled with the advent and success of sophisticated statistical methods and
developments in physics according to which genuine randomness can be
found in nature.
On the other hand, abduction is still much more controversial. Abduction
is also sometimes also called inference to the best explanation and has for a
long time been discussed under the general idea of a ‘logic of discovery’. One
strand of abduction, the most popular one, is indirectly based on the fallacy
of affirming the consequent. Assume there are hypotheses p 1 , p 2 , . . . , p n
such that p i ⊢ c and c is observed to be the case, whereas it is not known
10 If you change the interpretation of ⊢ in the meta language (and correspondingly, the proof
theory) but keep the truth-functions in the object language classical, the deduction theorem
might no longer hold, and that might not be desirable. This theorem says that if A ∪ {φ} ⊢ ψ then
A ⊢ φ → ψ for a set of wffs A, i.e. if you can prove the succedent of a conditional from a set of
assumptions plus the antecedent of the conditional, then you can prove the whole conditional
from the assumptions alone.
60
CHAPTER 2. PROPOSITIONAL LOGIC
which of the hypotheses holds. By an abductive inference one concludes
from c that the most plausible p i , say p 3 , is the case. Unless only finitely
many premises can and need to be taken into account, which rarely happens,
abduction is also non-monotonic, as there could be some premise p m (m > n)
that is more plausible than p 3 . Formally, abduction can be implemented
by defining ordering the set of possible hypotheses with a preorder relation.
The crucial problem is, however, where this ordering comes from, i.e. what
‘most plausible’ means in this context. Note that abduction seems to be
pretty common in everyday reasoning, though: Lady Buttersworth has been
murdered with an axe. Lord Worcester was found in the same building with
a bloody axe in his hand, and the DNA of the blood on the axe matches that
of Lady Buttersworth. Hence, Lord Worcester murdered Lady Buttersworth
– unless the dear old Lord found the poor Lady, grabbed an axe from the wall
with the intention of pursuing the fleeing would-be assassin and thereby
contaminated the axe with her blood. Or, perhaps someone planted false
evidence on him. (‘My dear, would you mind holding this bloody axe for
me for a minute or two, please? I’ll be right back...’ – ‘Of course not, Sir
Farnsdale. I shall guard this marvelous gardening instrument until you
return.’)
2.4.3 Exercises
✐ Exercise 25 Translate the following deductive argumentation fragments
into propositional logic, idealizing the statements as is deemed appropriate, and
determine whether they are deductively valid or based on a fallacy.
a. If John did it, so did I. I didn’t do it, so John didn’t do it either.
b. Either Jack the Ripper or someone imitating him was responsible for this
crime. If Jack the Ripper had done it, the cut would have been deep and
precise. The cut is not deep and precise, hence the crime was committed by
someone imitating Jack the Ripper.
c. Either Bob ate all the cookies or it is not the case that Alice went swimming.
There are still some cookies left, so Alice went swimming.
d. John is both a chess player and an excellent boxer. If someone is a chess
player, then he is not physically strong. Therefore, John is not physically
strong.
e. Something that nobody likes is bad. Therefore, Capitalism is good. For many
people like it and if it were bad, nobody would like it.
2.4. DEDUCTIVE ARGUMENTS
61
f. If Sir John was killed by a blunt instrument like a Vase or a candlestick, then
he died of a head trauma. He did not die of a head trauma. Therefore, Sir
John was not killed by a blunt instrument.
g. John likes Deep Purple or Led Zeppelin. If he likes Deep Purple, he is stupid.
He is not stupid, so he likes Led Zeppelin.
h. That English proper names are rigid is a contingent fact of the English
language and not a consequence of the Millian view. For suppose it was a
consequence of the Millian view; then any formal or informal implementation
of the Millian view in which proper names or their formal counterparts were
non-rigid would be inconsistent, but this is clearly not the case.
✐ Exercise 26 CIA special analyst Jack Thompson is running queries on a big
secret geospatial database in order to track down the whereabouts of known
terrorists. The search engine works with boolean queries and can be used
to confirm whether a given conclusion follows from a number of geospatial
conditions expressed in propositional logic. After simplifying the queries for
‘Osama bin Laden’ and eliminating a number of possibilities, he obtains the
following test condition:
NOT (NOT In(Osama, province:Nimruz) OR NOT In(Osama,
province:Balochistan, loc:Bolan Pass) OR (In(Osama,
province:Nimruz) OR NOT In(Osama, province:Balochistan,
loc:Bolan Pass))
Jack Thompson enters his query ‘?In(Osama, province:Balochistan, loc:
Bolan Pass)’ and obtains the answer ‘YES’. Full of excitement he presents the
result to his superiors, who immediately request an airstrike. The airstrike is
conducted swiftly, all travelers at the Bolan Pass, Balochistan, are killed and the
pass is reduced to rubble. A later investigation concludes that the operation was
a failure and among the many civilian victims no traces of Osama bin Laden’s
DNA were found.
a. Show what mistake Jack Thompson made when he queried the database.
b. During the investigations representatives of the company that created the
database and inference engine claim that their software functioned correctly
and Jack Thompson misused it. Can you suggest a way how Jack Thompson
could have prevented the disaster by entering another query and without
changes to the costworthy geospatial database?
c. Is the claim of the company justified or can you suggest a technical change
to the inference engine that, notwithstanding any other bugs it might have,
would prevent such scenarios from occurring in the future?
62
CHAPTER 2. PROPOSITIONAL LOGIC
✐ Exercise 27 (difficult) Implement the syntax and denotational semantics
for an abductive conditional operator M Í φ ; ψ that is true if ψ is among the
‘most plausible’ wffs such that M Í ψ → φ holds.
2.5 Metatheorems
Proofs of important metatheorems go beyond the scope of this introduction.
However, even for just applying a logical system as a theoretical tool it is
necessary to be aware of its most important properties. For propositional logic
with a suitable proof theory like the tableaux rules of the previous section it is
well-known that the following metatheorems hold. In what follows, the term
‘logical system’ is meant as the ensemble of a logical language, its denotational
semantics, and its proof theory.
Soundness. A logical system is sound iff. it follows from ⊢ φ that Í φ, i.e. if φ
can be proved then φ is also true in all models.
Completeness. A logical system is complete iff. from Í φ it follows that ⊢ φ,
i.e. if φ is true in all models then φ can also be proved.
Decidability. A logical system is decidable iff. there is an algorithm to decide
in finite time of a wff whether or not it is a tautology.
Compactness. Let a set Γ of wffs be satisfiable if there is a model M such
that M Í φ i for all φ i ∈ Γ. A logical system is compact iff. any set Γ of wffs of
that system is satisfiable if and only if every finite subset A ⊂ Γ is satisfiable.
Propositional logic with the tableaux rules of section 2.3.2 is sound, complete,
decidable, and compact.
2.6 Concluding Remarks
Propositional logic might seem overly simplified in order to be of much use but
in fact it has many useful applications. For one thing, it is fair to say that
calculations with numbers in computers work on the basis of propositional logic
(see Note 6).
There is also a rich tradition of using propositional logic in the study of argumentation that has started with Aristotle’s writings. At his time propositional
logic and a limited form of quantification in the form of syllogisms were already
known and since then have been studied extensively. The most important thing
about propositional logic is, however, that it is the basis for other important
logical systems like modal logic and first-order predicate logic. The latter is the
topic of the next chapter.
2.6. CONCLUDING REMARKS
63
✧ Remark 6 (Computers and Propositional Logic) Almost all computers represent numbers in the binary system, using only 1 and 0. So
the number 0 is represented by 0, the number 1 by 1, the number 2 by
the sequence 〈1, 0〉, the number 3 by the sequence 〈1, 1〉, the number 4 by
the sequence 〈1, 0, 0〉, the number 5 by the sequence 〈1, 0, 1〉, and so on.11
You get the picture. Now let us represent these sequences by propositional
constants p 0 for the least significant bit representing 0 and 1 and p 1 representing 2 if it is true, 0 otherwise, p 2 representing 4 if it is true and 0
otherwise, and so on. According to this convention for example the number 5 can be represented by a model of propositional logic in which the
formula (p 2 ∧ ¬ p 1 ) ∧ p 0 is true. Consider now arithmetic operations like,
for example, addition. Addition of binary numbers without overflow can
˙ : 0 + 0 = 0, 0 + 1 = 1, and 1 + 1 = 0. See table 2.4 for
be represented by ∨
an example. When an overflow occurs 1 + 1 = 0 an overflow bit, say c n ,
has to be stored that is taken into account when adding the next signifi˙ q 0 represents the addition of the first bit of p
cant bit. So for example p 0 ∨
and q without overflow and the overflow is c 0 := p 0 ∧ q 0 . For the next bit
˙ q 1 ) ∨ c 0 represents the result of the addition without overflow and the
(p 1 ∨
overflow is c 1 := ((p 1 ∧ q 1 ) ∨ (p 1 ∧ c 0 )) ∨ (q 1 ∧ c 0 ) – and so on for the other
bits. The resulting representations of arithmetic operations can often be
simplified extensively and are implemented in hardware and combined into
more complex components. Figure 2.1 shows what is called an ‘adder’. Other
operations of computers are implemented in a similar way.
However, you should bear in mind that the example is simplified in many
ways. Notice that input and output as well as how storage works has been
ignored. In order to describe the workings of computers in general a more
expressive logical system than propositional logic is needed, such as, for
example, so-called untyped λ-calculus. To avoid potential misunderstandings, the version of λ-calculus to be introduced in chapter 4 is typed and
not expressive enough for that purpose. To put it in technical terms, simple
untyped λ-calculus is Turing complete whereas simple typed λ-calculus is
not.
11 Obviously, it is merely a matter of convention/implementation whether the least significant
bit is the leftmost or rightmost bit of such a sequence. And in fact both orders of representation
have been used in different types of microprocessors, and failures to convert numbers between
the two format when they were transferred from one system to another has occasionally caused
problems. As you probably know, the issue is further complicated by the fact that modern
computers don’t store numbers as sequences of arbitrary length but always in blocks of 4, 8, 16,
32, 64 or 128 bits.
11
64
CHAPTER 2. PROPOSITIONAL LOGIC
(a) 0
0
0
0
1
1
1
0
1
0
1
1
(b) 0
0
0
1
1
0
1
1
1
1
1
1
1
0
0
0
Table 2.4: Binary representation of (a) 2 + 5 and (b) 3 + 5. The rightmost bit is
the least significant. So for example, 5 = (1 · 1) + (0 · 2) + (1 · 4) + (0 · 8).
Figure 2.1: Circuit diagram of an adder. (Source: Wikimedia Commons)
2.7 Literature
Just about any introduction to logic covers propositional logic and regarding
the basics introductions to logic only differ in perspective and in the level of
formal rigidity. Here are some classics that are less or at most as formal as this
introduction:
• Alfred Tarski (1995). Introduction to Logic. Dover.
• Wilfried Hodges (1977, 2001). Logic: an introduction to elementary logic.
Penguin.
I highly recommend the book by Hodges, which is easy to read and covers a
vast range of material that is particularly relevant for linguists. He also uses
tableaux as a proof theory. Other seminal texts can be found in the literature
section of the next chapter. Please notice that books on ‘informal logic’ or ‘critical
thinking’ are not suitable for learning logic.
C HAPTER
3
First-Order Logic
This chapter is about first-order predicate logic (often abbreviated FOL). In
addition to predicate logic it allows for predicates, relations, constants and
variables for individuals and quantifiers like ∀ with reading for all and ∃ with
reading there is at least one. Like in the last chapter, we first introduce the
syntax of the formal language and then its model-theoretic semantics. Finally,
tableaux rules will be introduced as a proof theory.
3.1 Syntax of First-order Predicate Logic with
Identity
Let us call our version of first order predicate logic FOL. Usual formulations
contain constants for predicates, constants and variables for a domain, and
the quantifiers in addition to the connectives of propositional logic. Having
functions in the object language is optional, as they can be defined as restricted
relations. Many authors don’t include functions, as they make definitions a bit
more complicated. (See for example the definition of terms below.)
3.1.1 Basic Expressions.
Property, Relation, and Function Symbols. Let L R be the set of property
and relation symbols consisting of sequences of letters starting with a capital
letter, usually just P,Q, R, S and their indexed variants.1 All symbols in L R
have a fixed arity, which is an integer number n (1 ≤ n). We stipulate that = of
arity 2 is in L R . Let L F be the set of function symbols consisting of f , g, h and
1 We will also sometimes use predicate symbols like Man or Mortal in order to indicate their
intended interpretation.
65
66
CHAPTER 3. FIRST-ORDER LOGIC
their indexed variants. Function symbols also have an arity, which is a positive
integer number.
Terms. Let L V be the set of variables consisting of x, y, z and their indexed
variants. Let L C be the set of individual constants consisting of a, b, c, d and
their indexed variants. Let L T be the set of terms defined as follows. If α ∈
(L C ∪ L V ) then α ∈ L T . If α1 , . . . , αn ∈ L T and σ ∈ L F and of arity n, then
σ(α1 , . . . , αn ) ∈ L T .
Ground Term. A term that does not contain any variable is a ground term.
For example, a, b, f (a), g( f (a, b), c), and g(g(a)) are ground terms and x, y,
f (a, x), g( f (a), y), and g(g(x)) are not ground terms.
Connectives. The set of truth-functional connectives is the same as for propo˙ and L N := {¬}.
sitional logic, i.e. L B := {∨, ∧, →, ↔, ↑, ↓, ∨}
Quantifiers.
The set of quantifiers is defined as L Q := {∀, ∃}.
3.1.2 Well-formed Formula.
The set L S of well-formed formulas of FOL is defined as follows:
1. If P ∈ L R and of arity n (n > 0) and α1 , . . . , αn ∈ L T then P(α1 , . . . , αn ) ∈ L S .
Example: R(a, x, b) (arity 3)
2. If φ ∈ L S and ν ∈ L N , then νφ ∈ L S .
Example: ¬P(x, y)
3. If φ, ψ ∈ L S and ◦ ∈ L B then (φ ◦ ψ) ∈ L S .
Example: (P(a) ∧ R(x, a))
4. If φ ∈ L S , x ∈ L V , and ξ ∈ L Q then ξ x(φ) ∈ L S .
Example: ∃ x(P(x) ∨ Q(x, y))
5. Nothing else is in L S .
This time I have sacrificed correctness a little bit for better readability and
used P both as a meta-variable for relations and a symbol for a relation or
predicate and x both as a meta-variable for variables and the name of a variable.
According to the above rules for example the following formulas are wff: P(x),
P(a) ∧ P(a, b), ∃ xP(a), ∀ y(∃ x(R( f (x), y) ∨ R(y, g(x, y)))), ∃ x(P(x) ∧ ∀ yQ(y, a)).
3.1. SYNTAX OF FIRST-ORDER PREDICATE LOGIC WITH IDENTITY 67
✧ Remark 7 (Simpler definition of the syntax) The above definition
of the syntax is a bit complicated. The idea is that you should be able
to correctly formulate a syntax on the basis of sets for syntactic entities, as
this can be useful for certain kinds of proofs, and are also also understand
similar definitions in the literature. With a bit less precision and using
object-language expressions as metavariables the same syntax as the above
one could also be defined as follows:
• Terms
– x, y, z and indexed variants are variables, P,Q, R and indexed
variants are predicates with a given arity, a, b, c, d, e and indexed
variants are constants, and f , g, h and indexed variants are function symbols with a given arity.
– Variables and constants are terms; if f is a function symbol of
arity n and t 1 , . . . , t n are terms, then f (t 1 , . . . , t n ) is also a term.
• Well-formed Formulas
– If P is a predicate of arity n and t 1 , . . . , t n are terms, then
P(t 1 , . . . , t n ) is a wff.
– If A is a wff then ¬ A is also a wff.
˙ B),
– If A and B are wffs then (A ∧ B), (A ∨ B), (A → B), (A ↔ B), (A ∨
(A ↓ B), and (A ↑ B) are also wffs.
– If A is a wff and x is a variable, then ∃ x(A) and ∀ x(A) are also
wffs.
Notational Conventions. Redundant parentheses may be left out. Square
brackets [ and ] may be used instead parentheses ( and ). The identity sign =
may be used in infix notation, i.e. we generally write a = b instead of = (a, b),
and we write x 6= y for ¬ = (x, y). A dot . opens a parenthesis that is closed as
rightmost as possible in the formula. Without dot or enclosing parentheses the
argument of a quantifier is always taken as the shortest wff, i.e. for example
∃ xP x ∧ Qx must be read ∃ x[P x] ∧ Qx. The arguments of a predicate or relation
may be written without enclosing parentheses. Conjunction and disjunction bind
stronger than conditional and biconditional. Several subsequent applications of
∃ or ∀ may be contracted.
❉ Example 6 (Shortcut Notation)
1. P x ∧ Qx → Rx may be written for ((P(x) ∧ Q(x)) → R(x))
68
CHAPTER 3. FIRST-ORDER LOGIC
2. ∃ x.P(x, y) may be written for ∃ x(P(x, y))
3. P x, y ∨ P y, x may be written instead of P(x, y) ∨ P(y, x)
4. ∀ x∃ y.P x → R(x, y) ∧ Qx may be written for ∀ x∃ y((P(x) → (R(x, y) ∧ Q(x))))
5. ∀ x yz.P(x, y, z) may be written for ∀ x∀ y∀ zP(x, y, z)
There is no need to use the shortcut notation and if you feel uncomfortable with
it the best thing to do is to resort to full bracketing and add missing parentheses
in formulas. Both too many and too few parentheses can confuse at times, and
the right balance is a matter of personal taste. Notice that some logicians or
logic teachers do not like or allow certain shortcut notations. In particular,
for whatever reason some people seem to wholeheartedly dislike writing P x, y
instead of P(x, y) and I will only use this notation sparingly.
☞ Note 10 (Yet another way to specify the syntax) Using
dense notation the above grammar could have been specified as
a
very
S := R n (x1 , . . . , xn ) | (S ∧ S) | ¬S | ∀ xS,
where the remaining functors and quantifiers would have to be defined by
abbreviation.
Free vs. Bound Variables. A central concept in first-order logic is the distinction between free and bound variables. This distinction is purely syntactical but has important semantic ramifications. It is best to define free and
bound variables recursively based on the structure of wffs. Here is how. Let
fvar : (L S ∪ L T ) → P (L V ) be a function determining the free variables of a wff
and bvar : (L S ∪ L T ) → P (L V ) a function determining the bound variables of a
wff as follows:
1. If α ∈ L V then fvar(α) = {α} and bvar(α) = ;.
2. If α ∈ L C then fvar(α) = ; and bvar(α) = ;.
3. If P ∈ L R and of arity n (n > 0) and α1 , . . . , αn ∈ L T then fvar(P(α1 , . . . , αn )) =
fvar(α1 ) ∪ · · · ∪ fvar(αn ) and bvar(P(α1 , . . . , αn )) = ;.
4. If φ ∈ L S and ν ∈ L N , then fvar(νφ) = fvar(φ) and bvar(νφ) = bvar(φ).
5. If φ, ψ ∈ L S and ◦ ∈ L B then fvar(φ ◦ ψ) = fvar(φ) ∪ fvar(ψ) and bvar(φ ◦ ψ) =
bvar(φ) ∪ bvar(ψ).
3.1. SYNTAX OF FIRST-ORDER PREDICATE LOGIC WITH IDENTITY 69
6. If φ ∈ L S , x ∈ L V , and ξ ∈ L Q then fvar(ξ x(φ)) = fvar(φ)\{ x} and bvar(ξ x(φ)) =
(
bvar(φ) ∪ { x} if x ∈ fvar(φ)
.
bvar(φ) otherwise
As scary as this definition might seem at first glance, it is only a precise and
mathematically correct way of defining what is intuitively easy to see. Intuitively,
in a formula like ∃ x.P(x, y) the variable x is bound by the existential quantifier ∃,
whereas y remains free. If we only look at P(x, y), on the other hand, intuitively
both variables are free (=not bound). Now how does the above definition work?
Clause 1 puts every variable into the set of free variables. Constants contain
neither bound nor free variables, which is stated by clauses 2. Clauses 3 to 5 then
just accumulate the sets of free and bound variables for compound expressions
by taking into account the free and bound variables of their parts. The final and
crucial clause 6 says that a quantifier binds a variable; so if the variable was
previously free it is now no longer free, i.e. removed from the set of free variables,
and put in the set of bound variables. To understand the case distinction in this
definition, consider the following formulas, which are wffs but ‘strange’:
• ∀ x.Pa
• ∀ x.P x → ∃ x[Qx]
• ∀ x∃ x.P x
The first wff illustrates vacuous quantification. It has no semantic effect but we
haven’t disallowed it in our syntax. The second wff illustrates a case of variable
reuse. This is usually allowed and some authors love to use it extensively in order
to confuse their readers. The last wff illustrates both vacuous quantification
and variable reuse. The universal quantifier doesn’t have any effect, because
the variable is immediately rebound by the existential quantifier. The case
distinction of the above definition formally captures the idea that in case of
vacuous quantification the variable is not really bound. A wff may contain
unbound variables, in case of which it is open:
Open Formula. A formula A that contains variables that are free in A is
called an open formula.
Finally, a further important syntactic concept is that of the scope of a quantifier,
which is indicated by the parentheses around the quantifier body. I only specify
it informally as follows.
Scope. A variable is in the scope of the occurrence of a quantifier if it could be
bound be the occurrence of that quantifier.
70
CHAPTER 3. FIRST-ORDER LOGIC
3.1.3 Exercises
✐ Exercise 28 Determine which of the following terms are ground:
a. f (a, b, x)
d. f ( f ( f (a)))
b. c
e. g(c, f (a))
c. f (g(x, a), b)
f. h(x90 , y29 )
g. f (g(a), h(a))
✐ Exercise 29 Indicate the scope of the quantifiers in the following formulas
by underlining and overlining them as in the example. (You may also use colors,
of course.)
Example: ∀ x.P(x) ∧ ∃ y[P(y) ∨ Q(x, y)]
a. ∀ xP x → ∃ xP x
b. ∀ x.P x → ∃ xP x
c. ∀ x.P x → ∃ yP y
d. ∀ x.P x → ∃ yQ(x, y)
e. ∀ x yz.P x ∧ [R(x, z) → Q(y, z)]
f. ∀ x yz[R(x, y) ∧ R(y, z) → R(x, z)]
g. ∃ xP x ∧ ∀ y.P y → x = y
h. P(x, y) ∧ ∀ x∀ y[P(x, y)]
i. ∀ x yP(x, y) ∧ R(x, y)
j. ∀ x∃ y.P x → P( f (x, y))
k. ¬∀ x∃ y[x = f (y)]
3.2 Semantics of First-Order Logic with Identity
We will now specify the semantics of first-order logic. Not very surprisingly,
the connectives are defined in exactly the same way as in propositional logic.
However, we need rules additional for predication and for at least one quantifier.
We have in a sense already learned how to interpret first-order logic in the first
chapter and only have to apply this knowledge now.
3.2.1 Variable Assignments and Variants
Assignment. An variable assignment is a function g : L V → D e , where D e is
a set of individuals specified by a model (see below).2
Variant of an Assignment. An x-variant h of an assignment g is the same
function as g except that it is possible that h(x) 6= g(x). As a shortcut, we write
h ≈ x g for the x-variant h of g.
2 I use e for ‘entities’ according to a convention from higher-order logic. More on that in the
next chapter.
3.2. SEMANTICS OF FIRST-ORDER LOGIC WITH IDENTITY
71
Notice that h(x) = g(x) is not disallowed when h ≈ x g; it is only possible that
h(x) 6= g(x).
3.2.2 Models and Truth in a Model
Model. A model for FOL is an ordered pair consisting of a non-empty domain
D e for individuals and an interpretation function I g (.) that maps non-logical
constants to their extension in dependence of a variable assignment g as follows:
1. Variables: If α ∈ L V then I g (α) = g(α) where g(α) ∈ D e .
2. Constants: If α ∈ L C then I g (α) ∈ D e .
3. Functions: If ξ ∈ L F and of arity n then I g (ξ) is an n-ary total function
f : D ne → D e , i.e. a function taking an n-tuple 〈a 1 , . . . , a n 〉 (a 1 , . . . , a n ∈ D e )
and yielding a b ∈ D e .
4. Properties, Relations: If P ∈ L R and of arity n then I g (P) ⊆ D ne , i.e. a
subset of the set of n-tuples over D e .
N.B. It is common to leave out g in cases when it is not relevant. (The assignment
is only used for variables.) Also, sometimes authors use separate functions for
term interpretations and the interpretation of relations, but this distinction
is usually not made explicit in the model. It is very common not to include
functions, as they are not essential to the logic.3
Watch out the requirement of the above definition that the domain D e is nonempty, which of course means that D e 6= ;. As innocuous as it might seem, this
requirement is important and omitting it would have unexpected consequences.
✧ Remark 8 (Propositional Constants in First-order Logic)
Sometimes authors allow relation symbols of arity 0, which are effectively interpreted as propositional constants. We do not do this here,
because it unnecessarily complicates the syntax and semantics. However, as
long as tense is ignored it may be argued that English uses of the expletive
‘it’ sometimes take no arguments. Here is a typical example:
(3.1) It rains.
The pronoun ‘it’ in this example may be regarded as a pseudo-subject in
the sense that it is grammatically in subject position but does not indicate
the presence of a logical subject to which the property of raining would be
ascribed. In Portuguese this is clearer, since an expletive use of ‘ele’ is not
grammatical:
3 The functions are total and we have seen in the first chapter that they can be defined in
terms of a suitably restricted relation.
72
CHAPTER 3. FIRST-ORDER LOGIC
(3.2) Chove.
(3.3) * Ele chove.
When tenses or situations are taken into account the predicate for ‘to rain’
and ‘chover’ will likely take other arguments, though. For example, in event
semantics the above examples could be expressed as ∃ e.R(e), where e is a
special sort of variable that stands for an event. So in such a framework the
paraphrase for ‘chove’ would be there is a raining event.
Truth in a Model. The evaluation function Í maps wffs in L S to {1, 0} in
a model M in dependence of an assignment g. To avoid misunderstandings I
specify the complete definitions this time, including the clause for 0 (false) that
is often omitted. This time we use . as a symbol for evaluation instead of Í.4
(
1 if 〈 I g (α1 ), . . . , I g (αn )〉 ∈ I g (P)
(3.4)
=
(
1 if I g (α) = I g (β)
(3.5)
=
(
1 if φ M,g = 0
(3.6)
P(α1 , . . . , αn ) M,g =
(α = β)
M,g
¬φ
M,g
(φ ∧ ψ) M,g =
(φ ∨ ψ) M,g =
(φ → ψ)
M,g
=
0 otherwise
0 otherwise
0 otherwise
(
1 if φ M,g = 1 and ψ M,g = 1
0 otherwise
(
1 if φ M,g = 1 or ψ M,g = 1 (or both)
0 otherwise
(
1 if φ M,g = 0 or ψ M,g = 1
0 otherwise
M,g
= 0 and ψ M,g = 0,
1 if φ
M,g
(φ ↔ ψ)
=
or φ M,g = 1 and ψ M,g = 1
0 otherwise
(
1 if there is a h ≈ x g such that φ M,h = 1
∃ xφ M,g =
0 otherwise
(3.7)
(3.8)
(3.9)
(3.10)
(3.11)
4 It is common in linguistics to use . for the evaluation function (or interpretation in general),
because most general semanticists use higher-order logic and . has traditionally been used for
higher-order logic.
73
3.2. SEMANTICS OF FIRST-ORDER LOGIC WITH IDENTITY
∀ xφ M,g =
1 if for all h ≈ x g it is the case that
φ M,h = 1
0 otherwise
(3.12)
3.2.3 Explanation of the Rules for Predication and
Quantification
As you can see, there is a reason why the ‘otherwise’ alternative is usually
left out; it’s always the same. What do the clauses in the above definition
mean? First of all, notice that the only new clauses are (3.4), (3.5), (3.11), and
(3.12). The truth-functional connectives have exactly the same interpretation
as in propositional calculus, and in this sense first-order predicate logic is an
extension of propositional logic. What about the new rules then?
The predication rule (3.4) simply states that a predication of the form P(a 1 ,
. . . , a n ) is true if the n-tuple consisting of the interpretations of a 1 to a n is an
element of the interpretation of P, and P is interpreted as a set of n-tuples over
the domain D e , of course. So any n-ary predicate is defined in a fully extensional
way by specifying a set of n-tuples when a model is specified. One thing to
note about (3.4) is that the case of a unary predicate is sort of special: There
is no 1-tuple. The rule must be understood in such a way that P(a) is true if
I g (a) ∈ I g (P). This is a common convention.
Rule (3.5) ensures that the identity sign is interpreted as identity. Bear in
mind that despite the infix notation the identity sign is an ordinary predicate and
could have been written = (a, b). However, a special rule is needed to ensure its
proper semantics, because identity is strictly-speaking not first-order definable,
i.e. it cannot be defined as an abbreviation on the basis of the quantifiers and
the other rules in a first-order setting. A correct definition of identity requires
second-order quantification over predicates as in the following formula:
∀P ∀ x y.(P x ↔ P y) → (x = y)
(3.13)
This formula goes back to Leibniz (Discourse on Metaphysics, Section 9) and is
called the identity of indiscernibles. The converse formula
∀P ∀ x y.(x = y) → (P x ↔ P y)
(3.14)
is called the indiscernibility of identicals. The corresponding biconditional
∀P ∀ x y.(P x ↔ P y) ↔ (x = y)
(3.15)
is often called Leibniz’ Law. Neither of them are wffs of first-order predicate
logic. Therefore, rule (3.5) is required, whereas we already know that we could
do without explicit rules for most of the truth-functional connectives once a base
like the Sheffer-stroke has been chosen.
The rules (3.11) and (3.12) quantify (in the meta language) over variants
of assignments. This is one way of formally expressing variable binding. Take
74
CHAPTER 3. FIRST-ORDER LOGIC
for example an evaluation of the formula ∃ y.P y in a model M with respect
to an assignment g. This wff is true if there is an y-variant h of g such that
P y M,h = 1, which is the case if and only if I h (y) ∈ I h (P), which is in turn the
case if and only if h(y) ∈ I h (P). Since assignments are defined relative to the
domain D e of a model, the clauses effectively quantify over the objects in D e
while binding the respective variable.
3.2.4 Exercises
✐ Exercise 30 Write definitions as in 3.4–3.12 for the following truth-functions
and quantifiers:
a. the Sheffer stroke
b. the Peirce stroke
c. negated conditional (i.e. corresponding to ¬(φ → ψ))
d. negated biconditional (i.e. corresponding to ¬(φ ↔ ψ))
e. ∃!xφ with reading there is exactly one x such that φ
f. ∃3 xφ with reading there are exactly 3 x such that φ
3.3 Proof Theory
3.3.1 Tableaux Rules for First-Order Predicate Logic
Since the truth-functional connectives work exactly the same as in propositional
logic the tableaux rules from propositional logic can be used in first-order logic
with only minor changes. As far as the truth-functional connectives are concerned, theorems from propositional calculus are also theorems of first-order
predicate logic. But of course additional rules for the quantifiers are needed.
Table 3.1 depicts the tableaux rules for first-order predicate logic, where I have
more or less copy & pasted the corresponding rules from the last chapter.
In the above rules, φ(x) stands for any wff in which x occurs freely one or
more times and φ[x/c] is the same formula as φ except that all free occurrences of
x in φ have been replaced by c. For example, P x and ∀ y.P(x, a) → (Qx ∧ R(y, x))
are formulas in which x occurs freely; thus, P x[x/a] = Pa and ∀ y.P(x, a) →
(Qx ∧ R(y, x))[x/c] = ∀ y.P(c, a) → (Q c ∧ R(y, c)).5 Notice that in the last formula
we could only have replaced x by a (instead of c) if the corresponding rule was
a universal quantification rule, because the constant a already occurs in the
original formula.
5 Since square brackets are allowed as a substitute for ‘(’ and ‘)’ for better readability, they
are now used for two different purposes. But it shouldn’t be too hard to differentiate between the
two different usages.
75
3.3. PROOF THEORY
Rules of propositional calculus:
φ∧ψ
¬(φ ∨ ψ)
φ
ψ
¬φ
¬ψ
φ
¬¬φ
φ→ψ
¬(φ ∧ ψ)
¬φ
¬ψ
¬(φ → ψ)
φ
¬ψ
φ∨ψ
φ
¬φ
ψ
ψ
φ↔ψ
¬(φ ↔ ψ)
φ
ψ
φ
¬ψ
¬φ
¬ψ
¬φ
ψ
Existential quantification rules where constant c must be new on the branch:
∃ x.φ(x)
¬∀ x.φ(x)
φ[x/c]
¬φ[x/c]
Universal quantification rules where t is any ground term:
∀ x.φ(x)
¬∃ x.φ(x)
φ[x/t]
¬φ[x/t]
Table 3.1: Tableaux rules for first-order predicate logic without identity.
76
CHAPTER 3. FIRST-ORDER LOGIC
As you can see from the table, only the rules for the quantifiers are new.
How do they work? Let’s start with existential quantification. When there is a
quantified wff on a branch, say ∃ x.P x, then we can eliminate the quantifier by
introducing a constant but this constant must be new to the branch. Suppose
it wasn’t new to the branch. Then we could, for example, derive a 6= a from the
claim ∃ x∃ y.x 6= y – but the former is a contradiction whereas the latter is clearly
satisfiable! We know that some object, say a, exists such that Pa if ∃ x.P x is
true, but ∃ x.P x does not allow us to make any additional assumptions about a.
In order not to introduce any unwarranted assumptions, we are only allowed to
introduce a new constant – one about which nothing has been said yet on the
same branch – when simplifying ∃ x.P x in a truth-preserving way. The same
applies for the negated universal quantifier. Why is this so? Take for example
the statement ‘not all students hate logic’. How would you confirm that this
claim is true? The answer is fairly obvious. Find a student (in the domain of
the model) who doesn’t hate logic. So it seems that saying ‘not all students
hate logic’ means just the same as saying ‘there is a student who doesn’t hate
logic’. As this consideration shows ¬∀ behaves like an existential quantifier, or
to put it in other terms, a negated universal quantifier actually gives rise to
an instance of existential quantification. This explains why the rule for ¬∀ is
analogous to the one for ∃ and requires an individual constant that is new to
the branch.
In contrast to this, the rule for the universal quantifier ∀ allows you to
introduce any constant for the fairly obvious reason that for example ∀ x.P x
says that any object in the domain satisfies P – including the objects about
which some other claims have been made already. Analogously to the previous
case of existential quantification, the negated existential quantifier actually
gives rise to an instance of universal quantification. To see this, take a correct
literal paraphrasing of a wff involving a negated existential quantifier. Such
paraphrases sound rather clumsy. Consider for example ‘it is not the case
that there is a student who failed the exam’. Speaking less like Mr. Spock
this could be more adequately expressed as ‘all students passed the exam’.
This consideration intuitively explains why the rule for ¬∃ allows choosing
any constant as a replacement for the variable when the negated quantifier is
eliminated in order to simplify the formula. Given a statement such as ¬∃ x.P x
we pick any individual named by a constant, say a, in our language we like
and transform the statement into the claim that a does not satisfy P: ¬Pa.
Negated existentials are hidden instances of universal quantification. The rules
for ∀ and ¬∃ not only place no restriction on the choice of the constant, their
application may also be repeated as often as you like. After all we’re dealing
with universal quantification and, surely, if ∀ x.P x then Pa holds, P b holds, P c
holds, and so forth for all objects in the domain. From the fact that the universal
rules may be repeated as often as one likes it follows that a tree might never
be completed if one or more of its branches contain one or more instances of
the unnegated universal or the negated existential quantifier and, in addition,
3.3. PROOF THEORY
77
there is an infinite supply of constants naming objects in the domain D e . To put
it in other words, when each object in D e has a unique name and the D e has
infinitely many members, then the tree cannot be completed.
Notice finally that the rules for universal quantification not only allow substitution by constants, but also substitution by functions that only take constants
as arguments – for instance, g(a) or f (c, d). Why can they be substituted and
why aren’t they allowed in the rules for the existential quantifiers? Take a
look at the definition of the interpretation function I(.) for functions in the
beginning of section 3.2.2. A function symbol is interpreted as a function from
elements in D e , i.e. from D e × · · · × D e , to one element in D e . So if for example
∀ xP x is assumed to hold then P( f (a, b, c)) must also hold, because we know
that I g ( f (a, b, c)) denotes an object in the domain D e . However, in a rule for
existential quantification we cannot assume that there are enough objects to
satisfy the existential claim if we would substitute a grounded function instead
of a constant only, because we could no longer check that the side condition
is fulfilled. Recall that ∃ xφ has a reading ‘there are one or more x such that
φ (of x)’. There might only be one such object. Suppose we have ∃ xP x on a
branch. If we were allowed to obtain P( f (a)) and later perhaps P(g(a)) this
would be unwarranted, because f (a) and g(b) might denote different entities
but the original wff did not state that there are two objects that satisfy P. We
could allow substitution by one ground term once on the branch but that would
be unnecessary, because the result of that function could just be named by a
constant c and we could use that constant instead. So it makes sense to only
allow substitution by a constant that is new on the branch in case of the rules
for existential quantification.
3.3.2 Rules for Identity
According to 3.5 the symbol ‘=’ is interpreted as identity in the meta-language,
but the above tableaux rules don’t specify any special way how to deal with
identity statements. Clearly, identity is not just like any other predicate. In
an axiomatic setting one would add axiom schemes that express the logical
properties of identity, i.e. transitivity, symmetry, and reflexivity because identity
is an equivalence relation. For the tableaux we must add rules that allow us
to take advantage of an identity statement once it has occurred on the branch.
If for example a = b occurs on the branch and somewhere else on the same
branch Pa ∧ ¬∀ x[R(a, x)] occurs on the branch, we should be allowed to write
P b ∧ ¬∀ x[R(b, x)] instead. Moreover, we know that t 6= t is a contradiction for
any ground term t; so if this occurs on a branch, the branch closes. Table 3.2 is
the additional rule we need.
78
CHAPTER 3. FIRST-ORDER LOGIC
For any ground terms t, u:
φ
t=u
φ
t=u
φ[t/u]
φ[u/t]
Table 3.2: Tableaux rule for identity.
3.3.3 Using the Tableaux Rules
The tableaux rules are used in the same way as those for propositional logic
with one important difference. Because of the rule for universal quantifications,
i.e. the rules for ∀ and ¬∃, the tree might never be completed. Some care is
needed to substitute the right constant when applying one of the universal
rules. Usually, this will be the same constant as was already used before when
applying one of the existential rules. Because of the restriction for the existential
rules, i.e. the rules for ∃ and ¬∀, it is generally a good strategy to first apply
an existential rule and then apply a universal rule whenever there is a choice
between the order of two such rule applications. Otherwise it might be necessary
to apply the same universal rule twice in order to close the branch. When there
are two universal quantifications, it is often a good strategy to use the same
constant for simplifying both of them. Here are a few example proofs.
❉ Example 7 To show: ∀ xP x → ∃ xP x. Proof: We start with the negated wff.
Recall that the formula could also be written as ∀ x[P x] → ∃ x[P x]. The scope of
∀ is narrow and the main junctor is →. So we apply the rule for ¬ → and then
the universal rules for ∀ and ¬∃ with the same constant.
¬(∀ xP x → ∃ xP x)
∀ x.P x
¬∃ xP x
Pa
¬Pa
The tree closes, hence ∀ xP x → ∃ xP x is a tautology. QED.
79
3.3. PROOF THEORY
❉ Example 8 To show: ∃ x¬P x → ¬∀ xP x. Proof: The main junctor of the formula to prove is →. We negate the formula and use tableaux rules:
1.
2.
3.
4.
5.
6.
¬(∃ x[¬P x] → ¬∀ x[P x])
∃ x.¬P x
¬¬∀ x.P x
∀ x.P x
¬Pa
Pa
assumption
1:¬ →
1:¬ →
3:¬¬
2:∃
4:∀
The tree closes with 5 and 6, hence ∃ x¬P x → ¬∀ xP x is a tautology. QED.
Notice that it would not have been correct to first use ∀ and afterward ∃, because
the rule for ∃ would have required us to use a new constant instead of a! More
examples can be found in the next section.
Hints.
Remember the following points:
• If you don’t know whether it is a theorem or not, quickly assess the original
formula before using the proof method: Should this hold? Can I find a
counterexample beforehand?
• Whenever possible, use existential rules ∃ and ¬∀ first, universal rules
afterward.
• Use a previously used constant when applying a universal rule ∀ or ¬∀.
• Be prepared to apply a universal rule several times.
• Never ignore the side condition of the existential rules: You need to use a
new constant that does not yet occur on the same branch – no exceptions!
3.3.4 Selected Theorems
T1. ∀ xP x → ∃ xP x
Proof: see above.
T2.
∀ x yR(x, y) ↔ ∀ yxR(x, y)
T3.
∀ x[P x → ∃ yQ y] ↔ ∀ x∃ y[P x → Q y]
80
CHAPTER 3. FIRST-ORDER LOGIC
T4. ¬∃ x[x 6= x]
Proof:
¬¬∃ x[x 6= x]
∃ x[x 6= x]
a 6= a
The tree closes, because a 6= a is a contradiction. QED.
T5.
∀ x[P x → Qx] → (∀ x[P x] → ∀ x[Qx])
T6. ∃ xP x → ¬∀ x¬P x
Proof:
¬(∃ xP x → ¬∀ x¬P x)
∃ xP x
¬¬∀ x¬P x
∀ x¬ P x
Pa
¬Pa
The tree closes. QED.
T7. ∀ x∃ y.P x → P y
Proof:
¬∀ x∃ y[P x → P y]
¬∃ y[Pa → P y]
¬[Pa → Pa]
The tree closes. QED.
T8.
∀ x.P x → ∃ xP x
T9.
∀ x[P x ∧ Qx] ↔ ∀ x[P x] ∧ ∀ x[Qx]
T10.
Pa
¬Pa
[∀ x(P x) ∨ ∀ x(Qx)] → [∀ x(P x ∨ Qx)]
81
3.3. PROOF THEORY
Proof:
¬([∀ x(P x) ∨ ∀ x(Qx)] → [∀ x(P x ∨ Qx)])
∀ x(P x) ∨ ∀ x(Qx)
¬[∀ x(P x ∨ Qx)]
¬(Pa ∨ Qa)
¬Pa
¬Qa
Pa
Qa
The single branch of the tree closes. QED.
T11.
∃ x[P x ∨ Qx] ↔ ∃ x[P x] ∨ ∃ x[Qx]
T12.
∃ x[P x ∧ Qx] → [∃ x(P x) ∧ ∃ x(Qx)]
Proof:
¬(∃ x[P x ∧ Qx] → [∃ x(P x) ∧ ∃ x(Qx)])
∃ x[P x ∧ Qx]
¬[∃ x(P x) ∧ ∃ x(Qx)]
Pa ∧ Qa
Pa
Qa
¬∃ xP x
¬∃ xQx
¬Pa
¬Qa
Both branches of the tree close. QED.
3.3.5 Exercises
✐ Exercise 31 Prove the following theorems.
a. T2
c. T5
e. T9
b. T3
d. T8
f. T11
✐ Exercise 32 Check, using semantic tableaux, whether the following formulas are valid or not.
a. P x ∨ ¬P z
✐ Exercise 33 Prove that the following holds.
82
CHAPTER 3. FIRST-ORDER LOGIC
a. (Hodges 1977) Every irreflexive and transitive binary relation is asymmetric:
∀ x.¬Rxx, ∀ x yz.[Rx y ∧ R yz] → Rxz ⊢ ∀ x y.Rx y → ¬R yx
3.4 Defined Notions
3.4.1 Russellian Descriptions
Going back to work by Russell, the iota operator is a term-building operator
that takes a formula and yields the unique object that satisfies this formula. A
term ι x.P x is read as the x such that P of x. It can be defined as follows.
Syntax: If φ ∈ L S and x ∈ L V then ι xφ ∈ L T
h(x) if there is exactly one h ≈ x g
g
Semantics: I (ι xφ) =
such that φh = 1,
undefined otherwise
(3.16)
(3.17)
However, this makes I(.) a partial function from terms to denotations. If there
are more objects that satisfy φ or there is no object satisfying φ, then I g (ι xφ)
is undefined. Consequently, rule 3.4 on page 72 would have to be adjusted as
follows.
g
g
1 if all of I (α1 ), . . . , I (αn ) are defined
P(α1 , . . . , αn ) M,g =
(3.18)
and 〈 I g (α1 ), . . . , I g (αn )〉 ∈ I g (P),
0 otherwise
There is a way to achieve the same effect as with the iota operator but without
resorting to partial interpretations. We can define the following two-place
quantifier.
x[φ]ψ := ∃ x[φ ∧ ∀ y(φ[x/y] → x = y) ∧ ψ]
(3.19)
ι
This can be read as the x that uniquely satisfies φ also satisfies ψ or in a similar
way.6 The formulas P(ι x.Qx) and x[Qx]P x are equivalent in any model, and
so the quantifier can replace uses of the iota operator in predicative clauses.
Strange enough, there doesn’t seem to be any common name for this two-place
quantifier even though it is well-known since Russell’s times. I have used and
will continue to use the term ‘iota quantifier’ for it.
ι
6 One advantage of the iota operator over the quantifier is that it is easier to paraphrase. It
is hard to express unambiguously by the paraphrase that in a use of the above quantifier the
uniqueness condition is only put on φ, not on ψ.
83
3.4. DEFINED NOTIONS
☞ Note 11 (Definability, Characterizability.) When an expression
such as the iota quantifier is definable as an abbreviation like 3.19 in
first-order logic, we say that it is first-order definable. Generally, proofs
that an expression with a certain semantics can be defined with the
means of a given logical system are called characterization results and
play an important role in logical research, because they are one way to
circumscribe the expressive power of a formal system. Not all natural
language quantifiers are first-order definable. For example, the quantifier
‘most’, whose definition was given in set-theoretic terms in note 3 (on page 7
of chapter 1), is not first-order definable.
3.4.2 Relativized Quantifiers
It is possible to define relativized quantifiers, which are restricted to a certain domain of objects. Let for example D(x) be a special domain predicate. Relativized
quantifiers could then be defined as:7
∀∗ xφ := ∀ x[Dx → φ]
∗
∃ xφ := ∃ x[Dx ∧ φ]
(3.20)
(3.21)
D is a quantifier domain restriction for the quantifiers. This restriction could
for example be regarded as being contextually provided in order to get a more
reasonable account of natural language quantification. For one may argue
that an utterance like 3.22 doesn’t mean that every student on earth is in the
classroom and it is hard to imagine a context in which an utterance of 3.23 is
meant to be read as every bottle in the universe is empty.
(3.22) Every student is in the classroom.
(3.23) Every bottle is empty.
However, we are now speaking of utterances instead of sentences and no specific
contextual resolution mechanism is provided by an analysis based on a simple
domain predicate. The status of quantifier domain restrictions and similar
phenomenas regarding the semantics/pragmatics distinction and the correct way
of modeling them on the basis of a finite lexicon and grammar are still subject
to philosophical debates. Nevertheless it is commonly assumed in linguistics
that natural language quantifiers are contextually restricted. Using relativized
quantifiers is one means of achieving this.
7 Why the definition for the universal quantifier looks different from the one of the existential
quantifier will be explained in detail further below.
84
CHAPTER 3. FIRST-ORDER LOGIC
3.4.3 Many-sorted Logic
Instead of one domain D e for objects you could introduce another domain, say
D s for situations, add variables s, s 1 , s 2 , . . . , s for situations and an extra pair of
quantifiers that take situation-variables and run over D s . This implementation
of first-order logic would be two-sorted. In the same manner any kind of other
extra domains could be added, making the logic many-sorted.
While having many different variables with corresponding quantifiers can be
handy from a notational point of view, it does not add anything to the expressivity
of the logic in general. Instead, you may introduce a domain predicate into the
object language for each sort and define corresponding restricted quantifiers by
abbreviation. Whether many-sorted first order logic (for finitely many sorts) or
relativized quantifiers are used is merely a matter of convenience.
3.4.4 The Existence Predicate
Nothing in the formulation of first-order logic prevents us from introducing an
existence predicate and possibly also define restricted quantifiers on the basis of
it just like in 3.20 and 3.21. Let E(x) be such a unary existence predicate with
intended interpretation x exists. We can then represent 3.24 as 3.25, where a is
a constant for Santa Claus and P(x) a predicate with reading x has a long, white
beard.
(3.24) Santa Claus doesn’t exist but he has a long, white beard.
(3.25) ¬Ea ∧ Pa
Using the relativized quantifiers we may quantify over existing objects, whereas
the unrestricted quantifiers run over all objects.
There is a long philosophical tradition of criticizing such uses of an existence
predicate. Some logicians have a strong philosophical conviction that actualism
is the right position towards non-existence: We can only ascribe (positive)
properties to things that exist. In contrast to this, allowing meaningful talk
about nonexistent objects is a decidedly possibilist position. One might ask how
we can know which properties a particular nonexistent object has. This concern
is legitimate and many different answers have been given to it, but at one point
one point or another one has to opt for a possibilist position if one is interested
in natural language semantics. It is pointless to claim, from a fundamentalist
point of view, that 3.24 cannot be true despite the fact that we commonly regard
it as being true. Theories of fictional objects deal with this problem.
Another concern with the existence predicate is based on the idea that unsound proofs like the ontological proof for the existence of God mentioned in
section 2.4 of the last chapter would go through if existence were a predicate.
Despite its persistence this criticism is not justified. First, the fact that the
3.5. APPLICATIONS TO NATURAL LANGUAGES
85
concludion of a valid argument that is intuitively judged sound might be undesirable or appear to be implausible doesn’t for itself suffice for challenging
any of its premises, although we sometimes take an implausible conclusion of a
valid argument as an indicator that something must be wrong with one of its
premises. The mere conclusion that God exists doesn’t seem to suffice to turn
the argument into a reductio ad absurdum. From a logical point of view, in a
reductio ad absurdum we merely assume the premises without considering them
plausible and then derive a contradiction, whereas in this case some people
might merely find the conclusion somewhat implausible (depending on their
beliefs). Second, it is an open question whether the premises of (particular
versions of) the ontological argument are sound. Saying that the most supreme
being must exist, because not to exist is a flaw, is not substantially different
from saying that Erich’s best toaster must exist, because a nonexistent toaster
cannot be the best toaster. This premise is not true and, moreover, I don’t have
a toaster. But if the premises of the ontological argument are not sound anyway,
then it doesn’t matter whether they establish the conclusion or not. Third, ‘to
exist’ is a verb that in finite clauses occurs as a grammatical predicate; there is
no linguistic evidence that it should be translated to anything else than a logical
predicate.
Two uses of the existence predicate have to be distinguished: In a strict
actualist setting the existence predicate must be reducible. ‘Reducible’ here
means that a rule P t ⊢ Et is added and ⊢ Et holds for any term t. This makes
the existence predicate redundant, but it may still be used to analyze ‘to exists’.
In a possibilist setting, on the other hand, no restriction is put on the existence
predicate and the existence predicate has no special logical properties. It is just
another predicate with a certain intended meaning and simply divides the total
domain D e in two halves: the set of objects a for which Ex M,g[ x/a] is true and
the set of objects for which it is false. This use of the existence predicate is not
redundant but also does not change the logic, because in this use the existence
predicate has no special logical properties. This use is exemplified by 3.25.
3.5 Applications to Natural Languages
3.5.1 Truth-Conditions and Pre-Montegovian Semantics
Before Richard Montague and others popularized higher-order logic and categorial grammar in linguistics at the end of the sixties and the beginning of
the seventies of last century, sentence-level semantics was mostly based on
the specification of truth-conditions of sentences in first-order logic. In fact, a
large number of natural language expressions have first-order characterizable
truth-conditions and it is possible to specify reasonable semantic representations of many natural language sentences in first-order predicate logic. The
representations are less elegant than the ones in higher-order logic that will be
discussed in the next chapter and it is very hard to provide a mechanical map-
86
CHAPTER 3. FIRST-ORDER LOGIC
ping from natural language to first-order logic, because many natural language
constructions can only be expressed by ‘tricks’ in first-order logic. We will take a
look at some examples and some of these ‘tricks’ in the following paragraphs.
The Main Verb. A main verb with n obligatory argument places can be
represented by an n-ary relation. For example, ‘give’ may be represented by a
predicate P(x, y, z) with reading x gives y to z or ‘to buy’ may be represented by
a predicate P(x, y, z, z′ ) with reading x buys y from z at the price z′ .
Proper Names. A proper name can be represented by a constant, e.g. ‘Erich’
may be represented by a, ‘Maria’ by b, João Gonçalves by c, and so forth. Alternatively, one could use a unary predicate in combination with a iota operator
or iota quantifier. For example, ‘João laughs’ could be expressed as L(ι x.P x) or
x[P x]Lx. This use of descriptions for proper names was advocated by Russell
and later criticized by Saul Kripke in Naming and Necessity on the basis of
philosophical and linguistic intuitions about how proper names are understood
in modal claims such as ‘Aristotle might not have been the teacher of Alexander
the Great’. There is an extensive literature about this topic and in the aftermath
of Kripke’s work proper names are most commonly represented by constants.
ι
Sentence Connectives. It is natural to translate ‘and’ and ‘but’ to ∧, ‘or’ to
∨, ‘if. . . then. . . ’ to → and so on – as long as you keep in mind that there are
many exceptions to these rules!
Quantifiers. ‘all’ and ‘every’ can be expressed by ∀ and English ‘a’ by ∃. Some,
though by far not all, uses of the English determiner ‘the’ can be represented by
the iota quantifier or operator. Other quantifiers like ‘exactly three’ are also
first-order definable. Others like ‘most’ cannot be expressed in first-order logic.
☞ Note 12 (Correct Use of ∃ and ∀) As a general rule when using ∀ it
must be combined with the conditional in order to get the intended reading.
For example, ‘all men are mortal’ becomes ∀ x[Men(x) → Mortal(x)]. The
existential quantifier ∃, on the other hand, usually needs to be combined
with conjunction in order to get the intended reading. For example, ‘there is
a mortal man’ becomes ∃ x[Man(x) ∧ Mortal(x)].
Why is this so? It is clear in case of the existential quantifier that in order
to be a mortal man you need to be mortal and a man. (This translation of
adjectives does not work in general, though. See below.) But if we would use
conjunction for ‘all men are mortal’ we would get a completely inadequate
reading: ∀ x[Men(x) ∧ Mortal(x)] all things in the domain of the model are
both men and mortal, which is clearly not what ‘all men are mortal’ means.
3.5. APPLICATIONS TO NATURAL LANGUAGES
87
Recall from chapter 1 that set-theoretically ‘all men are mortal’ can
be represented as Men ⊆ Mortals, i.e. the set of men is a subset of the set
of mortal things. ∀ x[Man(x) → Mortal(x)] encodes exactly the same truth
conditions. Suppose Men = ;. Then the above statement is true, because
; ⊆ A for any set A. Likewise, if Man(x) turns out false, Man(x) → Mortal(x)
is true. If on the other hand Men 6= ; then if Men ⊆ Mortals holds, Mortals 6=
; must hold as well – because obviously a non-empty set cannot be a subset
of the empty set. Likewise, if ∀ x[Man(x) → Mortal(x)] and ∃ x.Man(x) (which
could be read as ‘the set of men is non-empty’), then it is also the case that
∃ x.Mortal(x) (which could be read as ‘the set of mortal things is non-empty’).
Notice also the following equivalences and the role that negation plays
in it:
∀ x[P x → Qx] ↔ ¬∃ x¬[P x → Qx]
↔ ¬∃ x¬[¬P x ∨ Qx]
↔ ¬∃ x[¬¬P x ∧ ¬Qx]
(3.26)
(3.27)
(3.28)
↔ ¬∃ x[P x ∧ ¬Qx],
(3.29)
∃ x[P x ∧ Qx] ↔ ¬∀ x¬[P x ∧ Qx]
(3.30)
and
↔ ¬∀ x[¬P x ∨ ¬Qx]
↔ ¬∀ x[P x → ¬Qx].
(3.31)
(3.32)
Adjectives. Some adjectives can be represented in a straightforward way
while others are pretty hard to express out-of-the-box in first-order logic. For
example, ‘red ball’ may be represented as Rx ∧ Bx (‘R’ for redness and ‘B’ for
being a ball), but such a conjunctive analysis is inadequate for ‘famous pianist’.
Why? Someone can be famous for his virtuosity with the sledgehammer and at
the meantime be a lousy pianist, but this doesn’t make him a famous pianist.
Indexicals and Anaphora. As long as they stand for ordinary individuals,
indexicals and anaphora can be expressed in first-order logic either as open
variables or as functions from some suitable type of context ‘individual’ to an
ordinary object. For example, ‘I’ in ‘I’m hungry’ could be represented as Hx with
the convention that x represents the speaker of the utterance and ‘he’ in ‘John is
hungry. He eats a banana’ could be represented as Ha ∧∃ x∃ y.B y ∧ E(x, y) ∧ x = a.
In a more elaborate account, one could stipulate that contexts (whatever they
are) are in D e and express ‘I’m hungry’ as ∃ x.H( f (x)), where f in the model
represents a function taking an utterance situation and yielding the speaker
88
CHAPTER 3. FIRST-ORDER LOGIC
in that utterance situation. For a more convenient notation it would also be
possible to introduce an additional domain D c for contexts and make the logic
two-sorted.
Tenses. Like in the case of adverbs tenses require the introduction of time
intervals or situations, events, or other objects on which a temporal ordering
relation is defined. When time intervals or other temporally ordered entities
are available a denotational semantics for the basic absolute tenses can be
implemented. Suppose we modify FOL and introduce a new sort of variables t,
t 1 , t 2 ,. . . for time intervals with corresponding quantifiers ∃ t and ∀ t . Let < be a
temporal ordering of the domain of time intervals D t in the meta language such
that t 1 < t 2 if t 1 ends before t 2 starts. We can then add an argument place for a
time interval to each predicate that corresponds to a finite verb and formulate
rules for basic absolute tenses:
(
1 if g(t 1 ) < g(t 2 )
M,g
Past(t 1 , t 2 )
=
(3.33)
0 otherwise
(
1 if g(t 2 ) < g(t 1 )
M,g
F ut(t 1 , t 2 )
=
(3.34)
0 otherwise
Assuming a convention that the variable t 0 is interpreted as the time of utterance we may then express 3.35 ad 3.36 and 3.37 as 3.38.
(3.35) John was hungry.
(3.36) ∃ t 1 .Past(t 1 , t 0 ) ∧ Hungry(t 1 , j)
(3.37) John will meet Mary.
(3.38) ∃ t 1 .F ut(t 1 , t 0 ) ∧ Meet(t 1 , j, m)
For a more detailed treatment of tenses and aspect more relations between time
intervals are needed. We need to be able to express that one time interval is a
subinterval of another and when two time intervals overlap. These relations are
first-order axiomatizable. See van Benthem (1991) The Logic of Time for more
details.
3.5.2 Some Problems
Let me now give a few examples that illustrate the complexity of natural languages and that even simple sentences may elicit complicated semantic problems
whose solution may be approached in different ways. There are numerous similar examples, of course, and the work of the general semanticist is to a large
extent to classify different readings of natural language expressions and somehow determine how they might be represented in first-order logic, higher-order
logic, or using other tools.
3.5. APPLICATIONS TO NATURAL LANGUAGES
89
Belief and other Attitudes. It is very hard to correctly implement belief
and similar attitudes, including the meaning of factive verbs such as ‘to know’,
in out-of-the-box first-order predicate logic. They are usually implemented on
the basis of modal logic and generally considered as so-called intensional verbs.
Consider the following sentence:
(3.39) John believes that Mary is hungry.
The verb ‘to believe’ is clearly not truth-functional. The truth value of the whole
sentence does not directly depend on the truth-value of the embedded sentence.
It is possible to introduce an operator into first-order logic that builds a name
for a wff. Suppose – for the sake of the argument and without going into the
details – that pφq builds a constant for the formula φ. Then we can analyze
belief as a predicate as in the following formula:
Bel( j, p H(m)q)
(3.40)
However, this way of analyzing attitudes is not very popular for a variety of
reasons. First, it relates John to a formula of a specific formal language. But
why? Perhaps John doesn’t know anything about logic, but the relation is really
stipulated to hold between him and the formula itself. Shouldn’t such a belief
relation rather be a relation between John and what the formula means? The
formula itself is just a string of symbols! Second, a belief predicate of this kind
only works as long as no strong introspection principles are assumed. Such
principles do, for example, assert that whenever someone believes that φ, he
also believes that he believes that φ (positive introspection) or that whenever
someone does not believe that φ he believes that he does not believe that φ
(negative introspection). Another such principle is for example needed for the
factivity of ‘to know’, namely that if someone knows that φ, then φ is the case
(KK principle or factivity of knowledge). These introspection principles are
available in different versions of modal logic and can be integrated into firstorder predicate logic by embedding modal logics into it. The resulting systems
are known under the label first-order modal logic.
Montague (1974) showed that when a belief predicate is added to first-order
predicate logic, a way of quantifying over embedded propositions is added (as in
‘Everything John believes is false’ ∀ p[Bel( j, p) → ¬ p]), and sufficiently strong
introspection principles are assumed, then the logic becomes inconsistent.8
These problems can be circumvented with some fairly technical trickery and
then the syntactic treatment of belief is more expressive than the modal logical
one (see Bolander 2003). Nevertheless, the usual, established way to deal with
so-called propositional attitudes like belief is to use a system of modal logic (often
one called KD45) or more elaborate approaches that have descended from modal
8 Montague, R. (1974). Syntactical Treatments of Modality, with Corollaries on Reflexion
Principles and Finite Axiomatizability Formal Philosophy. In Thomason, R. (Eds.) (1974). Selected
Papers of Richard Montague, (pp. 287–302). Yale University Press.
90
CHAPTER 3. FIRST-ORDER LOGIC
logic. The bottomline of this section is that you ought not attempt to analyze
belief, knowing that, and other attitudes that take an embedded sentence as
complement (e.g. ‘to fear’, ‘to doubt’) as predicates unless you know what you’re
doing.
Asymmetric Conjunction. Asymmetric conjunction is a special reading of
a use of the seemingly innocuous word ‘and’ at sentence level. Consider the
following utterances:
(3.41) Lea grabbed the bottle of whiskey and took a big sip.
(3.42) Lea took a big sip and grabbed the bottle whiskey.
Speakers tend to read 3.41 such that Lea first grabbed the bottle of whiskey and
then took a big sip out of that bottle of whiskey, whereas this interpretation is not
so readily available in 3.42. Traditional truth-conditional semanticists would
consider this a pragmatic phenomenon as in the contrastive reading expressed
by ‘but’. On the other hand, it is pretty hard to interpret 3.41 in a way such that
Lea took a big sip out of a completely different bottle.
Adverbs. Without introducing contexts, situations, events, or possible worlds
specifying an adequate truth-conditional meaning for adverbs is difficult or even
impossible in classical first-order logic. Sentence adverbs like ‘presumably’ or
‘timely’ have a complicated meaning and modify the meaning of the sentence
as a whole just like tenses and moods. Even adverbs that only modify a verb
or adjective are hard to implement in first-order logic, because it doesn’t allow
us to quantify over predicates or express a function from a predicate to a new
predicate. Consider for example the intensifier ‘very’ in ‘Maria walks very fast’.
At least prima facie it seems that ‘very’ modifies the meaning of ‘fast’ – and it
would be mind-boggling to think of this as a conjunctive condition, because there
are no ‘very’ things or events.
Counterfactual Conditionals
following one.
Consider counterfactual conditionals like the
(3.43) If Kennedy had pressed the button, the world would have been destroyed
in a nuclear Armageddon.
This sentence cannot be represented by an ordinary truth-functional conditional,
because it involves certain deliberations about counterfactual scenarios. Many
different non-classical logics for counterfactual conditionals have been proposed
3.5. APPLICATIONS TO NATURAL LANGUAGES
91
and there is not much of an agreement about what exactly the truth-conditions
of sentences like 3.43 are and whether they are first-order definable or not.9
NP-Conjunction.
Consider the following uses of ‘and’ for NP-conjunction:
(3.44) Ontem Maria e João foram no cinema.
(3.45) John and Peter dragged the washing machine six floors over the stairs to
my apartment.
Ordinary truth-functional conjunction in first-order logic only allows us to
combine sentences with sentences. But even if we ignore for a moment how to
obtain it in a systematic way from the natural language sentence, does 3.44
even allow a reading as Yesterday Maria went to the cinema and yesterday João
went to the cinema (but not necessarily the same and together)? Can it express a
stronger condition, saying Yesterday Maria and João went to the cinema together,
and if so, how can this be represented in FOL? Moreover, anyone who has ever
carried a washing machine will be aware that only the stronger, second reading
can be present in 3.45. But how is this reading triggered and is it part of the
truth-conditional meaning of the sentence?
3.5.3 Deductive Arguments
Deductive arguments are represented and defined the same way as in propositional logic except that the first-order quantifiers are now available. For example,
we can now represent the following classical argument.
(3.46) All men are mortal.
(3.47) Socrates is a man.
(3.48) Hence, Socrates is mortal.
The translation to first-order predicate logic is: (∀ x[P x → Qx] ∧ Pa) → Qa. Using
first-order tableaux we can prove the validity of the argument by assuming the
premises in the antecedent of the main conditional, denying the consequent,
and checking that the tree closes.
9 Bear in mind that first-order definability is mostly a matter of the quantification involved.
In one account for counterfactuals 3.43 would be true if in the most plausible scenarios in which
Kennedy has pressed the button is a nuclear Armageddon.
92
CHAPTER 3. FIRST-ORDER LOGIC
∀ x[P x → Qx]
Pa
¬Qa
Pa → Qa
¬Pa
Qa
The tree closes, hence the argument is valid. Again, the soundness of the
premises must be established in order to make the argument convincing at
all. Notice that this task is particularly difficult when universal quantification
is at play, i.e. either ∀ or ¬∃ are used. While from a strictly logical point of
view one counterexample to a universal claim suffices to disprove the universal
claim it is in many real-world cases almost impossible to show that no such
counterexample exists. That is the reason why fallibilists like Karl Popper have
emphasized that empirical scientific theories – which usually involve universal
statements – cannot be verified once and for all. They can only be confirmed
by positive evidence and perhaps later disproved by counterexamples. From a
practical perspective, this is a good rule of thumb for many scientific theories
even if the domain is restricted and ultimately finite. For example, a biologist
might not be able to check whether animals of a certain species have a certain
property and be absolutely certain that he didn’t miss one, although there are
only finitely many beings on earth. In many other cases, however, the domain
is not only clearly finite but it is also practical to make use of this fact. For
example, I can easily verify a universal claim about the coins in my wallet by
exhaustively checking each one of them.
3.5.4 Exercises
✐ Exercise 34 Using the predicates Sx (‘x can solve this problem’), Mx (‘x
is a mathematician’), J x (‘x is Joe’), translate the following arguments into
first-order predicate logic and check whether they are valid using semantic
tableaux. (taken from Simpson (2004))
a. If anyone can solve this problem, some mathematician can solve it. Joe is a
mathematician and cannot solve it. Therefore, nobody can solve it.
b. Any mathematician can solve this problem if anyone can. Joe is a mathematician and cannot solve it. Therefore, nobody can solve it.
✐ Exercise 35 Define predicates and relations with adequate readings for
each of the following sentences and formulate truth-conditions for at least one
reading of each of them in first-order predicate logic. (Tenses and possessive
pronouns like ‘his’ should be ignored.)
a. Peter likes Mary even though she can’t stand him.
3.5. APPLICATIONS TO NATURAL LANGUAGES
93
b. O Pedro tem um gato ou um cão.
c. Either Ana or Maria has a car.
d. Every sailor has a ship.
e. All Cretans except Epimenides are liars.
f. Everything Midas touches turns into gold.
g. O Pedro dá o seu livro à Ana.
h. Anyone who counterfeits this $10 bill has to pay a fine or go to prison.
i. It was the gardener or the butler who killed Lady Buttersworth. If it was the
gardener then the housemaid was an accomplice. If it was the butler then
the taxi driver must have seen him. The taxi driver did not see the butler.
Hence, Lady Buttersworth was killed by the gardener and the housemaid
was an accomplice.
j. Not everything made of gold shines.
k. O Afonso gosta de peixe porque peixe é saudável.
l. No student deserves a beer unless he has finished his logic exercises.
m. Cada profesor tem um carro mas há estudantes que não têm um.
n. Se este livro não conter um erro os estudantes o encontram.
✐ Exercise 36 So called Donkey-sentences have been one of the main motivations for developing dynamic predicate logic (DPL). Here is such a sentence:
(3.49) Every farmer who owns a donkey beats it.
Provide a wff that expresses the correct truth-conditions for (at least) one
reading of the sentence.
✐ Exercise 37 This is one of my favorite ‘joke’ proofs:
Nothing is better than a steak.
A salad is better than nothing.
A salad is better than a steak.
a. What is the main mistake in the argument? Why the conclusion does not
follow?
b. Formalize the alleged proof and prove that the result is contradictory.
94
CHAPTER 3. FIRST-ORDER LOGIC
✐ Exercise 38 Translate the following deductive argumentation fragments
into propositional logic, idealizing the statements as is deemed appropriate, and
determine whether they are deductively valid or based on a fallacy.
a. Anyone who eats animals is evil. If someone is a vegetarian then he does
not eat animals. John is not a vegetarian or he just accidentally ate a big,
yummy steak. Therefore, John is evil.
b. Eating dolphins does not amuse fourteen year old girls. João just ate a
dolphin and it amused Patricia. Therefore, Patricia is not a fourteen year old
girl.
c. Smoking marihuana is prohibited to anyone unless he has a painful medical
condition. Jack does not have a painful medical condition. Therefore, Jack is
not allowed to smoke marihuana.
d. Every real American who owns a TV also owns a car. Everyone who owns a
car earns more than $ 100000 a year. The Jacksons don’t make more than $
100000 a year and don’t even own a TV. Hence, they are not real Americans.
e. Every decent chap likes coffee or cigarettes. Jacky, a convicted mass-murderer
and baby-eater, likes coffee and cigarettes. Therefore, Jacky is a decent chap.
f. Mushrooms can cause hallucinations. Maria had scrambled eggs with mushrooms, orange juice, and a slice of toast for breakfast and is hallucinating.
Therefore, the mushrooms caused her hallucinations.
3.6 Metatheorems
First-order predicate logic is sound, complete, and compact. However, first-order
predicate logic is undecidable, or, more precisely, it is semidecidable. Being
semidecidable means the following: While there is a terminating procedure for
determining that a given formula is valid (if it is one), there is no procedure that
determines in finite time that a given formula is not valid.10
According to the Löwenheim-Skolem theorem any satisfiable first-order formula is satisfiable in a countable domain, i.e. in a domain whose cardinality is
not larger than ℵ0 .11 When this was first proved it was a matter of great concern
to many logicians. In side note 1 (page 6, chapter 1) it was mentioned that the
cardinality of the set of real numbers is higher than ℵ0 . However you put it there
10 Being undecidable means that there is neither a terminating procedure for determining
that a given formula is valid nor one for determining that it is not valid. Since in the case of a
semidecidable logic for a formula whose status with respect to validity is unknown there is no
procedure that is guaranteed to decide in finite time whether a given wff is valid or not – such
a procedure might not halt – the difference between semi- and full decidability doesn’t matter
much and first-order logic is often just labeled as being undecidable.
11 Obviously, it could be smaller in case the formula is satisfiable in a finite domain.
3.7. LITERATURE
95
are ‘more’ real numbers than natural numbers, because you cannot create a
bijection between N and R. Now it follows from the Löwenheim-Skolem theorem
that any (consistent) arithmetic theory of real numbers formulated in first-order
logic has a countable domain, i.e. all the formulas of the theory are satisfiable
in a countable domain even though they are supposed to characterize the real
numbers. (This is sometimes referred to as ‘Skolem’s paradox’.) First-order logic
cannot distinguish between countable and uncountable domains.
3.7 Literature
Semantic tableaux are also sometimes called Smullyan calculi, because Raymond Smullyan pioneered them. He has written numerous books on logic
including ones intended for a general audience. His logical puzzles have entertained generations of professionals and laymen.
• Raymond M. Smullyan (1995). First-order Logic. Dover.
Smullyan’s introduction is a classic and highly recommended. I have also
already praised Hodges’ introduction in the last chapter, which is particularly
well-suited for linguists.
• Wilfried Hodges (1977, 2001). Logic: an introduction to elementary logic.
Penguin.
For people interested in a little bit more mathematical background information,
the following two books are a good start. Ebbinghaus et. al. (1994) is a standard
text aimed at people with a strong mathematical background interested in
metatheorems and properties of classical logics. If you want to buy it, get the
latest edition. Andrews (2002) is a very thorough and good introduction to
mathematical logic in which all theorems are numbered; it also introduces
to higher-order logic (see next chapter). Unfortunately, the book is rather
expensive.
• H.-D. Ebbinghaus, J. Flum, and W. Thomas (1994). Introduction to Mathematical Logic. Springer.
• Andrews, Peter B. (2002). An Introduction to Mathematical Logic and
Type Theory: To Truth Through Proof. Kluwer.
C HAPTER
4
Higher-Order Logic
In this chapter we will take a look at higher-order logic insofar as it is used
in linguistic theorizing. The chapter provides more of an overview and does
not go into all technical details. The version of higher-order logic we will be
concerned with is also sometimes called type theory, because every expression in
this logic must have a semantic type that regulates how it is interpreted. One
of the most influential type theories was Church’s simple theory of types, which
will also be the basis of this chapter. Type theory is surprisingly simple and
elegant; many concepts are definable in it because of its expressivity. This comes
at a price: Higher-order logic with standard models is not compact and not
complete. There are, however, proof theories that are complete for higher-order
logic with so-called General Henkin Models and there are many automated
theorem provers for higher-order logic.
4.1 Syntax of Simple Type Theory
We call our type theoretic language HOL. The formulation will differ from the
seminal one introduced by Church (1940).
4.1.1 Types
Types. Let T be a finite set of types of HOL. For the time being, we only use
e for entities and t for truth values. If α, β ∈ T then (αβ) ∈ T. Nothing else is in T.
Notice that the base types are the same as the ones we used for first-order logic,
but compound types hint at some richer expressivity. An ordinary first-order
unary predicate has type (et); in HOL there are infinitely many types on top of
the base type. For example, ((et)t) is a type, which will later be interpreted as a
predicate of a predicate.
97
98
CHAPTER 4. HIGHER-ORDER LOGIC
Notational Convention. Outer parentheses around types may be left out.
Within types parentheses may be left out, in case of which right-associativity is
assumed. This means that for example eet may be written for e(et) and (et)(et)
for ((et)(et)).
4.1.2 Terms
Base Terms. Base terms consist of sequences of alphanumeric characters
and special symbols like ∀, ∃, ∧, and so forth. Let L T be the set of terms of
HOL containing the base terms. Every base term has a base type that may be
indicated as a superscript.
Variables. We assume that for every term of type α there is a variable of
type α, where by convention x, y, z are used as variables of type e and P,Q, R as
variables of type et as long as no other type is indicated as a superscript.
Constants. N tt , A t( tt) , and Q α(α t) are logical constants, where α is any type.
We use φ, ψ as metavariables for terms in what follows.
Compound Terms.
i. If φ is a base term of type (βα) and ψ a term of type β, then (φψ) is a term
of type α.
ii. If φ is a term of type α and x is a variable of type β, then (λ xφ) is a term of
type (βα).
✧ Remark 9 (Alternative Notations.) Using e as type for objects and
t for truth-values is common in the linguistic literature. There is less
agreement on so-called intensional types, i.e. types for terms that denote
entities like possible worlds or situations. The letter i (for intension) is
sometimes used; I personally have used s for situation and c for context
elsewhere. Linguists often write types like the tuples by means of which
they can be represented: 〈〈 e, t〉 t〉 instead of (et)t. Computer scientists often
compose types with → as a symbol: (e → t) → t instead of (et)t. (They also
sometimes use additional type constructors like for example × for so-called
product types. For example the type (e × e)t would denote a binary predicate.
We do not use these types here.) In the logical literature on simple type
theory Church’s notation seems to be prevalent. He uses ι for objects, o
for truth-values, left-associativity is assumed, and the order is reversed in
comparison to our types. That is, ooι in Church’s notation corresponds to
(et)t in our notation.
4.1. SYNTAX OF SIMPLE TYPE THEORY
99
☞ Note 13 (Type Theory vs. Higher-Order Logic) You might have realized that there are no wffs in type theory. This is so, because as we will
see below the types suffice for understanding what a term means. Moreover,
type theory alone doesn’t require us to have a type t for truth-values. While
our semantics in the next section is for higher-order logic, where there is at
least a type e for individuals and a type t for truth-values, the λ-calculus
that will briefly be strived in section 4.3 works independently of whether
there is a type t or not. In this sense, higher-order logic is a special application of simple type theory. For completeness it must be mentioned, however,
that the above way using types is just one popular among many possible
ways of formulating a paradox-free higher-order logic. Some formulations
of higher-order logic do not use types at all and instead avoid paradoxes by
restricting a comprehension principle that regulates the concepts to which a
higher-order variable may refer in a suitable way.
The reason why it is relatively common to use A, N, Q, and Π as symbols is
that the functions are Schönfinkelized (see page 101 below) and some syntactic
sugar needs to be added to make terms more readable.
☞ Note 14 (Praefix vs. Operator-Argument Syntax) Following a tradition based on Church’s simple theory of types the above syntax specifies
praefix notation for functions instead of the familiar functor-argument syntax. In praefix notation, the parentheses are put around the functor and its
argument, i.e. ( f a) is written instead of f (a) for the functional application
of f to its argument a. This syntax has been popularized by the programming language LISP and its derivatives, which originally has been based on
λ-calculus plus a few built-in functions.
Notational Conventions. We may write ¬φ for (N φ), (φ∨ψ) for ((A φ)ψ), and
(φ = ψ) for ((Q φ)ψ). Let as also allow functor-argument syntax as a notational
variant and other usual syntactic conventions such as infix notation for =, dotnotation, and leaving out redundant parentheses. Let us further write functions
as taking n argument instead of subsequent applications of unary functions.
That is, we may write P(x, y) instead of ((P x) y). Finally, it is common to contract
multiple λ-abstractions into one, i.e. for example to write λ x y.P x y for λ xλ y.P x y.
100
CHAPTER 4. HIGHER-ORDER LOGIC
Because of the rules for the λ-calculus that will be introduced in section 4.3 great
care must be taken not to confuse the scope of λ-terms and their arguments
when the notations are mixed, though. In case of doubt, it is best to resort to
the original praefix notation and I will use only few notational simplifications in
what follows. You should also keep in the back of your mind that n-ary function
notation is only syntactic sugar in traditional non-relational type theory. For
example, when we write P(a, b) as usual, where P is of type eet and a, b are of
type e, it must be kept in mind that there is an intermediate term (Pa) whose
meaning is well-defined: it is a function taking an object and yielding a function
that takes an object and yields a truth value.
4.2 Semantics of Higher-Order Logic
4.2.1 General Models and Truth in a Model
General Models. A general model M of HOL contains a base domain D α for
each base type α and an interpretation function I g (.) that maps expressions to
their domain in dependence of a variable assignment g as follows:
I g (φα ) ∈ D α
g
I (φ
(αβ)
) ∈ D (αβ) , where
(4.1)
D
D (αβ) ⊆ D β α
I g (λ xα .φβ ) is the function f such that for any
(4.2)
(4.3)
a ∈ D α , f (a) = I g[ x/a] (φ)
Definition 4.2 is of particular importance. Notice that it defines the domain
of terms of a function type (αβ) as a subset of all functions from D α to D β . In
practice, when a concrete model is specified, this means that we have to choose
a particular subset of the set of all functions from D α to D β for each compound
type (αβ) in the model, and consequently the quantifiers for variables of type
D
(αβ) only run over this subset. If we had instead defined D (αβ) = D β α then the
model would be a so-called standard model and the logic as a whole would have
very different properties from the one we have defined here! (See section 4.6 for
more details.) General models go back to Henkin (1950) and are often called
Henkin models.
Truth in a Model. To define the logical constants N, A, and Q, a function
. M,g evaluates expressions in dependence of I of M and an assignment g as
follows:
(
1 if φ M,g = 0
M,g
(N φ)
=
(4.4)
0 otherwise
A M,g is the function f such that f (x)(y) = 0
if x = 0 and y = 0, and otherwise f (x)(y) = 1
(4.5)
4.2. SEMANTICS OF HIGHER-ORDER LOGIC
101
Q M,g is the function f such that f (x)(y) = 1
(4.6)
(φψ) M,g = I g (φ)(I g (ψ))
(4.7)
if x = y, and otherwise f (x)(y) = 0
(λ xφ) M,g = I g (λ xφ)
(4.8)
This is one way of defining a minimal higher-order logic. N is obviously just
truth-functional negation, A is disjunction, and Q is identity. The other rules
just pass on functional application and λ-abstraction to the previously defined
interpretation function. The formulations of disjunction and identity might
seem odd at first glance, and there would have been numerous other ways to
define them, but they make sense given that HOL only contains unary functions.
In the above formulations, the dyadic functions have been Schönfinkelized or,
according to more common terminology, curried.
4.2.2 Interdefinability of Quantifiers and Identity
Where are the quantifiers? As it happens, identity and the universal or existential quantifiers are interdefinable in higher-order logic. Here is how to define
the universal quantifier:
⊤ := ((Q ( ttt)( ttt) t Q ( ttt) )Q ( ttt) )
(Π(λ xφ)) := ((λ xφ) = (λ x.⊤))
(4.9)
(4.10)
The auxiliary definition (4.9) is a cumbersome, but correct way of defining
the Verum, i.e. a term that is always true. Definition (4.10) then defines the
universal quantifier Π by asserting that the term λ xφ is identical to a function
that takes an x and yields true. Bear in mind that λ xα φ M,g is the function f
such that for any a ∈ D α , φ M,g[ x/a] = 1. The identity in definition (4.10) asserts
that this function f yields true for any given a, i.e. for all a ∈ D α , φ M,g[ x/a] = 1.
That’s universal quantification. Since the symbol Π looks a bit weird it is
customary to write ∀ xφ instead of Π(λ xφ). The existential quantifier can be
defined as ∃ xφ := ¬Π(λ x¬φ).
You already know from the discussion of Leibniz’ Law in the last chapter how
to define identity in terms of the universal quantifier. Instead of (4.6) we could
have introduced a Schönfinkelized definition for Π, defined ∀ as a notational
shortcut and then used formula 3.15 on page 73 to introduce identity.
Notational Convention.
We may write ∀ xφ instead of Π(λ x.φ).
4.2.3 More Definitions
The other truth-functional connectives are defined as usual. For completeness,
here are some definitions you could use:
(φ ∧ ψ) := ¬(¬φ ∨ ¬ψ)
(4.11)
102
CHAPTER 4. HIGHER-ORDER LOGIC
(φ → ψ) := (¬φ ∨ ψ)
(4.12)
(φ ↔ ψ) := ((φ → ψ) ∧ (ψ → φ))
(4.13)
4.3 Typed λ-Calculus
There are tableaux systems for higher-order logics and they look very similar to
the ones for first-order logic plus some rules for λ-abstractions. Proof systems
like tableaux or natural deduction rules for higher-order logics are primarily
used (in combination with more specialized implementation methods) in higherorder logic theorem provers – see, for example, the following ones:
• Isabelle http://www.cl.cam.ac.uk/research/hvg/isabelle/
• Leo-II http://www.ags.uni-sb.de/~leo/
• TPS http://gtps.math.cmu.edu/tps.html1
Using a proof theory for higher-order logic by hand is cumbersome and we will
not take a look at tableaux for higher-order logic here. However, using the rules
of typed λ-calculus is very common in semantics. Lambda calculus only deals
with λ-terms, which according to rule 4.3 express functional abstractions. As we
shall see soon, they can be used to express arbitrary scope distinctions and for
this reason play a crucial role in semantics. The typed λ-calculus allows us to
simplify λ-terms in a mechanical way and thereby resolve scope distinctions.
4.3.1 Conversion Rules
The rules of lambda calculus are known as α-, β-, and η-conversion. Here they
are:
α
(λ x.φ) ⇔ (λ y.φ[x/y])
β
((λ x.φ)ψ) ⇔ φ[x/ψ]
η
(λ x.φ) ⇔ φ
α-conversion
(4.14)
β-conversion
(4.15)
η-conversion
(4.16)
Hereby, x may not be free in φ in 4.16, x must be free in φ and x and ψ must be
of the same type in 4.15, and φ[x/y] is the same term as φ except that all free
occurrences of x in φ have been substituted by y.
The rules express equivalences, but the ⇔ sign must be understood as a
purely syntactic operation. In other words, the rules really specify a calculus, i.e.
a mechanical system for calculating something that is like a proof theory but
1 If it weren’t against other regulations I wouldn’t mind if you would use one of these theorem
provers during exams. Learning and understanding how to use them is not substantially easier
than learning and using tableaux. The educational variant of TPS called ETPS is used at Carnegie
Mellon university for introductory logic courses.
4.3. TYPED λ-CALCULUS
103
more general (every proof theory is a calculus but not vice versa). The expression
on the left hand side may be rewritten as an expression on the right hand side
and vice versa. Since in rules 4.15 and 4.16 the expressions on the right hand
side are less complex than the ones on the left hand side, they are usually used
in the direction from the left to the right in order to simplify terms. Used from
left to right, we call the rewrite rules reduction rules. Thus, it is common to
speak of β-reduction or η-reduction.
4.3.2 λ-Abstraction at Work
The rule for β-reduction is used very often. It says that you may substitute a
constant of the same type as the variable x in a term λ x.φ for free occurrences of
that variable in φ. To see why this is relevant to linguistic theorizing, consider
the following example.
❉ Example 9 (β-Reduction.) We would like to specify the truth-conditional
content of the following sentence (ignoring tense and aspect):
(4.17) O João gosta da Maria.
Let ‘Maria’ and ‘João’ be terms of type e. Let ‘like’ be a term of type eet. Then:
(λ yλ x((like x) y) Maria) João
β
(4.18)
⇒ λ x((like x) Maria) João
(4.19)
⇒ (like João) Maria
(4.20)
β
As you can see the order of argument application can be easily reversed by
using λ-abstraction. Of course, in the above example this is not really necessary,
because the function ‘like’ could simply have been defined with the arguments
reversed, i.e. reading like x y iff y likes x instead of the ‘more natural’ order of
arguments. However, in many other cases λ-abstraction is needed. The following
example further illustrates the power of λ-abstraction.
❉ Example 10 (VP-Conjunction.) We would like to specify a semantic representation for the following sentences (ignoring tense and aspect):
(4.21) O João ama e odia a Maria.
Let ‘hate’ be a function of type eet and ‘and’ be of type (eet)((eet)(eet)) defined
as λP eet λQ eet λ yλ x.((P x) y) ∧ ((Q x) y).
(((and love) hate) Maria) João
= ((((λP
β
eet
λQ
eet
λ yλ x.((P x) y) ∧ ((Q x) y)) love) hate) Maria) João
⇒ (((λQ eet λ yλ x.((love x) y) ∧ ((Q x) y)) hate) Maria) João
(4.22)
(4.23)
(4.24)
104
CHAPTER 4. HIGHER-ORDER LOGIC
β
⇒ ((λ yλ x.((love x) y) ∧ ((hate x) y)) Maria) João
(4.25)
⇒ (λ x.((love x) Maria) ∧ ((hate x) Maria)) João
(4.26)
⇒ ((love João) Maria) ∧ ((hate João) Maria)
(4.27)
β
β
As you can see, we’re getting closer and closer to natural languages. We have
just learned about a way to specify a reasonable semantic representation for
any phrase of the form VP and VP, where the VPs are transitive verbs. Before
exploring the possibilities of representing the truth-conditional semantics of
natural language expressions in higher-order logic in more detail, an important
remark has to be made about the examples. It is absolutely crucial to be aware of
the fact that the names of our object-language functions such as ‘Maria’, ‘like’, or
‘and’ play no role in actual linguistic theorizing. Just like in the previous chapter,
we could have used letters like P or symbols like ∧ instead. The semantic type
of these functions counts, as would special any meaning rules we might specify
for them, and because of the fixed meaning of ∧ and functional application the
result 4.27 expresses some structural constraint – but the names of the functions
are insignificant.
The α-conversion rule is also important, because of the restriction for βreduction that x must be free in φ. Suppose we have a term λP et .∃ x[P x], where
x is bound by the existential quantifier. Suppose we want to apply this term
to λ x.x = x; we are not allowed to apply the β-reduction rule, because x is not
free in ∃ x[P x]. But we are allowed to exchange variables in either of the terms
using α-conversion, e.g. we may convert λ x.x = x to λ y.y = y and then apply
β-reduction as follows:
(λP et (∃ x[P x])) (λ x.x = x)
(4.28)
⇒ (λP et (∃ x[P x])) (λ y.y = y)
(4.29)
⇒ ∃ x[(λ y(y = y)) x]
(4.30)
⇒ ∃ x[x = x]
(4.31)
α
β
β
Finally, the purpose of the η-conversion rule is easy to see. It allows us to get
rid of vacuous λ-abstraction. Take a term like λ x.like. The x is never applied to
the function and so the abstraction is vacuous, or spurious as it is sometimes
also called. Using η-reduction, we can rewrite this term as ‘like’.
4.3.3 Exercises
✐ Exercise 39 Simplify the following terms in praefix notation using rules
4.14 to 4.16:
4.4. APPLICATIVE CATEGORIAL GRAMMAR
a. (λ x.x)a
e. (((λP eet x.P x)(λ xλ y.x = y))a)
b. (((λ x yz(P z y x)c)a)b)
f. λ x(Pa)
c. (((λ xλ y.(R x y) ∧ R( y x)) a) b)
g. (λP(P x a))(λ xλ y.Q x y)
d. (((λP λ x.P x ∨ ¬P x)(λ x.x = x) a)
h. (((λPQ.∀ x[P x → Qx])Q 0 )P0 )
105
4.4 Applicative Categorial Grammar
4.4.1 Introduction
Without going too much into the formal details, let us now take a brief look
at categorial grammar (CG). Categorial grammar goes back to the work of
Kazimierz Ajdukiewicz.2 Ajdukiewicz used categorial grammar for natural
language syntax. We will take a look at a slightly extended version of his
original proposal, as it was used later by Yehoshua Bar-Hillel and David Lewis.3
Let there be a finite set of syntactic base categories C. Then we define
the compound categories recursively: If a, b ∈ C then (a/b) ∈ C and (a\ b) ∈ C.
We then assign to our lexical items (aka ‘words’) either a base category or
a compound category and write a : A for a source language expression A of
category a. The forward concatenation / is used as follows: an expression of
syntactic category a/b followed by an expression of syntactic category b yields
an expression of category a. The backwards concatenation operator \ does the
same but takes its argument from the left-hand side: An expression of category
b followed by an expression of syntactic category b\a yields an expression of
category a. The following examples illustrate how this works.
❉ Example 11 Let S, NP, N be our syntactic base categories. Rule applications
can be depicted by trees as follows:
a. John walks.
S
NP
NP \S
John
walks
2 Ajdukiewicz, K. (1935). Die syntaktische Konnexität. Studia Philosophica, 1, 1935, 1–27.
3 See Bar-Hillel, Y. On syntactical categories. Journal of Symbolic Logic, 15, 1–16; Lewis, D.
(1970). General Semantics.Synthese, 22, 18–67.
106
CHAPTER 4. HIGHER-ORDER LOGIC
b. John gives Maria the book.
S
NP \S
NP
John
(NP \S)/NP
NP
((NP \S)/NP)/NP
NP
gives
Maria
NP/N
N
the
book
c. The spy saw the woman with the telescope.
d.
S
NP \S
NP
NP/N
N
the
spy
(NP \S)\(NP \S)
NP \S
(NP \S)/NP
saw
NP
((NP \S)\(NP \S))/NP
NP/N
N
the
woman
with
NP
NP/N
N
the
telescope
107
4.4. APPLICATIVE CATEGORIAL GRAMMAR
e.
S
NP \S
NP
NP/N
N
the
spy
(NP \S)/NP
NP
saw
NP \ NP
NP
NP/N
N
the
woman
(NP \ NP)/NP
with
NP
NP/N
N
the
telescope
Notice that ‘with’ has been given two different syntactic categories in order to
account for the two different readings. With category ((NP \ N)\(NP \ N))/NP it
takes an NP and then a verb phrase to combine with the verb phrase. When
it has category (NP \ NP)/NP it combines with two NPs and yields a new NP,
giving the other reading. For the semantics in the next section this scheme will
have to be slightly modified, but the general idea will remain the same.
4.4.2 Type-driven Evaluation
We now add the semantic component to our categorial grammar. To do this,
we interpret the lexical items directly as terms of higher-order logic and make
sure that to any compound syntactic category a/b and b\a belongs a corresponding semantic type (ab). As a result, it is possible to derive the semantics of
an expression directly in parallel to its syntax by interpreting the syntactic
operations of forward and backward concatenation as functional application.
Consider sequences of terms. These are interpreted according to the following
phrase structure rule:
f
(a/b) : A (αβ) b : Bβ ⇒ a : (AB)
(4.32)
b : Bβ (b\a) : A (αβ) ⇒ a : (AB)
(4.33)
b
The rules are to be read as follows: To evaluate a sequence of two terms AB of
the respective syntactic categories and semantic type, rewrite them as on the
right-hand side, where the new syntactic category is indicated. This process
is known as type-driven evaluation. For this to work we must assure a close
correspondence between syntactic categories and semantic types which may
be called a category-type well-formedness principle. Only sequences of terms
with categories and types like in the above schemes are well-formed. This way,
108
CHAPTER 4. HIGHER-ORDER LOGIC
syntax and semantics are kept in parallel. To see how this works, take a look at
the examples from the previous section where now the semantics is annotated
below the syntactic categories.4
❉ Example 12 (Categorial Grammar) Let S, NP, B be our syntactic base
categories. Rule applications can be depicted by trees as follows. Hereby, tense,
mood, and aspects are ignored and I use traditional operator-argument syntax
for applications of functions like walk or give that are not further analyzed.
Moreover, the semantic type of terms is only annotated once and not repeated.
a. John walks.
S
walk(John)
NP
John e
NP \S
walk( et)
John
walks
b. John gives Maria the book.
S
give(John, Maria, ι x.book(x))
NP \S
λ x.give(x, Maria, ι x.book(x))
NP
John e
John
(NP \S)/NP
λ zx.give(x, Maria, z)
NP
ι x.book(x)
((NP \S)/NP)/NP
λ yzx.give(x, y, z)
NP
Maria e
NP/N
λP .ι x.P(x)
N
book( et)
gives
Maria
the
book
( et)
c. The spy saw the woman with the telescope.
4 For better readability I put the actual lexical strings on nodes of their own as if there were
lexical insertion rules like in phrase structure grammar.
NP \S
λ y.see(y, ι z.woman(z)) ∧ instrumentOf (see, ι x.telescope(x))
NP
ι x.spy(x)
d.
NP/N
λP ( et) .ι x[P x]
N
spy( et)
the
spy
(NP \S)\(NP \S)
λP et λ y.P y ∧ instrumentOf (P, ι xtelescope(x))
NP \S
λ x.see(x, ι z.woman(z))
(NP \S)/NP
λ yx.see e( et) (x, y)
saw
NP
ι x.woman(x)
NP/N
λP ( et) .ι x[P x]
N
woman( et)
the
woman
((NP \S)\(NP \S))/NP
λ xλP et λ y.P y ∧ instrumentOf (P, x)
with
4.4. APPLICATIVE CATEGORIAL GRAMMAR
S
see(ι x′ .spy(x′ ), ι z.woman(z)) ∧ instrumentOf (see, ι x.telescope(x))
NP
ι x.telescope(x)
NP/N
λP ( et) .ι x[P x]
N
telescope( et)
the
telescope
109
NP
ι x.spy(x)
e.
NP/N
λP et ι x.P(x)
N
spy et
the
spy
110
]]
S
see[ι x.spy(x), ι y.woman(y) ∧ has(y, ι x.telescope(x))]
NP \S
λ x.see(x, ι y[woman(y) ∧ has(y, ι x.telescope(x))]])
(NP \S)/NP
λ yxsee e( et) (x, y)
NP
ι y[woman(y) ∧ has(y, ι x[telescope(x)])]]
saw
N
λ y.woman(y) ∧ has(y, ι x[telescope(x)])
the
N
woman et
N \N
λP et λ y.P(y) ∧ has(y, ι x[telescope(x)])
woman
(N \ N)/NP
λ xλP et λ y.P(y) ∧ has(y, x)
with
NP
ι x.telescope(x)
NP/N
λP et .ι x[P(x)]
N
telescope( et)
the
telescope
CHAPTER 4. HIGHER-ORDER LOGIC
NP/N
λP et ι x[P(x)]
4.5. APPLICATIONS
111
Actually, example c is not quite correct but at least close to what would be
desirable. The reason why the semantic representation is strictly speaking not
correct is that according to the end result instrumentOf modifies see in general.
In reality, however, the telescope is only the instrument for watching the woman
in this particular situation at the given time described by the sentence. For a
good solution to this problem we need to introduce situations or events.
4.5 Applications
4.5.1 Verbs, Proper Names
Strictly speaking, in the present setting n-ary relations cannot be expressed
and need to be encoded by Schönfinkelizing them into multiple functions of
one argument. As mentioned earlier, we can, however, write P(x, y) for ((P x) y).
For all practical purposes we can use the logic as if it had relations directly in
the object language. That being said, the translations of verbs are like in the
previous chapter. Not taking into account tense, aspect, or sentence mood verbs
can be expressed as follows:
• an intransitive verb is represented by a unary predicate,
• a transitive verb is represented by a binary relation,
• a ditransitive verb is represented by a ternary relation, and
• in general a verb that requires n mandatory arguments is represented by
an n-ary relation.
The maximum number of obligatory arguments is not very high in natural
languages, 4-5 seems to be the maximum; these numbers depend on the criteria
chosen to determine when an argument is ‘obligatory’. You should consult
literature in lexical semantics for more (authoritative) information.
Once we also take into account possible worlds, situations, events, or contexts
the argument place of predicates respectively change. One or even two argument
places might be added to each predicate in such an intensional framework.
However, even in these frameworks adding additional argument places is not
always necessary and theory-dependent.
4.5.2 Generalized Quantifiers
Recall generalized quantifiers introduced in the first chapter. For example, the
truth conditions for the quantifying determiner ‘some’ could be expressed in set
theory in terms of conditions between a set for the quantifier restriction and
the quantifier body. ‘Some teachers are lazy’ can be expressed in terms of a
condition between the set of teachers and the set of lazy ‘objects’:
{ x | x is a teacher} ∩ { x | x is lazy} 6= ;
112
CHAPTER 4. HIGHER-ORDER LOGIC
We can now express generalized quantifiers directly in the object language,
meaning that we can directly derive the appropriate semantic representation
from the syntax and the lexicon. However, in the above examples the syntactic
category of an intransitive verb phrase like ‘giggles’ or, for what its worth, ‘are
lazy’ (simplified, of course) is NP \S and the corresponding semantic type was
et, i.e. a function taking an individual and yielding a truth value. These types
don’t work for generalized quantifiers, because for example the quantifier ‘all
students’ represents a set of students and not just one.
A solution is to ‘shift up’ the type of the generalized quantifier to actually
take the meaning of the verb phrase (instead of vice versa) and yield a sentencetype meaning with the correct truth-conditions. Since the type of the verb phrase
is et, the type of the generalized quantifier must be (et)t, i.e. a function that
takes a unary predicate of type et and yields a truth-value. Correspondingly, the
syntactic category of a generalized quantifier must be S/(NP \S). What about
quantifying determiners themselves then, i.e. expressions like ‘some’, ‘most’, ‘a’,
‘no’, or ‘all’?
Since the type of a countable noun like ‘cat’, ‘dog’, ‘teacher’, or ‘student’
is also et, a unary predicate expressing a property or a set of objects in the
extensional view, and its syntactic category in the present setting is N, the type
of a generalized determiner must be (et)((et)t), i.e. a function that takes a unary
predicate (the meaning of the noun) and yields a function that takes another
unary predicate (the meaning of the intransitive verb) and yields a truth value
as a meaning of the whole sentence. Correspondingly, the syntactic category
of a generalized determiner in languages like English or Portuguese must be
(S/(NP \S))/N: it takes a noun from the right and yields an expression that
consumes a verb phrase from the right to yield a sentence. Here are example
entries:
every := (S/(NP \S))/N : λP et λQ et .∀ x[P x → Qx]
et
(4.35)
et
et
(4.36)
some := (S/(NP \S))/N : λP λQ .∃ x[P x ∧ Qx]
no := (S/(NP \S))/N : λP λQ .¬∃[P x ∧ Qx]
a := like some
(4.34)
et
(4.37)
113
4.5. APPLICATIONS
❉ Example 13 Every dog barks.
S
∀ x[Dog(x) → Bark(x)]
S/(NP \S)
λQ et ∀ x[Dog(x) → Q(x)]
NP \S
λ x.Bark(x)
barks
(S/(NP \S))/N
λP et λQ et .∀ x[P x → Qx]
N
λ x.Dog(x)
every
dog
4.5.3 Generalized Quantifiers and the Finite Verb Phrase
A transitive verb can be handled in the same manner as an intransitive verb,
except that it first needs to combine with the direct object. This, however, means
that unless we provide for additional mechanisms we need to type-shift the type
of generalized quantifiers to account for their occurrence in the position of the
direct object. Assuming that the type of individuals e is the one we base our
entries for verbs on, the type of a transitive verb phrase must be e(et) and the
corresponding syntactic category is (NP \S)/NP.
We get the lexicon entries that are even further ‘shifted up.’ The generalized
quantifier in direct-object position takes transitive verb and applies the meaning
of the direct object NP to it. The result is an entry of category NP \S and type
et:
every := (((NP \S)/NP)\(NP \S))/N : λP et λQ e( et) λ y.∀ x[P x → Q yx]
some := (((NP \S)/NP)\(NP \S))/N : λP et λQ e( et) λ y.∃ x[P x ∧ Q yx]
et
no := (((NP \S)/NP)\(NP \S))/N : λP λQ
a := like some
e( et)
λ y.¬∃[P x ∧ Q yx]
(4.38)
(4.39)
(4.40)
(4.41)
To see how this works, let us take a look at an example in which only the direct
object is semantically represented by a generalized quantifier.
114
CHAPTER 4. HIGHER-ORDER LOGIC
❉ Example 14 Giacomo Casanova loves every woman.
S
∀ x[W oman(x) → Love(a, x)]
NP \S
λ y.∀ x[W oman(x) → Love(y, x)]
NP
a
Giacomo Casanova
(NP \S)/NP
λ yλ x.Love(x, y)
((NP \S)/NP)\(NP \S)
λQ e( et) λ y.∀ x[W oman(x) → Q(y, x)]
loves
(((NP \S)/NP)\(NP \S))/N
λP et λQ e( et) λ y[∀ x[P x → Q(y, x)]
N
λ x.W oman(x)
every
woman
4.5.4 Quantifier Scope Ambiguities
Ditransitive verbs like in ‘Many teachers give the book to every student’ require
quantifiers and quantifying determiners to be shifted up even higher.5 General
type shifting principles have been stipulated that yield the desired lexicon entries in a systematic way. However, there is another problem with the approach
caused by quantifier scope ambiguities. Consider the following sentence:
(4.42) Every sailor loves a woman.
In the first and prevalent reading, every sailor loves some woman (in a contextually restricted domain) but not necessarily the same one. In a second reading,
there is one woman, say Rosy, whom every sailor (in the contextually restricted
domain) loves. The above type-shifted lexicon entries only account for the first
reading:
∀ x[Sailor(x) → ∃ y(W oman(y) ∧ Love(x, y))]
(4.43)
To derive the second reading, ∃ needs to have scope over ∀:
∃ x[W oman(x) ∧ ∀ y(Sailor(y) → Love(y, x))]
(4.44)
It is possible to obtain this reading using only the mechanisms introduced so far
by giving the quantifying indefinite article ‘a’ in direct-object position a syntactic
category that basically consumes the whole rest of the sentence from the left.
5 Also notice the peculiar use of the definite determiner ‘the’ in this example, which cannot
be represented adequately in the Russellian way as a iota term or quantifier.
4.5. APPLICATIONS
115
The resulting syntactic and semantic composition is monstrous, though. Since
this kind of shifting is ad hoc and undesirable from a formal point of view many
other solutions to quantifier scope ambiguities have been explored:
• In the transformational and generative grammar tradition, the categorial
grammar is only used on the semantic side to approximate the semantic
representation to the actual syntax, which is based on the constituent
structure of the sentence. Readings like (4.44) can be obtained by transformation rules when deriving the logical form from an underlying syntactic
representation while keeping the amount of type shifting as minimal as
possible.
• In computational linguistics, algorithms have been developed that can be
used to derive all possible readings triggered by quantifier scope ambiguities – and not all readings one might naively think are possible are in
fact possible. These algorithms and their corresponding stack-based data
structures are known as Cooper storage and Keller storage.
• In the tradition of Type Logical Grammar, the above applicative categorial
grammar is only considered a fragment of the more expressive full-fledged
categorial grammar based on so-called Lambek calculus, which also allows
for assigning meanings to non-constituent expressions like ‘Every sailor
likes.’ Moreover, many more ways of combining meanings are available
that allow for resolving quantifier scope ambiguities more elegantly.
4.5.5 Outlook and Limits
Notice that problems like quantifier scope ambiguity concern the syntax–semantics
interface and solutions to these kinds of problems generally depend on the underlying syntactic theory. Semantic construction can look quite differently from
the perspective of theories like Chomsky’s Minimalism, Lexical Functional
Grammar, Tree Adjoining Grammar, Type-Logical Grammar and Combinatory
Categorial Grammar, or HPSG.
It is, however, reasonable to believe that practically all semantic phenomenona
can be represented adequately in higher-order logics, simply because these logics are so expressive. In particular, all sorts of modification can be expressed
adequately. It was for example mentioned in the last chapter that ‘famous’ in
‘famous pianist’ is not an intersective adjective. In a higher-order logic, ‘famous’
can be elegantly represented as a function (et)(et) from a function of type et
to a function of type et. Intensifiers like ‘very’ or ‘pretty’ as in ‘it’s very hot in
here’ can be analyzed by the same token: For example, ‘very’ can be considered
a function of type ((et)(et))((et)(et)), i.e. a function that takes an adjective and
yields an (intensified) adjective. While in many cases the same result can be
achieved with some trickery (keyword: reification) in first-order predicate logic,
the analysis in simple type theory is usually more natural and elegant.
116
CHAPTER 4. HIGHER-ORDER LOGIC
A
C
B
D
Figure 4.1: The Church-Rosser property.
4.6 Metatheorems
Higher-order logic differs in expressivity depending on the models that are
allowed. Generally speaking, very powerful concepts can be expressed in higherorder logic and models can be chosen accordingly. For example, one might pick
models in which the axiom of choice of set-theory is true or allow only those in
which it turns out false. Or one might allow only those models in which the
continuum hypothesis is true (or false, respectively). As one might imagine,
even seemingly small changes like that of going from standard to Henkin models
can have huge consequences, and a large part of research on higher-order logic
is intertwined with research on the foundations of mathematics. Proofs of
metatheorems are generally harder than for first-order logic and the properties
of a particular formulation of higher-order logic and models for it depend very
much on the detail.
Church-Rosser Theorem. According to the Church-Rosser theorem,6 when
the evaluation of a term A of the λ-calculus splits up into two paths then the
two resulting terms B and C will always be reducible to the same third term D.
This is also called the diamond property and illustrated by figure 4.1.
When a rewrite system has this property it is also said to be confluent: The rules
of the rewrite system guarantee that when there are two ways to proceed with
the rewriting (for example by order of rule application) then the two rewriting
‘paths’ will eventually flow together.
Normalization Theorems. According to the weak normalization theorem
every term of simply typed λ-calculus can be brought into a normal form. According to the strong normalization theorem no term of simply typed λ-calculus has
an infinite reduction sequence, i.e. no term requires infinitely many reduction
6 See Alonzo Church and J. Barkley Rosser. Some properties of conversion. Transactions of
the American Mathematical Society, vol. 39, No. 3. (May 1936), 472–482.
4.6. METATHEOREMS
117
steps in order to bring it into a normal form.
Taken together these theorems ensure that every λ-term can be brought into
a normal form in finitely many steps and the reduction does not suddenly go
astray or continue ad infinitum. It also means that there is an equivalence
between terms modulo variable substitution by α-conversion that can be used
to check syntactically whether two terms are equal.
Completeness and Incompleteness Results. Higher-order logic with standard models is incomplete. There are complete proof theories for versions of
higher-order logic based on simple type theory with General models (Henkin
1950).
Compactness and Lack of Compactness. Higher-order logic with standard
models is not compact. There are versions of higher-order logic based on simple
type theory with Generalized Henkin models that are weakly compact, where
‘weakly compact’ means compactness with respect to general models as opposed
to standard models (see Andrews 2002: Ch. 5).
Literature
The technical intricacies of higher-order logic are laid out in the following
seminal works:
• Church, Alonzo (1940). A Formulation of the Simple Theory of Types.
The Journal of Symbolic Logic, 5, 56-68. Reprinted in Benzmüller et. al.
(2008), 35-47.
• Henkin, Leon (1950): Completeness in the Theory of Types. The Journal of
Symbolic Logic, 15, 81-91. Reprinted in Benzmüller et. al. (2008), 49-59.
• Benzmüller, C.; Brown, C. E.; Siekmann, J. & Statman, R. (eds.) (2008).
General Models and Choice in Type Theory Reasoning in Simple Type
Theory. College Publications.
• Andrews, Peter B. (2002). An Introduction to Mathematical Logic and
Type Theory: To Truth Through Proof. Kluwer.
None of them is particularly easy to read. As a start, I would recommend
the book by Andrews to anyone who is seriously interested in mathematical
logic and type theory. Benzmüller et. al. (2008) contains reprints of the most
important articles on type theory.
The following works are first and foremost mentioned for historical interest.
They are targeted at mathematicians and not suitable for beginners:
118
CHAPTER 4. HIGHER-ORDER LOGIC
• Schönfinkel, Moses (1924). Über die Bausteine der mathematischen Logik.
Mathematische Annalen 92, 305–316. Translation: On the building blocks
of mathematical logic. In Jean van Heijenoort (1967). A Source Book in
Mathematical Logic. Harvard University Press, 355–66.
• Haskell Curry and Robert Feys (1958). Combinatory Logic I. North Holland.
• Haskell Curry, J.R. Hindley and J.P Seldin (1972). Combinatory Logic II.
North-Holland.
Moses Schönfinkel was a Russian mathematician who is known for his work on
combinatoric logic. According to Wikipedia, ‘His later life was spent in poverty,
and he died in Moscow some time in 1942. His papers were burned by his
neighbors for heating.’ (English Wikipedia entry for ‘Moses Schönfinkel’ of 201008-27) Curry also worked on combinatory logic and is considered one of the most
important contributors to the foundations of functional programming.
• Jon Barwise and Robin Cooper (1981). Generalized quantifiers and natural
language. Linguistics and Philosophy 4, 159-219.
• Andrzej Mostowski (1957). On a generalization of quantifiers. Fund.
Math. Vol. 44, 12-36.
• Barbara H. Partee, Alice ter Meulen, and Robert E. Wall (1990). Mathematical Methods in Linguistics. Springer.
• L.T.F. Gamut (1991). Logic, language, and meaning. Univ. of Chicago
Press.
• Irene Heim and Angelika Kratzer (1998). Semantics in a Generative
Grammar. Blackwell.
Generalized quantifiers go back to Mostowski (1957), have been used by Montague (1974) in The Proper Treatment of Quantification in English (PTQ), and
have been investigated systematically by Barwise and Cooper (1981). Introductions can be found in ascending order of difficulty and detail in Heim and
Kratzer (1981), Partee et. al. (1990), and Gamut (1991). By the way, the name
‘L.T.F. Gamut’ is a pseudonym for the collective of authors Johan van Benthem,
Jeroen Groenendijk, Dick de Jongh, Martin Stokhof and Henk Verkuyl – an
impressive collection of famous logicians.
• Richmond Thomason (ed.) (1974). Formal Philosophy. Yale University
Press.
• David R. Dowty, Robert E. Wall and Stanley Peters (1981). Introduction to
Montague Semantics. Kluwer.
4.6. METATHEOREMS
119
The influence of Richard Montague’s work on natural language semantics cannot
be underestimated.7 The papers that are most important for linguists can be
found in Thomason (1974). They are short, but very dense and presume a high
level of technical expertise. For this reason Montague’s work was mostly spread
by some of his scholars such as Richmond Thomason, Barbara Partee, and David
Dowty. Dowty et. al. (1981) is still the standard introduction to Montague
Semantics and a good place to start.
Here is some of the seminal literature on type-logical grammar and combinatory categorial grammar mentioned above.
• Bob Carpenter (1997). Type-Logical Semantics. MIT Press.
• Michael Moortgat (1997). Categorial Type Logics. In Johann van Benthem
and Alice ter Meulen (eds.). Handbook of Logic and Language. MIT Press,
93-178.
• Glynn Morrill (1994). Type Logical Grammar: Categorial Logic of signs.
Kluwer Academic Publishers.
• Mark Steedman (1996). Surface Structure and Interpretation. MIT Press.
• Mark Steedman (2000). The Syntactic Process. MIT Press.
Carpenter (1997) is a very good introduction to type-logical grammar. It covers
everything that has been covered in this chapter in detail in the first few
chapters, introduces Lambek calculus using a sequent calculus and a natural
deduction system, and then discusses the semantic modeling of a vast range
of linguistic phenomenas. Moortgat is a survey handbook article on categorial
grammars in general and as such a good and formally rigid reference but not
suitable for beginners. Morrill (1994) is a bit older but definitely worth reading.
He starts from Montague’s logic IL and quickly proceeds to more advanced
topics; the book is very dense. (Morrill (2010) is not yet available at the time
of this writing.) The books by Steedman are easy to read and intended for
both beginning and experienced linguists interested in Combinatory Categorial
Grammar. Steedman (2000) is more detailed and a good starting point.
7 He was murdered in 1971; the killer has never been found.
Solutions to Exercises
Chapter 1
Exercise 1, page 9:
a. {1, 2, 3, 4, 5, 6}
b. {{1, 1}, {1, 2}, {1, 3}, {1, 4}, {1, 6}, {2, 2}, {2, 3}, {2, 4}, {2, 5}, {2, 6}, {3, 3}, {3, 4}, {3, 5},
{3, 6}, {4, 4}, {4, 5}, {4, 6}, {5, 5}, {5, 6}, {6, 6}}
Note: Apart from these 20 unordered outcomes there are also 36 possible
ordered outcomes of a throw of two standard dice. The case with 36 ordered
pairs as an outcome is relevant for the calculation of the probability of an
outcome.
Exercise 2, page 10:
a. { n ∈ N | there is a k ∈ N s.t.such that n = 2k + 1}
or, using the remainder function mod frequently available in programming
languages:
{ n ∈ N | n mod 2 = 1 and n > 3}
b. { X | X ⊆ S }
Or simply: P (S)
c. { x | x is a blue sports car in Lisbon on October 24, 2014}
Note: Using the indexical ‘today’ is not precise enough in a definition. Depending on the application, you would probably build this set out of sets
representing the parts, e.g. the union of the set of objects in Lisbon on a
certain day with the set of blue objects and with the set of sports cars.
121
d. { X | X ⊆ A and there is a Y ⊆ A such that (X ∩ Y ) 6= ;}
Exercise 3, page 10:
a. (A ∪ B) ∩ C = {1}
b. (A ∩ B) ∪ C = {1, 3, 4, 5, 9}
c. (A \C) ∩ B = {3, 4, 5}
d. (C \ A) ∪ ((B ∩ A) ∪ ;) = {9, 3, 4, 5}
Note: A ∪ ; = A for any A
Exercise 4, page 11:
a. It doesn’t make sense, because sets are not ordered.
b. Depending on the application it makes sense. Suppose you want to model an
inventary of fruits.
c. Let A be the set of employees, B be the set of union members, and C be the
set of persons that get a higher salary. Then: (A ∩ B) ⊆ C
d. Let A be the set of students and B be the set of workers.
‘Há estudantes que trabalham’: (A ∩ B) 6= ;
‘Há estudantes que não trabalham’: (A ∩ B) 6= ;
e. Let A be the set of students and B be the set of workers.
‘Todos estudantes trabalham ou não trabalham’: A ⊆ (B ∪ B)
f. It is always true under the given analysis, because A ⊆ (B ∪ B) is always true.
Notice, however, that for the presuppositional reading of the quantifier ‘todos’
we could add the restriction that A 6= ;; under that presuppositional reading
the sentence would be false if A = ;.
g. (A ∩ B) ∩ C 6= ;
Note: The order does not matter, we could have written A ∩ (B ∩ C) 6= ; and
in fact the parentheses could be left out in this case.
h. Yes, because the empty set is a subset of any set.
Exercise 5 on page 12:
a.
122
b.
e.
f. This does not hold, since a ∈
A and a ∈ B.
It is easy to
see this from the diagram for B:
c.
d.
g. This does not hold, since A 6= B.
Exercise 6, page 12:
a.
Note: This does not hold in general. It only holds when 1.) A ⊆ B, because
then A ∩ B = A and thus A ⊆ A, or when 2.) A = ;. The Venn diagram depicts
the first case.
b.
123
c.
Note: The grey area depicts A ∪ B. It is apparent from the picture that
everything that is not in the grey area, i.e. the ‘complement’ of the grey area,
is exactly A ∩ B.
Exercise 7, page 13:
a. {;, {1}, {2}, {2, 1}}
b. {;}
c. {;, { c}, {a}, {a, c}}
Exercise 8, page 13:
a. | A ∩ B| ≥ 5
b. | A ∩ B| = 1
c. | A ∩ B| ≤ 3
d. ‘not one’ has (at least) two readings:
i. A ∩ B = ; (read as ‘no. . . ’)
ii. | A ∩ B| 6= 1 (read as ‘it is not the case that exactly one. . . ’)
Exercise 9, page 19:
a. Let R(x, y) have the reading x likes y. R = {〈 d, a〉, 〈a, b〉, 〈a, d 〉}. Let A = {a, d }
be the animate objects in the domain D = {a, b, c, d }. We stipulate that R ⊆
A × D, i.e. the first argument of R must be animate.
b. Let P ⊆ D × D have the reading x belongs to y. In the given example, P =
{〈a, c〉, 〈 d, b〉}.
Exercise 10, page 19:
a. transitive
b. transitive
124
c. not transitive
d. transitive (identity is an equivalence relation)
e. probably not transitive in general
Note: This case is controversial and it depends on what kind of similarity one
has in mind. In cases of so-called Sorites paradoxes similarity does not seem
to be transitive. Take for example, having a similar color. Color a might be
similar to color b and b similar to c, but perhaps a is no longer considered
similar to c. (Think of a smooth transition from red to orange.)
f. not transitive (the mother of the mother of x is the grandmother of x)
g. transitive (this is the identity relation, which is an equivalence relation)
Exercise 11, page 19:
a. not reflexive, symmetric, not antisymmetric, not Euclidean, not transitive
b. reflexive, symmetric, not antisymmetric, Euclidean, transitive
Note: ‘sameness’ is understood in the sense of an equivalence relation, but
there might be weaker readings that are not Euclidean and not transitive.
c. not reflexive, not symmetric, antisymmetric, not Euclidean, transitive
d. not reflexive, not symmetric, antisymmetric, not Euclidean, transitive
e. reflexive, not symmetric, antisymmetric, not Euclidean, transitive
f. not reflexive, not symmetric, not antisymmetric, not Euclidean, not transitive
g. not reflexive, symmetric, not antisymmetric, not Euclidean, not transitive
Exercise 12, page 19: Any relation based on a strict and precise understanding
of ‘sameness’ is an equivalence relation, e.g.: x has the same age as y, x has the
same birthday as y, x and y have the same number of pets, x and y have the
same number of children. Identity is also an equivalence relation.
Exercise 13, page 20:
a. Yes, it even must contain cycles, because a preorder is reflexive. Hence, in
the graph representation of the relation every node points to itself, which is a
cycle.
b. No, the depicted relation is not a preorder. By transitivity from R(a, b) and
R(b, c) it would follow that R(a, c), but there is no link from a to c in the
picture (only in the opposite direction).
Exercise 14, page 26:
125
a. total, not surjective, injective since N denotes the set of positive integers (not
injective if we include negative numbers)
b. total, not surjective, injective
c. total, not surjective, injective
Note: not total when 0 is included because 1/0 is not defined (whether N
contains 0 or not varies from author to author; usually it doesn’t)
d. total, not surjective, not injective
e. total, not surjective, not injective
f. total, surjective, injective, bijective (identity function)
g. total, not surjective, injective
h. total, surjective, not injective
i. not total, not surjective, not injective
Exercise 15, page 26:
a. f (x) = 2x
b. There is strictly speaking no inverse function, because the square root can
be both positive and negative (−22 = 4 and 22 = 4). By convention often the
p
positive square root f (x) = x is taken as the inverse of the power function.
c. f (x) = x
d. f (x) = x2
e. not injective, so the inverse is a relation not a function
f. f = {〈Thomas, Ana〉, 〈Peter, Teresa〉, 〈Maria, Klaus〉}
Exercise 16, page 26:
a. not a function; inverse: the function from the bearers of a Turkish proper
names to their name (in the ideal case where every name bearer has only one
name)
Note: At least unofficially people can have two names, and then there is no
inverse function. The set of Turkish proper names might not be well-defined.
b. not a function; the relation between a Portuguese sentence to its possible
translations is not a function; the inverse relation is also not a function
c. function; inverse: the function from all passport numbers to the respective
owner
126
d. not a function; the relation between a grammatically-well formed English
sentence to its meaning is (usually) one to many
e. not a function; one owner can have many dogs
Exercise 17, page 27: Let R = {〈Ana, Ana〉, 〈Pedro, Pedro〉, 〈Mustafa, Mustafa〉,
〈Joe, Joe〉, 〈Lisa, Lisa〉, 〈Ana, Pedro〉, 〈Ana, Mustafa〉, 〈Pedro, Ana〉, 〈Pedro, Mustafa〉,
〈Pedro, Joe〉, 〈Mustafa, Joe〉, 〈Mustafa, Lisa〉, 〈Lisa, Ana〉, 〈Lisa, Pedro〉, 〈Lisa, Mustafa〉,
〈Lisa, Joe〉}. Then:
(
1 if 〈 x, y〉 ∈ R
f (x, y) =
0 otherwise
Exercise 18, page 27: 1.) f (x) = 3x, fixed point 0: f (0) = 0. 2.) f (x) = x x , fixed
point 1: f (1) = 11 = 1.
Exercise 19, page 27:
a. f (x) =
(
1 if x is a native speaker of German,
0 otherwise
b. 1 A (x) =
(
1 if x = 1, x = 0, or x = −1,
0 otherwise
c. f (x) =
(
1 if x = 〈a, b〉 such that
d. f (x) =
(
1 if x is a raining event,
a says ‘Hi!’ to b,
0 otherwise
0 otherwise
e. A := {1, 2,
( 3, 4, 5}
1 if x ∈ A,
1 A (x) =
0 otherwise
Note: The original formulation of A was deliberately obfuscated and B is not
needed at all.
(
1 if x ∈ A or x ∈ B,
f. 1( A ∪B) (x) =
0 otherwise
(
1 if x ∈ {a, b, c, d, f },
Or: f (x) =
0 otherwise
1 if x = 〈a, b, c, d, e〉 such that
g. f (x) =
a buys b from c at price d at time f ,
0 otherwise
127
h. f (x) =
(
1 if x is a raven and x is not black,
0 otherwise
Exercise 20, page 27:
a. function
f. not a function
b. function
g. not a function
c. not a function
h. function
d. function
e. function
i. function
Exercise 21, page 28:
a. no, it’s a relation
b. no, it’s a relation
Chapter 4
Exercise 39, page 104:
β
a. (λ x.x)a ⇒ a
β
β
β
b. (((λ x yz(P z y x)c)a)b) ⇒ ((λ yz(P z yc)a)b) ⇒ (λ z(P zac)b) ⇒ (P bac)
β
β
c. (((λ xλ y.(R x y) ∧ R( y x)) a) b) ⇒ ((λ y.(R a y) ∧ (R y a))b) ⇒ (R a b) ∧ (R b a)
β
α
d. (((λP λ x.P x∨¬P x)(λ x.x = x)) a) ⇒ ((λP λ y.P y∨¬P y)(λ x.x = x) a) ⇒ ((λ y.(λ x.x =
β
β
x)y ∨ (¬(λ x.x = x)) y) a) ⇒ ((λ y.y = y ∨ ¬(y = y)) a) ⇒ a = a ∨ ¬(a = a)
β
α
e. (((λP eet x.P x)(λ xλ y.x = y))a) ⇒ (((λP z.P z)(λ xλ y.x = y)) a) ⇒ ((λ z.(λ xλ y.x =
β
β
y)z)a) ⇒ ((λ zλ y.z = y) a) ⇒ λ y.a = y
η
f. λ x(Pa) ⇒ Pa
β
α
β
β
g. (λP(P xa))(λ xλ y.Qx y) ⇒ (λP(P za))(λ xλ y.Qx y) ⇒ (((λ xλ y.Qx y)z)a) ⇒ ((λ y.Q z y)a) ⇒
Q za
β
β
h. (((λPQ.∀ x[P x → Qx])Q 0 )P0 ) ⇒ (λQ.∀ x[Q 0 x → Qx])P0 ⇒ ∀ x[Q 0 x → P0 x]
128
Index
Łukasiewicz, 35
for PC, 40
belief, 88
Benthem, 88
beta conversion, 102
beta reduction, 103
biconditional, 32, 38
biimplication, 32
binary numbers, 62
bisubjunction, 32
Bohr, 33
Bourbaki, 3
branch
closed, 48
open, 48
abduction, 58
abstraction, 103
actualism, 84
adder, 62
adjunction, 32
adverb, 90
affirming the consequent, 56
Ajdukiewicz, 105
aleph null, 6
all, 5, 7
alpha conversion, 102, 104
analysis
logical, 42
anaphora, 87
Andrews, 95, 117
argument
deductive, 50, 55, 91
good, 57
scheme, 55
arity, 13
assignment, 70
axiom of choice, 116
axiom system, 45
calculus, 102
Cantor, 6
cardinality, 6, 94
Carpenter, 119
Cartesian Product, 14
categorial grammar, 105, 119
characteristic function, 25
characterizability, 83
Chomsky, 115
Church, 22, 97, 98, 117
Church-Rosser theorem, 116
Combinatory Categorial Grammar,
115, 119
Compactness
Bar-Hillel, 105
Barwise, 118
base
129
of PC, 61
compactness, 97
of FOL, 94
of HOL, 117
complement, 6
completeness
of FOL, 94
of HOL, 97, 117
of PC, 61
compound category
in CG, 105
concatenation, 105
conditional, 32, 38, 57
converse, 39
confluency, 116
conjunction, 32, 38, 103
asymmetric, 89
of NPs, 90
connective, 32, 66
consistency, 43
constant
propositional, 31, 71
contingency
logical, 42
continuum, 6
continuum hypothesis, 6, 116
contradiction, 43, 57
contraposition, 55
Cooper, 118
corollary, 46
countable, 6, 8, 94
counter-model, 48, 79
counterfactual conditional, 90
credibility, 58
critical thinking, 63
Curry, 118
definition, 34
recursive, 3
DeMorgan, 50
denumerable, 6, see countable
denying the antecedent, 56
description, 3, 82
diamond property, 116
difference, 6
disjunction, 32
exclusive, 32, 38
inclusive, 32, 38
domain
non-empty, 71
enumeration, 2
equivalence, 32
of wffs, 43
vs. biconditional, 43
equivalence relation, 17
eta conversion, 102, 104
eta reduction, 103
event, 72
every, 83
ex falso quod libet, 57
exclusive OR, 32
existence, 55, 84
expletive it, 71
expressive power, 83
extension, 4, 14, 15
extensionality principle, 4
factivity, 89
fallacy, 56
Falsum function, 39
Feys, 118
first-order modal logic, 89
Frege, 4, 28
function, 20
bijective, 24, 25
characteristic, 25
indicator, 25
injective, 23, 24
inverse, 24
partial, 20
decidability
of PC, 61
deduction
natural, 45
deductive closure, 51
definability, 83
definite description, 82
130
lambda calculus, 22, 102
Lambek calculus, 119
Leibniz’ Law, 73, 101
lemma, 46
Leo-II, 102
Lewis, 105
Lexical Functional Grammar, 115
logic
‘informal’, 63
logical consequence, 58
Löwenheim-Skolem theorem, 94
surjective, 23
total, 20
functional application, 107
future, 88
God, 55, 84
hammer, 42
Heim, 28
Henkin, 100, 117
Hodges, 63, 95
horseshoe, 32
HPSG, 115
main junctor, 34, 40
many-sorted logic, 84
material equivalence, 32
material implication, 32, 57
maximum, 56
membership, 5
metatheorem, 46
minimalism, 115
minimum, 56
modal logic, 89
model
generalized Henkin, 100, 116
intended, 44
of FOL, 71
of HOL, 100, 116
standard, 100
modification, 90, 115
modus ponens, 55
modus tollens, 55
monotonicity, 57, 59
Montague, 85, 89, 119
Moortgat, 119
Morrill, 119
most, 8
Mostowski, 28, 118
murder, 119
identity, 44, 73
between sets, 4
identity of indiscernibles, 73
if and only if, 32
iff, 32
implication, 32
indexical, 87
indicator function, 25
indiscernibility of identicals, 73
induction, 58
infinite chain, 56
infinity, 6
intension, 4
intensionality, 88
interdefinability
of quantifiers and identity, 101
of truth functions, 39, 57
intersection, 5
introspection principle, 89
inverse function, 24
inverse relation, 15
iota operator, 82, 86
iota quantifier, 82, 86
Isabelle, 102
junctor, 32
junctor main, 34
name, 86
natural deduction, 45
negation, 32, 38, 41
double, 50
negative introspection, 89
KK principle, 89
Kratzer, 28
Kripke, 86
131
propositional attitude, 89
prove, 48
Newton, 42
no, 7, 8
non-classical logic, 57
nonconditional, 39
normalization theorem, 116
notation
polish, 35
numerals, 8
quantification
first-order, 73
higher-order, 83
relativized, 83
second-order, 73
vacuous, 69
quantifier, 5
body, 111
existential, 75
first-order, 86
generalized, 7, 28, 111–114, 118
in FOL, 66
relativized, 83
restricted, 83
restriction, 5, 111
universal, 75, 76
quantifier domain restriction, 83
quantifier scope ambiguity, 114–115
quasi-order, 16
Quine dagger, 32
one-on-one, 23
onto, 23
ontological proof, 55, 84
open formula, 69
order
partial, 17
preorder, 16
quasi-order, 16
total, 17
ordered pair, 13
ordered tuple, 13
paradox of the material implication,
57
Partee, 28
partial order, 17
partiality, 82
past, 88
Peirce stroke, 32, 38
plausibility, 58, 59
Polish notation, 35
positive introspection, 89
possibilism, 84
possibility
logical, 41
powerset, 8
predication, 73
preference, 20
preorder, 16, 20, 56, 59
presupposition, 5
projection, 39
proof theory, 45
proper name, 86
proposition
in mathematics, 46
recursion, 3
reduction, 103
reduction ad absurdum, 48
reification, 115
relation, 13
antisymmetric, 16
asymmetric, 16
equivalence, 17
Euclidean, 16
irreflexive, 16
part-of, 17
reflexive, 16
symmetric, 16
total, 16
transitive, 16
rewrite system, 102
Russell, 82
satisfiability, 41, 43
Schönfinkel, 118
Schönfinkelization, 99, 101
132
of FOL, 79
of PC, 52
theorem prover, 102
theory, 44
three, 8
time interval, 88
todos, 7
total order, 17
total relation, 16
TPS, 102
transitive verb, 104
tree
closed, 48
complete, 48
incomplete, 78
Tree Adjoining Grammar, 115
truth conditions, 85
truth function, 39, 44, 57
truth in a model, 72, 100
truth preservation, 57
truth table, 44
truth-functionality, 89
two-sorted logic, 84
type, 97
Type-Logical Grammar, 115, 119
type-shifting, 112–114
scope
of a quantifier, 69
semantics
lexical, 111
of FOL, 70
of PC, 36
semidecidability, 94
sequent calculus, 45
set, 1
abstraction, 2
empty, 3
Sheffer stroke, 32, 38, 40
situation, 72
Smullyan, 94
some, 7
soundness
of a premise, 58
of an argument, 55
of FOL, 94
of PC, 61
Steedman, 119
subject
logical vs. grammatical, 71
subjunction, 32
subset, 5
syllogism, 61
syntactic category
in CG, 105
syntax
of FOL, 65, 68
of HOL, 97
of PC, 31, 33
union, 5
validity, 41, 43
variable, 98
assignment, 70
binding, 69, 73
free vs. bound, 68
reuse, 69
variant, 70, 73
Venn diagram, 9
verb, 85
ditransitive, 111, 114
inransitive, 111
transitive, 111, 113
Verum function, 39
VP-conjunction, 103
tableaux, 45
for FOL, 74–79
for PC, 46–50
Tarski, 63
tautology, 41, 43
tense, 72, 87
ter Meulen, 28
term, 66, 98
compound, 98
ground, 66
theorem, 46
Wall, 28
133
wff
of FOL, 66
of PC, 33
134