Bayesian Cognition Explored
Bayesian Cognition Explored
Indeed, the very title of Bernoulli’s4 seminal book After providing a brief overview of Bayesian
Ars Conjectandi, ‘The Art of Conjecture’, embodies inference, in this rest of this article we survey some
the idea that probability captures how people actually of the burgeoning research applying Bayesian mod-
make conjectures; as well as providing a calculus for els to cognition and perception. Seven sections cover
helping people to make conjectures more accurately. Bayesian models in Perception, Categorization, Learn-
Thus, one important strand in the development of ing and Causality, Language Processing, Inductive
probability theory viewed it directly as a theory of Reasoning, Deductive Reasoning, and Argumentation.
thought, as well as a helpful mathematical calculus.
The probabilistic approach can be adopted
at three different levels, corresponding to Marr’s5 BAYESIAN INFERENCE
three levels of explanation. Computational level
From a probabilistic standpoint, beliefs are a matter
explanation aims to specify the nature of the problem
of degree. Each hypothesis, Hi , can be associated with
that the brain faces: the goals of the system and
a degree of belief P(Hi ); and very modest consistency
the structure of the environment in which these
constraints require that these degrees of belief must
goals must be achieved. At the computational level,
obey the laws of probability. Thus, the probability
then, probabilistic methods are used to specify the
distribution over the various Hi can be viewed as
problem that the brain faces. Thus, learning to control
characterizing prior beliefs. Suppose that Hi has
an arm, or use a language, might be viewed as
implications for the data we expect to encounter
problems of probabilistic inference, given certain prior
(e.g., Hi states that the floodlights are on; which if
assumptions; and in the light of data gleaned from
true, makes sense sensory inputs—roughly, the bright
experience. Modern engineering, machine learning,
ones—more likely than others). These implications
and artificial intelligence typically view a wide range
can be captured by P(D|Hi ), the probability of the
of information processing problems faced by the brain, data, given the hypothesis. In the light of D, we need to
from motor control, to speech perception, to object update the priors P(Hi ) to P(Hi |D), the probabilities of
recognition from this probabilistic perspective. the hypotheses, given that the data is known. A simple
Algorithmic level explanation requires specify- identity of probability theory, Bayes’ theorem, shows
ing the representations and computational operations how this can be done:
over those representations that constitute cognition.
Even if the brain faces probabilistic challenges, it may P(D|Hi )P(Hi )
be that it solves them, using some set of heuristics P(Hi |D) =
P(D)
or approximations which do not involve actually
carrying out probabilistic calculations. On the other The probability of the data P(D) is not, of course,
hand, though, the modern technology of probabilistic known independently of the hypotheses that might
inference, as explored in state-of-the-art engineering generate that data—so in practice P(D) is typically
and artificial intelligence systems, does provide a rich expanded using the probabilistic identity:
set of hypotheses about human cognition. Cognitive
science is, after all, a process of reverse engineering; P(D) = P(D|Hj )P(Hj )
and reverse engineering inevitably draws on the best j
engineering solutions to the information processing
problems that the brain faces. Because of the centrality of the problem of updating
Finally, even if the brain is probabilistic at the beliefs in the light of new information, Bayes’ Theorem
computational and algorithmic levels, this does not has very broad application, so much so, indeed, that
necessarily imply that it is probabilistic at the third the interpretation of degrees of belief in terms of
of Marr’s levels of explanation, the implementational probabilities is often known as the Bayesian approach.
level. Indeed, probabilistic algorithms used in speech If we quantify ‘degrees of belief’ numerically, as
engineering or computer vision run on the binary logic the Bayesian approach presupposes, why should the
of digital computers. But some neuroscientists have laws of probability theory, rather than some other
begun to conjecture that the brain may be probabilistic principles, define the calculus of degrees of belief?
at its very foundations—that individual neurons From the point of view of cognitive science, there
may convey probabilistic information, that neural are two strong arguments for adopting a probabilistic
populations may capture probability distributions, approach. The first, mentioned above, is that violation
that basic neural processes might be understood of the laws of probability leads to paradoxical
as directly carrying out elementary probabilistic conclusions. Indeed, the laws of probability can be
inference.6 derived from a variety of plausible, modest, but
very different assumptions concerning how degrees rule.10 More recently, this perspective has become
of belief should behave. Perhaps the best known increasingly influential throughout the brain and
such derivation is the Dutch book theorem,7 which cognitive sciences, as well as in computer vision.
shows that, under fairly general conditions, gamblers Moreover, the Bayesian approach is consistent with
whose degrees of belief violate the laws of probability a broader tradition in perceptual research, the idea
will happily accept a combination of bets which are, that perception is analysis-by-synthesis.11 That is,
nonetheless, guaranteed to lose money, whatever their the perceptual data is presumed to be analyzed
outcomes—which appears to be an unequivocally (i.e., calculating P(H|D)) from a knowledge of the
irrational choice. This type of argument suggests perceptual data that would be generated by various
that, given that brains reason spectacularly well about possible scene interpretations (i.e., from a knowledge
uncertainty, it is unlikely systematically depart from of P(D|H), and of course a prior distribution
the norms of good probabilistic reasoning by too P(H) over the hypotheses concerning the scenes)—a
much—any good uncertain reasoner is, the argument transformation which requires the application of
might go, to some degree a good Bayesian, that is, Bayes’ theorem. In practice, the process of finding
probabilistic, reasoner. an interpretation from which the perceptual data
In addition to this a priori line of argument, and can reasonably be generated requires a combination
perhaps more persuasive from the point of view of the of bottom-up and top-down perceptual inferences,12
practicing neuroscientist and cognitive scientist is that a process that can be captured computationally by
the Bayesian approach is widely used in engineering recent methods such as Data-Driven Markov Chain
approaches to solving the types of problem faced by Monte Carlo.13 Thus, the Bayesian approach to
the brain. Thus, the fields of computer vision, speech perception requires that the perceptual system is
recognition, computational linguistics, robotics, able to generate sensory input, as well as being
machine learning, information retrieval and expert able to perceive it; and hence provides a natural
systems, and many more, have seen a dramatic explanation of the existence of imagery, consistent
upsurge in the application of probabilistic methods. with some existing psychological theories,14 and with
To the extent that the project of understanding the experimental data indicating the influence of top-
mind/brain is reverse engineering, that is, attempting down perceptual processes.15
to find the engineering principles that underpin neural Bayesian models of perception have been
and cognitive function, then any credible scientific subjected to direct experimental test in a number of
theory has to be good engineering; and the Bayesian domains (e.g., the integration of sensory cues16 ). And
approach seems plausibly to pass this test. a wide variety of computational models of empirical
Below, we briefly describe the Bayesian approach findings in perception have been put forward, ranging
to cognition in a number of domains, ranging from low-level image intepretation,17 shape from
from perception to learning about causal relations, shading,8,18 and shape from texture,19 to boundaries
to Bayesian models of higher-level reasoning and interpolation.20,21 There has also been an explosive
argumentation. growth in theories in the field of computational neu-
roscience which view specific neural mechanisms as
carrying out probabilistic computations, from lateral
PERCEPTION inhibition in the retina,22 to the activity of single
From a computational level perspective, the problem cells in the blow-fly,23 or to populations of neurons
of perception is that of inferring the structure of including the accumulation of sensory evidence.6
the world from sensory input. This problem may Indeed, it turns out that a large class of
seem to be ill-posed, because any given sensory input apparently non probabilistic models of perception can
may have been generated by an infinity of possible also be accommodated into the Bayesian framework.
states of the world.8 From a probabilistic perspective, A long tradition in perception, often viewed as stand-
the infinity of possible interpretations is not in itself ing in direct opposition to the Bayesian approach, is
problematic. Rather, the challenge of probabilistic based on simplicity: the perceptual system is assumed
inference in perception is to assign probabilities to to choose an interpretation of sensory input that pro-
each of these possible interpretations, based not only vides a briefest encoding of the sensory data. Here, the
on sensory input itself, but prior knowledge. This is a starting point for the perceiver is a coding language:
problem of Bayesian inference par excellence. a representational system in which scenes, and the
The Bayesian approach in perception has its sensory inputs that they deliver, can be represented.
beginnings in Helmholtz’s9 notion of ‘unconscious According to simplicity-based explanations, for exam-
inference’, although he did not explicitly use Bayes’ ple, Gestalt principles, such as common fate (grouping
objects with the same movement together, such as a new item, an agent may postulate a new category;
flock of birds) or good continuation (assuming align- and therefore that the number of categories may
ment between items, even when occluded, typically grow, perhaps unboundedly, as the number of items
indicates they should be grouped, or perhaps part of categorized increases. This type of ‘nonparametric’
the same object, for example when the outline of an categorization model is widely used in Bayesian
animal is seen through dense foliage), arise because models of categorization, from Anderson41 through
of a preference for simple codes—codes which specify to Griffiths et al.42 and Goodman et al.43 . Exemplar
a single motion direction for the entire flock, rather models can then be seen as a limiting case of this class
than for each bird individually; or specify the position of model.44
of a single occluded object, rather than independently Viewing the problem of categorization as a
coding the positions of each object fragment. Yet matter of probabilistic inference provides more than
it turns out that simplicity-based approaches to an interesting notational variant of initial non
perception24–31 are mathematically equivalent to the probabilistic formulations. On the one hand, it
Bayesian approach, under mild conditions.32 The provides a fresh perspective on the explanation for
choice of coding language can be viewed as implicitly classic psychological data. So, to take a simple
specifying a prior probability distribution—such that example, the finding that people are usually able
items that have a brief representation in the language to classify more typical category members more
have relatively high prior probability. rapidly than less typical category members34 has a
natural interpretation: that the features of prototypical
items provide more unequivocal evidence for the
CATEGORIZATION specific category membership than do less prototypical
items; and hence fewer such features needs to be
Understanding perceptual input involves the creation
processed, on average, for a category judgment
of categories. Categorization allows generalization
to be made reliably. Moreover, the probabilistic
from one category member to another; and also allows
framework provides a starting point for a wide
the formulation of abstract relations defined over
range of generalizations, which may take account
categories, rather than concrete items. From a formal
of the fact, for example, that a single item may be
point of view, categorization is an aspect of high-level
a member of multiple categories45 ; that the prior
perception, where categorization of the items is in the
assumptions that underpin categorization may be
scene is just one of many pieces of information that
powerfully influenced by background theories46 ; or
must be recovered from sensory input. In cognitive
that the relative importance of different features, and
psychology, early theories of categorization focused
even the choice of appropriate features, may itself
on supervised categorization—that is, learning a
depend on the category being considered, and have to
category from a set of examples, labeled with
be learned.45
their category. The two main theoretical approaches
both focused on similarity between the item to be
classified to a prototypical category exemplar,33,34 or
alternatively to one of a set of category exemplars.35
LEARNING AND CAUSALITY
While initially formulated in probabilistic terms, both Conditioning in animals has traditionally been
types of theory have increasingly been formulated conceived as a matter of the formation of associations,
from a Bayesian point of view.36–40 Roughly speaking, which might be presumed to form on the basis of, for
the prototype view of categorization can be viewed as example, the constant conjunction of two events, or
assuming that categories corresponds to the Gaussian their spatial and temporal proximity. Nonetheless, a
(or similar) blobs, which may potentially overlap, in wide variety of empirical findings has indicated that
some feature space; and the problem of categorization the animal may be viewed as an intelligent problem
is to work out, given an item, the probability solver,47 attempting to figure out the structure of the
distribution over the Gaussian blobs that may have world, from available contingency data. Thus, for
generated it. According to the simplest formulation, example, the discovery of blocking,48 that once an
we assume that the participant is certain that the new animal has learned that an outcome is predicted by
item is generated by one of the previous encountered one cue, it is less liable to associate that outcome
categories; but in reality, of course, it is possible when the second cue is added; however reliable that
that a new item is generated by a category that has second cue may be, may be suggest that the animal
not been previously encountered. Thus one extension already has an ‘explanation’ of the outcome; and
of the prototype approach, from a probabilistic hence no further explanation, for example, in terms
point of view, is to allow that, in response to a of the second cue, is required. To the extent that
the animal is regarded as making inferences about going off many hours later, in view of our belief that,
the structure of the environment from observed data had the button not been pressed, the alarm would not
concerning the arrival of lights, tones, food pellets, have sounded. On the other hand, we do not assume
or shocks, the problem that the animal faces appears that alarm clock sounding is caused by the chiming
closely analogous to the general problem of scientific of the church clock next door, even if this regularly
inference, and hence to be naturally modeled with a occurs very few seconds before, because we know that
Bayesian framework.49–51 From this point of view, if some intervention occurred to stop the church clock
well-known conditioning phenomena, such as that is chiming, the alarm sounds nonetheless. It turns out
a contingency that has been reliably reinforced is that it is possible to construct a calculus of causal
extinguished more rapidly than a contingency that has intervention within a probabilistic framework55,56 ;
been partially reinforced, has a natural probabilistic and there has been recent experimental work attempt-
explanation. If a contingency is typically reliable, then ing to determine how far this framework can provide
after a few ‘extinction’ trials, there is already strong a useful model of human causality judgments, when
evidence that the state of the world has changed and intervention is allowed.54
that the strong tendency is no longer in operation; Finally, there has been a very promising line of
on the other hand, if the contingency is initially research in cognitive development, exploring Bayesian
unreliable, then a few such trials are to be expected network models of contingency learning, causal
by chance, in a case, and hence the animal will be learning, and learning from intervention, throughout
slower to reach the conclusion that the world has development.57 For example, Gopnik et al.57 discuss
changed, and that the contingency is no longer in a variety of experiments,58,59 which demonstrate
operation. This type of phenomenon is difficult to that pre-school children have the ability to learn
account for according to some mechanistic associative causal structures. In particular, this knowledge can
accounts, because the association formed by partial
be revealed by the nature of the interventions children
reinforcement is simply assumed to be weaker, and
choose to perform on the experimental apparatus
for this reason should be expected to be eliminated
embodying the causal relationships. This knowledge
more rapidly.
is independent of the frequency information available
Similarly, a variety of probabilistic models
in the experimental set up and does not appear to be
have been put forward to explain human judgment
learnable within non-Bayesian frameworks.
of contingency and causality, when learning from
Note though, that contingency is a relatively
experience. Cheng,52 for example, has put forward
weak source of information about causal relation-
a ‘probabilistic contrast’ model of human causal
ships. In observing the relationship between an object
judgment, according to which the strength of a causal
relationship is assumed to be measured by the contrast and its shadow, for example, the fact that the shadow
between probability of the effect, in the presence of has roughly the same shape as the object that casts it,
the cause, and the probability of the effect in the that the shadow moves predictably when the object
absence of the cause. Griffiths and Tenenbaum53 moves, and that, in many cases at least, the shadow
have proposed a Bayesian model in which the and object connect smoothly at the object base, pro-
existence of, and the nature of, a potential causal vide powerful indications of the existence of a relation-
relationship between events is itself inferred from the ship between the two; a trail of footprints in the sand
observed data. This account aims to explain empirical can reasonably be causally attributed to the recent
data concerning both how the structure of causal passage of feet purely in virtue of their shape and
relationships can be learned, as well as the strength arrangement. Indeed, a variety of classic psychological
of those relationships, which is the primary concern demonstrations of ‘perceptual’ causality60 and even
of Cheng’s model. Sloman and Lagnado,54 moreover, causal relations underpinned by social interactions,61
have directly studied the role of intervention in human appear to be perceived essentially instantaneously,
causal judgments. without requiring prior learning. A strength of the
According to many standard philosophical Bayesian approach is that it is, in principle, possible
accounts of causality, the existence of causal relation to build models which include richer representations
between two events A and B depends on counterfac- of the physical structure of the environment, or prior
tual claims about whether, for example, B would still knowledge about other aspects of the physical and
have occurred even if A had been ‘blocked’, leaving social world, such that examples of this kind can
everything else unchanged as far as possible. Thus, for readily be captured. Such work is at an early stage62 ;
example, pressing the ‘alarm set’ button on the alarm but, for example, there has already been significant
clock appears to be causally related to the alarm clock progress in constructing computational models of the
attribution of intentions to an agent, from observing increasingly suggested that a probabilistic integration
the agent’s behavior.63 of multiple cues is used by the language processing
system in order to determine the most probable parse
and interpretation of the input.67–69
LANGUAGE PROCESSING As with other aspects of learning, it is also
Probabilistic approaches have also been influential natural to view the problem of acquiring a language
in recent accounts of language processing and as an example of uncertain inference. Any finite set of
acquisition.64 Within linguistics, it has been standard linguistic data available to the child will be compatible
to view probabilistic aspects of language as of with an infinite number of languages; and the child
marginal importance, although mainly the study of must learn to generalize from the observed input to be
syntax. Language is often viewed as a set of well- able to successfully produce and understand linguistic
formed strings, which are generated by a symbolic material that has never previously been encountered.
grammar, and associated, through systems of symbolic From a non probabilistic point of view, the
rules, with phonological and semantic representations. problem of learning a language appears almost
The mappings between phonology, syntax, and insuperably difficult; it will, for example, be extremely
semantics can be fully described, according to this hard for the learner to distinguish between, say,
point of view, without reference to probabilities. normal English and a version of English with one
Probability is, nonetheless, fundamentally involved additional constraint, for example, that it is not
in language processing and acquisition in a number of grammatically acceptable to begin and end a sentence
ways. with the word fish, to include more than five adjectives
Notice, for example, that the problem of in a noun phrase, or to use a sentence whose sequence
analog-to-digital conversion, that is, turning an of words forms a palindrome (disallowing dogs chase
extremely rich and complex acoustic waveform into a dogs). These possible variants of English would be
discrete phonological representation is an enormously extremely difficult to rule out, because the structures
challenging problem of uncertain inference. The that they disallow are extremely rare, and might not
speech wave is typically highly locally ambiguous, be expected to occur more than a few times, if at all,
and can only be disambiguated by piecing together during childhood. From a probabilistic point of view,
large numbers of locally ambiguous cues, together these variations need not be ruled out unequivocally,
with background knowledge concerning the speaker, but rather assigned a very low prior probability (e.g.,
the topic being discussed, and so on. Unsurprisingly, on the basis that prior probability should be inversely
speech technology draws on a rich repertoire of related to complexity); from a non probabilistic point
probabilistic methods including hidden Markov of view, such possibilities either need to be ruled out
models, and neural networks.65 Probability plays a entirely, or pose genuine problems for the learner.
similar role in helping to construct a globally coherent Note, though, that languages do exhibit numerous
parse (and associated semantic representation), in the apparently arbitrary constraints, which learners are
light of the notorious local ambiguity of natural able to successfully learn. So, for example, the child
language, whether such ambiguity is lexical (e.g., must infer that, while it is acceptable to say I made the
bank as financial institution or geographical feature), clock break, I broke the clock, and I made the clock
syntactic [e.g., I saw the man (with the telescope) disappear, it is not acceptable to say I disappeared
vs. I saw (the man with the telescope)], or semantic the rabbit, even though the meaning of this string of
(e.g., all the witnesses saw a burglar running from words is entirely clear. Learning the absence of certain
the scene, which might or might not be interpreted linguistic possibilities has often been viewed as posing
as implying that each witness all the same burglar). ‘logical’ problems for language acquisition, however,
Again, a globally coherent parse and interpretation much data the child receives.70 From a probabilistic
of a sentence can only be achieved by integrating standpoint, it is possible to show that learning is
these locally ambiguous cues, together with relevant possible in principle, given sufficient data.71 More
background knowledge; and, just as in the problem important, perhaps, Bayesian analysis of language
of perception, the natural framework in which to acquisition provides the tools to assess the prior
consider such integration is probabilistic inference. information that the learner must possess, in order
Traditional theories of parsing have not, to learn these and other regularities, given realistic
however, taken a probabilistic standpoint; indeed, estimates of the data available to the child.72
such accounts have often, instead, focus purely on There has, moreover, been increasing interest in
structural features of the competing parses.66 Research building statistical computational models, although
over the last decade and a half has, however, not always using a strictly Bayesian framework,
which can potentially model the acquisition of a Rabbits have sesamoid bones (a)
variety of aspects of phonology, syntax and semantics, Dogs (Bears) have sesamoid bones
ranging from the acquisition of morphology, to
syntactic categories, and broad semantic classes73–76 ;
and there has been substantial progress in developing Bluejays (Geese) have sesamoid bones (b)
computational models that are able to learn phrase Blue tits have sesamoid bones
structure and dependency relations from corpora
of untagged text.77 From the point of view of a
Bayesian analysis, the problem of language acquisition This Barratos islander is obese (c)
remains formidable indeed; but significant progress
All Barratos islanders are obese
has been made both in developing specific models of
learning, and defining methods for determining what
is learnable in principle. This Shreeble is blue (d)
INDUCTIVE REASONING
Inductive reasoning involves drawing conclusions that Cows require vitamin K for the liver to function (e)
are probably true, given a set of premises. Conse- Horses require vitamin K for the liver to function
quently, a rational Bayesian approach seems uniquely
All mammals require vitamin K for the liver to function
suited to model induction. Inductive reasoning con-
trasts with deductive reasoning, in which the conclu-
sion must necessarily follow from a set of premises. Cows require vitamin K for the liver to function (f)
In contrast, two inductive arguments can each have
Ferrets require vitamin K for the liver to function
some degree of inductive strength (Figure 1).
There is now a well-documented set of empirical All mammals require vitamin K for the liver to function
regularities on inductive reasoning (see Ref 78, for
FIGURE 2 | Empirical effects. (a) Similarity : when premise and
a more extensive review). These demonstrations
conclusion are more similar (rabbits–dogs) inference is stronger than
all use inference patterns like that in figure 1. when they are less similar (rabbits–bears). (b) Typicality : typical
Rips,79 looked at how people project properties categories (bluejays) lead to stronger inferences than less typical
of one category of animals to another (Figure 2(a) (geese). Variability : variable categories (c) lead to stronger inferences
and (b)). He found that the more similar the than less variable categories (d). Diversity . : diverse categories (f) lead
premise category is to the conclusion category the to stronger inferences than less diverse categories (e).
stronger the inference (Figure 2a). He also found
that the more typical the premise category [bluejays
(typical) vs. geese (atypical)] the stronger the inference
(Figure 2b). Using multiple regression analyses, Rips variability of the conclusion category. After just one
found distinct contributions of premise-conclusion case, variable categories (Figure 2(c)), for example,
similarity and premise typicality (see Ref 80 for people on an imaginary island (Barratos) with
further investigations of similarity and typicality respect to obesity, lead to weaker inferences than
effects). non-variable categories, such as imaginary birds
Using similar materials, Nisbett et al.,81 found (Shreebles) with respect to color (Figure 2(d)). Nisbett
that participants were very sensitive to the perceived et al.81 also systematically varied the given number of
observations. For example, participants were told that
1, 3, or 20 shreebles had been observed. Inferences
Cows have sesamoid bones (a)
were stronger with increased sample size (see also Ref
All mammals have sesamoid bones
80). Osherson et al.80 showed that diversity of cases
also affects inductive strength, that is, Figure 2(f) is
considered stronger than Figure 2(e). This diversity
Ferrets have sesamoid bones (b) effect runs in the opposite direction to the typicality
All mammals have sesamoid bones effect: Whereas a typical premise category leads
to a fairly strong inductive argument (Figure 2(b)),
FIGURE 1 | Inductive arguments vary in strength. The conclusion in an argument with two typical premise categories
argument (a) may seem stronger, or more probable given the evidence, (Figure 2(e)) is weaker than an argument with a typical
than the conclusion in (b).
premise and an atypical premise (Figure 2(f)).
A rational Bayesian model82 views evaluating an the application of rational Bayesian models in this
inductive argument as learning for which categories area, we concentrate on conditional inference which
a property is true or false. In Figure 1(a), the goal is is currently the most researched topic in the area.
to learn which animals have sesamoid bones. For this Four inference patterns have mainly been stud-
novel property, hypotheses must be derived from prior ied: two which are logically valid: modus ponens
knowledge about familiar properties. People know (MP) and modus tollens (MT), and two falla-
some facts that are true of all mammals (including cies: denying the antecedent (DA) and affirming the
cows), but they also know some facts that are true consequent (AC) (Figure 3). Classical logic predicts
of cows but not some other mammals. The question endorsement of the valid inferences and rejection
is which of these known kinds of properties does the of the fallacies. However, all four inferences are
novel property, ‘has sesamoid bones’, resemble most, endorsed above 50% and in the characteristic order:
an all-mammal property, or a cow-only property? MP > MT > AC > DA93 revealing a large discrep-
Crucially it is assumed that novel properties follow ancy between performance and logical expectations.
the same distribution as known properties. Because The core intuition behind a rational Bayesian
many known properties of cows are also true of other model of conditional inference is that it must account
mammals, argument Figure 1(a) seems fairly strong. for the non monotonicity of everyday informal
As well as typicality, a Bayesian model also reasoning with conditionals.94,95 Classical logic is
addresses the other key results in inductive reasoning. monotonic (Figure 4(a)) and hence is unable to
Similarity effects arise because given that rabbits have account the ability of additional information to defeat
sesamoid bones, it more likely that dogs do rather previously derived conclusions (Figure 4(b)). The only
than bears, because rabbits and dogs share more recourse is to question the premises, e.g., in Figure 4(b)
known properties than rabbits and bears. Diversity to suggest that birds fly is false. But surely, while
effects are also addressed. Figure 2(e) will access many defeasible, this is a very useful generalization that we
idiosyncratic properties true just of large farm animals would not want to reject as false.
and so a novel property of cows and horses may seem The Bayesian approach is to adopt The
idiosyncratic to farm animals. In contrast, Figure 2(f) Equation and to treat conditional inference as
could not access familiar idiosyncratic properties true Bayesian conditionalization.87,88 That is, people are
of just these two animals, so prior hypotheses must trying to determine the posterior probability of
be derived from known properties that are true of the conclusion, P1 (flys(a)), given they now know
all mammals or all animals. We have focused here that the categorical premise holds with certainty,
on a narrow class of inductive inference problems P1 (bird(a)) = 1 (Figure 4(a)). By Bayesian condition-
that have been especially well-studied empirically. alization, P1 (flys(a)) = P0 (flys(a)|bird(a)), that is, the
But recent Bayesian models have analyzed a wide posterior probability of the conclusion equals the prior
conditional probability of the conclusion given the cat-
range of inductive problems, which can be naturally
egorical premise. Note that this approach easily han-
formulated and modeled in probabilistic terms.83,84
dles non monotonicity, for example, P0 (flys(a)|bird(a))
= 0.9 and P0 (flys(a)|bird(a),Ostrich(a)) = 0 are per-
fectly probabilistically consistent (Figure 4b).
DEDUCTIVE REASONING This approach cannot immediately apply to
MT and the fallacies because, for example, DA
Work on ostensibly deductive reasoning tasks reveals requires knowledge of P0 (¬flys(a)|¬bird(a)) and
many apparent errors and biases when performance is there is insufficient information in the premises
compared to classical logical standards.85 The recent
emergence of rational Bayesian models casts this per- p ⇒ q, p p ⇒ q, ¬q
formance in a better light by comparing performance (MP) (MT)
∴q ∴ ¬p
to a probabilistic standard.86,87 Such models have been
developed in all the three main areas investigated in
p ⇒ q, ¬p p ⇒ q, q
the psychology of reasoning, conditional inference,88 (DA) (AC)
data selection,89 and syllogistic reasoning.90 The key ∴ ¬q ∴p
idea behind them all is that the conditional prob- FIGURE 3 | The valid inferences, modus ponens (MP) and modus
ability, P(q|p), provides the meaning of conditional tollens (MT), and the fallacies, denying the antecedent (DA) and
statements, if p then q (e.g., if you turn the key then affirming the consequent (AC), investigated in conditional inference.
the car starts), and so P(if p then q) = P(q|p). This These inference schema are to be read that if the list of premises above
latter identity is called The Equation.91,92 To illustrate the line are true so must be the conclusion below the line.
FIGURE 4 | Monotonic (a) and non-monotonic (b) conditional inference by MP. In (a), the additional information, that the particular triangle a is
red, cannot override the original conclusion that qua triangle, a has three sides. In contrast, in (b), the additional information, that the particular bird
a is an Ostrich does override the original conclusion that qua bird, a can fly.
to determine this probability. This is actually also to persuade yourself or others of a particular,
true of P0 (flys(a)|bird(a)) for MP, which on the perhaps controversial, position.97 The rational Bay-
subjective view of probability (see Introductory text) esian approach has been extended to at least some
must be determined by reference to global world aspects of argumentation.98 On this view concern
knowledge via the Ramsey Test, that is, add the centers on how the premises, P, of an argument
antecedent, bird(a), to one’s stock of beliefs, make affect the probability of the conclusion, C. If
minimal adjustments to incorporate it, and then P(C|P) is high then the argument has high inductive
read off the probability of the consequent, flys(a), strength.
this is P0 (flys(a)|bird(a)) (Figure 5). To determine This account has been applied most directly
the conditional probabilities for DA, AC, and MT to reasoning fallacies in the attempt to understand
requires the assumption that the priors P0 (flys(x)) how some instances seem to be good arguments
and P0 (bird(x)) are also available from global world while others do not.99 For example, the classical
knowledge. Figure 6 show shows how well the
so-called argument from ignorance, or argumentum
Bayesian conditionalization model accounts for the
ad ignorantiam, has many seemingly very weak
principle data on conditional inference.
exemplars:
ARGUMENTATION
Reasoning and decision making often takes place in Ghosts exist, because nobody has proven
the service of argumentation, that is, the attempt that they don’t (1)
Data Model
1
(a) (b) (c)
P (Endorse)
0.8
However, other exemplars of this argument form seem Positive Negative Model
quite strong in scientific and everyday discourse: 1
One Fifty
Prob (Conclusion)
This drug is safe, because no-one has 0.9
found any toxic effects (2)
0.8
REFERENCES
1. Gregory R. The Intelligent Eye. London: Weidenfeld 19. Blake A, Bulthoff HH, Sheinberg D. Shape from tex-
and Nicolson; 1970. ture: ideal observers and human psychophysics. In:
2. Johansson P, Hall L, Sikström S, Olsson A. Failure to Knill D, Richards W, eds. Perception as Bayesian
detect mismatches between intention and outcome in Inference. Cambridge, MA: Cambridge University
a simple decision task. Science 2005, 310:116–119. Press; 1996, 287–321.
3. Quine WVO. Word and Object. Cambridge, MA: 20. Feldman J. Bayesian contour integration. Percept Psy-
Harvard University Press; 1960. chophys 2001, 63:1171–1182.
4. Bernoulli J. Ars Conjectandi, The Art of Conjecture, 21. Feldman J, Singh M. Information along curves and
1713, (translation and notes by Edith Dudley Sylla, closed contours. Psychol Rev 2005, 112:243–252.
Baltimore, Maryland: John Hopkins University Press, 22. Barlow HB. (1959)., Sensory mechanisms, the reduc-
2005). tion of redundancy, and intelligence. In The mechani-
5. Marr D. Vision. San Francisco, CA: W. H. Freeman; sation of thought processes, H.M.S.O., London: pp.
1982. 535–539.
6. Beck J, Ma WJ, Kiani R, Hanks T, Churchland AK 23. Snippe HP, Poot L, van Hateren JH. A temporal model
et al. Probabilistic population codes for Bayesian deci- for early vision that explains detection thresholds for
sion making. Neuron 60:1142–52. light pulses on flickering backgrounds. Vis Neurosci
7. Lehman R. On confirmation and rational betting. 2000, 17:449–462.
J Symbolic Logic 1955, 20:251–262. 24. Attneave F. Some informational aspects of visual per-
8. Freeman WT. The generic viewpoint assumption in ception. Psychol Rev 1954, 61:183–193.
a framework for visual perception. Nature 1994,
25. Hochberg JE, McAlister E. A quantitative approach to
368:542–545.
figural ‘‘goodness’’. J Exp Psychol 1953, 46:361–364.
9. Helmholtz H. Treatise on Physiological Optics, vol.
26. Leeuwenberg ELJ. Quantitative specification of infor-
3. New York: Dover; 1910/1962, (English translation
mation in sequential patterns. Psychol Rev 1969,
by JPC Southall for the Optical Society of America
216–220.
(1925) from the 3rd German edition of Handbuch der
physiologischen Optik (Hamburg: Voss, 1910; first 27. Leeuwenberg E. A perceptual coding language for
published in 1867, Leipzig: Voss)). perceptual and auditory patterns. Am J Community
10. Westheimer G. Was Helmholtz a Bayesian? Percep- Psychol 1971, 84:307–349.
tion 2008, 37:642–650. 28. Leeuwenberg E, Boselie E. Against the likelihood prin-
11. Liberman AM, Mattingly IG. The motor theory of ciple in visual form perception. Psychol Rev 1988,
speech perception revised. Cognition 1985, 21:1–36. 95:485–491.
12. Yuille A, Kersten D. Vision as Bayesian infer- 29. Mach E. The Analysis of Sensations and the Relation
ence: analysis by synthesis? Trends Cogn Sci 2006, of the Physical to the Psychical. New York: Dover
10:301–308. Publications; 1959, (Original work published 1914).
13. Tu Z, Zhu S-C. Image segmentation y data-driven 30. Restle F. Theory of serial pattern learning: structural
Markov chain Monte Carlo. IEEE Trans Pattern Anal trees. Psychol Rev 1970, 77:481–495.
Mach Intell 2002, 24:657–673.
31. Van der Helm PA, Leeuwenberg PA. Goodness of
14. Shepard RN. Ecological constraints on internal repre- visual regularities: a non- transformational approach.
sentation. Psychol Rev 1984, 91:417–447. Psychol Rev 1996, 103:429–496.
15. Bar M, Kassam KS, Ghuman AS, Boshyan J, Schmid 32. Chater N. Reconciling simplicity and likelihood prin-
AM. Top-down facilitation of visual recognition. Proc ciples in perceptual organisation. Psychol Rev 1996,
Natl Acad Sci USA 2006, 103:449–454. 103:566–581.
16. Ernst MO, Banks MS. Humans integrate visual and 33. Reed SK. Pattern recognition and categorization. Cog-
haptic information in a statistically optimal fashion. nit Psychol 1972, 3:382–407.
Nature 2002, 415:429–433.
34. Rosch E, Mervis CB. Family resemblances: studies in
17. Weiss Y. Interpreting images by propagating Bayesian
the internal structure of categories. Cognit Psychol
beliefs. In: Mozer, MC, Jordan MI, Petsche T, eds.
1975, 7:573–605.
Advances in Neural Information Processing Sys-
tems 9. Cambridge MA: MIT Press; 1997, 908–915. 35. Medin DL, Schaffer MM. Context theory of classifi-
18. Adelson EH, Pentland AP. The perception of shad- cation learning. Psychol Rev 1978, 85:207–238.
ing and reflectance. In: D Knill, W Richards, eds. 36. Ashby FG, Gott RE. Decision rules in the perception
Perception as Bayesian Inference. Cambridge, MA: and categorization of multidimensional stimuli. J Exp
Cambridge University Press; 1996, 409–423. Psychol: Learn Mem Cognit 1988, 14:33–53.
37. Ashby FG, Townsend JT. Varieties of perceptual inde- 56. Pearl J. Causality: Models, Reasoning and Inference.
pendence. Psychol Rev 1986, 93:154–79. Cambridge: Cambridge University Press; 2000.
38. Fried LS, Holyoak KJ. Induction of category distribu- 57. Gopnik A, Glymour C, Sobel DM, Schulz LE, Kushnir
tions: a framework for classification learning. J Exp T et al. A theory of causal learning in children: causal
Psychol Learn Mem Cogn 1984, 10:234–257. maps and Bayes nets. Psychol Rev 2004, 111:3–32.
39. Lamberts K. Information-accumulation theory 58. Gopnik A, Sobel DM, Schulz LE, Glymour C. Causal
of speeded categorization. Psychol Rev 2000, learning mechanisms in very young children: two-,
107:227–260. three-, and four-year olds infer causal relations from
40. Nosofsky RM. Attention, similarity, and the patterns of variation and co-variation. Dev Psychol
identification-categorization relationship. J Exp Psy- 2001, 37:620–629.
chol Gen 1986, 115:39–57. 59. Sobel D, Tenenbaum J, Gopnik A. Children’s causal
41. Anderson JR. The adaptive nature of human catego- inferences from indirect evidence: backwards block-
rization. Psychol Rev 1991, 98:409–429. ing and Bayesian reasoning in preschoolers. Cogn Sci
42. Griffiths TL, Sanborn AN, Canini KR, Navarro DJ. 2004, 28:303–333.
Categorization as nonparametric Bayesian density 60. Michotte A. The Perception of Causality. New York:
estimation. In: Oaksford M, Chater N, eds. The Basic Books; 1963.
Probabilistic Mind: Prospects for Rational Models 61. Heider F, Simmel M. An experimental study of
of Cognition. Oxford: Oxford University Press; 2008. apparent behavior. Am J Community Psychol 1944,
43. Goodman ND, Tenenbaum JB, Griffiths TL, Feldman 57:243–259.
J. Compositionality in rational analysis: grammar- 62. Sanborn AN, Mansinghka VK, Griffiths TL.
based induction for concept learning. In: Oaksford A Bayesian framework for modeling intuitive dynam-
M, Chater N, eds. The Probabilistic Mind: Prospects ics. In: Taatgen NA, van Rijn H, eds. Proceedings of
for Rational Models of cognition. Oxford: Oxford the 31st Annual Conference of the Cognitive Science
University Press. 2008. Society. Austin, TX: Cognitive Science Society; 2009,
44. Rosseel Y. Mixture models of categorization. J Math 1145–1150.
Psychol 2002, 46:178–210.
63. Baker CL, Tenenbaum JB, Saxe RR. Action as inverse
45. Heller KA, Sanborn A, Chater N. Hierarchical learn- planning. Cognition In press.
ing of dimensional biases in human categorization.
64. Chater N, Manning C. Probabilistic models of lan-
Neural Inf Process Syst, 2009.
guage processing and acquisition. Trends Cogn Sci
46. Tenenbaum JB, Griffiths TL, Kemp C. Theory-based 2006, 10:335–344.
Bayesian models of inductive learning and reasoning.
65. Rabiner L, Juang L. Fundamentals of Speech Recog-
Trends Cogn Sci 2006, 10:309–318.
nition. New York: Prentice Hall; 1993.
47. Dickinson A. Contemporary Animal Learning The-
66. Frazier L. On Comprehending Sentences: Syntactic
ory. Cambridge: Cambridge University Press; 1980.
Parsing Strategies. Ph.D. Dissertation, University of
48. Kamin LJ. ‘‘Attention-like’’ processes in classical con- Connecticut, 1979.
ditioning. In: Jones MR, ed. Miami Symposium on the
Prediction of Behavior, 1967: Aversive Stimulation. 67. Chater N, Crocker MJ, Pickering MJ. The rational
Coral Gables, Florida: University of Miami Press; analysis of inquiry: the case of parsing. In: Oaksford
1968, 9–31. M, Chater N, eds. Rational Models of Cognition.
Oxford: Oxford University Press; 1998, 441–468.
49. Courville AC, Daw ND, Touretzky DS. Bayesian the-
ories of conditioning in a changing world. Trends 68. Narayanan S, Jurafsky D. A Bayesian model predicts
Cogn Sci 2006, 10:294–300. human parse preference and reading time in sentence
processing. In: Dietterich TG, Becker S, Ghahramani
50. Gallistel CR, Gibbon J. Time, rate, and conditioning.
Z, eds. Advances in Neural Information Processing
Psychol Rev 2000, 107:289–344. Systems, vol. 14. Cambridge, MA: MIT Press; 2002,
51. Kakade S, Dayan P. Acquisition and extinction in 59–65.
autoshaping. Psychol Rev 2002, 109:533–544.
69. McRae K, Spivey-Knowlton MJ, Tanenhaus MK.
52. Cheng PW. From covariation to causation: a causal Modeling the influence of thematic fit (and other con-
power theory. Psychol Rev 1997, 104:367–405. straints) in online sentence comprehension. J Memory
53. Griffiths TL, Tenenbaum JB. Structure and strength in Lang 1998, 38:283–312.
causal induction. Cognit Psychol 2005, 51:354–384. 70. Pinker S. Formal models of language learning. Cogni-
54. Sloman SA, Lagnado DA. Do we do? Cogn Sci 2005, tion 1979, 7:217–283.
29:5–39. 71. Chater N, Vitányi P. ‘Ideal learning’ of natural lan-
55. Spirtes P, Glymour C, Scheines R. Causation, Predic- guage: positive results about learning from positive
tion and Search. Cambridge, MA: MIT Press; 1993. evidence. J Math Psychol 2007, 51:135–163.
72. Foraker S, Regier T, Khetarpal N, Perfors A, 87. Oaksford M, Chater N. Bayesian Rationality:
Tenenbaum J. Indirect evidence and the poverty of The Probabilistic Approach to Human Reasoning.
the stimulus: the case of anaphoric one. Cogn Sci Oxford: Oxford University Press; 2007.
2009, 33:287–300.
88. Oaksford M, Chater N, Larkin J. Probabilities and
73. Goldsmith J. An algorithm for the unsupervised learn- polarity biases in conditional inference. J Exp Psychol
ing of morphology. Nat Lang Eng 2006, 12:353–371. Learn Mem Cogn 2000, 26:883–889.
74. Landauer TK, Dumais ST. A solution to Plato’s prob- 89. Oaksford M, Chater N. A rational analysis of the
lem: the Latent Semantic Analysis theory of acqui- selection task as optimal data selection. Psychol Rev
sition, induction and representation of knowledge. 1994, 101:608–631.
Psychol Rev 1997, 104:211–240.
90. Chater N, Oaksford M. The probability heuristics
75. Redington M, Chater N, Finch S. Distributional
model of syllogistic reasoning. Cognit Psychol 1999,
information: a powerful cue for acquiring syntactic
38:191–258.
categories. Cogn Sci 1998, 22:425–469.
76. Griffiths TL, Steyvers M. Finding scientific topics. 91. Adams EW. The utility of truth and probability. In:
Proc Natl Acad Sci U S A 2004, 101:5228–5235. Weingartner P, Schurz G, Dorn G, eds. The Role
of Pragmatics in Contemporary Philosophy. Vienna:
77. Klein D, Manning C. A generative constituent-context Holder-Pichler-Tempsky; 1998, 176–1994.
model for improved grammar induction. Proceed-
ings of the Annual Conference of the Association 92. Edgington D. On conditionals. Mind 1995, 104:
of Computational Linguistics (ACL 40), University of 235–329.
Pennsylvania, Philadelphia, PA, USA, 2002, 128–135. 93. Schroyens W, Schaeken W. A critique of Oaksford,
78. Heit E. Properties of inductive reasoning. Psychon Chater and Larkin’s (2000) conditional probability
Bull Rev 2000, 7:569–592. model of conditional reasoning. J Exp Psychol Learn
79. Rips LJ. Inductive judgments about natural categories. Mem Cogn 2003, 29:140–149.
J Verbal Learn Verbal Behav 1975, 14:665–681. 94. Oaksford M, Chater N. Against logicist cognitive
80. Osherson DN, Smith EE, Wilkie O, Lopez A, Shafir science. Mind Lang 1991, 6:1–38.
E. Category-based induction. Psychol Rev 1990, 95. Oaksford M, Chater N. Rationality in an Uncertain
97:185–200. World. Hove, England: Psychology Press; 1998.
81. Nisbett RE, Krantz DH, Jepson C, Kunda Z. The use
96. Oaksford M, Chater N. Probability logic and the
of statistical heuristics in everyday inductive reason-
Modus Ponens-Modus Tollens asymmetry in condi-
ing. Psychol Rev 1983, 90:339–363.
tional inference. In: Chater N, Oaksford M, eds. The
82. Heit E. A Bayesian analysis of some forms of inductive Probabilistic Mind: Prospects for Bayesian Cogni-
reasoning. In: Oaksford M, Chater N, eds. Ratio- tive Science. Oxford: Oxford University Press; 2008,
nal Models of Cognition. Oxford: Oxford University 97–120.
Press; 1998, 248–274.
97. van Eemeren FH, Grootendorst R. Argumentation,
83. Kemp C, Jern A. A taxonomy of inductive problems. Communication, and Fallacies: a Pragma-Dialectical
In: Taatgen N, van Rijn H. Proceedings of the 31st Perspective. Mahwah, NJ: Lawrence Erlbaum Asso-
Annual Conference of the Cognitive Science Society. ciates; 1992.
Mahwah, NJ: Lawrence Erlbaum Associates; 2009,
255–260. 98. Hahn U, Oaksford M. The rationality of informal
argumentation: a Bayesian approach to reasoning fal-
84. Kemp C, Tenenbaum JB. Structured statistical models
lacies. Psychol Rev 2007, 114:704–732.
of inductive reasoning. Psychol Rev 2009, 116:20–58.
85. Evans J.St.B.T. Heuristics and analytic processes in 99. Hamblin CL. Fallacies. London: Methuen; 1970.
reasoning. Br J Health Psychol 1984, 75:541–568. 100. Oaksford M, Hahn U. A Bayesian analysis of the
86. Oaksford M, Chater N. The probabilistic approach to argument from ignorance. Can J Exp Psychol 2004,
human reasoning. Trends Cogn Sci 2001, 5:349–357. 58:75–85.