Unexpectedness and Bayes' Rule

Unexpectedness
and Bayes’ rule
6 December 2021, CIFMA workshop
Giovanni Sileno Jean-Louis Dessalles
g.sileno@uva.nl jean-louis.dessalles@telecom-paris.fr
University of Amsterdam Télécom Paris -- Institut Polytechnique de Paris

● Human experience unfolds in patterns (tendencies, rules, laws, …) as much as
in lack of determinism, even without taking into account quantum mechanics.
We live in a “probabilistic” world /1

● Started by investigating gambling, probability theory has grown to be the most
important ingredient of formal accounts dealing with how rational agents
(artificial or natural) reason in conditions of uncertainty.

● Started by investigating gambling, probability theory has grown to be the most
important ingredient of formal accounts dealing with how rational agents
(artificial or natural) reason in conditions of uncertainty.
● Fundamental basis of Shannon’s theory of information.

● The probabilistic formula named after Thomas Bayes (Bayes’ rule) has a special role
in this success, as it is used for
○ Bayesian models (e.g. Bayesian networks),
○ Bayesian inference,
○ maximum a posteriori (MAP) estimation in statistics,
○ core component of machine learning methods (e.g. variational autoencoders)
○ …
Bayes’ rule

● Applications supporting or reproducing human decision-making, e.g.
○ medical diagnosis
○ evidential reasoning (eg. in criminal court settings)
○ …
● Cognitive models of
○ animal learning
○ visual perception
○ motor control
○ language processing
○ forms of social cognition
○ …
Uses of Bayes’ rule

● Applications supporting or reproducing human decision-making, e.g.
○ medical diagnosis
○ evidential reasoning (eg. in criminal court settings)
○ …
● Cognitive models of
○ animal learning
○ visual perception
○ motor control
○ language processing
○ forms of social cognition
○ …
Uses of Bayes’ rule
PRESCRIPTIVE accounts:
how agents should reason
DESCRIPTIVE accounts:
how agents do produce inferences

● clarity of the theoretical framework,
● proven practical value
PROs of probability theory

…and CONs
as a FORMAL system
probability theory relies on a series of axioms,
e.g. a measurable space of events

…and CONs
as a FORMAL system
but our experience of the
world defies this closure

…and CONs
as a FORMAL system
as a MODELLING framework
several cognitive patterns (often called biases or fallacies)
are not predicted by probability theory

…and CONs
as a FORMAL system
as a MODELLING framework
several cognitive patterns (often called biases or fallacies)
are not predicted by probability theory
in particular, there is a mismatch in what humans perceive as
informative w.r.t. Shannon’s notion of information

● Simplicity Theory (ST) is a computational model of cognition whose
investigation started by observing the “informativity” mismatch.
Simplicity Theory
NOISE SOURCE:
maximally informative following
Shannon’s theory of information

● Simplicity Theory (ST) is a computational model of cognition whose
investigation started by observing the “informativity” mismatch.
● ST predicts diverse human phenomena related to relevance:
○ unexpectedness
○ narrative interest
○ coincidences
○ near-miss experiences
○ emotional interest
○ responsibility
● ST has been used for experiments in artificial creativity.
Simplicity Theory

● Formally, ST builds upon Algorithmic Information Theory (AIT).
Simplicity Theory: formal background

● In AIT, the complexity of a string is the minimal length of a program that, given
a certain optional input parameter, produces that string as an output
(Kolmogorov complexity)
underlying
Turing machine
target string additional input in support
executable program

underlying
Turing machine
executable program
≠
how much information is needed for a
program constructing the object
how much time or space is
needed for running it
(algorithmic
or time-complexity)

underlying
Turing machine
executable program
● Kolmogorov complexity is generally
incomputable (due to the halting
problem), but it is computable on
bounded Turing machines.

underlying
Turing machine
executable program
● Kolmogorov complexity is generally
incomputable (due to the halting
problem), but it is computable on
bounded Turing machines.
We denote bounded
complexities with

● ST starts from the observation that humans are highly susceptible to
complexity drops, ie. for them
situations are relevant if they are simpler to describe than to explain
Unexpectedness

● ST starts from the observation that humans are highly susceptible to
complexity drops, ie. for them
situations are relevant if they are simpler to describe than to explain
● Formally, this is captured by the formula of unexpectedness, expressed as
divergence of complexity computed on two distinct machines
Unexpectedness
causal complexity
via world machine
description complexity
via description machine
situation

Unexpectedness: examples
● remarkable lottery draws: 11111 is more unexpected than 64178, even if the
lottery is fair
● coincidences: meeting by chance an old friend from yours abroad is more
unexpected than meeting there any random unknown person.
● deterministic yet unexpected events: e.g. a lunar eclipse
causal complexity
via world machine
description complexity
via description machine
situation

Aim of the paper
Bayes’ rule is a specific instantiation of a more general
template captured in ST by Unexpectedness
● Provide further arguments in support to non-probabilistic computational
models in cognition, in particular focusing on the following:
conjecture

Bayes’ rule
● From the definition of conditional probability:
we can obtain the formula of Bayes simply:
model
observation
often informally rewritten as:

Unexpectedness as posterior
● In previous works, it has been hypothesized that ST’s Unexpectedness offers
as non-extensional measure of posterior subjective probability:

● Starting from this hypothesis, we looked
for a mapping from Unexpectedness to
Bayes’ rules, and indeed we see that:

problem: 1 parameter with unexpectednes,
2 with posterior

problem: 1 parameter with unexpectednes,
2 with posterior
let’s investigate these two terms…

Causal complexity
● The causal complexity is the length in bits of the shortest path that, according to the
agent's world model, produces the situation.

Causal complexity
● The causal path is temporally unfolded. The chain rule has the form:
sequential composition causal link
(implicit: from the current situation)

Causal complexity
● Being a Kolmogorov complexity, the cause can be omitted if it lies on the shortest path

Causal complexity
● Being a Kolmogorov complexity, the cause can be omitted if it lies on the shortest path
the Unexpectedness formula abstracts the causally explanatory factor

Description complexity
● The description complexity is the length in bits of the shortest program that,
leveraging mental resources, determines the situation

○ e.g. determination could correspond to retrieve the situation from memory, so
informationally, we need to specify the address where to look at (an encoding)

● In the proposed mapping, corresponds to , the probability of observing
that situation.
a theoretical link can be then established through optimal encoding in
Shannon’s terms, where probability is assessed through frequency.

● In the proposed mapping, corresponds to , the probability of observing
that situation.
a theoretical link can be then established through optimal encoding in
Shannon’s terms, where probability is assessed through frequency.
● Complexity is however a more general measure, as it allows us to consider
compositional effects (eg. à la Gestalt) via adequate mental operations

Bayes’ rule vs Unexpectedness
● Bayes’ rule is a specific instantiation of ST’s Unexpectedness that:
○ makes a candidate “cause” explicit and does not select automatically the best one
○ takes a frequentist-like approach for encoding observables.

Why is this relevant?
● Unexpectedness is a more generally applicable measure.
● In the paper we show that it can be used to build:
○ an informational principle of framing
○ a model of derived likelihood
○ an explanation of the prosecutor’s fallacy

All prior is posterior of some other prior
● Let us consider an additional prior in Bayes’ formula, a sort of ‘environmental context’.
Following probability theory we have two equivalent formulations for the posterior:

● These formulations are not equivalent when expressed in complexity terms!

● These formulations are not equivalent when expressed in complexity terms!
abstracting c as before

● Let us compute the difference between the two formulations:

● Two distinct chain rules apply on the world and description machines:

● Two distinct chain rules apply on the world and description machines:
describing e and s together may be
simpler than fully determining one
term before the other
(cf. informed search)
the temporal
constraint is dropped

applying the chain rules…

a necessary condition for which the two formulations may be
equivalent is that the contextual prior is not unexpected.

shared facts, defaults, and also
improbable but descriptively complex situations

informational principle of framing
all contextual situations which are not unexpected provide grounds to be neglected;
the remaining situations provide the “relevant” context for the situation in focus.
shared facts, defaults, and also
improbable but descriptively complex situations

● Following ST, we do not have direct access to the causal complexity, as we need always
to pass through a descriptive step to identify what to compute.
Derived likelihood

● Following ST, we do not have direct access to the causal complexity, as we need always
to pass through a descriptive step to identify what to compute.
● So, how can we estimate likelihood? Counting back the description complexity!
Derived likelihood

● Consider the estimation of the likelihood that the wall changes colour if I close the door:
Derived likelihood: examples

because these
elements are just
in front of me
because this never
occurred

because these
elements are just
in front of me
because this never
occurred
it is implausible
(if it occurred)
it is improbable
(to occur)

● The likelihood that a stone in the world moves if I close the door:
because these
elements are just
in front of me
because this never
occurred
it is implausible
(if it occurred)
it is improbable
(to occur)

because these
elements are just
in front of me
because this never
occurred
it is implausible
(if it occurred)
it is improbable
(to occur)
because I need to
specify of which
stone I am talking
because this never
occurred

because these
elements are just
in front of me
because this never
occurred
it is implausible
(if it occurred)
it is improbable
(to occur)
because I need to
specify of which
stone I am talking
because this never
occurred
it is plausible
(if it occurred)
it is improbable
(to occur)

because these
elements are just
in front of me
because this never
occurred
it is implausible
(if it occurred)
it is improbable
(to occur)
because I need to
specify of which
stone I am talking
because this never
occurred
it is plausible
(if it occurred)
it is improbable
(to occur)
NOTE: If the stone e.g. is in the room or
was already described, we return to the first case!

● Suppose that, following forensic studies, the probability that a certain DNA evidence
appears if the defendant is guilty is deemed very high.
Prosecutor’s fallacy

● The prosecutor’s fallacy occurs when the probability that the defendant is guilty (given
that there is DNA evidence) is also concluded to be comparatively high.

● The prosecutor’s fallacy occurs when the probability that the defendant is guilty (given
that there is DNA evidence) is also concluded to be comparatively high.
this is a fallacy as it
neglects the base rates

● Let us reframe the problem in terms of complexity, introducing the definition of causally
constrained unexpectedness, computed before the selection of the best cause in
unexpectedness:
Prosecutor’s fallacy: an explanation
maps to
posterior

unexpectedness:
● Applying the chain rule:
maps to
likelihood
maps to
posterior

unexpectedness:
because

unexpectedness:
If the procurator finds plausible that the suspect is guilty:

unexpectedness:
Considering the limited number of suspects
and proximity to the victim:

unexpectedness:

● The proposed conjecture provides further arguments in support to non-probabilistic
computational models of cognition.
● A complexity-based account allows distinguishing between relevant and irrelevant
contextual elements, while the probabilistic account treats them equally.
● Remaining open questions is how the underlying machines should be defined.
● Yet, the abstraction level of algorithmic information theory is already relevant to draw
insights on cognitive processes, as we have shown here eg. with the analysis of the
prosecutor’s fallacy.
Conclusions

Unexpectedness and Bayes' Rule

More Related Content

What's hot (15)

Similar to Unexpectedness and Bayes' Rule (20)

More from Giovanni Sileno (20)

Recently uploaded (20)

Unexpectedness and Bayes' Rule