SlideShare a Scribd company logo
Unexpectedness
and Bayes’ rule
6 December 2021, CIFMA workshop
Giovanni Sileno Jean-Louis Dessalles
g.sileno@uva.nl jean-louis.dessalles@telecom-paris.fr
University of Amsterdam Télécom Paris -- Institut Polytechnique de Paris
● Human experience unfolds in patterns (tendencies, rules, laws, …) as much as
in lack of determinism, even without taking into account quantum mechanics.
We live in a “probabilistic” world /1
● Started by investigating gambling, probability theory has grown to be the most
important ingredient of formal accounts dealing with how rational agents
(artificial or natural) reason in conditions of uncertainty.
We live in a “probabilistic” world /2
● Started by investigating gambling, probability theory has grown to be the most
important ingredient of formal accounts dealing with how rational agents
(artificial or natural) reason in conditions of uncertainty.
● Fundamental basis of Shannon’s theory of information.
We live in a “probabilistic” world /2
● The probabilistic formula named after Thomas Bayes (Bayes’ rule) has a special role
in this success, as it is used for
○ Bayesian models (e.g. Bayesian networks),
○ Bayesian inference,
○ maximum a posteriori (MAP) estimation in statistics,
○ core component of machine learning methods (e.g. variational autoencoders)
○ …
Bayes’ rule
● Applications supporting or reproducing human decision-making, e.g.
○ medical diagnosis
○ evidential reasoning (eg. in criminal court settings)
○ …
● Cognitive models of
○ animal learning
○ visual perception
○ motor control
○ language processing
○ forms of social cognition
○ …
Uses of Bayes’ rule
● Applications supporting or reproducing human decision-making, e.g.
○ medical diagnosis
○ evidential reasoning (eg. in criminal court settings)
○ …
● Cognitive models of
○ animal learning
○ visual perception
○ motor control
○ language processing
○ forms of social cognition
○ …
Uses of Bayes’ rule
PRESCRIPTIVE accounts:
how agents should reason
DESCRIPTIVE accounts:
how agents do produce inferences
● clarity of the theoretical framework,
● proven practical value
PROs of probability theory
● clarity of the theoretical framework,
● proven practical value
PROs of probability theory
…and CONs
as a FORMAL system
probability theory relies on a series of axioms,
e.g. a measurable space of events
● clarity of the theoretical framework,
● proven practical value
PROs of probability theory
…and CONs
as a FORMAL system
probability theory relies on a series of axioms,
e.g. a measurable space of events
but our experience of the
world defies this closure
● clarity of the theoretical framework,
● proven practical value
PROs of probability theory
…and CONs
as a FORMAL system
probability theory relies on a series of axioms,
e.g. a measurable space of events
as a MODELLING framework
several cognitive patterns (often called biases or fallacies)
are not predicted by probability theory
but our experience of the
world defies this closure
● clarity of the theoretical framework,
● proven practical value
PROs of probability theory
…and CONs
as a FORMAL system
probability theory relies on a series of axioms,
e.g. a measurable space of events
as a MODELLING framework
several cognitive patterns (often called biases or fallacies)
are not predicted by probability theory
but our experience of the
world defies this closure
in particular, there is a mismatch in what humans perceive as
informative w.r.t. Shannon’s notion of information
● Simplicity Theory (ST) is a computational model of cognition whose
investigation started by observing the “informativity” mismatch.
Simplicity Theory
NOISE SOURCE:
maximally informative following
Shannon’s theory of information
● Simplicity Theory (ST) is a computational model of cognition whose
investigation started by observing the “informativity” mismatch.
● ST predicts diverse human phenomena related to relevance:
○ unexpectedness
○ narrative interest
○ coincidences
○ near-miss experiences
○ emotional interest
○ responsibility
● ST has been used for experiments in artificial creativity.
Simplicity Theory
● Formally, ST builds upon Algorithmic Information Theory (AIT).
Simplicity Theory: formal background
● Formally, ST builds upon Algorithmic Information Theory (AIT).
● In AIT, the complexity of a string is the minimal length of a program that, given
a certain optional input parameter, produces that string as an output
(Kolmogorov complexity)
Simplicity Theory: formal background
underlying
Turing machine
target string additional input in support
executable program
Simplicity Theory: formal background
underlying
Turing machine
target string additional input in support
executable program
≠
● Formally, ST builds upon Algorithmic Information Theory (AIT).
● In AIT, the complexity of a string is the minimal length of a program that, given
a certain optional input parameter, produces that string as an output
(Kolmogorov complexity)
how much information is needed for a
program constructing the object
how much time or space is
needed for running it
(algorithmic
or time-complexity)
● Formally, ST builds upon Algorithmic Information Theory (AIT).
● In AIT, the complexity of a string is the minimal length of a program that, given
a certain optional input parameter, produces that string as an output
(Kolmogorov complexity)
Simplicity Theory: formal background
underlying
Turing machine
target string additional input in support
executable program
● Kolmogorov complexity is generally
incomputable (due to the halting
problem), but it is computable on
bounded Turing machines.
● Formally, ST builds upon Algorithmic Information Theory (AIT).
● In AIT, the complexity of a string is the minimal length of a program that, given
a certain optional input parameter, produces that string as an output
(Kolmogorov complexity)
Simplicity Theory: formal background
underlying
Turing machine
target string additional input in support
executable program
● Kolmogorov complexity is generally
incomputable (due to the halting
problem), but it is computable on
bounded Turing machines.
We denote bounded
complexities with
● ST starts from the observation that humans are highly susceptible to
complexity drops, ie. for them
situations are relevant if they are simpler to describe than to explain
Unexpectedness
● ST starts from the observation that humans are highly susceptible to
complexity drops, ie. for them
situations are relevant if they are simpler to describe than to explain
● Formally, this is captured by the formula of unexpectedness, expressed as
divergence of complexity computed on two distinct machines
Unexpectedness
causal complexity
via world machine
description complexity
via description machine
situation
● ST starts from the observation that humans are highly susceptible to
complexity drops, ie. for them
situations are relevant if they are simpler to describe than to explain
● Formally, this is captured by the formula of unexpectedness, expressed as
divergence of complexity computed on two distinct machines
Unexpectedness
causal complexity
via world machine
description complexity
via description machine
situation
Unexpectedness: examples
● remarkable lottery draws: 11111 is more unexpected than 64178, even if the
lottery is fair
● coincidences: meeting by chance an old friend from yours abroad is more
unexpected than meeting there any random unknown person.
● deterministic yet unexpected events: e.g. a lunar eclipse
causal complexity
via world machine
description complexity
via description machine
situation
Aim of the paper
Bayes’ rule is a specific instantiation of a more general
template captured in ST by Unexpectedness
● Provide further arguments in support to non-probabilistic computational
models in cognition, in particular focusing on the following:
conjecture
Bayes’ rule
● From the definition of conditional probability:
we can obtain the formula of Bayes simply:
model
observation
often informally rewritten as:
Unexpectedness as posterior
● In previous works, it has been hypothesized that ST’s Unexpectedness offers
as non-extensional measure of posterior subjective probability:
Unexpectedness as posterior
● In previous works, it has been hypothesized that ST’s Unexpectedness offers
as non-extensional measure of posterior subjective probability:
● Starting from this hypothesis, we looked
for a mapping from Unexpectedness to
Bayes’ rules, and indeed we see that:
Unexpectedness as posterior
● In previous works, it has been hypothesized that ST’s Unexpectedness offers
as non-extensional measure of posterior subjective probability:
● Starting from this hypothesis, we looked
for a mapping from Unexpectedness to
Bayes’ rules, and indeed we see that:
problem: 1 parameter with unexpectednes,
2 with posterior
Unexpectedness as posterior
● In previous works, it has been hypothesized that ST’s Unexpectedness offers
as non-extensional measure of posterior subjective probability:
● Starting from this hypothesis, we looked
for a mapping from Unexpectedness to
Bayes’ rules, and indeed we see that:
problem: 1 parameter with unexpectednes,
2 with posterior
let’s investigate these two terms…
Causal complexity
● The causal complexity is the length in bits of the shortest path that, according to the
agent's world model, produces the situation.
Causal complexity
● The causal complexity is the length in bits of the shortest path that, according to the
agent's world model, produces the situation.
● The causal path is temporally unfolded. The chain rule has the form:
sequential composition causal link
(implicit: from the current situation)
Causal complexity
● The causal complexity is the length in bits of the shortest path that, according to the
agent's world model, produces the situation.
● The causal path is temporally unfolded. The chain rule has the form:
● Being a Kolmogorov complexity, the cause can be omitted if it lies on the shortest path
sequential composition causal link
(implicit: from the current situation)
Causal complexity
● The causal complexity is the length in bits of the shortest path that, according to the
agent's world model, produces the situation.
● The causal path is temporally unfolded. The chain rule has the form:
● Being a Kolmogorov complexity, the cause can be omitted if it lies on the shortest path
sequential composition causal link
(implicit: from the current situation)
the Unexpectedness formula abstracts the causally explanatory factor
Description complexity
● The description complexity is the length in bits of the shortest program that,
leveraging mental resources, determines the situation
Description complexity
● The description complexity is the length in bits of the shortest program that,
leveraging mental resources, determines the situation
○ e.g. determination could correspond to retrieve the situation from memory, so
informationally, we need to specify the address where to look at (an encoding)
Description complexity
● The description complexity is the length in bits of the shortest program that,
leveraging mental resources, determines the situation
○ e.g. determination could correspond to retrieve the situation from memory, so
informationally, we need to specify the address where to look at (an encoding)
● In the proposed mapping, corresponds to , the probability of observing
that situation.
a theoretical link can be then established through optimal encoding in
Shannon’s terms, where probability is assessed through frequency.
Description complexity
● The description complexity is the length in bits of the shortest program that,
leveraging mental resources, determines the situation
○ e.g. determination could correspond to retrieve the situation from memory, so
informationally, we need to specify the address where to look at (an encoding)
● In the proposed mapping, corresponds to , the probability of observing
that situation.
a theoretical link can be then established through optimal encoding in
Shannon’s terms, where probability is assessed through frequency.
● Complexity is however a more general measure, as it allows us to consider
compositional effects (eg. à la Gestalt) via adequate mental operations
Bayes’ rule vs Unexpectedness
● Bayes’ rule is a specific instantiation of ST’s Unexpectedness that:
○ makes a candidate “cause” explicit and does not select automatically the best one
○ takes a frequentist-like approach for encoding observables.
Why is this relevant?
● Unexpectedness is a more generally applicable measure.
● In the paper we show that it can be used to build:
○ an informational principle of framing
○ a model of derived likelihood
○ an explanation of the prosecutor’s fallacy
All prior is posterior of some other prior
● Let us consider an additional prior in Bayes’ formula, a sort of ‘environmental context’.
Following probability theory we have two equivalent formulations for the posterior:
● Let us consider an additional prior in Bayes’ formula, a sort of ‘environmental context’.
Following probability theory we have two equivalent formulations for the posterior:
● These formulations are not equivalent when expressed in complexity terms!
All prior is posterior of some other prior
● Let us consider an additional prior in Bayes’ formula, a sort of ‘environmental context’.
Following probability theory we have two equivalent formulations for the posterior:
● These formulations are not equivalent when expressed in complexity terms!
All prior is posterior of some other prior
abstracting c as before
● Let us compute the difference between the two formulations:
All prior is posterior of some other prior
● Let us compute the difference between the two formulations:
● Two distinct chain rules apply on the world and description machines:
All prior is posterior of some other prior
● Let us compute the difference between the two formulations:
● Two distinct chain rules apply on the world and description machines:
All prior is posterior of some other prior
describing e and s together may be
simpler than fully determining one
term before the other
(cf. informed search)
the temporal
constraint is dropped
● Let us compute the difference between the two formulations:
All prior is posterior of some other prior
applying the chain rules…
● Let us compute the difference between the two formulations:
a necessary condition for which the two formulations may be
equivalent is that the contextual prior is not unexpected.
All prior is posterior of some other prior
applying the chain rules…
● Let us compute the difference between the two formulations:
a necessary condition for which the two formulations may be
equivalent is that the contextual prior is not unexpected.
All prior is posterior of some other prior
applying the chain rules…
shared facts, defaults, and also
improbable but descriptively complex situations
● Let us compute the difference between the two formulations:
a necessary condition for which the two formulations may be
equivalent is that the contextual prior is not unexpected.
All prior is posterior of some other prior
applying the chain rules…
informational principle of framing
all contextual situations which are not unexpected provide grounds to be neglected;
the remaining situations provide the “relevant” context for the situation in focus.
shared facts, defaults, and also
improbable but descriptively complex situations
● Following ST, we do not have direct access to the causal complexity, as we need always
to pass through a descriptive step to identify what to compute.
Derived likelihood
● Following ST, we do not have direct access to the causal complexity, as we need always
to pass through a descriptive step to identify what to compute.
● So, how can we estimate likelihood? Counting back the description complexity!
Derived likelihood
● Consider the estimation of the likelihood that the wall changes colour if I close the door:
Derived likelihood: examples
● Consider the estimation of the likelihood that the wall changes colour if I close the door:
Derived likelihood: examples
because these
elements are just
in front of me
because this never
occurred
● Consider the estimation of the likelihood that the wall changes colour if I close the door:
Derived likelihood: examples
because these
elements are just
in front of me
because this never
occurred
it is implausible
(if it occurred)
it is improbable
(to occur)
● Consider the estimation of the likelihood that the wall changes colour if I close the door:
● The likelihood that a stone in the world moves if I close the door:
Derived likelihood: examples
because these
elements are just
in front of me
because this never
occurred
it is implausible
(if it occurred)
it is improbable
(to occur)
● Consider the estimation of the likelihood that the wall changes colour if I close the door:
● The likelihood that a stone in the world moves if I close the door:
Derived likelihood: examples
because these
elements are just
in front of me
because this never
occurred
it is implausible
(if it occurred)
it is improbable
(to occur)
because I need to
specify of which
stone I am talking
because this never
occurred
● Consider the estimation of the likelihood that the wall changes colour if I close the door:
● The likelihood that a stone in the world moves if I close the door:
Derived likelihood: examples
because these
elements are just
in front of me
because this never
occurred
it is implausible
(if it occurred)
it is improbable
(to occur)
because I need to
specify of which
stone I am talking
because this never
occurred
it is plausible
(if it occurred)
it is improbable
(to occur)
● Consider the estimation of the likelihood that the wall changes colour if I close the door:
● The likelihood that a stone in the world moves if I close the door:
Derived likelihood: examples
because these
elements are just
in front of me
because this never
occurred
it is implausible
(if it occurred)
it is improbable
(to occur)
because I need to
specify of which
stone I am talking
because this never
occurred
it is plausible
(if it occurred)
it is improbable
(to occur)
NOTE: If the stone e.g. is in the room or
was already described, we return to the first case!
● Suppose that, following forensic studies, the probability that a certain DNA evidence
appears if the defendant is guilty is deemed very high.
Prosecutor’s fallacy
● Suppose that, following forensic studies, the probability that a certain DNA evidence
appears if the defendant is guilty is deemed very high.
● The prosecutor’s fallacy occurs when the probability that the defendant is guilty (given
that there is DNA evidence) is also concluded to be comparatively high.
Prosecutor’s fallacy
● Suppose that, following forensic studies, the probability that a certain DNA evidence
appears if the defendant is guilty is deemed very high.
● The prosecutor’s fallacy occurs when the probability that the defendant is guilty (given
that there is DNA evidence) is also concluded to be comparatively high.
Prosecutor’s fallacy
this is a fallacy as it
neglects the base rates
● Let us reframe the problem in terms of complexity, introducing the definition of causally
constrained unexpectedness, computed before the selection of the best cause in
unexpectedness:
Prosecutor’s fallacy: an explanation
maps to
posterior
● Let us reframe the problem in terms of complexity, introducing the definition of causally
constrained unexpectedness, computed before the selection of the best cause in
unexpectedness:
● Applying the chain rule:
Prosecutor’s fallacy: an explanation
maps to
likelihood
maps to
posterior
● Let us reframe the problem in terms of complexity, introducing the definition of causally
constrained unexpectedness, computed before the selection of the best cause in
unexpectedness:
● Applying the chain rule:
Prosecutor’s fallacy: an explanation
because
● Let us reframe the problem in terms of complexity, introducing the definition of causally
constrained unexpectedness, computed before the selection of the best cause in
unexpectedness:
● Applying the chain rule:
Prosecutor’s fallacy: an explanation
If the procurator finds plausible that the suspect is guilty:
● Let us reframe the problem in terms of complexity, introducing the definition of causally
constrained unexpectedness, computed before the selection of the best cause in
unexpectedness:
● Applying the chain rule:
Considering the limited number of suspects
and proximity to the victim:
Prosecutor’s fallacy: an explanation
● Let us reframe the problem in terms of complexity, introducing the definition of causally
constrained unexpectedness, computed before the selection of the best cause in
unexpectedness:
● Applying the chain rule:
Considering the limited number of suspects
and proximity to the victim:
Prosecutor’s fallacy: an explanation
● Let us reframe the problem in terms of complexity, introducing the definition of causally
constrained unexpectedness, computed before the selection of the best cause in
unexpectedness:
● Applying the chain rule:
Prosecutor’s fallacy: an explanation
● The proposed conjecture provides further arguments in support to non-probabilistic
computational models of cognition.
● A complexity-based account allows distinguishing between relevant and irrelevant
contextual elements, while the probabilistic account treats them equally.
● Remaining open questions is how the underlying machines should be defined.
● Yet, the abstraction level of algorithmic information theory is already relevant to draw
insights on cognitive processes, as we have shown here eg. with the analysis of the
prosecutor’s fallacy.
Conclusions
Unexpectedness
and Bayes’ rule
6 December 2021, CIFMA workshop
Giovanni Sileno Jean-Louis Dessalles
g.sileno@uva.nl jean-louis.dessalles@telecom-paris.fr
University of Amsterdam Télécom Paris -- Institut Polytechnique de Paris

More Related Content

PDF
Operationalizing Declarative and Procedural Knowledge
PDF
4213ijaia04
PPTX
Knowledge representation in AI
PPTX
Knowledge representation
PPTX
Explainable AI
PDF
Artificial intelligence cs607 handouts lecture 11 - 45
PPT
Artificial Intelligence
PDF
Constructive Modal Logics, Once Again
Operationalizing Declarative and Procedural Knowledge
4213ijaia04
Knowledge representation in AI
Knowledge representation
Explainable AI
Artificial intelligence cs607 handouts lecture 11 - 45
Artificial Intelligence
Constructive Modal Logics, Once Again

What's hot (15)

PDF
artificial intelligence
PDF
Human in the loop: Bayesian Rules Enabling Explainable AI
PPTX
Categorization of Semantic Roles for Dictionary Definitions
PDF
Sementic nets
PPTX
Explainable AI in Industry (AAAI 2020 Tutorial)
PPTX
Reasoning in AI
PPTX
An Introduction to XAI! Towards Trusting Your ML Models!
PDF
Machine learning para tertulianos, by javier ramirez at teowaki
PDF
Model evaluation in the land of deep learning
PPTX
Introduction to AI - Second Lecture
PDF
Ai sem1 2012-13-w2-representation
PDF
Lesson 19
PDF
IMLA2011 Opening
PPTX
Aritificial intelligence
PDF
Bridging Representation of Laws, of Implementations and of Behaviours
artificial intelligence
Human in the loop: Bayesian Rules Enabling Explainable AI
Categorization of Semantic Roles for Dictionary Definitions
Sementic nets
Explainable AI in Industry (AAAI 2020 Tutorial)
Reasoning in AI
An Introduction to XAI! Towards Trusting Your ML Models!
Machine learning para tertulianos, by javier ramirez at teowaki
Model evaluation in the land of deep learning
Introduction to AI - Second Lecture
Ai sem1 2012-13-w2-representation
Lesson 19
IMLA2011 Opening
Aritificial intelligence
Bridging Representation of Laws, of Implementations and of Behaviours
Ad

Similar to Unexpectedness and Bayes' Rule (20)

PDF
Uncertain Knowledge in AI from Object Automation
PDF
Lecture9 - Bayesian-Decision-Theory
PDF
Non omniscience
PPT
Uncertainty
PDF
artificial intelligence 13-quantifying uncertainity.pdf
PPT
Artificial Intelligence Bayesian Reasoning
PPT
Earthquake dhnjggbnkkkknvcxsefghjk gyjhvcdyj
PDF
Foundations of Intelligence Agents
DOC
artficial intelligence
PPTX
Introduction to Artificial Intelligence
PPTX
Knowledge & Reasoning for Students study
PPTX
Knowledge & Reasoning.ppt for students study
PDF
PPTX
Unit1_AI&ML_leftover (2).pptx
PPTX
Lesson04-Uncertainty - Pt. 1 Probabilistic Methods.pptx
PDF
A Computational Model of Moral and Legal Responsibility via Simplicity Theory
PDF
Principles of Health Informatics: Artificial intelligence and machine learning
PDF
The Automated-Reasoning Revolution: from Theory to Practice and Back
PPTX
Unit 2(knowledge)
PPTX
Towards a mathematical understanding of intelligence
Uncertain Knowledge in AI from Object Automation
Lecture9 - Bayesian-Decision-Theory
Non omniscience
Uncertainty
artificial intelligence 13-quantifying uncertainity.pdf
Artificial Intelligence Bayesian Reasoning
Earthquake dhnjggbnkkkknvcxsefghjk gyjhvcdyj
Foundations of Intelligence Agents
artficial intelligence
Introduction to Artificial Intelligence
Knowledge & Reasoning for Students study
Knowledge & Reasoning.ppt for students study
Unit1_AI&ML_leftover (2).pptx
Lesson04-Uncertainty - Pt. 1 Probabilistic Methods.pptx
A Computational Model of Moral and Legal Responsibility via Simplicity Theory
Principles of Health Informatics: Artificial intelligence and machine learning
The Automated-Reasoning Revolution: from Theory to Practice and Back
Unit 2(knowledge)
Towards a mathematical understanding of intelligence
Ad

More from Giovanni Sileno (20)

PDF
Code-driven Law NO, Normware SI!
PDF
DPCL: a Language Template for Normative Specifications
PDF
On Mapping Values in AI Governance
PDF
Accounting Value Effects for Responsible Networking
PDF
Code Driven Law?
PDF
History of AI, Current Trends, Prospective Trajectories
PDF
The Role of Normware in Trustworthy and Explainable AI
PDF
Computing Contrast on Conceptual Spaces
PDF
On the problems of interface: explainability, conceptual spaces, relevance
PDF
Aligning Law and Action
PDF
Similarity and Contrast on Conceptual Spaces for Pertinent Description Genera...
PDF
A Petri net-based notation for normative modeling: evaluation on deontic para...
PDF
Reading Agendas Between the Lines, an exercise
PDF
Revisiting Constitutive Rules
PDF
Commitments, Expectations, Affordances and Susceptibilities: Towards Position...
PDF
A Constructivist Approach to Rule Bases
PDF
On the Interactional Meaning of Fundamental Legal Concepts
PPTX
Legal Knowledge Conveyed by Narratives: towards a representational model
PDF
Implementing Explanation-Based Argumentation using Answer Set Programming
PPTX
From Inter-Agent to Intra-Agent Representations
Code-driven Law NO, Normware SI!
DPCL: a Language Template for Normative Specifications
On Mapping Values in AI Governance
Accounting Value Effects for Responsible Networking
Code Driven Law?
History of AI, Current Trends, Prospective Trajectories
The Role of Normware in Trustworthy and Explainable AI
Computing Contrast on Conceptual Spaces
On the problems of interface: explainability, conceptual spaces, relevance
Aligning Law and Action
Similarity and Contrast on Conceptual Spaces for Pertinent Description Genera...
A Petri net-based notation for normative modeling: evaluation on deontic para...
Reading Agendas Between the Lines, an exercise
Revisiting Constitutive Rules
Commitments, Expectations, Affordances and Susceptibilities: Towards Position...
A Constructivist Approach to Rule Bases
On the Interactional Meaning of Fundamental Legal Concepts
Legal Knowledge Conveyed by Narratives: towards a representational model
Implementing Explanation-Based Argumentation using Answer Set Programming
From Inter-Agent to Intra-Agent Representations

Recently uploaded (20)

PDF
SEHH2274 Organic Chemistry Notes 1 Structure and Bonding.pdf
PDF
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
PDF
Phytochemical Investigation of Miliusa longipes.pdf
PDF
Biophysics 2.pdffffffffffffffffffffffffff
PDF
Warm, water-depleted rocky exoplanets with surfaceionic liquids: A proposed c...
PDF
Looking into the jet cone of the neutrino-associated very high-energy blazar ...
PPTX
cpcsea ppt.pptxssssssssssssssjjdjdndndddd
PPTX
2. Earth - The Living Planet Module 2ELS
PDF
Assessment of environmental effects of quarrying in Kitengela subcountyof Kaj...
PDF
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
PDF
Placing the Near-Earth Object Impact Probability in Context
PPTX
ECG_Course_Presentation د.محمد صقران ppt
PPTX
Protein & Amino Acid Structures Levels of protein structure (primary, seconda...
PPT
6.1 High Risk New Born. Padetric health ppt
PPTX
neck nodes and dissection types and lymph nodes levels
PDF
CHAPTER 3 Cell Structures and Their Functions Lecture Outline.pdf
PPTX
2. Earth - The Living Planet earth and life
PPT
protein biochemistry.ppt for university classes
PDF
Mastering Bioreactors and Media Sterilization: A Complete Guide to Sterile Fe...
PDF
Sciences of Europe No 170 (2025)
SEHH2274 Organic Chemistry Notes 1 Structure and Bonding.pdf
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
Phytochemical Investigation of Miliusa longipes.pdf
Biophysics 2.pdffffffffffffffffffffffffff
Warm, water-depleted rocky exoplanets with surfaceionic liquids: A proposed c...
Looking into the jet cone of the neutrino-associated very high-energy blazar ...
cpcsea ppt.pptxssssssssssssssjjdjdndndddd
2. Earth - The Living Planet Module 2ELS
Assessment of environmental effects of quarrying in Kitengela subcountyof Kaj...
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
Placing the Near-Earth Object Impact Probability in Context
ECG_Course_Presentation د.محمد صقران ppt
Protein & Amino Acid Structures Levels of protein structure (primary, seconda...
6.1 High Risk New Born. Padetric health ppt
neck nodes and dissection types and lymph nodes levels
CHAPTER 3 Cell Structures and Their Functions Lecture Outline.pdf
2. Earth - The Living Planet earth and life
protein biochemistry.ppt for university classes
Mastering Bioreactors and Media Sterilization: A Complete Guide to Sterile Fe...
Sciences of Europe No 170 (2025)

Unexpectedness and Bayes' Rule

  • 1. Unexpectedness and Bayes’ rule 6 December 2021, CIFMA workshop Giovanni Sileno Jean-Louis Dessalles g.sileno@uva.nl jean-louis.dessalles@telecom-paris.fr University of Amsterdam Télécom Paris -- Institut Polytechnique de Paris
  • 2. ● Human experience unfolds in patterns (tendencies, rules, laws, …) as much as in lack of determinism, even without taking into account quantum mechanics. We live in a “probabilistic” world /1
  • 3. ● Started by investigating gambling, probability theory has grown to be the most important ingredient of formal accounts dealing with how rational agents (artificial or natural) reason in conditions of uncertainty. We live in a “probabilistic” world /2
  • 4. ● Started by investigating gambling, probability theory has grown to be the most important ingredient of formal accounts dealing with how rational agents (artificial or natural) reason in conditions of uncertainty. ● Fundamental basis of Shannon’s theory of information. We live in a “probabilistic” world /2
  • 5. ● The probabilistic formula named after Thomas Bayes (Bayes’ rule) has a special role in this success, as it is used for ○ Bayesian models (e.g. Bayesian networks), ○ Bayesian inference, ○ maximum a posteriori (MAP) estimation in statistics, ○ core component of machine learning methods (e.g. variational autoencoders) ○ … Bayes’ rule
  • 6. ● Applications supporting or reproducing human decision-making, e.g. ○ medical diagnosis ○ evidential reasoning (eg. in criminal court settings) ○ … ● Cognitive models of ○ animal learning ○ visual perception ○ motor control ○ language processing ○ forms of social cognition ○ … Uses of Bayes’ rule
  • 7. ● Applications supporting or reproducing human decision-making, e.g. ○ medical diagnosis ○ evidential reasoning (eg. in criminal court settings) ○ … ● Cognitive models of ○ animal learning ○ visual perception ○ motor control ○ language processing ○ forms of social cognition ○ … Uses of Bayes’ rule PRESCRIPTIVE accounts: how agents should reason DESCRIPTIVE accounts: how agents do produce inferences
  • 8. ● clarity of the theoretical framework, ● proven practical value PROs of probability theory
  • 9. ● clarity of the theoretical framework, ● proven practical value PROs of probability theory …and CONs as a FORMAL system probability theory relies on a series of axioms, e.g. a measurable space of events
  • 10. ● clarity of the theoretical framework, ● proven practical value PROs of probability theory …and CONs as a FORMAL system probability theory relies on a series of axioms, e.g. a measurable space of events but our experience of the world defies this closure
  • 11. ● clarity of the theoretical framework, ● proven practical value PROs of probability theory …and CONs as a FORMAL system probability theory relies on a series of axioms, e.g. a measurable space of events as a MODELLING framework several cognitive patterns (often called biases or fallacies) are not predicted by probability theory but our experience of the world defies this closure
  • 12. ● clarity of the theoretical framework, ● proven practical value PROs of probability theory …and CONs as a FORMAL system probability theory relies on a series of axioms, e.g. a measurable space of events as a MODELLING framework several cognitive patterns (often called biases or fallacies) are not predicted by probability theory but our experience of the world defies this closure in particular, there is a mismatch in what humans perceive as informative w.r.t. Shannon’s notion of information
  • 13. ● Simplicity Theory (ST) is a computational model of cognition whose investigation started by observing the “informativity” mismatch. Simplicity Theory NOISE SOURCE: maximally informative following Shannon’s theory of information
  • 14. ● Simplicity Theory (ST) is a computational model of cognition whose investigation started by observing the “informativity” mismatch. ● ST predicts diverse human phenomena related to relevance: ○ unexpectedness ○ narrative interest ○ coincidences ○ near-miss experiences ○ emotional interest ○ responsibility ● ST has been used for experiments in artificial creativity. Simplicity Theory
  • 15. ● Formally, ST builds upon Algorithmic Information Theory (AIT). Simplicity Theory: formal background
  • 16. ● Formally, ST builds upon Algorithmic Information Theory (AIT). ● In AIT, the complexity of a string is the minimal length of a program that, given a certain optional input parameter, produces that string as an output (Kolmogorov complexity) Simplicity Theory: formal background underlying Turing machine target string additional input in support executable program
  • 17. Simplicity Theory: formal background underlying Turing machine target string additional input in support executable program ≠ ● Formally, ST builds upon Algorithmic Information Theory (AIT). ● In AIT, the complexity of a string is the minimal length of a program that, given a certain optional input parameter, produces that string as an output (Kolmogorov complexity) how much information is needed for a program constructing the object how much time or space is needed for running it (algorithmic or time-complexity)
  • 18. ● Formally, ST builds upon Algorithmic Information Theory (AIT). ● In AIT, the complexity of a string is the minimal length of a program that, given a certain optional input parameter, produces that string as an output (Kolmogorov complexity) Simplicity Theory: formal background underlying Turing machine target string additional input in support executable program ● Kolmogorov complexity is generally incomputable (due to the halting problem), but it is computable on bounded Turing machines.
  • 19. ● Formally, ST builds upon Algorithmic Information Theory (AIT). ● In AIT, the complexity of a string is the minimal length of a program that, given a certain optional input parameter, produces that string as an output (Kolmogorov complexity) Simplicity Theory: formal background underlying Turing machine target string additional input in support executable program ● Kolmogorov complexity is generally incomputable (due to the halting problem), but it is computable on bounded Turing machines. We denote bounded complexities with
  • 20. ● ST starts from the observation that humans are highly susceptible to complexity drops, ie. for them situations are relevant if they are simpler to describe than to explain Unexpectedness
  • 21. ● ST starts from the observation that humans are highly susceptible to complexity drops, ie. for them situations are relevant if they are simpler to describe than to explain ● Formally, this is captured by the formula of unexpectedness, expressed as divergence of complexity computed on two distinct machines Unexpectedness causal complexity via world machine description complexity via description machine situation
  • 22. ● ST starts from the observation that humans are highly susceptible to complexity drops, ie. for them situations are relevant if they are simpler to describe than to explain ● Formally, this is captured by the formula of unexpectedness, expressed as divergence of complexity computed on two distinct machines Unexpectedness causal complexity via world machine description complexity via description machine situation
  • 23. Unexpectedness: examples ● remarkable lottery draws: 11111 is more unexpected than 64178, even if the lottery is fair ● coincidences: meeting by chance an old friend from yours abroad is more unexpected than meeting there any random unknown person. ● deterministic yet unexpected events: e.g. a lunar eclipse causal complexity via world machine description complexity via description machine situation
  • 24. Aim of the paper Bayes’ rule is a specific instantiation of a more general template captured in ST by Unexpectedness ● Provide further arguments in support to non-probabilistic computational models in cognition, in particular focusing on the following: conjecture
  • 25. Bayes’ rule ● From the definition of conditional probability: we can obtain the formula of Bayes simply: model observation often informally rewritten as:
  • 26. Unexpectedness as posterior ● In previous works, it has been hypothesized that ST’s Unexpectedness offers as non-extensional measure of posterior subjective probability:
  • 27. Unexpectedness as posterior ● In previous works, it has been hypothesized that ST’s Unexpectedness offers as non-extensional measure of posterior subjective probability: ● Starting from this hypothesis, we looked for a mapping from Unexpectedness to Bayes’ rules, and indeed we see that:
  • 28. Unexpectedness as posterior ● In previous works, it has been hypothesized that ST’s Unexpectedness offers as non-extensional measure of posterior subjective probability: ● Starting from this hypothesis, we looked for a mapping from Unexpectedness to Bayes’ rules, and indeed we see that: problem: 1 parameter with unexpectednes, 2 with posterior
  • 29. Unexpectedness as posterior ● In previous works, it has been hypothesized that ST’s Unexpectedness offers as non-extensional measure of posterior subjective probability: ● Starting from this hypothesis, we looked for a mapping from Unexpectedness to Bayes’ rules, and indeed we see that: problem: 1 parameter with unexpectednes, 2 with posterior let’s investigate these two terms…
  • 30. Causal complexity ● The causal complexity is the length in bits of the shortest path that, according to the agent's world model, produces the situation.
  • 31. Causal complexity ● The causal complexity is the length in bits of the shortest path that, according to the agent's world model, produces the situation. ● The causal path is temporally unfolded. The chain rule has the form: sequential composition causal link (implicit: from the current situation)
  • 32. Causal complexity ● The causal complexity is the length in bits of the shortest path that, according to the agent's world model, produces the situation. ● The causal path is temporally unfolded. The chain rule has the form: ● Being a Kolmogorov complexity, the cause can be omitted if it lies on the shortest path sequential composition causal link (implicit: from the current situation)
  • 33. Causal complexity ● The causal complexity is the length in bits of the shortest path that, according to the agent's world model, produces the situation. ● The causal path is temporally unfolded. The chain rule has the form: ● Being a Kolmogorov complexity, the cause can be omitted if it lies on the shortest path sequential composition causal link (implicit: from the current situation) the Unexpectedness formula abstracts the causally explanatory factor
  • 34. Description complexity ● The description complexity is the length in bits of the shortest program that, leveraging mental resources, determines the situation
  • 35. Description complexity ● The description complexity is the length in bits of the shortest program that, leveraging mental resources, determines the situation ○ e.g. determination could correspond to retrieve the situation from memory, so informationally, we need to specify the address where to look at (an encoding)
  • 36. Description complexity ● The description complexity is the length in bits of the shortest program that, leveraging mental resources, determines the situation ○ e.g. determination could correspond to retrieve the situation from memory, so informationally, we need to specify the address where to look at (an encoding) ● In the proposed mapping, corresponds to , the probability of observing that situation. a theoretical link can be then established through optimal encoding in Shannon’s terms, where probability is assessed through frequency.
  • 37. Description complexity ● The description complexity is the length in bits of the shortest program that, leveraging mental resources, determines the situation ○ e.g. determination could correspond to retrieve the situation from memory, so informationally, we need to specify the address where to look at (an encoding) ● In the proposed mapping, corresponds to , the probability of observing that situation. a theoretical link can be then established through optimal encoding in Shannon’s terms, where probability is assessed through frequency. ● Complexity is however a more general measure, as it allows us to consider compositional effects (eg. à la Gestalt) via adequate mental operations
  • 38. Bayes’ rule vs Unexpectedness ● Bayes’ rule is a specific instantiation of ST’s Unexpectedness that: ○ makes a candidate “cause” explicit and does not select automatically the best one ○ takes a frequentist-like approach for encoding observables.
  • 39. Why is this relevant? ● Unexpectedness is a more generally applicable measure. ● In the paper we show that it can be used to build: ○ an informational principle of framing ○ a model of derived likelihood ○ an explanation of the prosecutor’s fallacy
  • 40. All prior is posterior of some other prior ● Let us consider an additional prior in Bayes’ formula, a sort of ‘environmental context’. Following probability theory we have two equivalent formulations for the posterior:
  • 41. ● Let us consider an additional prior in Bayes’ formula, a sort of ‘environmental context’. Following probability theory we have two equivalent formulations for the posterior: ● These formulations are not equivalent when expressed in complexity terms! All prior is posterior of some other prior
  • 42. ● Let us consider an additional prior in Bayes’ formula, a sort of ‘environmental context’. Following probability theory we have two equivalent formulations for the posterior: ● These formulations are not equivalent when expressed in complexity terms! All prior is posterior of some other prior abstracting c as before
  • 43. ● Let us compute the difference between the two formulations: All prior is posterior of some other prior
  • 44. ● Let us compute the difference between the two formulations: ● Two distinct chain rules apply on the world and description machines: All prior is posterior of some other prior
  • 45. ● Let us compute the difference between the two formulations: ● Two distinct chain rules apply on the world and description machines: All prior is posterior of some other prior describing e and s together may be simpler than fully determining one term before the other (cf. informed search) the temporal constraint is dropped
  • 46. ● Let us compute the difference between the two formulations: All prior is posterior of some other prior applying the chain rules…
  • 47. ● Let us compute the difference between the two formulations: a necessary condition for which the two formulations may be equivalent is that the contextual prior is not unexpected. All prior is posterior of some other prior applying the chain rules…
  • 48. ● Let us compute the difference between the two formulations: a necessary condition for which the two formulations may be equivalent is that the contextual prior is not unexpected. All prior is posterior of some other prior applying the chain rules… shared facts, defaults, and also improbable but descriptively complex situations
  • 49. ● Let us compute the difference between the two formulations: a necessary condition for which the two formulations may be equivalent is that the contextual prior is not unexpected. All prior is posterior of some other prior applying the chain rules… informational principle of framing all contextual situations which are not unexpected provide grounds to be neglected; the remaining situations provide the “relevant” context for the situation in focus. shared facts, defaults, and also improbable but descriptively complex situations
  • 50. ● Following ST, we do not have direct access to the causal complexity, as we need always to pass through a descriptive step to identify what to compute. Derived likelihood
  • 51. ● Following ST, we do not have direct access to the causal complexity, as we need always to pass through a descriptive step to identify what to compute. ● So, how can we estimate likelihood? Counting back the description complexity! Derived likelihood
  • 52. ● Consider the estimation of the likelihood that the wall changes colour if I close the door: Derived likelihood: examples
  • 53. ● Consider the estimation of the likelihood that the wall changes colour if I close the door: Derived likelihood: examples because these elements are just in front of me because this never occurred
  • 54. ● Consider the estimation of the likelihood that the wall changes colour if I close the door: Derived likelihood: examples because these elements are just in front of me because this never occurred it is implausible (if it occurred) it is improbable (to occur)
  • 55. ● Consider the estimation of the likelihood that the wall changes colour if I close the door: ● The likelihood that a stone in the world moves if I close the door: Derived likelihood: examples because these elements are just in front of me because this never occurred it is implausible (if it occurred) it is improbable (to occur)
  • 56. ● Consider the estimation of the likelihood that the wall changes colour if I close the door: ● The likelihood that a stone in the world moves if I close the door: Derived likelihood: examples because these elements are just in front of me because this never occurred it is implausible (if it occurred) it is improbable (to occur) because I need to specify of which stone I am talking because this never occurred
  • 57. ● Consider the estimation of the likelihood that the wall changes colour if I close the door: ● The likelihood that a stone in the world moves if I close the door: Derived likelihood: examples because these elements are just in front of me because this never occurred it is implausible (if it occurred) it is improbable (to occur) because I need to specify of which stone I am talking because this never occurred it is plausible (if it occurred) it is improbable (to occur)
  • 58. ● Consider the estimation of the likelihood that the wall changes colour if I close the door: ● The likelihood that a stone in the world moves if I close the door: Derived likelihood: examples because these elements are just in front of me because this never occurred it is implausible (if it occurred) it is improbable (to occur) because I need to specify of which stone I am talking because this never occurred it is plausible (if it occurred) it is improbable (to occur) NOTE: If the stone e.g. is in the room or was already described, we return to the first case!
  • 59. ● Suppose that, following forensic studies, the probability that a certain DNA evidence appears if the defendant is guilty is deemed very high. Prosecutor’s fallacy
  • 60. ● Suppose that, following forensic studies, the probability that a certain DNA evidence appears if the defendant is guilty is deemed very high. ● The prosecutor’s fallacy occurs when the probability that the defendant is guilty (given that there is DNA evidence) is also concluded to be comparatively high. Prosecutor’s fallacy
  • 61. ● Suppose that, following forensic studies, the probability that a certain DNA evidence appears if the defendant is guilty is deemed very high. ● The prosecutor’s fallacy occurs when the probability that the defendant is guilty (given that there is DNA evidence) is also concluded to be comparatively high. Prosecutor’s fallacy this is a fallacy as it neglects the base rates
  • 62. ● Let us reframe the problem in terms of complexity, introducing the definition of causally constrained unexpectedness, computed before the selection of the best cause in unexpectedness: Prosecutor’s fallacy: an explanation maps to posterior
  • 63. ● Let us reframe the problem in terms of complexity, introducing the definition of causally constrained unexpectedness, computed before the selection of the best cause in unexpectedness: ● Applying the chain rule: Prosecutor’s fallacy: an explanation maps to likelihood maps to posterior
  • 64. ● Let us reframe the problem in terms of complexity, introducing the definition of causally constrained unexpectedness, computed before the selection of the best cause in unexpectedness: ● Applying the chain rule: Prosecutor’s fallacy: an explanation because
  • 65. ● Let us reframe the problem in terms of complexity, introducing the definition of causally constrained unexpectedness, computed before the selection of the best cause in unexpectedness: ● Applying the chain rule: Prosecutor’s fallacy: an explanation If the procurator finds plausible that the suspect is guilty:
  • 66. ● Let us reframe the problem in terms of complexity, introducing the definition of causally constrained unexpectedness, computed before the selection of the best cause in unexpectedness: ● Applying the chain rule: Considering the limited number of suspects and proximity to the victim: Prosecutor’s fallacy: an explanation
  • 67. ● Let us reframe the problem in terms of complexity, introducing the definition of causally constrained unexpectedness, computed before the selection of the best cause in unexpectedness: ● Applying the chain rule: Considering the limited number of suspects and proximity to the victim: Prosecutor’s fallacy: an explanation
  • 68. ● Let us reframe the problem in terms of complexity, introducing the definition of causally constrained unexpectedness, computed before the selection of the best cause in unexpectedness: ● Applying the chain rule: Prosecutor’s fallacy: an explanation
  • 69. ● The proposed conjecture provides further arguments in support to non-probabilistic computational models of cognition. ● A complexity-based account allows distinguishing between relevant and irrelevant contextual elements, while the probabilistic account treats them equally. ● Remaining open questions is how the underlying machines should be defined. ● Yet, the abstraction level of algorithmic information theory is already relevant to draw insights on cognitive processes, as we have shown here eg. with the analysis of the prosecutor’s fallacy. Conclusions
  • 70. Unexpectedness and Bayes’ rule 6 December 2021, CIFMA workshop Giovanni Sileno Jean-Louis Dessalles g.sileno@uva.nl jean-louis.dessalles@telecom-paris.fr University of Amsterdam Télécom Paris -- Institut Polytechnique de Paris