SlideShare a Scribd company logo
markovian sequential decision-making in
non-stationary environments
application to argumentative debates
Emmanuel Hadoux
director: Nicolas Maudet
supervisors: Aurélie Beynier and Paul Weng
November, 26th
2015
LIP6 / UPMC - ED 130
sequential decision-making problem?
Example
• What do I want to eat?
• Which color should I wear?
• Which way to go to work?
2
sequential decision-making problem?
Example
• What do I want to eat? (one shot)
• Which color should I wear? (one shot)
• Which way to go to work? (sequential)
2
sequential decision-making problem?
Example
• What do I want to eat? (one shot)
• Which color should I wear? (one shot)
• Which way to go to work? (sequential)
2
sequential decision-making problem under uncertainty?
A more precise definition
An agent (real or virtual) makes decisions in an environment.
3
sequential decision-making problem under uncertainty?
A more precise definition
An agent (real or virtual) makes decisions in an environment.
The state evolves with the actions performed by the agent.
3
sequential decision-making problem under uncertainty?
A more precise definition
An agent (real or virtual) makes decisions in an environment.
The state evolves with the actions performed by the agent.
The transitions from one state to another can be:
3
sequential decision-making problem under uncertainty?
A more precise definition
An agent (real or virtual) makes decisions in an environment.
The state evolves with the actions performed by the agent.
The transitions from one state to another can be:
1. deterministic (known in advance) → closing a door
3
sequential decision-making problem under uncertainty?
A more precise definition
An agent (real or virtual) makes decisions in an environment.
The state evolves with the actions performed by the agent.
The transitions from one state to another can be:
1. deterministic (known in advance) → closing a door
2. stochastic (with probabilities) → the door may be locked
3
sequential decision-making problem under uncertainty?
A more precise definition
An agent (real or virtual) makes decisions in an environment.
The state evolves with the actions performed by the agent.
The transitions from one state to another can be:
1. deterministic (known in advance) → closing a door
2. stochastic (with probabilities) → the door may be locked
3. etc.
3
sequential decision-making problem under uncertainty?
A more precise definition
An agent (real or virtual) makes decisions in an environment.
The state evolves with the actions performed by the agent.
The transitions from one state to another can be:
1. deterministic (known in advance) → closing a door
2. stochastic (with probabilities) → the door may be locked
3. etc.
Do the probabilities evolve with the time?
no the environment is stationary
yes the environment is non-stationary
3
the whole context
We are interested in:
1. Solving sequential decision-making problem
4
the whole context
We are interested in:
1. Solving sequential decision-making problem
2. under uncertainty (with stochastic dynamics)
4
the whole context
We are interested in:
1. Solving sequential decision-making problem
2. under uncertainty (with stochastic dynamics)
3. in non-stationary environments
4
the whole context
We are interested in:
1. Solving sequential decision-making problem
2. under uncertainty (with stochastic dynamics)
3. in non-stationary environments
Many problems fall into this category (MAS, exogenous events,
etc.).
4
the whole context
We are interested in:
1. Solving sequential decision-making problem
2. under uncertainty (with stochastic dynamics)
3. in non-stationary environments
Many problems fall into this category (MAS, exogenous events,
etc.).
The non-stationarity makes the problem very hard to solve.
4
table of contents
Decision-making in non-stationary environments
1. Markov Decision Models
2. Non-stationary environments
Application to argumentation problems
1. Strategic debate
2. Mediation problems
5
markov decision models
markov decision process
Markov Decision Process (MDP)1 ⟨S, A, T, R⟩ such that:
1
Martin L. Puterman. Markov Decision Processes: Discrete dynamic
stochastic programming. John Wiley Chichester, 1994.
7
markov decision process
Markov Decision Process (MDP)1 ⟨S, A, T, R⟩ such that:
S a finite set of observable states,
1
Martin L. Puterman. Markov Decision Processes: Discrete dynamic
stochastic programming. John Wiley Chichester, 1994.
7
markov decision process
Markov Decision Process (MDP)1 ⟨S, A, T, R⟩ such that:
S a finite set of observable states,
A a finite set of actions,
1
Martin L. Puterman. Markov Decision Processes: Discrete dynamic
stochastic programming. John Wiley Chichester, 1994.
7
markov decision process
Markov Decision Process (MDP)1 ⟨S, A, T, R⟩ such that:
S a finite set of observable states,
A a finite set of actions,
T : S × A → Pr(S) a transition function over the
states,
1
Martin L. Puterman. Markov Decision Processes: Discrete dynamic
stochastic programming. John Wiley Chichester, 1994.
7
markov decision process
Markov Decision Process (MDP)1 ⟨S, A, T, R⟩ such that:
S a finite set of observable states,
A a finite set of actions,
T : S × A → Pr(S) a transition function over the
states,
R : S × A → R a reward function.
1
Martin L. Puterman. Markov Decision Processes: Discrete dynamic
stochastic programming. John Wiley Chichester, 1994.
7
markov decision process
Markov Decision Process (MDP)1 ⟨ S, A, T, R⟩ such that:
closed open
(open, 0.8)
(close, 1)
(open, 0.2)
(close, 1)
(open, 1)
1
Martin L. Puterman. Markov Decision Processes: Discrete dynamic
stochastic programming. John Wiley Chichester, 1994.
7
partially observable markov decision process
Partially Observable Markov Decision Process (POMDP)2
⟨S, A, T, R, O, Q⟩ such that:
2
Martin L. Puterman. Markov Decision Processes: Discrete dynamic
stochastic programming. John Wiley Chichester, 1994.
8
partially observable markov decision process
Partially Observable Markov Decision Process (POMDP)2
⟨S, A, T, R, O, Q⟩ such that:
S, A, T, R as in MDPs, with S non-observable,
2
Martin L. Puterman. Markov Decision Processes: Discrete dynamic
stochastic programming. John Wiley Chichester, 1994.
8
partially observable markov decision process
Partially Observable Markov Decision Process (POMDP)2
⟨S, A, T, R, O, Q⟩ such that:
S, A, T, R as in MDPs, with S non-observable,
O a finite set of observations,
2
Martin L. Puterman. Markov Decision Processes: Discrete dynamic
stochastic programming. John Wiley Chichester, 1994.
8
partially observable markov decision process
Partially Observable Markov Decision Process (POMDP)2
⟨S, A, T, R, O, Q⟩ such that:
S, A, T, R as in MDPs, with S non-observable,
O a finite set of observations,
Q : S → Pr(O) an observation function.
2
Martin L. Puterman. Markov Decision Processes: Discrete dynamic
stochastic programming. John Wiley Chichester, 1994.
8
partially observable markov decision process
Partially Observable Markov Decision Process (POMDP)2
⟨S, A, T, R, O, Q⟩ such that:
S, A, T, R as in MDPs, with S non-observable,
O a finite set of observations,
Q : S → Pr(O) an observation function.
As the state is not observable → belief state, a distribution of
probabilities on all possible current states.
2
Martin L. Puterman. Markov Decision Processes: Discrete dynamic
stochastic programming. John Wiley Chichester, 1994.
8
partially observable markov decision process
Partially Observable Markov Decision Process (POMDP)2
⟨S, A, T, R, O, Q ⟩ such that:
closed open locked
(cl, 1) (op, 1)
(cl, 1)
(open, 0.8)
(close, 1)
(open, 0.2)
(close, 1)
(open, 1) (open, 1)
(unlock, 1)
2
Martin L. Puterman. Markov Decision Processes: Discrete dynamic
stochastic programming. John Wiley Chichester, 1994.
8
mixed observability markov decision process
Mixed Observability Markov Decision Process (MOMDP)3
⟨Sv, Sh, A, T, R, Ov, Oh, Q⟩ such that:
3
S.C.W. Ong et al. “Planning under uncertainty for robotic tasks with mixed
observability”. In: The International Journal of Robotics Research. 2010.
9
mixed observability markov decision process
Mixed Observability Markov Decision Process (MOMDP)3
⟨Sv, Sh, A, T, R, Ov, Oh, Q⟩ such that:
Sv, Sh the visible and hidden parts of the state,
3
S.C.W. Ong et al. “Planning under uncertainty for robotic tasks with mixed
observability”. In: The International Journal of Robotics Research. 2010.
9
mixed observability markov decision process
Mixed Observability Markov Decision Process (MOMDP)3
⟨Sv, Sh, A, T, R, Ov, Oh, Q⟩ such that:
Sv, Sh the visible and hidden parts of the state,
Ov, Oh the observations on the visible part and the
hidden part of the state,
3
S.C.W. Ong et al. “Planning under uncertainty for robotic tasks with mixed
observability”. In: The International Journal of Robotics Research. 2010.
9
mixed observability markov decision process
Mixed Observability Markov Decision Process (MOMDP)3
⟨Sv, Sh, A, T, R, Ov, Oh, Q⟩ such that:
Sv, Sh the visible and hidden parts of the state,
Ov, Oh the observations on the visible part and the
hidden part of the state,
A, T, R, Q as before.
3
S.C.W. Ong et al. “Planning under uncertainty for robotic tasks with mixed
observability”. In: The International Journal of Robotics Research. 2010.
9
mixed observability markov decision process
Mixed Observability Markov Decision Process (MOMDP)3
⟨Sv, Sh, A, T, R, Ov, Oh, Q⟩ such that:
Sv, Sh the visible and hidden parts of the state,
Ov, Oh the observations on the visible part and the
hidden part of the state,
A, T, R, Q as before.
Note that ⟨Sv × Sh = S, A, T, R, Ov × Oh = O, Q⟩ is a POMDP.
3
S.C.W. Ong et al. “Planning under uncertainty for robotic tasks with mixed
observability”. In: The International Journal of Robotics Research. 2010.
9
mixed observability markov decision process
Let us consider, building on the previous example, that there is
the possible presence of a key on the door lock:
10
mixed observability markov decision process
Let us consider, building on the previous example, that there is
the possible presence of a key on the door lock:
Sv {key, no key},
Sh {open, closed, locked},
Ov {k, n-k},
Oh {op, cl}.
10
markov decision models
All those models have a common limitation: mandatory
stationarity.
11
markov decision models
All those models have a common limitation: mandatory
stationarity.
It is a limitation in many cases but we cannot take into
account all types of non-stationarity.
11
markov decision models
All those models have a common limitation: mandatory
stationarity.
It is a limitation in many cases but we cannot take into
account all types of non-stationarity.
One assumption
The non-stationarity is limited to a set of stationary modes,
or contexts.
11
non-stationary environments
hidden-mode markov decision process
To address this subclass of problems, we can use
Hidden-Mode Markov Decision Processes (HM-MDPs)4.
4
S.P.-M. Choi, N.L. Zhang, and D.-Y. Yeung. “Solving Hidden-Mode Markov
Decision Problems”. In: Proceedings of the 8th International Workshop on
Artificial Intelligence and Statistics (AISTATS). 2001, pp. 19–26.
13
hidden-mode markov decision process
To address this subclass of problems, we can use
Hidden-Mode Markov Decision Processes (HM-MDPs)4.
⟨M, C⟩ such that:
4
S.P.-M. Choi, N.L. Zhang, and D.-Y. Yeung. “Solving Hidden-Mode Markov
Decision Problems”. In: Proceedings of the 8th International Workshop on
Artificial Intelligence and Statistics (AISTATS). 2001, pp. 19–26.
13
hidden-mode markov decision process
To address this subclass of problems, we can use
Hidden-Mode Markov Decision Processes (HM-MDPs)4.
⟨M, C⟩ such that:
M a set of modes → mi = ⟨S, A, Ti, Ri⟩, ∀mi ∈ M,
4
S.P.-M. Choi, N.L. Zhang, and D.-Y. Yeung. “Solving Hidden-Mode Markov
Decision Problems”. In: Proceedings of the 8th International Workshop on
Artificial Intelligence and Statistics (AISTATS). 2001, pp. 19–26.
13
hidden-mode markov decision process
To address this subclass of problems, we can use
Hidden-Mode Markov Decision Processes (HM-MDPs)4.
⟨M, C⟩ such that:
M a set of modes → mi = ⟨S, A, Ti, Ri⟩, ∀mi ∈ M,
C : M → Pr(M) a transition function over modes.
4
S.P.-M. Choi, N.L. Zhang, and D.-Y. Yeung. “Solving Hidden-Mode Markov
Decision Problems”. In: Proceedings of the 8th International Workshop on
Artificial Intelligence and Statistics (AISTATS). 2001, pp. 19–26.
13
hidden-mode markov decision process
To address this subclass of problems, we can use
Hidden-Mode Markov Decision Processes (HM-MDPs)4.
⟨M, C⟩ such that:
M a set of modes → mi = ⟨S, A, Ti, Ri⟩, ∀mi ∈ M,
C : M → Pr(M) a transition function over modes.
S and A are common to each modes mi ∈ M.
4
S.P.-M. Choi, N.L. Zhang, and D.-Y. Yeung. “Solving Hidden-Mode Markov
Decision Problems”. In: Proceedings of the 8th International Workshop on
Artificial Intelligence and Statistics (AISTATS). 2001, pp. 19–26.
13
hidden-mode markov decision process
To address this subclass of problems, we can use
Hidden-Mode Markov Decision Processes (HM-MDPs)4.
⟨M, C⟩ such that:
M a set of modes → mi = ⟨S, A, Ti, Ri⟩, ∀mi ∈ M,
C : M → Pr(M) a transition function over modes.
S and A are common to each modes mi ∈ M.
S is observable, M is not.
4
S.P.-M. Choi, N.L. Zhang, and D.-Y. Yeung. “Solving Hidden-Mode Markov
Decision Problems”. In: Proceedings of the 8th International Workshop on
Artificial Intelligence and Statistics (AISTATS). 2001, pp. 19–26.
13
an example as an hm-mdp
2 modes
8 states
2 actions
Figure 1: Traffic light problem (drawing by T. Huraux)
14
an example as an hm-mdp
s’
s
Tm1(s, a, s′
)
m1
S {light side} × {car left?} ×
{car right?}
A {left light, right light}
T car arrivals and departures
depending on the light
R cost if cars are waiting on any
side
15
an example as an hm-mdp
s’
s
Tm1(s, a, s′
)
m1
s′
s
Tm2(s, a, s′
)
m2
C(m1, m2) C(m2, m1)
C(m1, m1)
C(m2, m2)
S {light side} × {car left?} ×
{car right?}
A {left light, right light}
T car arrivals and departures
depending on the light
R cost if cars are waiting on any
side
M majority flow of cars on the left
or the right
C a transition function over
modes
15
another limitation
Each time a decision is made, the environment may switch
modes.
16
another limitation
Each time a decision is made, the environment may switch
modes.
In the previous example: each time the system chooses which
light to turn on, the busy side may change.
16
another limitation
17
hidden-semi-markov mode markov decision process
An Hidden-Semi-Markov Mode Markov Decision Process
(HS3MDP)5 is characterized by a triplet ⟨M, C, H⟩ such that:
5
E. Hadoux, A. Beynier, and P. Weng. “Solving Hidden-Semi-Markov-Mode
Markov Decision Problems”. In: Scalable Uncertainty Management. Springer,
2014, pp. 176–189. 18
hidden-semi-markov mode markov decision process
An Hidden-Semi-Markov Mode Markov Decision Process
(HS3MDP)5 is characterized by a triplet ⟨M, C, H⟩ such that:
M, C as in HMMDP,
5
E. Hadoux, A. Beynier, and P. Weng. “Solving Hidden-Semi-Markov-Mode
Markov Decision Problems”. In: Scalable Uncertainty Management. Springer,
2014, pp. 176–189. 18
hidden-semi-markov mode markov decision process
An Hidden-Semi-Markov Mode Markov Decision Process
(HS3MDP)5 is characterized by a triplet ⟨M, C, H⟩ such that:
M, C as in HMMDP,
H : M × M → Pr(N) a duration function.
5
E. Hadoux, A. Beynier, and P. Weng. “Solving Hidden-Semi-Markov-Mode
Markov Decision Problems”. In: Scalable Uncertainty Management. Springer,
2014, pp. 176–189. 18
hidden-semi-markov mode markov decision process
An Hidden-Semi-Markov Mode Markov Decision Process
(HS3MDP)5 is characterized by a triplet ⟨M, C, H⟩ such that:
M, C as in HMMDP,
H : M × M → Pr(N) a duration function.
5
E. Hadoux, A. Beynier, and P. Weng. “Solving Hidden-Semi-Markov-Mode
Markov Decision Problems”. In: Scalable Uncertainty Management. Springer,
2014, pp. 176–189. 18
hidden-semi-markov mode markov decision process
An Hidden-Semi-Markov Mode Markov Decision Process
(HS3MDP)5 is characterized by a triplet ⟨M, C, H⟩ such that:
M, C as in HMMDP,
H : M × M → Pr(N) a duration function.
New duration h after a decision step in mode m



if h > 0 m′ = m, h′ = h − 1
if h = 0 m′ ∼ C(m, ·),
h′ = k − 1 where k ∼ H(m, m′, ·)
5
E. Hadoux, A. Beynier, and P. Weng. “Solving Hidden-Semi-Markov-Mode
Markov Decision Problems”. In: Scalable Uncertainty Management. Springer,
2014, pp. 176–189. 18
some precisions on hs3mdp
Equivalence
An HS3MDP is equivalent to a (potentially infinite) HM-MDP.
19
some precisions on hs3mdp
Equivalence
An HS3MDP is equivalent to a (potentially infinite) HM-MDP.
Conversion
• An HS3MDP is a subclass of MOMDP (S → Sv, M, H → Sh),
• An HS3MDP can be rewritten as a POMDP (as MOMDP is a
subclass of POMDP).
19
some precisions on hs3mdp
Equivalence
An HS3MDP is equivalent to a (potentially infinite) HM-MDP.
Conversion
• An HS3MDP is a subclass of MOMDP (S → Sv, M, H → Sh),
• An HS3MDP can be rewritten as a POMDP (as MOMDP is a
subclass of POMDP).
Solving
Therefore, MO/POMDP algorithms can be used with HS3MDPs.
But finding an optimal policy is PSPACE-complet → scalability
problem ⇒ approximate solution
19
partially observable monte-carlo planning
Partially Observable Monte-Carlo Planning algorithm
(POMCP)6: one of the most efficient algorithms for large-sized
POMDPs.
POMCP characteristics
6
David Silver and Joel Veness. “Monte-Carlo planning in large POMDPs”. In:
Proceedings of the 24th Conference on Neural Information Processing
Systems (NIPS). 2010, pp. 2164–2172.
20
partially observable monte-carlo planning
Partially Observable Monte-Carlo Planning algorithm
(POMCP)6: one of the most efficient algorithms for large-sized
POMDPs.
POMCP characteristics
• Uses a set of particles to approximate the belief state,
6
David Silver and Joel Veness. “Monte-Carlo planning in large POMDPs”. In:
Proceedings of the 24th Conference on Neural Information Processing
Systems (NIPS). 2010, pp. 2164–2172.
20
partially observable monte-carlo planning
Partially Observable Monte-Carlo Planning algorithm
(POMCP)6: one of the most efficient algorithms for large-sized
POMDPs.
POMCP characteristics
• Uses a set of particles to approximate the belief state,
• One particle = one state → ∞ particles = belief state,
6
David Silver and Joel Veness. “Monte-Carlo planning in large POMDPs”. In:
Proceedings of the 24th Conference on Neural Information Processing
Systems (NIPS). 2010, pp. 2164–2172.
20
partially observable monte-carlo planning
Partially Observable Monte-Carlo Planning algorithm
(POMCP)6: one of the most efficient algorithms for large-sized
POMDPs.
POMCP characteristics
• Uses a set of particles to approximate the belief state,
• One particle = one state → ∞ particles = belief state,
• Requires a simulator of the problem → relaxation of the
known-model constraint.
6
David Silver and Joel Veness. “Monte-Carlo planning in large POMDPs”. In:
Proceedings of the 24th Conference on Neural Information Processing
Systems (NIPS). 2010, pp. 2164–2172.
20
one step of pomcp
Simulation phase:
1. Start with a root history τ
τ ⟨N1, V1, B1⟩
21
one step of pomcp
Simulation phase:
1. Start with a root history τ
2. Build action-nodes
τ ⟨N1, V1, B1⟩
· · ·a1 a|A|
21
one step of pomcp
Simulation phase:
1. Start with a root history τ
2. Build action-nodes
3. Select next action to
simulate
τ ⟨N1, V1, B1⟩
· · ·a1 a|A|
21
one step of pomcp
Simulation phase:
1. Start with a root history τ
2. Build action-nodes
3. Select next action to
simulate
4. Build observation-node
τ ⟨N1, V1, B1⟩
· · ·a1 a|A|
oi ⟨N2, V2, B2⟩
21
one step of pomcp
Simulation phase:
1. Start with a root history τ
2. Build action-nodes
3. Select next action to
simulate
4. Build observation-node
5. Goto 2. until reaching end
τ ⟨N1, V1, B1⟩
· · ·a1 a|A|
oi ⟨N2, V2, B2⟩
· · ·a1 a|A|
21
one step of pomcp
Simulation phase:
1. Start with a root history τ
2. Build action-nodes
3. Select next action to
simulate
4. Build observation-node
5. Goto 2. until reaching end
τ ⟨N1, V1, B1⟩
· · ·a1 a|A|
oi ⟨N2, V2, B2⟩
· · ·a1 a|A|
...
21
one step of pomcp
Simulation phase:
1. Start with a root history τ
2. Build action-nodes
3. Select next action to
simulate
4. Build observation-node
5. Goto 2. until reaching end
6. Backtrack the result
τ ⟨N′
1, V′
1, B′
1⟩
· · ·a1 a|A|
oi ⟨N′
2, V′
2, B′
2⟩
· · ·a1 a|A|
...
21
one step of pomcp
Simulation phase:
1. Start with a root history τ
2. Build action-nodes
3. Select next action to
simulate
4. Build observation-node
5. Goto 2. until reaching end
6. Backtrack the result
7. Goto root 3. until no more
simulations
τ ⟨N′
1, V′
1, B′
1⟩
· · ·a1 a|A|
oi ⟨N′
2, V′
2, B′
2⟩
· · ·a1 a|A|
...
...
...
21
one step of pomcp
Exploitation phase:
τ ⟨N′
1, V′
1, B′
1⟩
· · ·a1 a|A|
oi ⟨N′
2, V′
2, B′
2⟩
· · ·a1 a|A|
...
...
...
21
one step of pomcp
Exploitation phase:
1. Start with the root τ
τ ⟨N′
1, V′
1, B′
1⟩
· · ·a1 a|A|
oi ⟨N′
2, V′
2, B′
2⟩
· · ·a1 a|A|
...
...
...
21
one step of pomcp
Exploitation phase:
1. Start with the root τ
2. Perform action given by
UCT
τ ⟨N′
1, V′
1, B′
1⟩
· · ·a1 a|A|
oi ⟨N′
2, V′
2, B′
2⟩
· · ·a1 a|A|
...
...
...
21
one step of pomcp
Exploitation phase:
1. Start with the root τ
2. Perform action given by
UCT
3. Go to matching
observation
τ ⟨N′
1, V′
1, B′
1⟩
· · ·a1 a|A|
oi ⟨N′
2, V′
2, B′
2⟩oi
· · ·a1 a|A|
...
...
...
21
one step of pomcp
Exploitation phase:
1. Start with the root τ
2. Perform action given by
UCT
3. Go to matching
observation
4. Set new root τ′ and prune
τ′
· · ·a1 a|A|
...
21
one step of pomcp
Exploitation phase:
1. Start with the root τ
2. Perform action given by
UCT
3. Go to matching
observation
4. Set new root τ′ and prune
5. Go to simulation phase
τ′
· · ·a1 a|A|
...
21
more limitations
When the size of the model is too large → particle deprivation
We can add more particles → requires more computing time
22
using the structure of hs3mdp
Fortunately, HS3MDPs are structured POMDPs.
We defined two adaptations of POMCP for HS3MDPs:
23
using the structure of hs3mdp
Fortunately, HS3MDPs are structured POMDPs.
We defined two adaptations of POMCP for HS3MDPs:
1. Adaptation to the structure
23
using the structure of hs3mdp
Fortunately, HS3MDPs are structured POMDPs.
We defined two adaptations of POMCP for HS3MDPs:
1. Adaptation to the structure
2. Exact representation of the belief state
23
using the structure of hs3mdp
Fortunately, HS3MDPs are structured POMDPs.
We defined two adaptations of POMCP for HS3MDPs:
1. Adaptation to the structure
2. Exact representation of the belief state
Adaptation to the structure (SA)
In HS3MDP, a state = a visible part and a hidden part.
The former can be removed from the particle representation
as it is directly observed.
→ a particle = a possible hidden part
23
using the structure of hs3mdp
Fortunately, HS3MDPs are structured POMDPs.
We defined two adaptations of POMCP for HS3MDPs:
1. Adaptation to the structure
2. Exact representation of the belief state
Exact representation of the belief state (SAER)
Replace the sets of particles by the exact distribution µ:
µ′(m′, h′) = 1
K ( Tm′ (s, a, s′) × µ(m′, h′ + 1) +
∑
m∈M C(m, m′) × Tm(s, a, s′) × µ(m, 0) × H(m, m′, h′ + 1) )
Complexity: O(|M| × hmax) ≷ O(N) with N the number of sim-
ulations in the original POMCP
23
experiments
We tested our method on 4 problems, with 3 taken from the
literature.
We compared the performances of:
• the original POMCP algorithm,
• our adaptations SA and SAER,
• the optimal policy when it can be computed
24
results for the traffic light problem
Simulations Original SA SAER Optimal
1 -3,42 0.0% 0.0% 38.5%
2 -2,86 3.0% 4.0% 26.5%
4 -2,80 8.1% 8.8% 25.0%
8 -2,68 6.0% 9.4% 21.7%
16 -2,60 8.0% 8.0% 19.2%
32 -2,45 5.3% 6.9% 14.3%
· · ·
1024 -2,31 5.1% 7.0% 9.3%
25
randomly generated environments
We can control the number of states, actions and modes and
the transition functions.
Too big to be optimally solved.
26
results for the randomly generated environments
0 1 2 3 4 5 6 7 8 9 10
0.5
1
1.5
2
2.5
log2 #sim.
Rewards(meanson100instances)
Original
SA
SAER
27
conclusion on hs3mdps
We proposed in this work:
28
conclusion on hs3mdps
We proposed in this work:
• A new model able to represent in a more realistic way
non-stationary decision-making problems (HS3MDP),
28
conclusion on hs3mdps
We proposed in this work:
• A new model able to represent in a more realistic way
non-stationary decision-making problems (HS3MDP),
• Adaptations of POMCP to tackle large-size problems,
outperforming it.
28
learning the model
We also proposed a method to learn a subclass of those
problems called RLCD with SCD7.
7
Emmanuel Hadoux, Aurélie Beynier, and Paul Weng. “Sequential
Decision-Making under Non-stationary Environments via Sequential
Change-point Detection”. In: First International Workshop on Learning over
Multiple Contexts (LMCE) @ ECML. 2014.
29
learning the model
We also proposed a method to learn a subclass of those
problems called RLCD with SCD7.
This method is able to learn a part of the dynamics, without
requiring to know the number of modes a priori.
7
Emmanuel Hadoux, Aurélie Beynier, and Paul Weng. “Sequential
Decision-Making under Non-stationary Environments via Sequential
Change-point Detection”. In: First International Workshop on Learning over
Multiple Contexts (LMCE) @ ECML. 2014.
29
strategic argumentation problems
strategic argumentation problems
Few works address the problem of decision-making in
argumentation.
8
Phan Minh Dung. “On the Acceptability of Arguments and its Fundamental
Role in Nonmonotonic Reasoning, Logic Programming and n-Person Games”.
In: Artificial Intelligence 77.2 (1995), pp. 321–358.
31
strategic argumentation problems
Few works address the problem of decision-making in
argumentation.
In abstract argumentation, agents exchange arguments and
use attacks as relations between the arguments.
8
Phan Minh Dung. “On the Acceptability of Arguments and its Fundamental
Role in Nonmonotonic Reasoning, Logic Programming and n-Person Games”.
In: Artificial Intelligence 77.2 (1995), pp. 321–358.
31
strategic argumentation problems
Few works address the problem of decision-making in
argumentation.
In abstract argumentation, agents exchange arguments and
use attacks as relations between the arguments.
Formal abstract argumentation framework8 ⟨A, E⟩ such that:
8
Phan Minh Dung. “On the Acceptability of Arguments and its Fundamental
Role in Nonmonotonic Reasoning, Logic Programming and n-Person Games”.
In: Artificial Intelligence 77.2 (1995), pp. 321–358.
31
strategic argumentation problems
Few works address the problem of decision-making in
argumentation.
In abstract argumentation, agents exchange arguments and
use attacks as relations between the arguments.
Formal abstract argumentation framework8 ⟨A, E⟩ such that:
A a set of arguments,
8
Phan Minh Dung. “On the Acceptability of Arguments and its Fundamental
Role in Nonmonotonic Reasoning, Logic Programming and n-Person Games”.
In: Artificial Intelligence 77.2 (1995), pp. 321–358.
31
strategic argumentation problems
Few works address the problem of decision-making in
argumentation.
In abstract argumentation, agents exchange arguments and
use attacks as relations between the arguments.
Formal abstract argumentation framework8 ⟨A, E⟩ such that:
A a set of arguments,
E a set of relations such that (a, b) ∈ E if a ∈ A and
b ∈ A and a attacks b.
8
Phan Minh Dung. “On the Acceptability of Arguments and its Fundamental
Role in Nonmonotonic Reasoning, Logic Programming and n-Person Games”.
In: Artificial Intelligence 77.2 (1995), pp. 321–358.
31
example of abstract framework
a b
c
d
e
Figure 2: Example of abstract argumentation framework with 5
arguments and 5 attacks
32
example of abstract framework
aa
in
b
c
d
e
Figure 2: Example of abstract argumentation framework with 5
arguments and 5 attacks
32
example of abstract framework
aa
in
bb
out
c
d
e
Figure 2: Example of abstract argumentation framework with 5
arguments and 5 attacks
32
example of abstract framework
aa
in
bb
out
cc
in
d
e
Figure 2: Example of abstract argumentation framework with 5
arguments and 5 attacks
32
example of abstract framework
aa
in
bb
out
cc
in
d
ee
out
Figure 2: Example of abstract argumentation framework with 5
arguments and 5 attacks
32
example of abstract framework
aa
in
bb
out
cc
in
dd
in
ee
out
Figure 2: Example of abstract argumentation framework with 5
arguments and 5 attacks
32
decision-making in argumentation
More recently: argumentation framework with probabilistic
strategies9 against stochastic opponents.
9
Anthony Hunter. “Probabilistic Strategies in Dialogical Argumentation”. In:
International Conference on Scalable Uncertainty Management (SUM) LNCS
volume 8720. 2014.
33
decision-making in argumentation
More recently: argumentation framework with probabilistic
strategies9 against stochastic opponents.
Agents play a turn-based game → argumentative dialogue
9
Anthony Hunter. “Probabilistic Strategies in Dialogical Argumentation”. In:
International Conference on Scalable Uncertainty Management (SUM) LNCS
volume 8720. 2014.
33
decision-making in argumentation
More recently: argumentation framework with probabilistic
strategies9 against stochastic opponents.
Agents play a turn-based game → argumentative dialogue
Uses executable logic to represent the actions of an agent in
the debate.
9
Anthony Hunter. “Probabilistic Strategies in Dialogical Argumentation”. In:
International Conference on Scalable Uncertainty Management (SUM) LNCS
volume 8720. 2014.
33
argumentation framework with probabilistic strategies
Each agent has a private state.
34
argumentation framework with probabilistic strategies
Each agent has a private state.
The problem has a public space.
34
argumentation framework with probabilistic strategies
Each agent has a private state.
The problem has a public space.
A rule for an agent is defined as Premises ⇒ Pr(Acts) such that:
• Premises: a conjunction of a(x), hi(x), e(x, y),
34
argumentation framework with probabilistic strategies
Each agent has a private state.
The problem has a public space.
A rule for an agent is defined as Premises ⇒ Pr(Acts) such that:
• Premises: a conjunction of a(x), hi(x), e(x, y),
• Acts: conjunction of ⊞, ⊟ on a(), e() and ⊕, ⊖ on hi(),
34
argumentation framework with probabilistic strategies
Each agent has a private state.
The problem has a public space.
A rule for an agent is defined as Premises ⇒ Pr(Acts) such that:
• Premises: a conjunction of a(x), hi(x), e(x, y),
• Acts: conjunction of ⊞, ⊟ on a(), e() and ⊕, ⊖ on hi(),
Example
h1(b) ∧ a(f) ∧ e(b, f) ⇒ 0.5 : ⊞a(b) ∨ 0.5 : ⊞a(c)
34
argumentation framework with probabilistic strategies
Each agent has a private state.
The problem has a public space.
A rule for an agent is defined as Premises ⇒ Pr(Acts) such that:
• Premises: a conjunction of a(x), hi(x), e(x, y),
• Acts: conjunction of ⊞, ⊟ on a(), e() and ⊕, ⊖ on hi(),
Example
h1(b) ∧ a(f) ∧ e(b, f) ⇒ 0.5 : ⊞a(b) ∨ 0.5 : ⊞a(c)
Purpose
Optimize the sequence of arguments of one agent.
34
argumentation problem with probabilistic strategies
Argumentation Problems with Probabilistic Strategies (APS)10
⟨A, E, Si, P, G, gi, Ri⟩ such that:
A, E a set of arguments and attacks,
10
Emmanuel Hadoux et al. “Optimization of Probabilistic Argumentation with
Markov Decision Models”. In: IJCAI, Buenos Aires, Argentina. 2015.
35
argumentation problem with probabilistic strategies
Argumentation Problems with Probabilistic Strategies (APS)10
⟨A, E, Si, P, G, gi, Ri⟩ such that:
A, E a set of arguments and attacks,
Si the set of private states of agent i,
10
Emmanuel Hadoux et al. “Optimization of Probabilistic Argumentation with
Markov Decision Models”. In: IJCAI, Buenos Aires, Argentina. 2015.
35
argumentation problem with probabilistic strategies
Argumentation Problems with Probabilistic Strategies (APS)10
⟨A, E, Si, P, G, gi, Ri⟩ such that:
A, E a set of arguments and attacks,
Si the set of private states of agent i,
P = 2A × 2E the public space,
10
Emmanuel Hadoux et al. “Optimization of Probabilistic Argumentation with
Markov Decision Models”. In: IJCAI, Buenos Aires, Argentina. 2015.
35
argumentation problem with probabilistic strategies
Argumentation Problems with Probabilistic Strategies (APS)10
⟨A, E, Si, P, G, gi, Ri⟩ such that:
A, E a set of arguments and attacks,
Si the set of private states of agent i,
P = 2A × 2E the public space,
G the set of all possible goals,
10
Emmanuel Hadoux et al. “Optimization of Probabilistic Argumentation with
Markov Decision Models”. In: IJCAI, Buenos Aires, Argentina. 2015.
35
argumentation problem with probabilistic strategies
Argumentation Problems with Probabilistic Strategies (APS)10
⟨A, E, Si, P, G, gi, Ri⟩ such that:
A, E a set of arguments and attacks,
Si the set of private states of agent i,
P = 2A × 2E the public space,
G the set of all possible goals,
gi the goal of agent i → Dung,
10
Emmanuel Hadoux et al. “Optimization of Probabilistic Argumentation with
Markov Decision Models”. In: IJCAI, Buenos Aires, Argentina. 2015.
35
argumentation problem with probabilistic strategies
Argumentation Problems with Probabilistic Strategies (APS)10
⟨A, E, Si, P, G, gi, Ri⟩ such that:
A, E a set of arguments and attacks,
Si the set of private states of agent i,
P = 2A × 2E the public space,
G the set of all possible goals,
gi the goal of agent i → Dung,
Ri a set of rules for agent i
10
Emmanuel Hadoux et al. “Optimization of Probabilistic Argumentation with
Markov Decision Models”. In: IJCAI, Buenos Aires, Argentina. 2015.
35
example: arguments
Debate between two agents: Is e-sport a sport?
36
example: arguments
Debate between two agents: Is e-sport a sport?
a E-sport is a sport
f E-sport is not a
physical activity
g E-sport is not
referenced by IOC
a f
g
b
c
d e
h
Figure 3: Attacks graph
36
probabilistic finite state machine: graph
APS → Probabilistic Finite State Machine from an initial state
(e.g., {h1(a), h1(b)}, {}, {h2(c), h2(d)})
σ1start σ2
σ3
σ4
σ5
σ6 σ7
σ8 σ9 σ10
σ11
σ12
1
0.8
0.2
0.5
0.5 1
1
0.8 0.2
0.8
0.2
Figure 4: PFSM of Example e-sport
37
probabilistic finite state machine
To optimize the sequence of arguments for agent 1, we could
optimize the PFSM but:
38
probabilistic finite state machine
To optimize the sequence of arguments for agent 1, we could
optimize the PFSM but:
1. depends of the initial state
38
probabilistic finite state machine
To optimize the sequence of arguments for agent 1, we could
optimize the PFSM but:
1. depends of the initial state
2. requires knowledge of the private state of the opponent
38
probabilistic finite state machine
To optimize the sequence of arguments for agent 1, we could
optimize the PFSM but:
1. depends of the initial state
2. requires knowledge of the private state of the opponent
Using MOMDPs, we can relax assumptions 1 and 2.
38
transformation to a momdp
An APS with two agents, from the point of view of agent 1, can
be transformed to a MOMDP:
• Sv = S1 × P, Sh = S2,
39
transformation to a momdp
An APS with two agents, from the point of view of agent 1, can
be transformed to a MOMDP:
• Sv = S1 × P, Sh = S2,
• Ov = Sv and Oh = ∅,
39
transformation to a momdp
An APS with two agents, from the point of view of agent 1, can
be transformed to a MOMDP:
• Sv = S1 × P, Sh = S2,
• Ov = Sv and Oh = ∅,
• A = {prem(r) ⇒ m|r ∈ R1 and m ∈ acts(r)}
39
transformation to a momdp
An APS with two agents, from the point of view of agent 1, can
be transformed to a MOMDP:
• Sv = S1 × P, Sh = S2,
• Ov = Sv and Oh = ∅,
• A = {prem(r) ⇒ m|r ∈ R1 and m ∈ acts(r)}
Example
h1(b) ∧ a(f) ∧ h1(c) 0.5 : ⊞a(b) ∧ ⊞e(b, f)∨
⇒
∧e(b, f) ∧ e(c, f) 0.5 : ⊞a(c) ∧ ⊞e(c, f)
39
transformation to a momdp
Model sizes:
APS : 8 arguments, 8 attacks, 6 rules
POMDP : 4 294 967 296 states
MOMDP : 16 777 216 states
40
transformation to a momdp
Model sizes:
APS : 8 arguments, 8 attacks, 6 rules
POMDP : 4 294 967 296 states
MOMDP : 16 777 216 states
We want to have the policy → cannot use POMCP.
We need to reduce the size of the instances to use traditional
methods.
40
transformation to a momdp
Model sizes:
APS : 8 arguments, 8 attacks, 6 rules
POMDP : 4 294 967 296 states
MOMDP : 16 777 216 states
We want to have the policy → cannot use POMCP.
We need to reduce the size of the instances to use traditional
methods.
Two kinds of size-reducing procedures: with or without
dependencies on the initial state.
40
size-reducing procedures
Dom. Removes dominated arguments
Argument dominance
If an argument is attacked by
at least one unattacked argu-
ment, it is dominated.
a f
g
b
c
d e
h
Figure 5: Attacks graph
41
size-reducing procedures
Irr. Prunes irrelevant arguments
42
size-reducing procedures
Irr. Prunes irrelevant arguments
Irr(s0) Removes rules incompatible with initial state.
42
size-reducing procedures
Irr. Prunes irrelevant arguments
Irr(s0) Removes rules incompatible with initial state.
Enth. Infers attacks
42
size-reducing procedures
Irr. Prunes irrelevant arguments
Irr(s0) Removes rules incompatible with initial state.
Enth. Infers attacks
Optimal sequence of procedures
1. Irr(s0), Irr. until stable
2. Dom., 1. until stable
3. Enth.
42
size-reducing procedures
Irr. Prunes irrelevant arguments
Irr(s0) Removes rules incompatible with initial state.
Enth. Infers attacks
Optimal sequence of procedures
1. Irr(s0), Irr. until stable
2. Dom., 1. until stable
3. Enth.
Guarantees
On the unicity and optimality of the solution
42
experiments
Solution for the e-sport problem computed with MO-SARSOP11.
None Irr. Enth. Dom. Irr(s0). All
E-sport — — — — — 0.56
6 args 1313 22 43 7 2.4 0.9
7 args — 180 392 16 20 6.7
8 args — — — — 319 45
9 args — — — — — —
Table 1: Computation time (in seconds) — means ∞
11
S.C.W. Ong et al. “Planning under uncertainty for robotic tasks with mixed
observability”. In: The International Journal of Robotics Research. 2010.
43
mediation problems
Let us consider a debate problem with several agents split in
teams.
12
Elise Bonzon and Nicolas Maudet. “On the outcomes of multiparty
persuasion”. In: AAMAS. 2011, pp. 47–54.
13
Michal Chalamish and Sarit Kraus. “AutoMed: an automated mediator for
multi-issue bilateral negotiations”. In: JAAMAS 24.3 (2012), pp. 536–564.
44
mediation problems
Let us consider a debate problem with several agents split in
teams.
We need a mediator to give the speak-turns.
12
Elise Bonzon and Nicolas Maudet. “On the outcomes of multiparty
persuasion”. In: AAMAS. 2011, pp. 47–54.
13
Michal Chalamish and Sarit Kraus. “AutoMed: an automated mediator for
multi-issue bilateral negotiations”. In: JAAMAS 24.3 (2012), pp. 536–564.
44
mediation problems
Let us consider a debate problem with several agents split in
teams.
We need a mediator to give the speak-turns.
In most cases, the mediator is not active12 or is looking for a
consensus13.
12
Elise Bonzon and Nicolas Maudet. “On the outcomes of multiparty
persuasion”. In: AAMAS. 2011, pp. 47–54.
13
Michal Chalamish and Sarit Kraus. “AutoMed: an automated mediator for
multi-issue bilateral negotiations”. In: JAAMAS 24.3 (2012), pp. 536–564.
44
mediation problems
Let us consider a debate problem with several agents split in
teams.
We need a mediator to give the speak-turns.
In most cases, the mediator is not active12 or is looking for a
consensus13.
We envision a more active mediator with her own agenda →
generalization
12
Elise Bonzon and Nicolas Maudet. “On the outcomes of multiparty
persuasion”. In: AAMAS. 2011, pp. 47–54.
13
Michal Chalamish and Sarit Kraus. “AutoMed: an automated mediator for
multi-issue bilateral negotiations”. In: JAAMAS 24.3 (2012), pp. 536–564.
44
mediation problems in non-stationary environments
We also consider each agent can be either of the two following
modes:
14
Still under review.
45
mediation problems in non-stationary environments
We also consider each agent can be either of the two following
modes:
constructive argumenting towards the goal,
14
Still under review.
45
mediation problems in non-stationary environments
We also consider each agent can be either of the two following
modes:
constructive argumenting towards the goal,
destructive argumenting against the opponent’s goal.
14
Still under review.
45
mediation problems in non-stationary environments
We also consider each agent can be either of the two following
modes:
constructive argumenting towards the goal,
destructive argumenting against the opponent’s goal.
But other modes can be defined.
14
Still under review.
45
mediation problems in non-stationary environments
We also consider each agent can be either of the two following
modes:
constructive argumenting towards the goal,
destructive argumenting against the opponent’s goal.
But other modes can be defined.
We proposed Dynamic Mediation Problems (DMP)14 for those
problems from the viewpoint of the mediator.
14
Still under review.
45
conversion to a hs3mdp
The argumentative modes can be converted into HS3MDP
modes, allowing us to convert DMPs to HS3MDPs.
46
conversion to a hs3mdp
The argumentative modes can be converted into HS3MDP
modes, allowing us to convert DMPs to HS3MDPs.
We can solve the problem using our adaptations of POMCP.
46
conversion to a hs3mdp
The argumentative modes can be converted into HS3MDP
modes, allowing us to convert DMPs to HS3MDPs.
We can solve the problem using our adaptations of POMCP.
Purpose
Organize the sequence of speak-turns for the mediator.
46
conclusion
To apply decision-making to argumentation, we proposed:
• A formalization of debates with probabilistic strategies
(APS),
47
conclusion
To apply decision-making to argumentation, we proposed:
• A formalization of debates with probabilistic strategies
(APS),
• How to transform APS to MOMDP and solve them,
47
conclusion
To apply decision-making to argumentation, we proposed:
• A formalization of debates with probabilistic strategies
(APS),
• How to transform APS to MOMDP and solve them,
• Size-reducing procedures,
47
conclusion
To apply decision-making to argumentation, we proposed:
• A formalization of debates with probabilistic strategies
(APS),
• How to transform APS to MOMDP and solve them,
• Size-reducing procedures,
• A formalization of non-stationary mediation problems
(DMP),
47
conclusion
To apply decision-making to argumentation, we proposed:
• A formalization of debates with probabilistic strategies
(APS),
• How to transform APS to MOMDP and solve them,
• Size-reducing procedures,
• A formalization of non-stationary mediation problems
(DMP),
• How to transform DMP to HS3MDP and solve them.
47
general conclusion
Our contribution is two-folded:
• Improvement of existing methods and models for
decision-making in non-stationary environments,
• Exploration of a new domain combining it to
argumentation.
15
http://arguman.org
16
https://github.com/Amande-WP5/formalarg
48
general conclusion
Our contribution is two-folded:
• Improvement of existing methods and models for
decision-making in non-stationary environments,
• Exploration of a new domain combining it to
argumentation.
What could be improved:
• Extensive testing of the scalability,
• More realistic experiments1516,
• Additional theoretical properties.
15
http://arguman.org
16
https://github.com/Amande-WP5/formalarg
48
perspectives
Some straightforward follow-ups of this work:
• learn the mode transition/duration functions in HS3MDPs,
• develop our adaptations of POMCP for MOMDPs,
49
perspectives
Some straightforward follow-ups of this work:
• learn the mode transition/duration functions in HS3MDPs,
• develop our adaptations of POMCP for MOMDPs,
• learn the probabilities of the acts in APS and DMPs,
• take into account the goal of the opponents in APS.
49
perspectives
Decision-making and argumentation can benefit each other at
different levels.
• sequence of arguments,
50
perspectives
Decision-making and argumentation can benefit each other at
different levels.
• sequence of arguments,
• sequence of agents,
50
perspectives
Decision-making and argumentation can benefit each other at
different levels.
• sequence of arguments,
• sequence of agents,
• sequence of topics,
50
perspectives
Decision-making and argumentation can benefit each other at
different levels.
• sequence of arguments,
• sequence of agents,
• sequence of topics,
• sequence of recommendations,
50
perspectives
Decision-making and argumentation can benefit each other at
different levels.
• sequence of arguments,
• sequence of agents,
• sequence of topics,
• sequence of recommendations,
• sequence of explanations.
50
Thank you very much for you attention
51

More Related Content

PPT
Hierarchical RL (DAI).ppt
PDF
Эриберто Кваджавитль "Адаптивное обучение с подкреплением для интерактивных ...
PDF
EL MODELO DE NEGOCIO DE YOUTUBE
PPTX
technical seminar2.pptx.on markov decision process
PPT
Cs221 lecture8-fall11
PPT
RL intro
PPTX
unit-4 Markov Decision process presentation.pptx
PDF
Applications of Markov Decision Processes (MDPs) in the Internet of Things (I...
Hierarchical RL (DAI).ppt
Эриберто Кваджавитль "Адаптивное обучение с подкреплением для интерактивных ...
EL MODELO DE NEGOCIO DE YOUTUBE
technical seminar2.pptx.on markov decision process
Cs221 lecture8-fall11
RL intro
unit-4 Markov Decision process presentation.pptx
Applications of Markov Decision Processes (MDPs) in the Internet of Things (I...

Similar to Markovian sequential decision-making in non-stationary environments: application to argumentation problems (20)

PDF
Deep reinforcement learning from scratch
PDF
Markov decision process
PPT
POMDP Seminar Backup3
PPTX
Making Complex Decisions(Artificial Intelligence)
PDF
MarkovDecisionProcess&POMDP-MDP_PPTX.pdf
PPTX
Unit 4 - 4.1 Markov Decision Process.pptx
PDF
PPTX
Reinforcement Learning: An Introduction.pptx
PPTX
What is Reinforcement Algorithms and how worked.pptx
PDF
Reinforcement Learning - DQN
PPT
Lecture notes
PDF
Reinforcement Learning for Financial Markets
PPTX
How to formulate reinforcement learning in illustrative ways
PDF
Optimization of probabilistic argumentation with Markov processes
PPTX
Reinforcement Learning
PDF
Planning in Markov Stochastic Task Domains
PDF
slides.pdfArtificial Intelligence: Towards Adaptive Autonomous Systems (resea...
PPTX
REINFORCEMENT_LEARNING POWER POINT PRESENTATION.pptx
PPT
reinforcement-learning.ppt
PPT
reinforcement-learning.prsentation for c
Deep reinforcement learning from scratch
Markov decision process
POMDP Seminar Backup3
Making Complex Decisions(Artificial Intelligence)
MarkovDecisionProcess&POMDP-MDP_PPTX.pdf
Unit 4 - 4.1 Markov Decision Process.pptx
Reinforcement Learning: An Introduction.pptx
What is Reinforcement Algorithms and how worked.pptx
Reinforcement Learning - DQN
Lecture notes
Reinforcement Learning for Financial Markets
How to formulate reinforcement learning in illustrative ways
Optimization of probabilistic argumentation with Markov processes
Reinforcement Learning
Planning in Markov Stochastic Task Domains
slides.pdfArtificial Intelligence: Towards Adaptive Autonomous Systems (resea...
REINFORCEMENT_LEARNING POWER POINT PRESENTATION.pptx
reinforcement-learning.ppt
reinforcement-learning.prsentation for c
Ad

Recently uploaded (20)

PPT
protein biochemistry.ppt for university classes
PPT
The World of Physical Science, • Labs: Safety Simulation, Measurement Practice
DOCX
Viruses (History, structure and composition, classification, Bacteriophage Re...
PPTX
7. General Toxicologyfor clinical phrmacy.pptx
PPTX
neck nodes and dissection types and lymph nodes levels
PDF
AlphaEarth Foundations and the Satellite Embedding dataset
PPTX
BIOMOLECULES PPT........................
PDF
bbec55_b34400a7914c42429908233dbd381773.pdf
PDF
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
PPTX
INTRODUCTION TO EVS | Concept of sustainability
PDF
IFIT3 RNA-binding activity primores influenza A viruz infection and translati...
PPTX
ognitive-behavioral therapy, mindfulness-based approaches, coping skills trai...
PPTX
Microbiology with diagram medical studies .pptx
PPTX
microscope-Lecturecjchchchchcuvuvhc.pptx
PPTX
Classification Systems_TAXONOMY_SCIENCE8.pptx
PDF
HPLC-PPT.docx high performance liquid chromatography
PDF
Sciences of Europe No 170 (2025)
PPTX
DRUG THERAPY FOR SHOCK gjjjgfhhhhh.pptx.
PPTX
G5Q1W8 PPT SCIENCE.pptx 2025-2026 GRADE 5
PPTX
Derivatives of integument scales, beaks, horns,.pptx
protein biochemistry.ppt for university classes
The World of Physical Science, • Labs: Safety Simulation, Measurement Practice
Viruses (History, structure and composition, classification, Bacteriophage Re...
7. General Toxicologyfor clinical phrmacy.pptx
neck nodes and dissection types and lymph nodes levels
AlphaEarth Foundations and the Satellite Embedding dataset
BIOMOLECULES PPT........................
bbec55_b34400a7914c42429908233dbd381773.pdf
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
INTRODUCTION TO EVS | Concept of sustainability
IFIT3 RNA-binding activity primores influenza A viruz infection and translati...
ognitive-behavioral therapy, mindfulness-based approaches, coping skills trai...
Microbiology with diagram medical studies .pptx
microscope-Lecturecjchchchchcuvuvhc.pptx
Classification Systems_TAXONOMY_SCIENCE8.pptx
HPLC-PPT.docx high performance liquid chromatography
Sciences of Europe No 170 (2025)
DRUG THERAPY FOR SHOCK gjjjgfhhhhh.pptx.
G5Q1W8 PPT SCIENCE.pptx 2025-2026 GRADE 5
Derivatives of integument scales, beaks, horns,.pptx
Ad

Markovian sequential decision-making in non-stationary environments: application to argumentation problems

  • 1. markovian sequential decision-making in non-stationary environments application to argumentative debates Emmanuel Hadoux director: Nicolas Maudet supervisors: Aurélie Beynier and Paul Weng November, 26th 2015 LIP6 / UPMC - ED 130
  • 2. sequential decision-making problem? Example • What do I want to eat? • Which color should I wear? • Which way to go to work? 2
  • 3. sequential decision-making problem? Example • What do I want to eat? (one shot) • Which color should I wear? (one shot) • Which way to go to work? (sequential) 2
  • 4. sequential decision-making problem? Example • What do I want to eat? (one shot) • Which color should I wear? (one shot) • Which way to go to work? (sequential) 2
  • 5. sequential decision-making problem under uncertainty? A more precise definition An agent (real or virtual) makes decisions in an environment. 3
  • 6. sequential decision-making problem under uncertainty? A more precise definition An agent (real or virtual) makes decisions in an environment. The state evolves with the actions performed by the agent. 3
  • 7. sequential decision-making problem under uncertainty? A more precise definition An agent (real or virtual) makes decisions in an environment. The state evolves with the actions performed by the agent. The transitions from one state to another can be: 3
  • 8. sequential decision-making problem under uncertainty? A more precise definition An agent (real or virtual) makes decisions in an environment. The state evolves with the actions performed by the agent. The transitions from one state to another can be: 1. deterministic (known in advance) → closing a door 3
  • 9. sequential decision-making problem under uncertainty? A more precise definition An agent (real or virtual) makes decisions in an environment. The state evolves with the actions performed by the agent. The transitions from one state to another can be: 1. deterministic (known in advance) → closing a door 2. stochastic (with probabilities) → the door may be locked 3
  • 10. sequential decision-making problem under uncertainty? A more precise definition An agent (real or virtual) makes decisions in an environment. The state evolves with the actions performed by the agent. The transitions from one state to another can be: 1. deterministic (known in advance) → closing a door 2. stochastic (with probabilities) → the door may be locked 3. etc. 3
  • 11. sequential decision-making problem under uncertainty? A more precise definition An agent (real or virtual) makes decisions in an environment. The state evolves with the actions performed by the agent. The transitions from one state to another can be: 1. deterministic (known in advance) → closing a door 2. stochastic (with probabilities) → the door may be locked 3. etc. Do the probabilities evolve with the time? no the environment is stationary yes the environment is non-stationary 3
  • 12. the whole context We are interested in: 1. Solving sequential decision-making problem 4
  • 13. the whole context We are interested in: 1. Solving sequential decision-making problem 2. under uncertainty (with stochastic dynamics) 4
  • 14. the whole context We are interested in: 1. Solving sequential decision-making problem 2. under uncertainty (with stochastic dynamics) 3. in non-stationary environments 4
  • 15. the whole context We are interested in: 1. Solving sequential decision-making problem 2. under uncertainty (with stochastic dynamics) 3. in non-stationary environments Many problems fall into this category (MAS, exogenous events, etc.). 4
  • 16. the whole context We are interested in: 1. Solving sequential decision-making problem 2. under uncertainty (with stochastic dynamics) 3. in non-stationary environments Many problems fall into this category (MAS, exogenous events, etc.). The non-stationarity makes the problem very hard to solve. 4
  • 17. table of contents Decision-making in non-stationary environments 1. Markov Decision Models 2. Non-stationary environments Application to argumentation problems 1. Strategic debate 2. Mediation problems 5
  • 19. markov decision process Markov Decision Process (MDP)1 ⟨S, A, T, R⟩ such that: 1 Martin L. Puterman. Markov Decision Processes: Discrete dynamic stochastic programming. John Wiley Chichester, 1994. 7
  • 20. markov decision process Markov Decision Process (MDP)1 ⟨S, A, T, R⟩ such that: S a finite set of observable states, 1 Martin L. Puterman. Markov Decision Processes: Discrete dynamic stochastic programming. John Wiley Chichester, 1994. 7
  • 21. markov decision process Markov Decision Process (MDP)1 ⟨S, A, T, R⟩ such that: S a finite set of observable states, A a finite set of actions, 1 Martin L. Puterman. Markov Decision Processes: Discrete dynamic stochastic programming. John Wiley Chichester, 1994. 7
  • 22. markov decision process Markov Decision Process (MDP)1 ⟨S, A, T, R⟩ such that: S a finite set of observable states, A a finite set of actions, T : S × A → Pr(S) a transition function over the states, 1 Martin L. Puterman. Markov Decision Processes: Discrete dynamic stochastic programming. John Wiley Chichester, 1994. 7
  • 23. markov decision process Markov Decision Process (MDP)1 ⟨S, A, T, R⟩ such that: S a finite set of observable states, A a finite set of actions, T : S × A → Pr(S) a transition function over the states, R : S × A → R a reward function. 1 Martin L. Puterman. Markov Decision Processes: Discrete dynamic stochastic programming. John Wiley Chichester, 1994. 7
  • 24. markov decision process Markov Decision Process (MDP)1 ⟨ S, A, T, R⟩ such that: closed open (open, 0.8) (close, 1) (open, 0.2) (close, 1) (open, 1) 1 Martin L. Puterman. Markov Decision Processes: Discrete dynamic stochastic programming. John Wiley Chichester, 1994. 7
  • 25. partially observable markov decision process Partially Observable Markov Decision Process (POMDP)2 ⟨S, A, T, R, O, Q⟩ such that: 2 Martin L. Puterman. Markov Decision Processes: Discrete dynamic stochastic programming. John Wiley Chichester, 1994. 8
  • 26. partially observable markov decision process Partially Observable Markov Decision Process (POMDP)2 ⟨S, A, T, R, O, Q⟩ such that: S, A, T, R as in MDPs, with S non-observable, 2 Martin L. Puterman. Markov Decision Processes: Discrete dynamic stochastic programming. John Wiley Chichester, 1994. 8
  • 27. partially observable markov decision process Partially Observable Markov Decision Process (POMDP)2 ⟨S, A, T, R, O, Q⟩ such that: S, A, T, R as in MDPs, with S non-observable, O a finite set of observations, 2 Martin L. Puterman. Markov Decision Processes: Discrete dynamic stochastic programming. John Wiley Chichester, 1994. 8
  • 28. partially observable markov decision process Partially Observable Markov Decision Process (POMDP)2 ⟨S, A, T, R, O, Q⟩ such that: S, A, T, R as in MDPs, with S non-observable, O a finite set of observations, Q : S → Pr(O) an observation function. 2 Martin L. Puterman. Markov Decision Processes: Discrete dynamic stochastic programming. John Wiley Chichester, 1994. 8
  • 29. partially observable markov decision process Partially Observable Markov Decision Process (POMDP)2 ⟨S, A, T, R, O, Q⟩ such that: S, A, T, R as in MDPs, with S non-observable, O a finite set of observations, Q : S → Pr(O) an observation function. As the state is not observable → belief state, a distribution of probabilities on all possible current states. 2 Martin L. Puterman. Markov Decision Processes: Discrete dynamic stochastic programming. John Wiley Chichester, 1994. 8
  • 30. partially observable markov decision process Partially Observable Markov Decision Process (POMDP)2 ⟨S, A, T, R, O, Q ⟩ such that: closed open locked (cl, 1) (op, 1) (cl, 1) (open, 0.8) (close, 1) (open, 0.2) (close, 1) (open, 1) (open, 1) (unlock, 1) 2 Martin L. Puterman. Markov Decision Processes: Discrete dynamic stochastic programming. John Wiley Chichester, 1994. 8
  • 31. mixed observability markov decision process Mixed Observability Markov Decision Process (MOMDP)3 ⟨Sv, Sh, A, T, R, Ov, Oh, Q⟩ such that: 3 S.C.W. Ong et al. “Planning under uncertainty for robotic tasks with mixed observability”. In: The International Journal of Robotics Research. 2010. 9
  • 32. mixed observability markov decision process Mixed Observability Markov Decision Process (MOMDP)3 ⟨Sv, Sh, A, T, R, Ov, Oh, Q⟩ such that: Sv, Sh the visible and hidden parts of the state, 3 S.C.W. Ong et al. “Planning under uncertainty for robotic tasks with mixed observability”. In: The International Journal of Robotics Research. 2010. 9
  • 33. mixed observability markov decision process Mixed Observability Markov Decision Process (MOMDP)3 ⟨Sv, Sh, A, T, R, Ov, Oh, Q⟩ such that: Sv, Sh the visible and hidden parts of the state, Ov, Oh the observations on the visible part and the hidden part of the state, 3 S.C.W. Ong et al. “Planning under uncertainty for robotic tasks with mixed observability”. In: The International Journal of Robotics Research. 2010. 9
  • 34. mixed observability markov decision process Mixed Observability Markov Decision Process (MOMDP)3 ⟨Sv, Sh, A, T, R, Ov, Oh, Q⟩ such that: Sv, Sh the visible and hidden parts of the state, Ov, Oh the observations on the visible part and the hidden part of the state, A, T, R, Q as before. 3 S.C.W. Ong et al. “Planning under uncertainty for robotic tasks with mixed observability”. In: The International Journal of Robotics Research. 2010. 9
  • 35. mixed observability markov decision process Mixed Observability Markov Decision Process (MOMDP)3 ⟨Sv, Sh, A, T, R, Ov, Oh, Q⟩ such that: Sv, Sh the visible and hidden parts of the state, Ov, Oh the observations on the visible part and the hidden part of the state, A, T, R, Q as before. Note that ⟨Sv × Sh = S, A, T, R, Ov × Oh = O, Q⟩ is a POMDP. 3 S.C.W. Ong et al. “Planning under uncertainty for robotic tasks with mixed observability”. In: The International Journal of Robotics Research. 2010. 9
  • 36. mixed observability markov decision process Let us consider, building on the previous example, that there is the possible presence of a key on the door lock: 10
  • 37. mixed observability markov decision process Let us consider, building on the previous example, that there is the possible presence of a key on the door lock: Sv {key, no key}, Sh {open, closed, locked}, Ov {k, n-k}, Oh {op, cl}. 10
  • 38. markov decision models All those models have a common limitation: mandatory stationarity. 11
  • 39. markov decision models All those models have a common limitation: mandatory stationarity. It is a limitation in many cases but we cannot take into account all types of non-stationarity. 11
  • 40. markov decision models All those models have a common limitation: mandatory stationarity. It is a limitation in many cases but we cannot take into account all types of non-stationarity. One assumption The non-stationarity is limited to a set of stationary modes, or contexts. 11
  • 42. hidden-mode markov decision process To address this subclass of problems, we can use Hidden-Mode Markov Decision Processes (HM-MDPs)4. 4 S.P.-M. Choi, N.L. Zhang, and D.-Y. Yeung. “Solving Hidden-Mode Markov Decision Problems”. In: Proceedings of the 8th International Workshop on Artificial Intelligence and Statistics (AISTATS). 2001, pp. 19–26. 13
  • 43. hidden-mode markov decision process To address this subclass of problems, we can use Hidden-Mode Markov Decision Processes (HM-MDPs)4. ⟨M, C⟩ such that: 4 S.P.-M. Choi, N.L. Zhang, and D.-Y. Yeung. “Solving Hidden-Mode Markov Decision Problems”. In: Proceedings of the 8th International Workshop on Artificial Intelligence and Statistics (AISTATS). 2001, pp. 19–26. 13
  • 44. hidden-mode markov decision process To address this subclass of problems, we can use Hidden-Mode Markov Decision Processes (HM-MDPs)4. ⟨M, C⟩ such that: M a set of modes → mi = ⟨S, A, Ti, Ri⟩, ∀mi ∈ M, 4 S.P.-M. Choi, N.L. Zhang, and D.-Y. Yeung. “Solving Hidden-Mode Markov Decision Problems”. In: Proceedings of the 8th International Workshop on Artificial Intelligence and Statistics (AISTATS). 2001, pp. 19–26. 13
  • 45. hidden-mode markov decision process To address this subclass of problems, we can use Hidden-Mode Markov Decision Processes (HM-MDPs)4. ⟨M, C⟩ such that: M a set of modes → mi = ⟨S, A, Ti, Ri⟩, ∀mi ∈ M, C : M → Pr(M) a transition function over modes. 4 S.P.-M. Choi, N.L. Zhang, and D.-Y. Yeung. “Solving Hidden-Mode Markov Decision Problems”. In: Proceedings of the 8th International Workshop on Artificial Intelligence and Statistics (AISTATS). 2001, pp. 19–26. 13
  • 46. hidden-mode markov decision process To address this subclass of problems, we can use Hidden-Mode Markov Decision Processes (HM-MDPs)4. ⟨M, C⟩ such that: M a set of modes → mi = ⟨S, A, Ti, Ri⟩, ∀mi ∈ M, C : M → Pr(M) a transition function over modes. S and A are common to each modes mi ∈ M. 4 S.P.-M. Choi, N.L. Zhang, and D.-Y. Yeung. “Solving Hidden-Mode Markov Decision Problems”. In: Proceedings of the 8th International Workshop on Artificial Intelligence and Statistics (AISTATS). 2001, pp. 19–26. 13
  • 47. hidden-mode markov decision process To address this subclass of problems, we can use Hidden-Mode Markov Decision Processes (HM-MDPs)4. ⟨M, C⟩ such that: M a set of modes → mi = ⟨S, A, Ti, Ri⟩, ∀mi ∈ M, C : M → Pr(M) a transition function over modes. S and A are common to each modes mi ∈ M. S is observable, M is not. 4 S.P.-M. Choi, N.L. Zhang, and D.-Y. Yeung. “Solving Hidden-Mode Markov Decision Problems”. In: Proceedings of the 8th International Workshop on Artificial Intelligence and Statistics (AISTATS). 2001, pp. 19–26. 13
  • 48. an example as an hm-mdp 2 modes 8 states 2 actions Figure 1: Traffic light problem (drawing by T. Huraux) 14
  • 49. an example as an hm-mdp s’ s Tm1(s, a, s′ ) m1 S {light side} × {car left?} × {car right?} A {left light, right light} T car arrivals and departures depending on the light R cost if cars are waiting on any side 15
  • 50. an example as an hm-mdp s’ s Tm1(s, a, s′ ) m1 s′ s Tm2(s, a, s′ ) m2 C(m1, m2) C(m2, m1) C(m1, m1) C(m2, m2) S {light side} × {car left?} × {car right?} A {left light, right light} T car arrivals and departures depending on the light R cost if cars are waiting on any side M majority flow of cars on the left or the right C a transition function over modes 15
  • 51. another limitation Each time a decision is made, the environment may switch modes. 16
  • 52. another limitation Each time a decision is made, the environment may switch modes. In the previous example: each time the system chooses which light to turn on, the busy side may change. 16
  • 54. hidden-semi-markov mode markov decision process An Hidden-Semi-Markov Mode Markov Decision Process (HS3MDP)5 is characterized by a triplet ⟨M, C, H⟩ such that: 5 E. Hadoux, A. Beynier, and P. Weng. “Solving Hidden-Semi-Markov-Mode Markov Decision Problems”. In: Scalable Uncertainty Management. Springer, 2014, pp. 176–189. 18
  • 55. hidden-semi-markov mode markov decision process An Hidden-Semi-Markov Mode Markov Decision Process (HS3MDP)5 is characterized by a triplet ⟨M, C, H⟩ such that: M, C as in HMMDP, 5 E. Hadoux, A. Beynier, and P. Weng. “Solving Hidden-Semi-Markov-Mode Markov Decision Problems”. In: Scalable Uncertainty Management. Springer, 2014, pp. 176–189. 18
  • 56. hidden-semi-markov mode markov decision process An Hidden-Semi-Markov Mode Markov Decision Process (HS3MDP)5 is characterized by a triplet ⟨M, C, H⟩ such that: M, C as in HMMDP, H : M × M → Pr(N) a duration function. 5 E. Hadoux, A. Beynier, and P. Weng. “Solving Hidden-Semi-Markov-Mode Markov Decision Problems”. In: Scalable Uncertainty Management. Springer, 2014, pp. 176–189. 18
  • 57. hidden-semi-markov mode markov decision process An Hidden-Semi-Markov Mode Markov Decision Process (HS3MDP)5 is characterized by a triplet ⟨M, C, H⟩ such that: M, C as in HMMDP, H : M × M → Pr(N) a duration function. 5 E. Hadoux, A. Beynier, and P. Weng. “Solving Hidden-Semi-Markov-Mode Markov Decision Problems”. In: Scalable Uncertainty Management. Springer, 2014, pp. 176–189. 18
  • 58. hidden-semi-markov mode markov decision process An Hidden-Semi-Markov Mode Markov Decision Process (HS3MDP)5 is characterized by a triplet ⟨M, C, H⟩ such that: M, C as in HMMDP, H : M × M → Pr(N) a duration function. New duration h after a decision step in mode m    if h > 0 m′ = m, h′ = h − 1 if h = 0 m′ ∼ C(m, ·), h′ = k − 1 where k ∼ H(m, m′, ·) 5 E. Hadoux, A. Beynier, and P. Weng. “Solving Hidden-Semi-Markov-Mode Markov Decision Problems”. In: Scalable Uncertainty Management. Springer, 2014, pp. 176–189. 18
  • 59. some precisions on hs3mdp Equivalence An HS3MDP is equivalent to a (potentially infinite) HM-MDP. 19
  • 60. some precisions on hs3mdp Equivalence An HS3MDP is equivalent to a (potentially infinite) HM-MDP. Conversion • An HS3MDP is a subclass of MOMDP (S → Sv, M, H → Sh), • An HS3MDP can be rewritten as a POMDP (as MOMDP is a subclass of POMDP). 19
  • 61. some precisions on hs3mdp Equivalence An HS3MDP is equivalent to a (potentially infinite) HM-MDP. Conversion • An HS3MDP is a subclass of MOMDP (S → Sv, M, H → Sh), • An HS3MDP can be rewritten as a POMDP (as MOMDP is a subclass of POMDP). Solving Therefore, MO/POMDP algorithms can be used with HS3MDPs. But finding an optimal policy is PSPACE-complet → scalability problem ⇒ approximate solution 19
  • 62. partially observable monte-carlo planning Partially Observable Monte-Carlo Planning algorithm (POMCP)6: one of the most efficient algorithms for large-sized POMDPs. POMCP characteristics 6 David Silver and Joel Veness. “Monte-Carlo planning in large POMDPs”. In: Proceedings of the 24th Conference on Neural Information Processing Systems (NIPS). 2010, pp. 2164–2172. 20
  • 63. partially observable monte-carlo planning Partially Observable Monte-Carlo Planning algorithm (POMCP)6: one of the most efficient algorithms for large-sized POMDPs. POMCP characteristics • Uses a set of particles to approximate the belief state, 6 David Silver and Joel Veness. “Monte-Carlo planning in large POMDPs”. In: Proceedings of the 24th Conference on Neural Information Processing Systems (NIPS). 2010, pp. 2164–2172. 20
  • 64. partially observable monte-carlo planning Partially Observable Monte-Carlo Planning algorithm (POMCP)6: one of the most efficient algorithms for large-sized POMDPs. POMCP characteristics • Uses a set of particles to approximate the belief state, • One particle = one state → ∞ particles = belief state, 6 David Silver and Joel Veness. “Monte-Carlo planning in large POMDPs”. In: Proceedings of the 24th Conference on Neural Information Processing Systems (NIPS). 2010, pp. 2164–2172. 20
  • 65. partially observable monte-carlo planning Partially Observable Monte-Carlo Planning algorithm (POMCP)6: one of the most efficient algorithms for large-sized POMDPs. POMCP characteristics • Uses a set of particles to approximate the belief state, • One particle = one state → ∞ particles = belief state, • Requires a simulator of the problem → relaxation of the known-model constraint. 6 David Silver and Joel Veness. “Monte-Carlo planning in large POMDPs”. In: Proceedings of the 24th Conference on Neural Information Processing Systems (NIPS). 2010, pp. 2164–2172. 20
  • 66. one step of pomcp Simulation phase: 1. Start with a root history τ τ ⟨N1, V1, B1⟩ 21
  • 67. one step of pomcp Simulation phase: 1. Start with a root history τ 2. Build action-nodes τ ⟨N1, V1, B1⟩ · · ·a1 a|A| 21
  • 68. one step of pomcp Simulation phase: 1. Start with a root history τ 2. Build action-nodes 3. Select next action to simulate τ ⟨N1, V1, B1⟩ · · ·a1 a|A| 21
  • 69. one step of pomcp Simulation phase: 1. Start with a root history τ 2. Build action-nodes 3. Select next action to simulate 4. Build observation-node τ ⟨N1, V1, B1⟩ · · ·a1 a|A| oi ⟨N2, V2, B2⟩ 21
  • 70. one step of pomcp Simulation phase: 1. Start with a root history τ 2. Build action-nodes 3. Select next action to simulate 4. Build observation-node 5. Goto 2. until reaching end τ ⟨N1, V1, B1⟩ · · ·a1 a|A| oi ⟨N2, V2, B2⟩ · · ·a1 a|A| 21
  • 71. one step of pomcp Simulation phase: 1. Start with a root history τ 2. Build action-nodes 3. Select next action to simulate 4. Build observation-node 5. Goto 2. until reaching end τ ⟨N1, V1, B1⟩ · · ·a1 a|A| oi ⟨N2, V2, B2⟩ · · ·a1 a|A| ... 21
  • 72. one step of pomcp Simulation phase: 1. Start with a root history τ 2. Build action-nodes 3. Select next action to simulate 4. Build observation-node 5. Goto 2. until reaching end 6. Backtrack the result τ ⟨N′ 1, V′ 1, B′ 1⟩ · · ·a1 a|A| oi ⟨N′ 2, V′ 2, B′ 2⟩ · · ·a1 a|A| ... 21
  • 73. one step of pomcp Simulation phase: 1. Start with a root history τ 2. Build action-nodes 3. Select next action to simulate 4. Build observation-node 5. Goto 2. until reaching end 6. Backtrack the result 7. Goto root 3. until no more simulations τ ⟨N′ 1, V′ 1, B′ 1⟩ · · ·a1 a|A| oi ⟨N′ 2, V′ 2, B′ 2⟩ · · ·a1 a|A| ... ... ... 21
  • 74. one step of pomcp Exploitation phase: τ ⟨N′ 1, V′ 1, B′ 1⟩ · · ·a1 a|A| oi ⟨N′ 2, V′ 2, B′ 2⟩ · · ·a1 a|A| ... ... ... 21
  • 75. one step of pomcp Exploitation phase: 1. Start with the root τ τ ⟨N′ 1, V′ 1, B′ 1⟩ · · ·a1 a|A| oi ⟨N′ 2, V′ 2, B′ 2⟩ · · ·a1 a|A| ... ... ... 21
  • 76. one step of pomcp Exploitation phase: 1. Start with the root τ 2. Perform action given by UCT τ ⟨N′ 1, V′ 1, B′ 1⟩ · · ·a1 a|A| oi ⟨N′ 2, V′ 2, B′ 2⟩ · · ·a1 a|A| ... ... ... 21
  • 77. one step of pomcp Exploitation phase: 1. Start with the root τ 2. Perform action given by UCT 3. Go to matching observation τ ⟨N′ 1, V′ 1, B′ 1⟩ · · ·a1 a|A| oi ⟨N′ 2, V′ 2, B′ 2⟩oi · · ·a1 a|A| ... ... ... 21
  • 78. one step of pomcp Exploitation phase: 1. Start with the root τ 2. Perform action given by UCT 3. Go to matching observation 4. Set new root τ′ and prune τ′ · · ·a1 a|A| ... 21
  • 79. one step of pomcp Exploitation phase: 1. Start with the root τ 2. Perform action given by UCT 3. Go to matching observation 4. Set new root τ′ and prune 5. Go to simulation phase τ′ · · ·a1 a|A| ... 21
  • 80. more limitations When the size of the model is too large → particle deprivation We can add more particles → requires more computing time 22
  • 81. using the structure of hs3mdp Fortunately, HS3MDPs are structured POMDPs. We defined two adaptations of POMCP for HS3MDPs: 23
  • 82. using the structure of hs3mdp Fortunately, HS3MDPs are structured POMDPs. We defined two adaptations of POMCP for HS3MDPs: 1. Adaptation to the structure 23
  • 83. using the structure of hs3mdp Fortunately, HS3MDPs are structured POMDPs. We defined two adaptations of POMCP for HS3MDPs: 1. Adaptation to the structure 2. Exact representation of the belief state 23
  • 84. using the structure of hs3mdp Fortunately, HS3MDPs are structured POMDPs. We defined two adaptations of POMCP for HS3MDPs: 1. Adaptation to the structure 2. Exact representation of the belief state Adaptation to the structure (SA) In HS3MDP, a state = a visible part and a hidden part. The former can be removed from the particle representation as it is directly observed. → a particle = a possible hidden part 23
  • 85. using the structure of hs3mdp Fortunately, HS3MDPs are structured POMDPs. We defined two adaptations of POMCP for HS3MDPs: 1. Adaptation to the structure 2. Exact representation of the belief state Exact representation of the belief state (SAER) Replace the sets of particles by the exact distribution µ: µ′(m′, h′) = 1 K ( Tm′ (s, a, s′) × µ(m′, h′ + 1) + ∑ m∈M C(m, m′) × Tm(s, a, s′) × µ(m, 0) × H(m, m′, h′ + 1) ) Complexity: O(|M| × hmax) ≷ O(N) with N the number of sim- ulations in the original POMCP 23
  • 86. experiments We tested our method on 4 problems, with 3 taken from the literature. We compared the performances of: • the original POMCP algorithm, • our adaptations SA and SAER, • the optimal policy when it can be computed 24
  • 87. results for the traffic light problem Simulations Original SA SAER Optimal 1 -3,42 0.0% 0.0% 38.5% 2 -2,86 3.0% 4.0% 26.5% 4 -2,80 8.1% 8.8% 25.0% 8 -2,68 6.0% 9.4% 21.7% 16 -2,60 8.0% 8.0% 19.2% 32 -2,45 5.3% 6.9% 14.3% · · · 1024 -2,31 5.1% 7.0% 9.3% 25
  • 88. randomly generated environments We can control the number of states, actions and modes and the transition functions. Too big to be optimally solved. 26
  • 89. results for the randomly generated environments 0 1 2 3 4 5 6 7 8 9 10 0.5 1 1.5 2 2.5 log2 #sim. Rewards(meanson100instances) Original SA SAER 27
  • 90. conclusion on hs3mdps We proposed in this work: 28
  • 91. conclusion on hs3mdps We proposed in this work: • A new model able to represent in a more realistic way non-stationary decision-making problems (HS3MDP), 28
  • 92. conclusion on hs3mdps We proposed in this work: • A new model able to represent in a more realistic way non-stationary decision-making problems (HS3MDP), • Adaptations of POMCP to tackle large-size problems, outperforming it. 28
  • 93. learning the model We also proposed a method to learn a subclass of those problems called RLCD with SCD7. 7 Emmanuel Hadoux, Aurélie Beynier, and Paul Weng. “Sequential Decision-Making under Non-stationary Environments via Sequential Change-point Detection”. In: First International Workshop on Learning over Multiple Contexts (LMCE) @ ECML. 2014. 29
  • 94. learning the model We also proposed a method to learn a subclass of those problems called RLCD with SCD7. This method is able to learn a part of the dynamics, without requiring to know the number of modes a priori. 7 Emmanuel Hadoux, Aurélie Beynier, and Paul Weng. “Sequential Decision-Making under Non-stationary Environments via Sequential Change-point Detection”. In: First International Workshop on Learning over Multiple Contexts (LMCE) @ ECML. 2014. 29
  • 96. strategic argumentation problems Few works address the problem of decision-making in argumentation. 8 Phan Minh Dung. “On the Acceptability of Arguments and its Fundamental Role in Nonmonotonic Reasoning, Logic Programming and n-Person Games”. In: Artificial Intelligence 77.2 (1995), pp. 321–358. 31
  • 97. strategic argumentation problems Few works address the problem of decision-making in argumentation. In abstract argumentation, agents exchange arguments and use attacks as relations between the arguments. 8 Phan Minh Dung. “On the Acceptability of Arguments and its Fundamental Role in Nonmonotonic Reasoning, Logic Programming and n-Person Games”. In: Artificial Intelligence 77.2 (1995), pp. 321–358. 31
  • 98. strategic argumentation problems Few works address the problem of decision-making in argumentation. In abstract argumentation, agents exchange arguments and use attacks as relations between the arguments. Formal abstract argumentation framework8 ⟨A, E⟩ such that: 8 Phan Minh Dung. “On the Acceptability of Arguments and its Fundamental Role in Nonmonotonic Reasoning, Logic Programming and n-Person Games”. In: Artificial Intelligence 77.2 (1995), pp. 321–358. 31
  • 99. strategic argumentation problems Few works address the problem of decision-making in argumentation. In abstract argumentation, agents exchange arguments and use attacks as relations between the arguments. Formal abstract argumentation framework8 ⟨A, E⟩ such that: A a set of arguments, 8 Phan Minh Dung. “On the Acceptability of Arguments and its Fundamental Role in Nonmonotonic Reasoning, Logic Programming and n-Person Games”. In: Artificial Intelligence 77.2 (1995), pp. 321–358. 31
  • 100. strategic argumentation problems Few works address the problem of decision-making in argumentation. In abstract argumentation, agents exchange arguments and use attacks as relations between the arguments. Formal abstract argumentation framework8 ⟨A, E⟩ such that: A a set of arguments, E a set of relations such that (a, b) ∈ E if a ∈ A and b ∈ A and a attacks b. 8 Phan Minh Dung. “On the Acceptability of Arguments and its Fundamental Role in Nonmonotonic Reasoning, Logic Programming and n-Person Games”. In: Artificial Intelligence 77.2 (1995), pp. 321–358. 31
  • 101. example of abstract framework a b c d e Figure 2: Example of abstract argumentation framework with 5 arguments and 5 attacks 32
  • 102. example of abstract framework aa in b c d e Figure 2: Example of abstract argumentation framework with 5 arguments and 5 attacks 32
  • 103. example of abstract framework aa in bb out c d e Figure 2: Example of abstract argumentation framework with 5 arguments and 5 attacks 32
  • 104. example of abstract framework aa in bb out cc in d e Figure 2: Example of abstract argumentation framework with 5 arguments and 5 attacks 32
  • 105. example of abstract framework aa in bb out cc in d ee out Figure 2: Example of abstract argumentation framework with 5 arguments and 5 attacks 32
  • 106. example of abstract framework aa in bb out cc in dd in ee out Figure 2: Example of abstract argumentation framework with 5 arguments and 5 attacks 32
  • 107. decision-making in argumentation More recently: argumentation framework with probabilistic strategies9 against stochastic opponents. 9 Anthony Hunter. “Probabilistic Strategies in Dialogical Argumentation”. In: International Conference on Scalable Uncertainty Management (SUM) LNCS volume 8720. 2014. 33
  • 108. decision-making in argumentation More recently: argumentation framework with probabilistic strategies9 against stochastic opponents. Agents play a turn-based game → argumentative dialogue 9 Anthony Hunter. “Probabilistic Strategies in Dialogical Argumentation”. In: International Conference on Scalable Uncertainty Management (SUM) LNCS volume 8720. 2014. 33
  • 109. decision-making in argumentation More recently: argumentation framework with probabilistic strategies9 against stochastic opponents. Agents play a turn-based game → argumentative dialogue Uses executable logic to represent the actions of an agent in the debate. 9 Anthony Hunter. “Probabilistic Strategies in Dialogical Argumentation”. In: International Conference on Scalable Uncertainty Management (SUM) LNCS volume 8720. 2014. 33
  • 110. argumentation framework with probabilistic strategies Each agent has a private state. 34
  • 111. argumentation framework with probabilistic strategies Each agent has a private state. The problem has a public space. 34
  • 112. argumentation framework with probabilistic strategies Each agent has a private state. The problem has a public space. A rule for an agent is defined as Premises ⇒ Pr(Acts) such that: • Premises: a conjunction of a(x), hi(x), e(x, y), 34
  • 113. argumentation framework with probabilistic strategies Each agent has a private state. The problem has a public space. A rule for an agent is defined as Premises ⇒ Pr(Acts) such that: • Premises: a conjunction of a(x), hi(x), e(x, y), • Acts: conjunction of ⊞, ⊟ on a(), e() and ⊕, ⊖ on hi(), 34
  • 114. argumentation framework with probabilistic strategies Each agent has a private state. The problem has a public space. A rule for an agent is defined as Premises ⇒ Pr(Acts) such that: • Premises: a conjunction of a(x), hi(x), e(x, y), • Acts: conjunction of ⊞, ⊟ on a(), e() and ⊕, ⊖ on hi(), Example h1(b) ∧ a(f) ∧ e(b, f) ⇒ 0.5 : ⊞a(b) ∨ 0.5 : ⊞a(c) 34
  • 115. argumentation framework with probabilistic strategies Each agent has a private state. The problem has a public space. A rule for an agent is defined as Premises ⇒ Pr(Acts) such that: • Premises: a conjunction of a(x), hi(x), e(x, y), • Acts: conjunction of ⊞, ⊟ on a(), e() and ⊕, ⊖ on hi(), Example h1(b) ∧ a(f) ∧ e(b, f) ⇒ 0.5 : ⊞a(b) ∨ 0.5 : ⊞a(c) Purpose Optimize the sequence of arguments of one agent. 34
  • 116. argumentation problem with probabilistic strategies Argumentation Problems with Probabilistic Strategies (APS)10 ⟨A, E, Si, P, G, gi, Ri⟩ such that: A, E a set of arguments and attacks, 10 Emmanuel Hadoux et al. “Optimization of Probabilistic Argumentation with Markov Decision Models”. In: IJCAI, Buenos Aires, Argentina. 2015. 35
  • 117. argumentation problem with probabilistic strategies Argumentation Problems with Probabilistic Strategies (APS)10 ⟨A, E, Si, P, G, gi, Ri⟩ such that: A, E a set of arguments and attacks, Si the set of private states of agent i, 10 Emmanuel Hadoux et al. “Optimization of Probabilistic Argumentation with Markov Decision Models”. In: IJCAI, Buenos Aires, Argentina. 2015. 35
  • 118. argumentation problem with probabilistic strategies Argumentation Problems with Probabilistic Strategies (APS)10 ⟨A, E, Si, P, G, gi, Ri⟩ such that: A, E a set of arguments and attacks, Si the set of private states of agent i, P = 2A × 2E the public space, 10 Emmanuel Hadoux et al. “Optimization of Probabilistic Argumentation with Markov Decision Models”. In: IJCAI, Buenos Aires, Argentina. 2015. 35
  • 119. argumentation problem with probabilistic strategies Argumentation Problems with Probabilistic Strategies (APS)10 ⟨A, E, Si, P, G, gi, Ri⟩ such that: A, E a set of arguments and attacks, Si the set of private states of agent i, P = 2A × 2E the public space, G the set of all possible goals, 10 Emmanuel Hadoux et al. “Optimization of Probabilistic Argumentation with Markov Decision Models”. In: IJCAI, Buenos Aires, Argentina. 2015. 35
  • 120. argumentation problem with probabilistic strategies Argumentation Problems with Probabilistic Strategies (APS)10 ⟨A, E, Si, P, G, gi, Ri⟩ such that: A, E a set of arguments and attacks, Si the set of private states of agent i, P = 2A × 2E the public space, G the set of all possible goals, gi the goal of agent i → Dung, 10 Emmanuel Hadoux et al. “Optimization of Probabilistic Argumentation with Markov Decision Models”. In: IJCAI, Buenos Aires, Argentina. 2015. 35
  • 121. argumentation problem with probabilistic strategies Argumentation Problems with Probabilistic Strategies (APS)10 ⟨A, E, Si, P, G, gi, Ri⟩ such that: A, E a set of arguments and attacks, Si the set of private states of agent i, P = 2A × 2E the public space, G the set of all possible goals, gi the goal of agent i → Dung, Ri a set of rules for agent i 10 Emmanuel Hadoux et al. “Optimization of Probabilistic Argumentation with Markov Decision Models”. In: IJCAI, Buenos Aires, Argentina. 2015. 35
  • 122. example: arguments Debate between two agents: Is e-sport a sport? 36
  • 123. example: arguments Debate between two agents: Is e-sport a sport? a E-sport is a sport f E-sport is not a physical activity g E-sport is not referenced by IOC a f g b c d e h Figure 3: Attacks graph 36
  • 124. probabilistic finite state machine: graph APS → Probabilistic Finite State Machine from an initial state (e.g., {h1(a), h1(b)}, {}, {h2(c), h2(d)}) σ1start σ2 σ3 σ4 σ5 σ6 σ7 σ8 σ9 σ10 σ11 σ12 1 0.8 0.2 0.5 0.5 1 1 0.8 0.2 0.8 0.2 Figure 4: PFSM of Example e-sport 37
  • 125. probabilistic finite state machine To optimize the sequence of arguments for agent 1, we could optimize the PFSM but: 38
  • 126. probabilistic finite state machine To optimize the sequence of arguments for agent 1, we could optimize the PFSM but: 1. depends of the initial state 38
  • 127. probabilistic finite state machine To optimize the sequence of arguments for agent 1, we could optimize the PFSM but: 1. depends of the initial state 2. requires knowledge of the private state of the opponent 38
  • 128. probabilistic finite state machine To optimize the sequence of arguments for agent 1, we could optimize the PFSM but: 1. depends of the initial state 2. requires knowledge of the private state of the opponent Using MOMDPs, we can relax assumptions 1 and 2. 38
  • 129. transformation to a momdp An APS with two agents, from the point of view of agent 1, can be transformed to a MOMDP: • Sv = S1 × P, Sh = S2, 39
  • 130. transformation to a momdp An APS with two agents, from the point of view of agent 1, can be transformed to a MOMDP: • Sv = S1 × P, Sh = S2, • Ov = Sv and Oh = ∅, 39
  • 131. transformation to a momdp An APS with two agents, from the point of view of agent 1, can be transformed to a MOMDP: • Sv = S1 × P, Sh = S2, • Ov = Sv and Oh = ∅, • A = {prem(r) ⇒ m|r ∈ R1 and m ∈ acts(r)} 39
  • 132. transformation to a momdp An APS with two agents, from the point of view of agent 1, can be transformed to a MOMDP: • Sv = S1 × P, Sh = S2, • Ov = Sv and Oh = ∅, • A = {prem(r) ⇒ m|r ∈ R1 and m ∈ acts(r)} Example h1(b) ∧ a(f) ∧ h1(c) 0.5 : ⊞a(b) ∧ ⊞e(b, f)∨ ⇒ ∧e(b, f) ∧ e(c, f) 0.5 : ⊞a(c) ∧ ⊞e(c, f) 39
  • 133. transformation to a momdp Model sizes: APS : 8 arguments, 8 attacks, 6 rules POMDP : 4 294 967 296 states MOMDP : 16 777 216 states 40
  • 134. transformation to a momdp Model sizes: APS : 8 arguments, 8 attacks, 6 rules POMDP : 4 294 967 296 states MOMDP : 16 777 216 states We want to have the policy → cannot use POMCP. We need to reduce the size of the instances to use traditional methods. 40
  • 135. transformation to a momdp Model sizes: APS : 8 arguments, 8 attacks, 6 rules POMDP : 4 294 967 296 states MOMDP : 16 777 216 states We want to have the policy → cannot use POMCP. We need to reduce the size of the instances to use traditional methods. Two kinds of size-reducing procedures: with or without dependencies on the initial state. 40
  • 136. size-reducing procedures Dom. Removes dominated arguments Argument dominance If an argument is attacked by at least one unattacked argu- ment, it is dominated. a f g b c d e h Figure 5: Attacks graph 41
  • 137. size-reducing procedures Irr. Prunes irrelevant arguments 42
  • 138. size-reducing procedures Irr. Prunes irrelevant arguments Irr(s0) Removes rules incompatible with initial state. 42
  • 139. size-reducing procedures Irr. Prunes irrelevant arguments Irr(s0) Removes rules incompatible with initial state. Enth. Infers attacks 42
  • 140. size-reducing procedures Irr. Prunes irrelevant arguments Irr(s0) Removes rules incompatible with initial state. Enth. Infers attacks Optimal sequence of procedures 1. Irr(s0), Irr. until stable 2. Dom., 1. until stable 3. Enth. 42
  • 141. size-reducing procedures Irr. Prunes irrelevant arguments Irr(s0) Removes rules incompatible with initial state. Enth. Infers attacks Optimal sequence of procedures 1. Irr(s0), Irr. until stable 2. Dom., 1. until stable 3. Enth. Guarantees On the unicity and optimality of the solution 42
  • 142. experiments Solution for the e-sport problem computed with MO-SARSOP11. None Irr. Enth. Dom. Irr(s0). All E-sport — — — — — 0.56 6 args 1313 22 43 7 2.4 0.9 7 args — 180 392 16 20 6.7 8 args — — — — 319 45 9 args — — — — — — Table 1: Computation time (in seconds) — means ∞ 11 S.C.W. Ong et al. “Planning under uncertainty for robotic tasks with mixed observability”. In: The International Journal of Robotics Research. 2010. 43
  • 143. mediation problems Let us consider a debate problem with several agents split in teams. 12 Elise Bonzon and Nicolas Maudet. “On the outcomes of multiparty persuasion”. In: AAMAS. 2011, pp. 47–54. 13 Michal Chalamish and Sarit Kraus. “AutoMed: an automated mediator for multi-issue bilateral negotiations”. In: JAAMAS 24.3 (2012), pp. 536–564. 44
  • 144. mediation problems Let us consider a debate problem with several agents split in teams. We need a mediator to give the speak-turns. 12 Elise Bonzon and Nicolas Maudet. “On the outcomes of multiparty persuasion”. In: AAMAS. 2011, pp. 47–54. 13 Michal Chalamish and Sarit Kraus. “AutoMed: an automated mediator for multi-issue bilateral negotiations”. In: JAAMAS 24.3 (2012), pp. 536–564. 44
  • 145. mediation problems Let us consider a debate problem with several agents split in teams. We need a mediator to give the speak-turns. In most cases, the mediator is not active12 or is looking for a consensus13. 12 Elise Bonzon and Nicolas Maudet. “On the outcomes of multiparty persuasion”. In: AAMAS. 2011, pp. 47–54. 13 Michal Chalamish and Sarit Kraus. “AutoMed: an automated mediator for multi-issue bilateral negotiations”. In: JAAMAS 24.3 (2012), pp. 536–564. 44
  • 146. mediation problems Let us consider a debate problem with several agents split in teams. We need a mediator to give the speak-turns. In most cases, the mediator is not active12 or is looking for a consensus13. We envision a more active mediator with her own agenda → generalization 12 Elise Bonzon and Nicolas Maudet. “On the outcomes of multiparty persuasion”. In: AAMAS. 2011, pp. 47–54. 13 Michal Chalamish and Sarit Kraus. “AutoMed: an automated mediator for multi-issue bilateral negotiations”. In: JAAMAS 24.3 (2012), pp. 536–564. 44
  • 147. mediation problems in non-stationary environments We also consider each agent can be either of the two following modes: 14 Still under review. 45
  • 148. mediation problems in non-stationary environments We also consider each agent can be either of the two following modes: constructive argumenting towards the goal, 14 Still under review. 45
  • 149. mediation problems in non-stationary environments We also consider each agent can be either of the two following modes: constructive argumenting towards the goal, destructive argumenting against the opponent’s goal. 14 Still under review. 45
  • 150. mediation problems in non-stationary environments We also consider each agent can be either of the two following modes: constructive argumenting towards the goal, destructive argumenting against the opponent’s goal. But other modes can be defined. 14 Still under review. 45
  • 151. mediation problems in non-stationary environments We also consider each agent can be either of the two following modes: constructive argumenting towards the goal, destructive argumenting against the opponent’s goal. But other modes can be defined. We proposed Dynamic Mediation Problems (DMP)14 for those problems from the viewpoint of the mediator. 14 Still under review. 45
  • 152. conversion to a hs3mdp The argumentative modes can be converted into HS3MDP modes, allowing us to convert DMPs to HS3MDPs. 46
  • 153. conversion to a hs3mdp The argumentative modes can be converted into HS3MDP modes, allowing us to convert DMPs to HS3MDPs. We can solve the problem using our adaptations of POMCP. 46
  • 154. conversion to a hs3mdp The argumentative modes can be converted into HS3MDP modes, allowing us to convert DMPs to HS3MDPs. We can solve the problem using our adaptations of POMCP. Purpose Organize the sequence of speak-turns for the mediator. 46
  • 155. conclusion To apply decision-making to argumentation, we proposed: • A formalization of debates with probabilistic strategies (APS), 47
  • 156. conclusion To apply decision-making to argumentation, we proposed: • A formalization of debates with probabilistic strategies (APS), • How to transform APS to MOMDP and solve them, 47
  • 157. conclusion To apply decision-making to argumentation, we proposed: • A formalization of debates with probabilistic strategies (APS), • How to transform APS to MOMDP and solve them, • Size-reducing procedures, 47
  • 158. conclusion To apply decision-making to argumentation, we proposed: • A formalization of debates with probabilistic strategies (APS), • How to transform APS to MOMDP and solve them, • Size-reducing procedures, • A formalization of non-stationary mediation problems (DMP), 47
  • 159. conclusion To apply decision-making to argumentation, we proposed: • A formalization of debates with probabilistic strategies (APS), • How to transform APS to MOMDP and solve them, • Size-reducing procedures, • A formalization of non-stationary mediation problems (DMP), • How to transform DMP to HS3MDP and solve them. 47
  • 160. general conclusion Our contribution is two-folded: • Improvement of existing methods and models for decision-making in non-stationary environments, • Exploration of a new domain combining it to argumentation. 15 http://arguman.org 16 https://github.com/Amande-WP5/formalarg 48
  • 161. general conclusion Our contribution is two-folded: • Improvement of existing methods and models for decision-making in non-stationary environments, • Exploration of a new domain combining it to argumentation. What could be improved: • Extensive testing of the scalability, • More realistic experiments1516, • Additional theoretical properties. 15 http://arguman.org 16 https://github.com/Amande-WP5/formalarg 48
  • 162. perspectives Some straightforward follow-ups of this work: • learn the mode transition/duration functions in HS3MDPs, • develop our adaptations of POMCP for MOMDPs, 49
  • 163. perspectives Some straightforward follow-ups of this work: • learn the mode transition/duration functions in HS3MDPs, • develop our adaptations of POMCP for MOMDPs, • learn the probabilities of the acts in APS and DMPs, • take into account the goal of the opponents in APS. 49
  • 164. perspectives Decision-making and argumentation can benefit each other at different levels. • sequence of arguments, 50
  • 165. perspectives Decision-making and argumentation can benefit each other at different levels. • sequence of arguments, • sequence of agents, 50
  • 166. perspectives Decision-making and argumentation can benefit each other at different levels. • sequence of arguments, • sequence of agents, • sequence of topics, 50
  • 167. perspectives Decision-making and argumentation can benefit each other at different levels. • sequence of arguments, • sequence of agents, • sequence of topics, • sequence of recommendations, 50
  • 168. perspectives Decision-making and argumentation can benefit each other at different levels. • sequence of arguments, • sequence of agents, • sequence of topics, • sequence of recommendations, • sequence of explanations. 50
  • 169. Thank you very much for you attention 51