Lecture 6-1. normal and extensive form of game theory

Perfect-Information Extensive-Form Games Subgame Perfection Backward Induction
Extensive Form Games
Lecture 7
Extensive Form Games Lecture 7, Slide 1

Lecture Overview
1 Perfect-Information Extensive-Form Games
2 Subgame Perfection
3 Backward Induction

Introduction
The normal form game representation does not incorporate
any notion of sequence, or time, of the actions of the players
The extensive form is an alternative representation that makes
the temporal structure explicit.
Two variants:
perfect information extensive-form games
imperfect-information extensive-form games

Definition
A (finite) perfect-information game (in extensive form) is defined
by the tuple (N, A, H, Z, χ, ρ, σ, u), where:
Players: N is a set of n players

Definition
Players: N
Actions: A is a (single) set of actions

Definition
Players: N
Actions: A
Choice nodes and labels for these nodes:
Choice nodes: H is a set of non-terminal choice nodes

Definition
Players: N
Actions: A
Choice nodes: H
Action function: χ : H → 2A
assigns to each choice node a set
of possible actions

Definition
Players: N
Actions: A
Choice nodes: H
Player function: ρ : H → N assigns to each non-terminal node
h a player i ∈ N who chooses an action at h

Definition
Players: N
Actions: A
Choice nodes: H
Player function: ρ : H → N
Terminal nodes: Z is a set of terminal nodes, disjoint from H

Definition
Players: N
Actions: A
Choice nodes: H
Terminal nodes: Z
Successor function: σ : H × A → H ∪ Z maps a choice node
and an action to a new choice node or terminal node such
that for all h1, h2 ∈ H and a1, a2 ∈ A, if
σ(h1, a1) = σ(h2, a2) then h1 = h2 and a1 = a2
The choice nodes form a tree, so we can identify a node with
its history.

Definition
Players: N
Actions: A
Choice nodes: H
Terminal nodes: Z
Successor function: σ : H × A → H ∪ Z
Utility function: u = (u1, . . . , un); ui : Z → R is a utility
function for player i on the terminal nodes Z

Example: the sharing game
q
q
q
q
q
q
q
q
q
q

HH
HHH
HHH
HH
A
A
A
A
A

A
A
A
A
A

A
A
A
A
A

1
2
2
2
0–2
1–1
2–0
yes
no
yes
no
yes
no
(0,2)
(0,0)
(1,1)
(0,0)
(2,0)
(0,0)

Example: the sharing game
q
q
q
q
q
q
q
q
q
q

HH
HHH
HHH
HH
A
A
A
A
A

A
A
A
A
A

A
A
A
A
A

1
2
2
2
0–2
1–1
2–0
yes
no
yes
no
yes
no
(0,2)
(0,0)
(1,1)
(0,0)
(2,0)
(0,0)
Play as a fun game, dividing 100 dollar coins. (Play each partner
only once.)

Pure Strategies
In the sharing game (splitting 2 coins) how many pure
strategies does each player have?

Pure Strategies
player 1: 3; player 2: 8

Pure Strategies
player 1: 3; player 2: 8
Overall, a pure strategy for a player in a perfect-information
game is a complete specification of which deterministic action
to take at every node belonging to that player.
Definition (pure strategies)
Let G = (N, A, H, Z, χ, ρ, σ, u) be a perfect-information
extensive-form game. Then the pure strategies of player i consist
of the cross product
×
h∈H,ρ(h)=i
χ(h)

Pure Strategies Example
at each choice node, regardless of whether or not it is possible to reach that node given
the other choice nodes. In the Sharing game above the situation is straightforward—
player 1 has three pure strategies, and player 2 has eight (why?). But now consider the
game shown in Figure 5.2.
1
2
2
1
(5,5)
(8,3)
(3,8)
(2,10) (1,0)
A B
C D E F
G H
Figure 5.2 A perfect-information game in extensive form.
In order to define a complete strategy for this game, each of the players must choose
an action at each of his two choice nodes. Thus we can enumerate the pure strategies
of the players as follows.
S1 = {(A, G), (A, H), (B, G), (B, H)}
S2 = {(C, E), (C, F), (D, E), (D, F)}
It is important to note that we have to include the strategies (A, G) and (A, H), even
What are the pure strategies for player 2?

1
2
2
1
(5,5)
(8,3)
(3,8)
(2,10) (1,0)
A B
C D E F
G H
S1 = {(A, G), (A, H), (B, G), (B, H)}
S2 = {(C, E), (C, F), (D, E), (D, F)}
S2 = {(C, E); (C, F); (D, E); (D, F)}

1
2
2
1
(5,5)
(8,3)
(3,8)
(2,10) (1,0)
A B
C D E F
G H
S1 = {(A, G), (A, H), (B, G), (B, H)}
S2 = {(C, E), (C, F), (D, E), (D, F)}
S2 = {(C, E); (C, F); (D, E); (D, F)}
S1 = {(B, G); (B, H), (A, G), (A, H)}
This is true even though, conditional on taking A, the choice
between G and H will never have to be made

Nash Equilibria
Given our new definition of pure strategy, we are able to reuse our
old definitions of:
mixed strategies
best response
Nash equilibrium
Theorem
Every perfect information game in extensive form has a PSNE
This is easy to see, since the players move sequentially.

Induced Normal Form
In fact, the connection to the normal form is even tighter
we can “convert” an extensive-form game into normal form
q
q
q
q
q
q
A
A
A

A
A
A

A
A
A

yes
no
yes
no
yes
o
(0,2)
(0,0)
(1,1)
(0,0)
(2,0)
,0)
Figure 5.1 The Sharing game.
at the definition contains a subtlety. An agent’s strategy requires a decision
ice node, regardless of whether or not it is possible to reach that node given
hoice nodes. In the Sharing game above the situation is straightforward—
s three pure strategies, and player 2 has eight (why?). But now consider the
n in Figure 5.2.
1
2
2
1
(5,5)
(8,3)
(3,8)
(2,10) (1,0)
A B
C D E F
G H
o define a complete strategy for this game, each of the players must choose
each of his two choice nodes. Thus we can enumerate the pure strategies
rs as follows.
A, G), (A, H), (B, G), (B, H)}

Induced Normal Form
q
q
q
q
q
q
A
A
A

A
A
A

A
A
A

yes
no
yes
no
yes
o
(0,2)
(0,0)
(1,1)
(0,0)
(2,0)
,0)
n in Figure 5.2.
1
2
2
1
(5,5)
(8,3)
(3,8)
(2,10) (1,0)
A B
C D E F
G H
rs as follows.
A, G), (A, H), (B, G), (B, H)}
CE CF DE DF
AG 3, 8 3, 8 8, 3 8, 3
AH 3, 8 3, 8 8, 3 8, 3
BG 5, 5 2, 10 5, 5 2, 10
BH 5, 5 1, 0 5, 5 1, 0

Induced Normal Form
q
q
q
q
q
q A

A

A

(0,2)
(0,0)
(1,1)
(0,0)
(2,0)
,0)
n in Figure 5.2.
1
2
2
1
(5,5)
(8,3)
(3,8)
(2,10) (1,0)
A B
C D E F
G H
rs as follows.
A, G), (A, H), (B, G), (B, H)}
C, E), (C, F), (D, E), (D, F)}
CE CF DE DF
AG 3, 8 3, 8 8, 3 8, 3
AH 3, 8 3, 8 8, 3 8, 3
BG 5, 5 2, 10 5, 5 2, 10
BH 5, 5 1, 0 5, 5 1, 0
this illustrates the lack of compactness of the normal form
games aren’t always this small
even here we write down 16 payoff pairs instead of 5

Induced Normal Form
(0,2)
(0,0)
(1,1)
(0,0)
(2,0)
,0)
n in Figure 5.2.
1
2
2
1
(5,5)
(8,3)
(3,8)
(2,10) (1,0)
A B
C D E F
G H
rs as follows.
A, G), (A, H), (B, G), (B, H)}
C, E), (C, F), (D, E), (D, F)}
ant to note that we have to include the strategies (A, G) and (A, H), even
CE CF DE DF
AG 3, 8 3, 8 8, 3 8, 3
AH 3, 8 3, 8 8, 3 8, 3
BG 5, 5 2, 10 5, 5 2, 10
BH 5, 5 1, 0 5, 5 1, 0
while we can write any extensive-form game as a NF, we can’t
do the reverse.
e.g., matching pennies cannot be written as a
perfect-information extensive form game

Induced Normal Form
(0,2)
(0,0)
(1,1)
(0,0)
(2,0)
,0)
n in Figure 5.2.
1
2
2
1
(5,5)
(8,3)
(3,8)
(2,10) (1,0)
A B
C D E F
G H
rs as follows.
A, G), (A, H), (B, G), (B, H)}
C, E), (C, F), (D, E), (D, F)}
CE CF DE DF
AG 3, 8 3, 8 8, 3 8, 3
AH 3, 8 3, 8 8, 3 8, 3
BG 5, 5 2, 10 5, 5 2, 10
BH 5, 5 1, 0 5, 5 1, 0
What are the (three) pure-strategy equilibria?

Induced Normal Form
(0,2)
(0,0)
(1,1)
(0,0)
(2,0)
,0)
n in Figure 5.2.
1
2
2
1
(5,5)
(8,3)
(3,8)
(2,10) (1,0)
A B
C D E F
G H
rs as follows.
A, G), (A, H), (B, G), (B, H)}
C, E), (C, F), (D, E), (D, F)}
CE CF DE DF
AG 3, 8 3, 8 8, 3 8, 3
AH 3, 8 3, 8 8, 3 8, 3
BG 5, 5 2, 10 5, 5 2, 10
BH 5, 5 1, 0 5, 5 1, 0
What are the (three) pure-strategy equilibria?
(A, G), (C, F)
(A, H), (C, F)
(B, H), (C, E)

Lecture Overview

Subgame Perfection
Notice that the definition contains a subtlety. An agent’s strategy requires a decision
1
2
2
1
(5,5)
(8,3)
(3,8)
(2,10) (1,0)
A B
C D E F
G H
S1 = {(A, G), (A, H), (B, G), (B, H)}
S2 = {(C, E), (C, F), (D, E), (D, F)}
though once A is chosen the G-versus-H choice is moot.
The definition of best response and Nash equilibria in this game are exactly as they
are in for normal form games. Indeed, this example illustrates how every perfect-
information game can be converted to an equivalent normal form game. For example,
the perfect-information game of Figure 5.2 can be converted into the normal form im-
age of the game, shown in Figure 5.3. Clearly, the strategy spaces of the two games are
Multi Agent Systems, draft of September 19, 2006
There’s something intuitively wrong with the equilibrium
(B, H), (C, E)
Why would player 1 ever choose to play H if he got to the
second choice node?
After all, G dominates H for him

Subgame Perfection
1
2
2
1
(5,5)
(8,3)
(3,8)
(2,10) (1,0)
A B
C D E F
G H
S1 = {(A, G), (A, H), (B, G), (B, H)}
S2 = {(C, E), (C, F), (D, E), (D, F)}
There’s something intuitively wrong with the equilibrium
(B, H), (C, E)
Why would player 1 ever choose to play H if he got to the
second choice node?
After all, G dominates H for him
He does it to threaten player 2, to prevent him from choosing
F, and so gets 5
However, this seems like a non-credible threat
If player 1 reached his second decision node, would he really
follow through and play H?

Formal Definition
Definition (subgame of G rooted at h)
The subgame of G rooted at h is the restriction of G to the
descendents of H.
Definition (subgames of G)
The set of subgames of G is defined by the subgames of G rooted
at each of the nodes in G.
s is a subgame perfect equilibrium of G iff for any subgame
G0 of G, the restriction of s to G0 is a Nash equilibrium of G0
Notes:
since G is its own subgame, every SPE is a NE.
this definition rules out “non-credible threats”

Which equilibria are subgame perfect?
q
q
q
q
q
q A

A

A

(0,2)
(0,0)
(1,1)
(0,0)
(2,0)
(0,0)
1
2
2
1
(5,5)
(8,3)
(3,8)
(2,10) (1,0)
A B
C D E F
G H
S1 = {(A, G), (A, H), (B, G), (B, H)}
S2 = {(C, E), (C, F), (D, E), (D, F)}
Which equilibria from the example are subgame perfect?
(A, G), (C, F):
(B, H), (C, E):
(A, H), (C, F):

q
q
q
q
q
q A

A

A

(0,2)
(0,0)
(1,1)
(0,0)
(2,0)
(0,0)
1
2
2
1
(5,5)
(8,3)
(3,8)
(2,10) (1,0)
A B
C D E F
G H
S1 = {(A, G), (A, H), (B, G), (B, H)}
S2 = {(C, E), (C, F), (D, E), (D, F)}
(A, G), (C, F): is subgame perfect
(B, H), (C, E):
(A, H), (C, F):

(0,2)
(0,0)
(1,1)
(0,0)
(2,0)
(0,0)
1
2
2
1
(5,5)
(8,3)
(3,8)
(2,10) (1,0)
A B
C D E F
G H
S1 = {(A, G), (A, H), (B, G), (B, H)}
S2 = {(C, E), (C, F), (D, E), (D, F)}
(B, H), (C, E): (B, H) is an non-credible threat; not subgame
perfect
(A, H), (C, F):

1
2
2
1
(5,5)
(8,3)
(3,8)
(2,10) (1,0)
A B
C D E F
G H
S1 = {(A, G), (A, H), (B, G), (B, H)}
S2 = {(C, E), (C, F), (D, E), (D, F)}
(B, H), (C, E): (B, H) is an non-credible threat; not subgame
perfect
(A, H), (C, F): (A, H) is also non-credible, even though H is
“off-path”

Lecture Overview

Computing Subgame Perfect Equilibria
Idea: Identify the equilibria in the bottom-most trees, and adopt
these as one moves up the tree
than possibly finding a Nash equilibrium that involves non-credible threats) but also
this procedure is computationally simple. In particular, it can be implemented as a
single depth-first traversal of the game tree, and thus requires time linear in the size
of the game representation. Recall in contrast that the best known methods for finding
Nash equilibria of general games require time exponential in the size of the normal
form; remember as well that the induced normal form of an extensive-form game is
exponentially larger than the original representation.
function BACKWARDINDUCTION (node h) returns u(h)
if h ∈ Z then
return u(h) // h is a terminal node
best util ← −∞
forall a ∈ χ(h) do
util at child ←BACKWARDINDUCTION(σ(h, a))
if util at childρ(h) best utilρ(h) then
best util ← util at child
return best util
Figure 5.6: Procedure for finding the value of a sample (subgame-perfect) Nash equi-
librium of a perfect-information extensive-form game.
The algorithm BACKWARDINDUCTION is described in Figure 5.6. The variable
util at child is a vector denoting the utility for each player at the child node; util at childρ(h)
denotes the element of this vector corresponding to the utility for player ρ(h) (the
player who gets to move at node h). Similarly best util is a vector giving utilities for
each player.
Observe that this procedure does not return an equilibrium strategy for each of the
n players, but rather describes how to label each node with a vector of n real numbers.
This labeling can be seen as an extension of the game’s utility function to the non-
util at child is a vector denoting the utility for each player
the procedure doesn’t return an equilibrium strategy, but rather
labels each node with a vector of real numbers.
This labeling can be seen as an extension of the game’s utility
function to the non-terminal nodes
The equilibrium strategies: take the best action at each node.

Computing Subgame Perfect Equilibria
Idea: Identify the equilibria in the bottom-most trees, and adopt
these as one moves up the tree
good news: not only are we guaranteed to find a subgame-perfect equilibrium (rather
than possibly finding a Nash equilibrium that involves non-credible threats) but also
this procedure is computationally simple. In particular, it can be implemented as a
single depth-first traversal of the game tree, and thus requires time linear in the size
of the game representation. Recall in contrast that the best known methods for finding
Nash equilibria of general games require time exponential in the size of the normal
form; remember as well that the induced normal form of an extensive-form game is
exponentially larger than the original representation.
function BACKWARDINDUCTION (node h) returns u(h)
if h ∈ Z then
return u(h) // h is a terminal node
best util ← −∞
util at child ←BACKWARDINDUCTION(σ(h, a))
if util at childρ(h) best utilρ(h) then
best util ← util at child
return best util
Figure 5.6: Procedure for finding the value of a sample (subgame-perfect) Nash equi-
librium of a perfect-information extensive-form game.
The algorithm BACKWARDINDUCTION is described in Figure 5.6. The variable
util at child is a vector denoting the utility for each player at the child node; util at childρ(h)
denotes the element of this vector corresponding to the utility for player ρ(h) (the
player who gets to move at node h). Similarly best util is a vector giving utilities for
each player.
Observe that this procedure does not return an equilibrium strategy for each of the
n players, but rather describes how to label each node with a vector of n real numbers.
This labeling can be seen as an extension of the game’s utility function to the non-
For zero-sum games, BackwardInduction has another name:
the minimax algorithm.
Here it’s enough to store one number per node.
It’s possible to speed things up by pruning nodes that will
never be reached in play: “alpha-beta pruning”.

126 5 Games with Sequential Actions: Reasoning and Computing with the Extensive Form
function ALPHABETAPRUNING (node h, real α, real β) returns u1(h)
if h ∈ Z then
return u1(h) // h is a terminal node
best_util ← (2ρ(h) − 3) × ∞ // −∞ for player 1; ∞ for player 2
if ρ(h) = 1 then
best_util ← max(best_util, ALPHABETAPRUNING(σ(h, a), α, β))
if best_util ≥ β then
return best_util
α ← max(α, best_util)
else
best_util ← min(best_util, ALPHABETAPRUNING(σ(h, a), α, β))
if best_util ≤ α then
return best_util
β ← min(β, best_util)
return best_util
Figure 5.7: The alpha-beta pruning algorithm. It is invoked at the root node h as
ALPHABETAPRUNING(h, −∞, ∞).
previously encountered node that their corresponding player (player 1 for α and
player 2 for β) would most prefer to choose instead of h. For example, consider
the variable β at some node h. Now consider all the different choices that player
2 could make at ancestors of h that would prevent h from ever being reached, and
that would ultimately lead to previously encountered terminal nodes. β is the best
value that player 2 could obtain at any of these terminal nodes. Because the players
do not have any alternative to starting at the root of the tree, at the beginning of the
search α = −∞ and β = ∞.
We can now concentrate on the important difference between BACKWARDIN-
DUCTION and ALPHABETAPRUNING: in the latter procedure, the search can back-
track at a node that is not terminal. Let us think about things from the point of view
of player 1, who is considering what action to play at node h. (As we encourage
you to check for yourself, a similar argument holds when it is player 2’s turn to
move at node h.) For player 1, this backtracking occurs on the line that reads “if
best_util ≥ β then return best_util.” What is going on here? We have just ex-
plored some, but not all, of the children of player 1’s decision node h; the highest
value among these explored nodes is best_util. The value of node h is therefore
lower bounded by best_util (it is best_util if h has no children with larger values,
and is some larger amount otherwise). Either way, if best_util ≥ β then player
1 knows that player 2 prefers choosing his best alternative (at some ancestor node
of h) rather than allowing player 1 to act at node h. Thus node h cannot be on
Uncorrected manuscript of Multiagent Systems, published by Cambridge University Press
Revision 1.1 © Shoham Leyton-Brown, 2009, 2010.

Lecture 6-1. normal and extensive form of game theory

More Related Content

Similar to Lecture 6-1. normal and extensive form of game theory (20)

More from Mohammad732983 (9)

Recently uploaded (20)

Lecture 6-1. normal and extensive form of game theory