SlideShare a Scribd company logo
1/33
Scalable Learning of Intrusion Response
through Recursive Decomposition
GameSec 2023, Avignon, France
Conference on Decision and Game Theory for Security
Kim Hammar & Rolf Stadler
kimham@kth.se
Division of Network and Systems Engineering
KTH Royal Institute of Technology
Oct 18, 2023
2/33
Use Case: Intrusion Response
I A defender owns an infrastructure
I Consists of connected components
I Components run network services
I Defender defends the infrastructure
by monitoring and active defense
I Has partial observability
I An attacker seeks to intrude on the
infrastructure
I Has a partial view of the
infrastructure
I Wants to compromise specific
components
I Attacks by reconnaissance,
exploitation and pivoting
Attacker Clients
. . .
Defender
1 IPS
1
alerts
Gateway
7 8 9 10 11
6
5
4
3
2
12
13 14 15 16
17
18
19
21
23
20
22
24
25 26
27 28 29 30 31
3/33
System Model
I G = h{gw} ∪ V, Ei: directed tree
representing the virtual infrastructure
I V: finite set of virtual components.
I E: finite set of component
dependencies.
I Z: finite set of zones.
r&d zone
App servers Honeynet
dmz
admin
zone
workflow
Gateway idps
quarantine
zone
alerts
Defender
. . .
Attacker Clients
2
1
3 12
4
5
6
7
8
9
10
11
13
14
15
16
17
18
19
20
21
22 23
24
25
26
27
28
29
30 31
32
33 34 35 36 37 38 39 40
41 42 43 44 45 46 47 48
49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64
4/33
State Space
I Each i ∈ V has a state
vt,i = (v
(Z)
t,i
|{z}
D
, v
(I)
t,i , v
(R)
t,i
| {z }
A
)
I System state st = (vt,i )i∈V ∼ St.
I Markovian time-homogeneous
dynamics:
st+1 ∼ f (· | St, At)
At = (A
(A)
t , A
(D)
t ) are the actions.
s1 s2 s3
s4 s5 s4
.
.
.
.
.
.
.
.
.
4/33
State Model
I Each i ∈ V has a state
vt,i = (v
(Z)
t,i
|{z}
D
, v
(I)
t,i , v
(R)
t,i
| {z }
A
)
I System state st = (vt,i )i∈V ∼ St.
I Markovian time-homogeneous
dynamics:
st+1 ∼ f (· | St, At)
At = (A
(A)
t , A
(D)
t ) are the actions.
s1 s2 s3
s4 s5 s4
.
.
.
.
.
.
.
.
.
4/33
State Model
I Each i ∈ V has a state
vt,i = (v
(Z)
t,i
|{z}
D
, v
(I)
t,i , v
(R)
t,i
| {z }
A
)
I System state st = (vt,i )i∈V ∼ St.
I Markovian time-homogeneous
dynamics:
st+1 ∼ f (· | St, At)
At = (A
(A)
t , A
(D)
t ) are the actions.
s1 s2 s3
s4 s5 s4
.
.
.
.
.
.
.
.
.
5/33
Workflows
I Services are connected into workflows W = {w1, . . . , w|W|}.
5/33
Workflows
I Services are connected into workflows W = {w1, . . . , w|W|}.
gw fw idps lb
http
servers
auth
server
search
engine
db
cache
Dependency graph of an example workflow representing a web
application; gw, fw, idps, lb, and db are acronyms for gateway,
firewall, intrusion detection and prevention system, load balancer, and
database, respectively.
6/33
Workflows
I Services are connected into
workflows
W = {w1, . . . , w|W|}.
I Each w ∈ W is realized as a
subtree Gw = h{gw} ∪ Vw, Ewi
of G
I W = {w1, . . . , w|W|} induces a
partitioning
V =
[
wi ∈W
Vwi such that i 6= j =⇒ Vwi ∩ Vwj = ∅
Zone a
Zone b Zone c
gw
1 2 3
4 5 6
7
A workflow tree
6/33
Workflow
I Services are connected into
workflows
W = {w1, . . . , w|W|}.
I Each w ∈ W is realized as a
subtree Gw = h{gw} ∪ Vw, Ewi
of G
I W = {w1, . . . , w|W|} induces a
partitioning
V =
[
wi ∈W
Vwi such that i 6= j =⇒ Vwi ∩ Vwj = ∅
Zone a
Zone b Zone c
gw
1 2 3
4 5 6
7
A workflow tree
7/33
Clients
Client population
. . .
Arrival rate λ Departure
Service time µ
. . .
.
.
.
.
.
.
.
.
.
w1 w2 w|W|
Workflows (Markov processes)
I Homogeneous client population
I Clients arrive according to Po(λ), Service times Exp(1
µ)
I Workflow selection: uniform
I Workflow interaction: Markov process
8/33
Observations
I idpss inspect network traffic and
generate alert vectors:
ot ,

ot,1, . . . , ot,|V|

∈ N
|V|
0
ot,i is the number of alerts related to
node i ∈ V at time-step t.
I ot = (ot,1, . . . , ot,|V|) is a realization
of the random vector Ot with joint
distribution Z
idps
idps
idps
idps
alerts
Defender
. . .
Attacker Clients
2
1
3 12
4
5
6
7
8
9
10
11
13
14
15
16
17
18
19
20
21
22 23
24
25
26
27
28
29
30 31
32
33 34 35 36 37 38 39 40
41 42 43 44 45 46 47 48
49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64
8/33
Observations
I idpss inspect network traffic and
generate alert vectors:
ot ,

ot,1, . . . , ot,|V|

∈ N
|V|
0
ot,i is the number of alerts related to
node i ∈ V at time-step t.
I ot = (ot,1, . . . , ot,|V|) is a realization
of the random vector Ot with joint
distribution Z
idps
idps
idps
idps
alerts
Defender
. . .
Attacker Clients
2
1
3 12
4
5
6
7
8
9
10
11
13
14
15
16
17
18
19
20
21
22 23
24
25
26
27
28
29
30 31
32
33 34 35 36 37 38 39 40
41 42 43 44 45 46 47 48
49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64
9/33
probability
ZO1
ZO2
ZO3
ZO4
ZO5
ZO6
ZO7
ZO8
probability
ZO9
ZO10
ZO11
ZO12
ZO13
ZO14
ZO15
ZO16
probability
ZO17
ZO18
ZO19
ZO20
ZO21
ZO22
ZO23
ZO24
probability
ZO25
ZO26
ZO27
ZO28
ZO29
ZO30
ZO31
ZO32
probability
ZO33
ZO34
ZO35
ZO36
ZO37
ZO38
ZO39
ZO40
probability
ZO41
ZO42
ZO43
ZO44
ZO45
ZO46
ZO47
ZO48
probability
ZO49
ZO50
ZO51
ZO52
ZO53
ZO54
ZO55
ZO56
250 500 750
O
probability
ZO57
250 500 750
O
ZO58
250 500 750
O
ZO59
250 500 750
O
ZO60
250 500 750
O
ZO61
250 500 750
O
ZO62
250 500 750
O
ZO63
250 500 750
O
ZO64
Distributions of # alerts weighted by priority ZOi
(Oi | S
(D)
i , A
(A)
i ) per node i ∈ V
no intrusion intrusion
10/33
Defender
I Defender action:
a
(D)
t ∈ {0, 1, 2, 3, 4}|V|
I 0 means do nothing. 1 − 4 correspond
to defensive actions (see fig)
I A defender strategy is a function
πD ∈ ΠD : HD → ∆(AD), where
h
(D)
t = (s
(D)
1 , a
(D)
1 , o1, . . . , a
(D)
t−1, s
(D)
t , ot) ∈ HD
I Objective: (i) maintain workflows; and
(ii) stop a possible intrusion:
J ,
T
X
t=1
γt−1
η
|W|
X
i=1
uW(wi , st)
| {z }
workflows utility
− (1 − η)
|V|
X
j=1
cI(st,j, at,j)
| {z }
intrusion and defense costs
!
dmz
rd
zone
admin
zone
Old path
New path
Honeypot App server
Defender
Revoke
certificates
Blacklist
IP
1) Server migration 2) Flow migration and blocking
3) Shut down server 4) Access control
10/33
Defender
I Defender action:
a
(D)
t ∈ {0, 1, 2, 3, 4}|V|
I 0 means do nothing. 1 − 4 correspond
to defensive actions (see fig)
I A defender strategy is a function
πD ∈ ΠD : HD → ∆(AD), where
h
(D)
t = (s
(D)
1 , a
(D)
1 , o1, . . . , a
(D)
t−1, s
(D)
t , ot) ∈ HD
I Objective: (i) maintain workflows; and
(ii) stop a possible intrusion:
J ,
T
X
t=1
γt−1
η
|W|
X
i=1
uW(wi , st)
| {z }
workflows utility
− (1 − η)
|V|
X
j=1
cI(st,j, at,j)
| {z }
intrusion and defense costs
!
dmz
rd
zone
admin
zone
Old path
New path
Honeypot App server
Defender
Revoke
certificates
Blacklist
IP
1) Server migration 2) Flow migration and blocking
3) Shut down server 4) Access control
10/33
Defender
I Defender action:
a
(D)
t ∈ {0, 1, 2, 3, 4}|V|
I 0 means do nothing. 1 − 4 correspond
to defensive actions (see fig)
I A defender strategy is a function
πD ∈ ΠD : HD → ∆(AD), where
h
(D)
t = (s
(D)
1 , a
(D)
1 , o1, . . . , a
(D)
t−1, s
(D)
t , ot) ∈ HD
I Objective: (i) maintain workflows; and
(ii) stop a possible intrusion:
J ,
T
X
t=1
γt−1
η
|W|
X
i=1
uW(wi , st)
| {z }
workflows utility
− (1 − η)
|V|
X
j=1
cI(st,j, at,j)
| {z }
intrusion and defense costs
!
dmz
rd
zone
admin
zone
Old path
New path
Honeypot App server
Defender
Revoke
certificates
Blacklist
IP
1) Server migration 2) Flow migration and blocking
3) Shut down server 4) Access control
11/33
Attacker
I Attacker action: a
(A)
t ∈ {0, 1, 2, 3}|V|
I 0 means do nothing. 1 − 3 correspond
to attacks (see fig)
I An attacker strategy is a function
πA ∈ ΠA : HA → ∆(AA), where HA
is the space of all possible attacker
histories
h
(A)
t = (s
(A)
1 , a
(A)
1 , o1, . . . , a
(A)
t−1, s
(A)
t , ot) ∈ HA
I Objective: (i) disrupt workflows; and
(ii) compromise nodes:
− J
.
.
.
Attacker
login attempts
configure
Automated
system
Server
2) Brute-force
1) Reconnaissance
3) Code execution
Attacker Server
TCP SYN
TCP SYN ACK
port open
Attacker Service Server
malicious
request inject code
execution
11/33
Attacker
I Attacker action: a
(A)
t ∈ {0, 1, 2, 3}|V|
I 0 means do nothing. 1 − 3 correspond
to attacks (see fig)
I An attacker strategy is a function
πA ∈ ΠA : HA → ∆(AA), where HA
is the space of all possible attacker
histories
h
(A)
t = (s
(A)
1 , a
(A)
1 , o1, . . . , a
(A)
t−1, s
(A)
t , ot) ∈ HA
I Objective: (i) disrupt workflows; and
(ii) compromise nodes:
− J
.
.
.
Attacker
login attempts
configure
Automated
system
Server
2) Brute-force
1) Reconnaissance
3) Code execution
Attacker Server
TCP SYN
TCP SYN ACK
port open
Attacker Service Server
malicious
request inject code
execution
11/33
Attacker
I Attacker action: a
(A)
t ∈ {0, 1, 2, 3}|V|
I 0 means do nothing. 1 − 3 correspond
to attacks (see fig)
I An attacker strategy is a function
πA ∈ ΠA : HA → ∆(AA), where HA
is the space of all possible attacker
histories
h
(A)
t = (s
(A)
1 , a
(A)
1 , o1, . . . , a
(A)
t−1, s
(A)
t , ot) ∈ HA
I Objective: (i) disrupt workflows; and
(ii) compromise nodes:
− J
.
.
.
Attacker
login attempts
configure
Automated
system
Server
2) Brute-force
1) Reconnaissance
3) Code execution
Attacker Server
TCP SYN
TCP SYN ACK
port open
Attacker Service Server
malicious
request inject code
execution
12/33
The Intrusion Response Problem
maximize
πD∈ΠD
minimize
πA∈ΠA
E(πD,πA) [J] (1a)
subject to s
(D)
t+1 ∼ fD · | A
(D)
t , A
(D)
t

∀t (1b)
s
(A)
t+1 ∼ fA · | S
(A)
t , At

∀t (1c)
ot+1 ∼ Z · | S
(D)
t+1, A
(A)
t ) ∀t (1d)
a
(A)
t ∼ πA · | H
(A)
t

, a
(A)
t ∈ AA(st) ∀t (1e)
a
(D)
t ∼ πD · | H
(D)
t

, a
(D)
t ∈ AD ∀t (1f)
E(πD,πA) denotes the expectation of the random vectors
(St, Ot, At)t∈{1,...,T} when following the strategy profile (πD, πA).
(1) can be formulated as a zero-sum Partially Observed Stochastic
Game with Public Observations (a PO-POSG):
Γ = hN, (Si )i∈N , (Ai )i∈N , (fi )i∈N , u, γ, (b
(i)
1 )i∈N , O, Zi
13/33
Existence of a Solution
Theorem
Given the po-posg Γ (2), the following holds:
(A) Γ has a mixed Nash equilibrium and a value function
V ∗ : BD × BA → R that maps each possible initial pair of
belief states (b
(D)
1 , b
(A)
1 ) to the expected utility of the
defender in the equilibrium.
(B) For each strategy pair (πA, πD) ∈ ΠA × ΠD, the best response
sets BD(πA) and BA(πD) are non-empty and correspond to
optimal strategies in two Partially Observed Markov Decision
Processes (pomdps): M (D) and M (A). Further, a pair of
pure best response strategies (π̃D, π̃A) ∈ BD(πA) × BA(πD)
and a pair of value functions (V ∗
D,πA
, V ∗
A,πD
) exist.
14/33
The Curse of Dimensionality
I While Γ has a value, computing it is intractable. The state,
action, and observation spaces of the game grow
exponentially with |V|.
1 2 3 4 5
104
105
2
105
|S|
|O|
|Ai |
|V|
Growth of |S|, |O|, and |Ai | in function of the number of nodes |V|
14/33
The Curse of Dimensionality
I While (1) has a solution (i.e the game Γ has a value (Thm
1)), computing it is intractable since the state, action, and
observation spaces of the game grow exponentially with |V|.
1 2 3 4 5
104
105
2
105
|S|
|O|
|Ai |
|V|
Growth of |S|, |O|, and |Ai | in function of the number of nodes |V|
We tackle the scability challenge with decomposition
15/33
Intuitively..
rd zone
App servers Honeynet
dmz
admin
zone
workflow
Gateway idps
quarantine
zone
alerts
Defender
. . .
Attacker Clients
2
1
3 12
4
5
6
7
8
9
10
11
13
14
15
16
17
18
19
20
21
22 23
24
25
26
27
28
29
30 31
32
33 34 35 36 37 38 39 40
41 42 43 44 45 46 47 48
49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 65
The optimal
action here...
Does not directly
depend on the state or
action of a node
down here
15/33
Intuitively..
rd zone
App servers Honeynet
dmz
admin
zone
workflow
Gateway idps
quarantine
zone
alerts
Defender
. . .
Attacker Clients
2
1
3 12
4
5
6
7
8
9
10
11
13
14
15
16
17
18
19
20
21
22 23
24
25
26
27
28
29
30 31
32
33 34 35 36 37 38 39 40
41 42 43 44 45 46 47 48
49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 65
The optimal
action here...
But they are
not completely
independent either.
How can we
exploit this
structure?
Does not directly
depend on the state
or action of a node
down here
16/33
Our Approach: System Decomposition
To avoid explicitly enumerating the very large state, observation,
and action spaces of Γ, we exploit three structural properties.
1. Additive structure across workflows.
I The game decomposes into additive subgames on the
workflow-level, which means that the strategy for each
subgame can be optimized independently
2. Optimal substructure within a workflow.
I The subgame for each workflow decomposes into subgames on
the node-level that satisfy the optimal substructure property
3. Threshold properties of local defender strategies.
I The optimal node-level strategies for the defender exhibit
threshold structures, which means that they can be estimated
efficiently
16/33
Our Approach: System Decomposition
To avoid explicitly enumerating the very large state, observation,
and action spaces of Γ, we exploit three structural properties.
1. Additive structure across workflows.
I The game decomposes into additive subgames on the
workflow-level, which means that the strategy for each
subgame can be optimized independently
2. Optimal substructure within a workflow.
I The subgame for each workflow decomposes into subgames on
the node-level that satisfy the optimal substructure property
3. Threshold properties of local defender strategies.
I The optimal node-level strategies for the defender exhibit
threshold structures, which means that they can be estimated
efficiently
17/33
Additive Structure Across Workflows (Intuition)
“=”
I If there is no path between i and j in G, then i and j are
independent in the following sense:
I Compromising i has no affect on the state of j.
I Compromising i does not make it harder or easier to
compromise j.
I Compromising i does not affect the service provided by j.
I Defending i does not affect the state of j.
I Defending i does not affect the service provided by j.
18/33
Additive Structure Across Workflows
Definition (Transition independence)
A set of nodes Q are transition independent iff the transition
probabilities factorize as
f (St+1 | St, At) =
Y
i∈Q
f (St+1,i | St,i , At,i )
Definition (Utility independence)
A set of nodes Q are utility independent iff there exists functions
u1, . . . , u|Q| such that the utility function u decomposes as
u(St, At) = f (u1(St,1, At,1), . . . , u1(St,|Q|, At,Q))
and
ui ≤ u0
i ⇐⇒ f (u1, . . . , ui , . . . , u|Q|) ≤ f (u1, . . . , u0
i , . . . , u|Q|)
19/33
Additive Structure Across Workflows
Theorem (Additive structure across workflows)
(A) All nodes V in the game Γ are transition independent.
(B) If there is no path between i and j in the topology graph G,
then i and j are utility independent.
Corollary
Γ decomposes into |W| additive subproblems that can be solved
independently and in parallel.
π
(w1)
k
π
(w2)
k
π
(w|W|)
k
ot,w1
ot,w2
ot,w|W|
.
.
.
⊕
a
(k)
w1
a
(k)
w2
a
(k)
w|W|
a
(k)
t
20/33
Additive Structure Across Workflows: Example
Zone 1 Zone 2
1
2
3
gw
w1
w2
V = {1, 2, 3},
E = {(1, 2)},
W = {w1, w2},
Z = {1, 2}
a) IT infrastructure b) Transition dependencies
St, At St+1, Ot+1
S
(D)
t+1,1
S
(D)
t,1
A
(D)
t,1 S
(A)
t+1,1
S
(A)
t,1
Ot,1
A
(A)
t,1
S
(D)
t+1,2
S
(D)
t,2
A
(D)
t,2 S
(A)
t+1,2
S
(A)
t,2
Ot,2
A
(A)
t,2
S
(D)
t+1,3
S
(D)
t,3
A
(D)
t,3 S
(A)
t+1,3
S
(A)
t,3
Ot,1
A
(A)
t,3
c) Utility dependencies
St, At Ut
S
(D)
t,1
A
(D)
t,1
Ut,1
S
(A)
t,1
S
(D)
t,2
A
(D)
t,2
Ut,2
S
(A)
t,2
S
(D)
t,3
A
(D)
t,3
Ut,3
S
(A)
t,3
21/33
Our Approach: System Decomposition
To avoid explicitly enumerating the very large state, observation,
and action spaces of Γ, we exploit three structural properties.
1. Additive structure across workflows.
I The game decomposes into additive subgames on the
workflow-level, which means that the strategy for each
subgame can be optimized independently
2. Optimal substructure within a workflow.
I The subgame for each workflow decomposes into subgames on
the node-level that satisfy the optimal substructure property
3. Threshold properties of local defender strategies.
I The optimal node-level strategies for the defender exhibit
threshold structures, which means that they can be estimated
efficiently
22/33
Optimal Substructure Within a Workflow
I Nodes in the same workflow are utility
dependent.
I =⇒ Locally-optimal strategies for
each node can not simply be added
together to obtain an optimal strategy
for the workflow.
I However, the locally-optimal strategies
satisfy the optimal substructure
property.
I =⇒ there exists an algorithm for
constructing an optimal workflow
strategy from locally-optimal
strategies for each node.
Zone 1 Zone 2
1
2
3
gw
w1
w2
V = {1, 2, 3},
E = {(1, 2)},
W = {w1, w2},
Z = {1, 2}
IT infrastructure
Utility dependencies
St, At Ut
S
(D)
t,1
A
(D)
t,1
Ut,1
S
(A)
t,1
S
(D)
t,2
A
(D)
t,2
Ut,2
S
(A)
t,2
S
(D)
t,3
A
(D)
t,3
Ut,3
S
(A)
t,3
22/33
Optimal Substructure Within a Workflow
I Nodes in the same workflow are utility
dependent.
I =⇒ Locally-optimal strategies for
each node can not simply be added
together to obtain an optimal strategy
for the workflow.
I However, the locally-optimal strategies
satisfy the optimal substructure
property.
I =⇒ there exists an algorithm for
constructing an optimal workflow
strategy from locally-optimal
strategies for each node.
Zone 1 Zone 2
1
2
3
gw
w1
w2
V = {1, 2, 3},
E = {(1, 2)},
W = {w1, w2},
Z = {1, 2}
IT infrastructure
Utility dependencies
St, At Ut
S
(D)
t,1
A
(D)
t,1
Ut,1
S
(A)
t,1
S
(D)
t,2
A
(D)
t,2
Ut,2
S
(A)
t,2
S
(D)
t,3
A
(D)
t,3
Ut,3
S
(A)
t,3
23/33
Algorithm for Combining Locally-Optimal Node Strategies
into Optimal Workflow Strategies
π
(2)
D
π
(1)
D
π
(3)
D
π
(4)
D
π
(5)
D
π
(6)
D
(π
(i)
D )i∈Vw : local strategies in the same workflow w ∈ W
24/33
Algorithm for Combining Locally-Optimal Node Strategies
into Optimal Workflow Strategies
π
(2)
D
π
(1)
D
π
(3)
D
π
(4)
D
π
(5)
D
π
(6)
D
(π
(i)
D )i∈Vw : local strategies in the same workflow w ∈ W
24/33
Algorithm for Combining Locally-Optimal Node Strategies
into Optimal Workflow Strategies
π
(2)
D
π
(1)
D
π
(3)
D
π
(4)
D
π
(5)
D
π
(6)
D
Can redefine the utility function for each node i
to take into account the utility impact on its ancestors.
e.g. utility of node 6 need to include utility impact for 1, 3, 5.
24/33
Algorithm for Combining Locally-Optimal Node Strategies
into Optimal Workflow Strategies
π
(2)
D
π
(1)
D
π
(3)
D
π
(4)
D
π
(5)
D
π
(6)
D
Can prove that this utility transformation makes the nodes utility independent.
=⇒ Optimal substructure.
25/33
Our Approach: System Decomposition
To avoid explicitly enumerating the very large state, observation,
and action spaces of Γ, we exploit three structural properties.
1. Additive structure across workflows.
I The game decomposes into additive subgames on the
workflow-level, which means that the strategy for each
subgame can be optimized independently
2. Optimal substructure within a workflow.
I The subgame for each workflow decomposes into subgames on
the node-level that satisfy the optimal substructure property
3. Threshold properties of local defender strategies.
I The optimal node-level strategies for the defender exhibit
threshold structures, which means that they can be estimated
efficiently
26/33
Threshold Properties of Local Defender Strategies.
I The local problem of the defender can be decomposed in the
temporal domain as
max
πD
T
X
t=1
J = max
πD
τ1
X
t=1
J1 +
τ2
X
t=1
J2 + . . . (2)
where τ1, τ2, . . . are stopping times.
I =⇒ (1) selection of defensive actions is simplified; and (2)
the optimal stopping times are given by a threshold strategy
that can be estimated efficiently:
Belief space B
(j)
D
Switching curve
Υ
Continuation set
C
Stopping set
S
(1, 0, 0)
j healthy
(0, 1, 0)
j discovered
(0, 0, 1)
j compromised
27/33
Threshold Properties of Local Defender Strategies.
Belief space B
(j)
D
Switching curve
Υ
Continuation set
C
Stopping set
S
(1, 0, 0)
j healthy
(0, 1, 0)
j discovered
(0, 0, 1)
j compromised
I A node can be in three attack states s
(A)
t : Healthy,
Discovered, Compromised.
I The defender has a belief state b
(D)
t
28/33
Proof Sketch (Threshold Properties)
I Let L(e1, b̂) denote the line segment
that starts at the belief state
e1 = (1, 0, 0) and ends at b̂, where b̂ is
in the sub-simplex that joins e2 and e3.
I All beliefs on L(e1, b̂) are totally
ordered according to the Monotone
Likelihood Ratio (MLR) order. =⇒ a
threshold belief state αb̂ ∈ L(e1, b̂)
exists where the optimal strategy
switches from C to S.
I Since the entire belief space can be
covered by the union of lines L(e1, b̂),
the threshold belief states αb̂1
, αb̂2
, . . .
yield a switching curve Υ.
Belief space B
(j)
D
(the 2-dimensional unit simplex)
sub-simplex B
(j)
D,e1
joining e2 and e3
b̂5
b̂4
b̂3
b̂2
b̂1
b̂6
b̂7
b̂8
b̂9
L(e1, b̂5)
Switching curve
Υ
Threshold
belief state αb̂9
e1
(1, 0, 0)
e2
(0, 1, 0)
e3
(0, 0, 1)
29/33
Scalable Learning through Decomposition
1 2 3 4 5 6 7 8 9 10
2
4
6
8
10
linear
measured
# parallel processes n
|V| = 10
Speedup
S
n
Speedup of completion time when computing best response strategies for
the decomposed game with |V| = 10 nodes and different number of
parallel processes; the subproblems in the decomposition are split evenly
across the processes; let Tn denote the completion time when using n
processes, the speedup is then calculated as Sn = T1
Tn
; the error bars
indicate standard deviations from 3 measurements.
30/33
Decompositional Fictitious Play (DFSP)
π̃2 ∈ B2(π1)
π2
π1
π̃1 ∈ B1(π2)
π̃0
2 ∈ B2(π0
1)
π0
2
π0
1
π̃0
1 ∈ B1(π0
2)
. . .
π∗
2 ∈ B2(π∗
1)
π∗
1 ∈ B1(π∗
2)
Fictitious play: iterative averaging of best responses.
I Learn best response strategies iteratively through the parallel
solving of subgames in the decomposition
I Average best responses to approximate the equilibrium
31/33
Learning Equilibrium Strategies
0 20 40 60 80 100
running time (h)
0
5
b
δ = 0.4
Approximate exploitability b
δ
0 20 40 60 80 100
running time (h)
0.0
0.5
1.0
Defender utility per episode
dfsp simulation dfsp digital twin upper bound oi,t  0 random defense
Learning curves obtained during training of dfsp to find optimal
(equilibrium) strategies in the intrusion response game; red and blue
curves relate to dfsp; black, orange and green curves relate to baselines.
32/33
Comparison with NFSP
0 10 20 30 40 50 60 70 80
running time (h)
0.0
2.5
5.0
7.5
Approximate exploitability
dfsp nfsp
Learning curves obtained during training of dfsp and nfsp to find
optimal (equilibrium) strategies in the intrusion response game; the red
curve relate to dfsp and the purple curve relate to nfsp; all curves show
simulation results.
33/33
Conclusions
I We study an intrusion response use
case.
I We formulate the use case as a POSG
I We design a novel decompositional
approach to approximate equilibria
I We show that the decomposition
allows scalable approximation of
equilibria.
s1,1 s1,2 s1,3 . . . s1,n
s2,1 s2,2 s2,3 . . . s2,n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Emulation
Target
System
Model Creation 
System Identification
Strategy Mapping
π
Selective
Replication
Strategy
Implementation π
Simulation 
Learning

More Related Content

PDF
A Game Theoretic Analysis of Intrusion Detection in Access Control Systems - ...
PDF
Fast Identification of Heavy Hitters by Cached and Packed Group Testing
PDF
A Design Principle for Federated Learning
PDF
Intrusion Tolerance as a Two-Level Game - GameSec24
PDF
I have a stream - Insights in Reactive Programming - Jan Carsten Lohmuller - ...
PPT
PDF
Intrusion Tolerance as a Two-Level Game (Visit to Melbourne University)
PPT
Lec-35Graph - Graph - Copy in Data Structure
A Game Theoretic Analysis of Intrusion Detection in Access Control Systems - ...
Fast Identification of Heavy Hitters by Cached and Packed Group Testing
A Design Principle for Federated Learning
Intrusion Tolerance as a Two-Level Game - GameSec24
I have a stream - Insights in Reactive Programming - Jan Carsten Lohmuller - ...
Intrusion Tolerance as a Two-Level Game (Visit to Melbourne University)
Lec-35Graph - Graph - Copy in Data Structure

Similar to Gamesec23 - Scalable Learning of Intrusion Response through Recursive Decomposition (20)

PDF
RSA SIGNATURE: BEHIND THE SCENES
PPTX
Shooman_11 Software Reliability (1).pptx
PPTX
ATS Programming
PDF
Faster Interleaved Modular Multiplier Based on Sign Detection
PDF
Solutions for Image Processing for Engineers by Yagle and Ulaby
PDF
Digital Signal Processing Tutorial:Chapt 1 signal and systems
PPTX
Watermarking in Source Code: Applications and Security Challenges
PDF
Lecture 2 Introduction to digital image
PPTX
Introduction to ml and dl
PDF
presentation.pdf
PPT
PDF
Intrusion Tolerance for Networked Systems through Two-Level Feedback Control
PDF
Eos - Efficient Private Delegation of zkSNARK provers
PDF
Slides_Neural Networks for Time Series Prediction
PDF
Flink Forward Berlin 2017: David Rodriguez - The Approximate Filter, Join, an...
PDF
Security of Artificial Intelligence
PDF
presentation
PPTX
Lecture Slide (21).pptx
PDF
presentation
PPT
Secure information aggregation in sensor networks
RSA SIGNATURE: BEHIND THE SCENES
Shooman_11 Software Reliability (1).pptx
ATS Programming
Faster Interleaved Modular Multiplier Based on Sign Detection
Solutions for Image Processing for Engineers by Yagle and Ulaby
Digital Signal Processing Tutorial:Chapt 1 signal and systems
Watermarking in Source Code: Applications and Security Challenges
Lecture 2 Introduction to digital image
Introduction to ml and dl
presentation.pdf
Intrusion Tolerance for Networked Systems through Two-Level Feedback Control
Eos - Efficient Private Delegation of zkSNARK provers
Slides_Neural Networks for Time Series Prediction
Flink Forward Berlin 2017: David Rodriguez - The Approximate Filter, Join, an...
Security of Artificial Intelligence
presentation
Lecture Slide (21).pptx
presentation
Secure information aggregation in sensor networks

More from Kim Hammar (20)

PDF
Approximation in Value Space using Aggregation, with Applications to POMDPs a...
PDF
Adaptive Security Policies via Belief Aggregation and Rollout
PDF
Optimal Security Response to Network Intrusions in IT Systems
PDF
Automated Intrusion Response - CDIS Spring Conference 2024
PDF
Automated Security Response through Online Learning with Adaptive Con jectures
PDF
Självlärande System för Cybersäkerhet. KTH
PDF
Learning Automated Intrusion Response
PDF
Intrusion Tolerance for Networked Systems through Two-level Feedback Control
PDF
Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...
PDF
Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...
PDF
Learning Optimal Intrusion Responses via Decomposition
PDF
Digital Twins for Security Automation
PDF
Learning Near-Optimal Intrusion Response for Large-Scale IT Infrastructures v...
PDF
Självlärande system för cyberförsvar.
PDF
Intrusion Response through Optimal Stopping
PDF
CNSM 2022 - An Online Framework for Adapting Security Policies in Dynamic IT ...
PDF
Self-Learning Systems for Cyber Defense
PDF
Self-learning Intrusion Prevention Systems.
PDF
Learning Security Strategies through Game Play and Optimal Stopping
PDF
Intrusion Prevention through Optimal Stopping
Approximation in Value Space using Aggregation, with Applications to POMDPs a...
Adaptive Security Policies via Belief Aggregation and Rollout
Optimal Security Response to Network Intrusions in IT Systems
Automated Intrusion Response - CDIS Spring Conference 2024
Automated Security Response through Online Learning with Adaptive Con jectures
Självlärande System för Cybersäkerhet. KTH
Learning Automated Intrusion Response
Intrusion Tolerance for Networked Systems through Two-level Feedback Control
Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...
Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...
Learning Optimal Intrusion Responses via Decomposition
Digital Twins for Security Automation
Learning Near-Optimal Intrusion Response for Large-Scale IT Infrastructures v...
Självlärande system för cyberförsvar.
Intrusion Response through Optimal Stopping
CNSM 2022 - An Online Framework for Adapting Security Policies in Dynamic IT ...
Self-Learning Systems for Cyber Defense
Self-learning Intrusion Prevention Systems.
Learning Security Strategies through Game Play and Optimal Stopping
Intrusion Prevention through Optimal Stopping

Recently uploaded (20)

PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Machine learning based COVID-19 study performance prediction
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PPTX
Cloud computing and distributed systems.
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PPTX
Spectroscopy.pptx food analysis technology
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PDF
Electronic commerce courselecture one. Pdf
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Unlocking AI with Model Context Protocol (MCP)
Machine learning based COVID-19 study performance prediction
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Dropbox Q2 2025 Financial Results & Investor Presentation
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Cloud computing and distributed systems.
Building Integrated photovoltaic BIPV_UPV.pdf
20250228 LYD VKU AI Blended-Learning.pptx
Understanding_Digital_Forensics_Presentation.pptx
Reach Out and Touch Someone: Haptics and Empathic Computing
Spectroscopy.pptx food analysis technology
MIND Revenue Release Quarter 2 2025 Press Release
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
Electronic commerce courselecture one. Pdf
Chapter 3 Spatial Domain Image Processing.pdf
Per capita expenditure prediction using model stacking based on satellite ima...
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Review of recent advances in non-invasive hemoglobin estimation
Mobile App Security Testing_ A Comprehensive Guide.pdf
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy

Gamesec23 - Scalable Learning of Intrusion Response through Recursive Decomposition

  • 1. 1/33 Scalable Learning of Intrusion Response through Recursive Decomposition GameSec 2023, Avignon, France Conference on Decision and Game Theory for Security Kim Hammar & Rolf Stadler kimham@kth.se Division of Network and Systems Engineering KTH Royal Institute of Technology Oct 18, 2023
  • 2. 2/33 Use Case: Intrusion Response I A defender owns an infrastructure I Consists of connected components I Components run network services I Defender defends the infrastructure by monitoring and active defense I Has partial observability I An attacker seeks to intrude on the infrastructure I Has a partial view of the infrastructure I Wants to compromise specific components I Attacks by reconnaissance, exploitation and pivoting Attacker Clients . . . Defender 1 IPS 1 alerts Gateway 7 8 9 10 11 6 5 4 3 2 12 13 14 15 16 17 18 19 21 23 20 22 24 25 26 27 28 29 30 31
  • 3. 3/33 System Model I G = h{gw} ∪ V, Ei: directed tree representing the virtual infrastructure I V: finite set of virtual components. I E: finite set of component dependencies. I Z: finite set of zones. r&d zone App servers Honeynet dmz admin zone workflow Gateway idps quarantine zone alerts Defender . . . Attacker Clients 2 1 3 12 4 5 6 7 8 9 10 11 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64
  • 4. 4/33 State Space I Each i ∈ V has a state vt,i = (v (Z) t,i |{z} D , v (I) t,i , v (R) t,i | {z } A ) I System state st = (vt,i )i∈V ∼ St. I Markovian time-homogeneous dynamics: st+1 ∼ f (· | St, At) At = (A (A) t , A (D) t ) are the actions. s1 s2 s3 s4 s5 s4 . . . . . . . . .
  • 5. 4/33 State Model I Each i ∈ V has a state vt,i = (v (Z) t,i |{z} D , v (I) t,i , v (R) t,i | {z } A ) I System state st = (vt,i )i∈V ∼ St. I Markovian time-homogeneous dynamics: st+1 ∼ f (· | St, At) At = (A (A) t , A (D) t ) are the actions. s1 s2 s3 s4 s5 s4 . . . . . . . . .
  • 6. 4/33 State Model I Each i ∈ V has a state vt,i = (v (Z) t,i |{z} D , v (I) t,i , v (R) t,i | {z } A ) I System state st = (vt,i )i∈V ∼ St. I Markovian time-homogeneous dynamics: st+1 ∼ f (· | St, At) At = (A (A) t , A (D) t ) are the actions. s1 s2 s3 s4 s5 s4 . . . . . . . . .
  • 7. 5/33 Workflows I Services are connected into workflows W = {w1, . . . , w|W|}.
  • 8. 5/33 Workflows I Services are connected into workflows W = {w1, . . . , w|W|}. gw fw idps lb http servers auth server search engine db cache Dependency graph of an example workflow representing a web application; gw, fw, idps, lb, and db are acronyms for gateway, firewall, intrusion detection and prevention system, load balancer, and database, respectively.
  • 9. 6/33 Workflows I Services are connected into workflows W = {w1, . . . , w|W|}. I Each w ∈ W is realized as a subtree Gw = h{gw} ∪ Vw, Ewi of G I W = {w1, . . . , w|W|} induces a partitioning V = [ wi ∈W Vwi such that i 6= j =⇒ Vwi ∩ Vwj = ∅ Zone a Zone b Zone c gw 1 2 3 4 5 6 7 A workflow tree
  • 10. 6/33 Workflow I Services are connected into workflows W = {w1, . . . , w|W|}. I Each w ∈ W is realized as a subtree Gw = h{gw} ∪ Vw, Ewi of G I W = {w1, . . . , w|W|} induces a partitioning V = [ wi ∈W Vwi such that i 6= j =⇒ Vwi ∩ Vwj = ∅ Zone a Zone b Zone c gw 1 2 3 4 5 6 7 A workflow tree
  • 11. 7/33 Clients Client population . . . Arrival rate λ Departure Service time µ . . . . . . . . . . . . w1 w2 w|W| Workflows (Markov processes) I Homogeneous client population I Clients arrive according to Po(λ), Service times Exp(1 µ) I Workflow selection: uniform I Workflow interaction: Markov process
  • 12. 8/33 Observations I idpss inspect network traffic and generate alert vectors: ot , ot,1, . . . , ot,|V| ∈ N |V| 0 ot,i is the number of alerts related to node i ∈ V at time-step t. I ot = (ot,1, . . . , ot,|V|) is a realization of the random vector Ot with joint distribution Z idps idps idps idps alerts Defender . . . Attacker Clients 2 1 3 12 4 5 6 7 8 9 10 11 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64
  • 13. 8/33 Observations I idpss inspect network traffic and generate alert vectors: ot , ot,1, . . . , ot,|V| ∈ N |V| 0 ot,i is the number of alerts related to node i ∈ V at time-step t. I ot = (ot,1, . . . , ot,|V|) is a realization of the random vector Ot with joint distribution Z idps idps idps idps alerts Defender . . . Attacker Clients 2 1 3 12 4 5 6 7 8 9 10 11 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64
  • 15. 10/33 Defender I Defender action: a (D) t ∈ {0, 1, 2, 3, 4}|V| I 0 means do nothing. 1 − 4 correspond to defensive actions (see fig) I A defender strategy is a function πD ∈ ΠD : HD → ∆(AD), where h (D) t = (s (D) 1 , a (D) 1 , o1, . . . , a (D) t−1, s (D) t , ot) ∈ HD I Objective: (i) maintain workflows; and (ii) stop a possible intrusion: J , T X t=1 γt−1 η |W| X i=1 uW(wi , st) | {z } workflows utility − (1 − η) |V| X j=1 cI(st,j, at,j) | {z } intrusion and defense costs ! dmz rd zone admin zone Old path New path Honeypot App server Defender Revoke certificates Blacklist IP 1) Server migration 2) Flow migration and blocking 3) Shut down server 4) Access control
  • 16. 10/33 Defender I Defender action: a (D) t ∈ {0, 1, 2, 3, 4}|V| I 0 means do nothing. 1 − 4 correspond to defensive actions (see fig) I A defender strategy is a function πD ∈ ΠD : HD → ∆(AD), where h (D) t = (s (D) 1 , a (D) 1 , o1, . . . , a (D) t−1, s (D) t , ot) ∈ HD I Objective: (i) maintain workflows; and (ii) stop a possible intrusion: J , T X t=1 γt−1 η |W| X i=1 uW(wi , st) | {z } workflows utility − (1 − η) |V| X j=1 cI(st,j, at,j) | {z } intrusion and defense costs ! dmz rd zone admin zone Old path New path Honeypot App server Defender Revoke certificates Blacklist IP 1) Server migration 2) Flow migration and blocking 3) Shut down server 4) Access control
  • 17. 10/33 Defender I Defender action: a (D) t ∈ {0, 1, 2, 3, 4}|V| I 0 means do nothing. 1 − 4 correspond to defensive actions (see fig) I A defender strategy is a function πD ∈ ΠD : HD → ∆(AD), where h (D) t = (s (D) 1 , a (D) 1 , o1, . . . , a (D) t−1, s (D) t , ot) ∈ HD I Objective: (i) maintain workflows; and (ii) stop a possible intrusion: J , T X t=1 γt−1 η |W| X i=1 uW(wi , st) | {z } workflows utility − (1 − η) |V| X j=1 cI(st,j, at,j) | {z } intrusion and defense costs ! dmz rd zone admin zone Old path New path Honeypot App server Defender Revoke certificates Blacklist IP 1) Server migration 2) Flow migration and blocking 3) Shut down server 4) Access control
  • 18. 11/33 Attacker I Attacker action: a (A) t ∈ {0, 1, 2, 3}|V| I 0 means do nothing. 1 − 3 correspond to attacks (see fig) I An attacker strategy is a function πA ∈ ΠA : HA → ∆(AA), where HA is the space of all possible attacker histories h (A) t = (s (A) 1 , a (A) 1 , o1, . . . , a (A) t−1, s (A) t , ot) ∈ HA I Objective: (i) disrupt workflows; and (ii) compromise nodes: − J . . . Attacker login attempts configure Automated system Server 2) Brute-force 1) Reconnaissance 3) Code execution Attacker Server TCP SYN TCP SYN ACK port open Attacker Service Server malicious request inject code execution
  • 19. 11/33 Attacker I Attacker action: a (A) t ∈ {0, 1, 2, 3}|V| I 0 means do nothing. 1 − 3 correspond to attacks (see fig) I An attacker strategy is a function πA ∈ ΠA : HA → ∆(AA), where HA is the space of all possible attacker histories h (A) t = (s (A) 1 , a (A) 1 , o1, . . . , a (A) t−1, s (A) t , ot) ∈ HA I Objective: (i) disrupt workflows; and (ii) compromise nodes: − J . . . Attacker login attempts configure Automated system Server 2) Brute-force 1) Reconnaissance 3) Code execution Attacker Server TCP SYN TCP SYN ACK port open Attacker Service Server malicious request inject code execution
  • 20. 11/33 Attacker I Attacker action: a (A) t ∈ {0, 1, 2, 3}|V| I 0 means do nothing. 1 − 3 correspond to attacks (see fig) I An attacker strategy is a function πA ∈ ΠA : HA → ∆(AA), where HA is the space of all possible attacker histories h (A) t = (s (A) 1 , a (A) 1 , o1, . . . , a (A) t−1, s (A) t , ot) ∈ HA I Objective: (i) disrupt workflows; and (ii) compromise nodes: − J . . . Attacker login attempts configure Automated system Server 2) Brute-force 1) Reconnaissance 3) Code execution Attacker Server TCP SYN TCP SYN ACK port open Attacker Service Server malicious request inject code execution
  • 21. 12/33 The Intrusion Response Problem maximize πD∈ΠD minimize πA∈ΠA E(πD,πA) [J] (1a) subject to s (D) t+1 ∼ fD · | A (D) t , A (D) t ∀t (1b) s (A) t+1 ∼ fA · | S (A) t , At ∀t (1c) ot+1 ∼ Z · | S (D) t+1, A (A) t ) ∀t (1d) a (A) t ∼ πA · | H (A) t , a (A) t ∈ AA(st) ∀t (1e) a (D) t ∼ πD · | H (D) t , a (D) t ∈ AD ∀t (1f) E(πD,πA) denotes the expectation of the random vectors (St, Ot, At)t∈{1,...,T} when following the strategy profile (πD, πA). (1) can be formulated as a zero-sum Partially Observed Stochastic Game with Public Observations (a PO-POSG): Γ = hN, (Si )i∈N , (Ai )i∈N , (fi )i∈N , u, γ, (b (i) 1 )i∈N , O, Zi
  • 22. 13/33 Existence of a Solution Theorem Given the po-posg Γ (2), the following holds: (A) Γ has a mixed Nash equilibrium and a value function V ∗ : BD × BA → R that maps each possible initial pair of belief states (b (D) 1 , b (A) 1 ) to the expected utility of the defender in the equilibrium. (B) For each strategy pair (πA, πD) ∈ ΠA × ΠD, the best response sets BD(πA) and BA(πD) are non-empty and correspond to optimal strategies in two Partially Observed Markov Decision Processes (pomdps): M (D) and M (A). Further, a pair of pure best response strategies (π̃D, π̃A) ∈ BD(πA) × BA(πD) and a pair of value functions (V ∗ D,πA , V ∗ A,πD ) exist.
  • 23. 14/33 The Curse of Dimensionality I While Γ has a value, computing it is intractable. The state, action, and observation spaces of the game grow exponentially with |V|. 1 2 3 4 5 104 105 2 105 |S| |O| |Ai | |V| Growth of |S|, |O|, and |Ai | in function of the number of nodes |V|
  • 24. 14/33 The Curse of Dimensionality I While (1) has a solution (i.e the game Γ has a value (Thm 1)), computing it is intractable since the state, action, and observation spaces of the game grow exponentially with |V|. 1 2 3 4 5 104 105 2 105 |S| |O| |Ai | |V| Growth of |S|, |O|, and |Ai | in function of the number of nodes |V| We tackle the scability challenge with decomposition
  • 25. 15/33 Intuitively.. rd zone App servers Honeynet dmz admin zone workflow Gateway idps quarantine zone alerts Defender . . . Attacker Clients 2 1 3 12 4 5 6 7 8 9 10 11 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 65 The optimal action here... Does not directly depend on the state or action of a node down here
  • 26. 15/33 Intuitively.. rd zone App servers Honeynet dmz admin zone workflow Gateway idps quarantine zone alerts Defender . . . Attacker Clients 2 1 3 12 4 5 6 7 8 9 10 11 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 65 The optimal action here... But they are not completely independent either. How can we exploit this structure? Does not directly depend on the state or action of a node down here
  • 27. 16/33 Our Approach: System Decomposition To avoid explicitly enumerating the very large state, observation, and action spaces of Γ, we exploit three structural properties. 1. Additive structure across workflows. I The game decomposes into additive subgames on the workflow-level, which means that the strategy for each subgame can be optimized independently 2. Optimal substructure within a workflow. I The subgame for each workflow decomposes into subgames on the node-level that satisfy the optimal substructure property 3. Threshold properties of local defender strategies. I The optimal node-level strategies for the defender exhibit threshold structures, which means that they can be estimated efficiently
  • 28. 16/33 Our Approach: System Decomposition To avoid explicitly enumerating the very large state, observation, and action spaces of Γ, we exploit three structural properties. 1. Additive structure across workflows. I The game decomposes into additive subgames on the workflow-level, which means that the strategy for each subgame can be optimized independently 2. Optimal substructure within a workflow. I The subgame for each workflow decomposes into subgames on the node-level that satisfy the optimal substructure property 3. Threshold properties of local defender strategies. I The optimal node-level strategies for the defender exhibit threshold structures, which means that they can be estimated efficiently
  • 29. 17/33 Additive Structure Across Workflows (Intuition) “=” I If there is no path between i and j in G, then i and j are independent in the following sense: I Compromising i has no affect on the state of j. I Compromising i does not make it harder or easier to compromise j. I Compromising i does not affect the service provided by j. I Defending i does not affect the state of j. I Defending i does not affect the service provided by j.
  • 30. 18/33 Additive Structure Across Workflows Definition (Transition independence) A set of nodes Q are transition independent iff the transition probabilities factorize as f (St+1 | St, At) = Y i∈Q f (St+1,i | St,i , At,i ) Definition (Utility independence) A set of nodes Q are utility independent iff there exists functions u1, . . . , u|Q| such that the utility function u decomposes as u(St, At) = f (u1(St,1, At,1), . . . , u1(St,|Q|, At,Q)) and ui ≤ u0 i ⇐⇒ f (u1, . . . , ui , . . . , u|Q|) ≤ f (u1, . . . , u0 i , . . . , u|Q|)
  • 31. 19/33 Additive Structure Across Workflows Theorem (Additive structure across workflows) (A) All nodes V in the game Γ are transition independent. (B) If there is no path between i and j in the topology graph G, then i and j are utility independent. Corollary Γ decomposes into |W| additive subproblems that can be solved independently and in parallel. π (w1) k π (w2) k π (w|W|) k ot,w1 ot,w2 ot,w|W| . . . ⊕ a (k) w1 a (k) w2 a (k) w|W| a (k) t
  • 32. 20/33 Additive Structure Across Workflows: Example Zone 1 Zone 2 1 2 3 gw w1 w2 V = {1, 2, 3}, E = {(1, 2)}, W = {w1, w2}, Z = {1, 2} a) IT infrastructure b) Transition dependencies St, At St+1, Ot+1 S (D) t+1,1 S (D) t,1 A (D) t,1 S (A) t+1,1 S (A) t,1 Ot,1 A (A) t,1 S (D) t+1,2 S (D) t,2 A (D) t,2 S (A) t+1,2 S (A) t,2 Ot,2 A (A) t,2 S (D) t+1,3 S (D) t,3 A (D) t,3 S (A) t+1,3 S (A) t,3 Ot,1 A (A) t,3 c) Utility dependencies St, At Ut S (D) t,1 A (D) t,1 Ut,1 S (A) t,1 S (D) t,2 A (D) t,2 Ut,2 S (A) t,2 S (D) t,3 A (D) t,3 Ut,3 S (A) t,3
  • 33. 21/33 Our Approach: System Decomposition To avoid explicitly enumerating the very large state, observation, and action spaces of Γ, we exploit three structural properties. 1. Additive structure across workflows. I The game decomposes into additive subgames on the workflow-level, which means that the strategy for each subgame can be optimized independently 2. Optimal substructure within a workflow. I The subgame for each workflow decomposes into subgames on the node-level that satisfy the optimal substructure property 3. Threshold properties of local defender strategies. I The optimal node-level strategies for the defender exhibit threshold structures, which means that they can be estimated efficiently
  • 34. 22/33 Optimal Substructure Within a Workflow I Nodes in the same workflow are utility dependent. I =⇒ Locally-optimal strategies for each node can not simply be added together to obtain an optimal strategy for the workflow. I However, the locally-optimal strategies satisfy the optimal substructure property. I =⇒ there exists an algorithm for constructing an optimal workflow strategy from locally-optimal strategies for each node. Zone 1 Zone 2 1 2 3 gw w1 w2 V = {1, 2, 3}, E = {(1, 2)}, W = {w1, w2}, Z = {1, 2} IT infrastructure Utility dependencies St, At Ut S (D) t,1 A (D) t,1 Ut,1 S (A) t,1 S (D) t,2 A (D) t,2 Ut,2 S (A) t,2 S (D) t,3 A (D) t,3 Ut,3 S (A) t,3
  • 35. 22/33 Optimal Substructure Within a Workflow I Nodes in the same workflow are utility dependent. I =⇒ Locally-optimal strategies for each node can not simply be added together to obtain an optimal strategy for the workflow. I However, the locally-optimal strategies satisfy the optimal substructure property. I =⇒ there exists an algorithm for constructing an optimal workflow strategy from locally-optimal strategies for each node. Zone 1 Zone 2 1 2 3 gw w1 w2 V = {1, 2, 3}, E = {(1, 2)}, W = {w1, w2}, Z = {1, 2} IT infrastructure Utility dependencies St, At Ut S (D) t,1 A (D) t,1 Ut,1 S (A) t,1 S (D) t,2 A (D) t,2 Ut,2 S (A) t,2 S (D) t,3 A (D) t,3 Ut,3 S (A) t,3
  • 36. 23/33 Algorithm for Combining Locally-Optimal Node Strategies into Optimal Workflow Strategies π (2) D π (1) D π (3) D π (4) D π (5) D π (6) D (π (i) D )i∈Vw : local strategies in the same workflow w ∈ W
  • 37. 24/33 Algorithm for Combining Locally-Optimal Node Strategies into Optimal Workflow Strategies π (2) D π (1) D π (3) D π (4) D π (5) D π (6) D (π (i) D )i∈Vw : local strategies in the same workflow w ∈ W
  • 38. 24/33 Algorithm for Combining Locally-Optimal Node Strategies into Optimal Workflow Strategies π (2) D π (1) D π (3) D π (4) D π (5) D π (6) D Can redefine the utility function for each node i to take into account the utility impact on its ancestors. e.g. utility of node 6 need to include utility impact for 1, 3, 5.
  • 39. 24/33 Algorithm for Combining Locally-Optimal Node Strategies into Optimal Workflow Strategies π (2) D π (1) D π (3) D π (4) D π (5) D π (6) D Can prove that this utility transformation makes the nodes utility independent. =⇒ Optimal substructure.
  • 40. 25/33 Our Approach: System Decomposition To avoid explicitly enumerating the very large state, observation, and action spaces of Γ, we exploit three structural properties. 1. Additive structure across workflows. I The game decomposes into additive subgames on the workflow-level, which means that the strategy for each subgame can be optimized independently 2. Optimal substructure within a workflow. I The subgame for each workflow decomposes into subgames on the node-level that satisfy the optimal substructure property 3. Threshold properties of local defender strategies. I The optimal node-level strategies for the defender exhibit threshold structures, which means that they can be estimated efficiently
  • 41. 26/33 Threshold Properties of Local Defender Strategies. I The local problem of the defender can be decomposed in the temporal domain as max πD T X t=1 J = max πD τ1 X t=1 J1 + τ2 X t=1 J2 + . . . (2) where τ1, τ2, . . . are stopping times. I =⇒ (1) selection of defensive actions is simplified; and (2) the optimal stopping times are given by a threshold strategy that can be estimated efficiently: Belief space B (j) D Switching curve Υ Continuation set C Stopping set S (1, 0, 0) j healthy (0, 1, 0) j discovered (0, 0, 1) j compromised
  • 42. 27/33 Threshold Properties of Local Defender Strategies. Belief space B (j) D Switching curve Υ Continuation set C Stopping set S (1, 0, 0) j healthy (0, 1, 0) j discovered (0, 0, 1) j compromised I A node can be in three attack states s (A) t : Healthy, Discovered, Compromised. I The defender has a belief state b (D) t
  • 43. 28/33 Proof Sketch (Threshold Properties) I Let L(e1, b̂) denote the line segment that starts at the belief state e1 = (1, 0, 0) and ends at b̂, where b̂ is in the sub-simplex that joins e2 and e3. I All beliefs on L(e1, b̂) are totally ordered according to the Monotone Likelihood Ratio (MLR) order. =⇒ a threshold belief state αb̂ ∈ L(e1, b̂) exists where the optimal strategy switches from C to S. I Since the entire belief space can be covered by the union of lines L(e1, b̂), the threshold belief states αb̂1 , αb̂2 , . . . yield a switching curve Υ. Belief space B (j) D (the 2-dimensional unit simplex) sub-simplex B (j) D,e1 joining e2 and e3 b̂5 b̂4 b̂3 b̂2 b̂1 b̂6 b̂7 b̂8 b̂9 L(e1, b̂5) Switching curve Υ Threshold belief state αb̂9 e1 (1, 0, 0) e2 (0, 1, 0) e3 (0, 0, 1)
  • 44. 29/33 Scalable Learning through Decomposition 1 2 3 4 5 6 7 8 9 10 2 4 6 8 10 linear measured # parallel processes n |V| = 10 Speedup S n Speedup of completion time when computing best response strategies for the decomposed game with |V| = 10 nodes and different number of parallel processes; the subproblems in the decomposition are split evenly across the processes; let Tn denote the completion time when using n processes, the speedup is then calculated as Sn = T1 Tn ; the error bars indicate standard deviations from 3 measurements.
  • 45. 30/33 Decompositional Fictitious Play (DFSP) π̃2 ∈ B2(π1) π2 π1 π̃1 ∈ B1(π2) π̃0 2 ∈ B2(π0 1) π0 2 π0 1 π̃0 1 ∈ B1(π0 2) . . . π∗ 2 ∈ B2(π∗ 1) π∗ 1 ∈ B1(π∗ 2) Fictitious play: iterative averaging of best responses. I Learn best response strategies iteratively through the parallel solving of subgames in the decomposition I Average best responses to approximate the equilibrium
  • 46. 31/33 Learning Equilibrium Strategies 0 20 40 60 80 100 running time (h) 0 5 b δ = 0.4 Approximate exploitability b δ 0 20 40 60 80 100 running time (h) 0.0 0.5 1.0 Defender utility per episode dfsp simulation dfsp digital twin upper bound oi,t 0 random defense Learning curves obtained during training of dfsp to find optimal (equilibrium) strategies in the intrusion response game; red and blue curves relate to dfsp; black, orange and green curves relate to baselines.
  • 47. 32/33 Comparison with NFSP 0 10 20 30 40 50 60 70 80 running time (h) 0.0 2.5 5.0 7.5 Approximate exploitability dfsp nfsp Learning curves obtained during training of dfsp and nfsp to find optimal (equilibrium) strategies in the intrusion response game; the red curve relate to dfsp and the purple curve relate to nfsp; all curves show simulation results.
  • 48. 33/33 Conclusions I We study an intrusion response use case. I We formulate the use case as a POSG I We design a novel decompositional approach to approximate equilibria I We show that the decomposition allows scalable approximation of equilibria. s1,1 s1,2 s1,3 . . . s1,n s2,1 s2,2 s2,3 . . . s2,n . . . . . . . . . . . . . . . Emulation Target System Model Creation System Identification Strategy Mapping π Selective Replication Strategy Implementation π Simulation Learning