SlideShare a Scribd company logo
Clinical data based optimal STI strategies for
HIV: a reinforcement learning approach
Damien Ernst
Department of Electrical Engineering and Computer Science
University of Li`ege
Montefiore - March 9, 2006
Presentation based on the paper: “Clinical data based optimal STI
strategies for HIV: a reinforcement leanring approach”. D. Ernst, G.B.
Stan, J. Gon¸calves and L. Wehenkel
.
Damien Ernst Clinical data .... (1/22)
HIV
Human Immunodeficiency Virus (HIV) is a retrovirus at the
source of the Acquired Immune Defficiency Syndrome (AIDS)
HIV particles target cells of the immune system (mostly CD4+
lymphocytes and macrophages)
Inclusion of HIV particles in immune cells lead to massive
production of new viral particles, death of the infected cells
and, ultimately, devastation of the immune system
Damien Ernst Clinical data .... (2/22)
Current anti-HIV drugs
Two main categories:
1. Reverse Transcriptaese Inhibitors (RTI)
2. Protease Inhibitor (PI)
Figure: Taken from http://www.cellsalive.com/hiv0.htm
Damien Ernst Clinical data .... (3/22)
Treatments for infected patients
Highly Active Anti-Retroviral Therapy (HAART): combination
of two or more drugs. Usually one or more RTIs in
combinations with a PI.
Two main concerns about the long-term used of anti retroviral
drugs: undesirable side effects (leading to poor compliance)
and mutation of the virus (need to change drugs or even
inability to find appropriate pharmaceutical treatments).
Need for efficient drug scheduling strategies.
Idealistically, a drug-scheduling strategy should bring the
system to a state where the immune system has control over
the virus (with low amount of drugs and low systemic effects).
Damien Ernst Clinical data .... (4/22)
Structured Treatment Interruption (STI)
STI: to cycle the patient on and off drug therapy
STI strategies often well received by patients since they offer
them period of relief from treatment
In some remarkable cases, STI strategies have enabled the
patients to maintain immune control over the virus in the
absence of treatment
Goal of this research: to compute optimal STI strategies
Damien Ernst Clinical data .... (5/22)
STI: A glimpse at today’s practice
If CD4+ cell count falls below a certain threshold, put the patient
on drugs. Otherwise put him off. This practice has met some
problems:
Figure: Taken from
http://www.cpcra.org/docs/pubs/2006/croi2006-smart.pdf
Damien Ernst Clinical data .... (6/22)
More advanced techniques (not clinically tested)
Some authors have proposed to design STI treatments by
exploiting mathematical models of the HIV infection.
Models are under the form of a set of Ordinary Differential
Equations (ODEs)
Deduction of STI strategies is done by using methods from
the control theory.
But modelling of the HIV dynamics is a difficult task. Indeed, one
has
to select the right parametric system of ODEs
to fit the parameters to reflect quantitatively biological
observations
Damien Ernst Clinical data .... (7/22)
An interesting alternative
Infer directly from clinical data good STI strategies, without
modelling the HIV infection dynamics.
Clinical data: time evolution of patient’s state (CD4+ T cell
count, systemic costs of the drugs, etc) recorded at
discrete-time instant and sequence of drugs administered.
Clinical data can be seen as trajectories of the immune system
responding to treatment.
Damien Ernst Clinical data .... (8/22)
Inferring policies from trajectories
Problem of inferring from trajectories appropriate control
policy has been studied in control theory and computer
science.
One way to approach it: state an optimality criterion and
search for strategies optimizing this criterion.
Classical approach: infer a model and derive from it and the
optimality criterion an optimal strategy.
Reinforcement learning approach: compute optimal strategies
directly from the trajectory, without identifying a model.
Damien Ernst Clinical data .... (9/22)
The trajectories are processed
by using reinforcement learning techniques
patients
A pool of
HIV infected
problem which typically containts the following information:
some (near) optimal STI strategies,
often under the form of a mapping
given time and the drugs he has to take
protocols and are monitored at regular intervals
The patients follow some (possibly suboptimal) STI
The monitoring of each patient generates a trajectory for the optimal STI
drugs taken by the patient between t0 and t1 = t0 + n days
state of the patient at time t0
state of the patient at time t1
drugs taken by the patient between t1 and t2 = t1 + n days
state of the patient at time t2
drugs taken by the patient between t2 and t3 = t2 + n days
Processing of the trajectories gives
between the state of the patient at a
till the next time his state is monitored.
Figure: Determination of optimal STI strategies from clinical data by
using reinforcement learning algorithms: the overall principle.
Damien Ernst Clinical data .... (10/22)
Learning from a sample of trajectories: the RL approach
Problem formulation
Discrete-time dynamics:
xt+1 = f (xt , ut ) t = 0, 1, . . .
where xt ∈ X and ut ∈ U.
Cost function: c(x, u) : X × U → R. c(x, u) bounded by Bc.
Discounted infinite horizon cost associated to stationary policy
µ : X → U: Jµ(x) = lim
N→∞
N−1
t=0 γt c(xt , µ(xt))
Optimal stationary policy µ∗ : Policy that minimizes Jµ for all x.
Objective: Find an optimal policy µ∗.
We do not know: The discrete-time dynamics.
We know instead: A set of trajectories (x0, u0, x1, · · · , uT−1, xT ).
Damien Ernst Clinical data .... (11/22)
Some dynamic programming results
Sequence of functions QN : X × U → R
QN(x, u) = c(x, u) + γ min
u ∈U
QN−1(f (x, u), u ), ∀N > 1
with Q1(x, u) ≡ c(x, u), converges to the Q-function, unique
solution of the Bellman equation:
Q(x, u) = c(x, u) + γ min
u ∈U
Q(f (x, u), u ).
Necessary and sufficient optimality condition:
µ∗
(x) ∈ arg min
u∈U
Q(x, u)
Stationary policy µ∗
N:
µ∗
N (x) ∈ arg min
u∈U
QN(x, u).
Bound on the suboptimality of µ∗
N:
Jµ∗
N − Jµ∗
≤
2γN Bc
(1 − γ)2
.
Damien Ernst Clinical data .... (12/22)
Fitted Q iteration
Trajectories (x0, u0, x1, · · · , uT−1, xT ) transformed into a set of
one-step system transitions F = {(xl
t , ul
t , xl
t+1)}#F
l=1.
Fitted Q iteration computes from F the functions ˆQ1, ˆQ2, . . .,
ˆQN, approximations of Q1, Q2, . . ., QN.
Computation done iteratively by solving a sequence of standard
supervised learning (SL) problems. Training sample for the kth
(k ≥ 2) problem is
(xl
t , ul
t ), c(xl
t , ul
t) + γmin
u∈U
ˆQk−1(xl
t+1, u)
#F
l=1
with
ˆQ1(x, u) ≡ c(x, u). From the kth training sample, the supervised
learning algorithm outputs ˆQk .
ˆµ∗
N(x) ∈ arg min
u∈U
ˆQN (x, u) is taken as approximation of µ∗(x).
In our simulations, SL method used is an ensemble of regression
trees method named Extra-Trees.
Damien Ernst Clinical data .... (13/22)
Illustration
We present results we have obtained by using the RL-based
approach on artificially generated data.
The example is directly inspired from
B.M. Adams, H.T. Banks, Hee-Dae Kwon and H.T. Tran.
(2004). “Dynamic multidrug therapies for HIV: Optimal and
STI Control Approaches”. Mathematical Biosciences and
Engineering, 1, 223-241.
Damien Ernst Clinical data .... (14/22)
Illustration: Kinds of STI strategies targeted
Bi-therapy treatments combining a fixed RTI and a fixed PI.
Revise drug administration every five days based on clinical
measurements.
Four possible on-off combinations for the next five days: RTI and
PI on, only RTI on, only STI on, RTI and PI off
We seek STI strategies that minimize Jµ.
Instantaneous cost at time t:
c(xt, ut ) = 0.1Vt + 20000 2
1t
+ 2000 2
2t
− 1000Et
1t = 0.7 (resp. 1t = 0) if the RTI is cycled on (resp. off) at t
2t = 0.3 (resp. 2t = 0) if the PI is cycled on (resp. off) at time t
V : number of free HI viruses
E: number of cytotoxic T-lymphocytes
Decay factor γ: chosen equal to 0.98.
Damien Ernst Clinical data .... (15/22)
Illustration: A mathematical model as substitute for
real-life patients
˙T1 = λ1 − d1T1 − (1 − 1)k1VT1
˙T2 = λ2 − d2T2 − (1 − f 1)k2VT2
˙T∗
1 = (1 − 1)k1VT1 − δT∗
1 − m1ET∗
1
˙T∗
2 = (1 − f 1)k2VT2 − δT∗
2 − m2ET∗
2
˙V = (1 − 2)NT δ(T∗
1 + T∗
2 ) − cV − [(1 − 1)ρ1k1T1 + (1 − f 1)ρ2k2T2]V
˙E = λE +
bE (T∗
1 + T∗
2 )
(T∗
1 + T∗
2 ) + Kb
E −
dE (T∗
1 + T∗
2 )
(T∗
1 + T∗
2 ) + Kd
E − δE E
T1 (T∗
1 ) = number of non-infected (infected) CD4+
lymphocytes
T2 (T∗
2 ) = non-infected (infected) macrophages
V = number of free HI viruses
E = number of cytotoxic T-lymphocytes.
1 and 2 = control actions corresponding to RTI and the PI.
Period during which the RTI (resp. the PI) is administrated to the
patient: 1 (resp. 2) is set equal to 0.7 (resp. 0.3).
RTI (resp. the PI) not administrated: 1 = 0 (resp. 2 = 0).
Damien Ernst Clinical data .... (16/22)
Illustration: Some insight into this model
In absence of treatment, three physical equilibrium points:
1. uninfected state:
(T1, T2, T∗
1 , T∗
2 , V , E) = (106
, 3198, 0, 0, 0, 10)
2. “healthy” locally stable equilibrium
(T1, T2, T∗
1 , T∗
2 , V , E) = (967839, 621, 76, 6, 415, 353108)
(small viral load, a high CD4+ T-lymphocytes count, high
HIV-specific cytotoxic T-cells count)
3. “non-healthy” locally stable equilibrium point
(T1, T2, T∗
1 , T∗
2 , V , E) = (163573, 5, 11945, 46, 63919, 24)
(T-cells depleted, viral load very high).
Damien Ernst Clinical data .... (17/22)
Illustration: Protocol for artificially generating the clinical
data
Monitoring of patients: every five days during 1000 days.
Medication: can be revised every five days based on the
information generated by the monitoring.
Iterative generation of the clinical data (ten iterations):
First iteration. Thirty patients in “non-healthy” steady-state.
Physiological data ( T1, T2, T∗
1 , T∗
2 , V , E) recorded and a
new type of medication randomly selected in U every five
days. Monitoring of each patient generates a trajectory
(x0, u0, x1, · · · , x199, u199, x200).
Second iteration. Only difference with first iteration:
medication determined by the following STI strategy: in 85%
of the cases, use strategy ˆµ∗
400 computed by fitted Q iteration
on previously generated trajectories; in the remaining 15%
medication randomly selected in U.
Third-tenth iteration: idem as second iteration.
Damien Ernst Clinical data .... (18/22)
Illustration: Simulation results
0
5.2
5.3
5.4
5.5
5.6
5.7
5.8
5.9
days
log10(T1)
250 500 750 0
days
250 500 750
-0.5
0.0
0.5
1.
1.5
2.
2.5
3.
log10(T2)
-1.
0.0
1.
2.
3.
4.
5.
0
days
250 500 750
log10(T∗
1)
0
days
250 500 750
-1.
0.0
0.5
1.
1.5
2.
-0.5
log10(T∗
2)
0.0
2.
3.
4.
5.
6.
0
days
250 500 750
log10(V)
1.
0
days
250 500 750
log10(E)
2.
3.
4.
5.
Figure: Solid curve (−) corresponds = patient which follows STI
strategies; dashed curves (− −) = no interruption in the treatment;
dotted curves (− ·) = no treatment
Damien Ernst Clinical data .... (19/22)
0
days
250 500 750
reversetranscriptase
inhibitor
off
on
0
days
250 500 750
inhibitor
protease
off
on
Figure: STI treatment for a patient treated from early stage of infection.
Clinical data generated by 300 patients.
infinite time
horizon cost
number of patients
-5.e+8
-1.e+9
-1.5e+9
-2.e+9
-2.5e+9
-3.e+9
-3.5e+9
-4.e+9
240 300180120906030
Figure: Influence of the number of patients on the infinite time horizon
cost corresponding to the computed STI strategies.
Damien Ernst Clinical data .... (20/22)
From numerically simulated data to real-life patients
We expect to face four main difficulties:
The HIV/immune system dynamics may be different from one
patient to the other.
Difficulty to state properly the optimal control problem
Partial observability
Corrupted measurements
Damien Ernst Clinical data .... (21/22)
Conclusions
Reinforcement learning algorithms seem to be promising tools
to extract from clinical data, good STI strategies.
Lot of work is however still needed !!!
But 40 millions of people are living with HIV/AIDS. Isn’t it a
good reason to keep working hard ?
Figure: Taken from UNAIDS. AIDS epidemic update: December 2005.
“UNAIDS/05.19E”
Damien Ernst Clinical data .... (22/22)

More Related Content

PDF
Computing near-optimal policies from trajectories by solving a sequence of st...
PDF
A reinforcement learning approach for designing artificial autonomous intelli...
PDF
Inference via Bayesian Synthetic Likelihoods for a Mixed-Effects SDE Model of...
PDF
Learning for exploration-exploitation in reinforcement learning. The dusk of ...
PDF
slides_online_optimization_david_mateos
PDF
WSC 2011, advanced tutorial on simulation in Statistics
PDF
Meta-learning of exploration-exploitation strategies in reinforcement learning
PDF
Machine Learning for Actuaries
Computing near-optimal policies from trajectories by solving a sequence of st...
A reinforcement learning approach for designing artificial autonomous intelli...
Inference via Bayesian Synthetic Likelihoods for a Mixed-Effects SDE Model of...
Learning for exploration-exploitation in reinforcement learning. The dusk of ...
slides_online_optimization_david_mateos
WSC 2011, advanced tutorial on simulation in Statistics
Meta-learning of exploration-exploitation strategies in reinforcement learning
Machine Learning for Actuaries

What's hot (20)

PDF
A Multi-Objective Genetic Algorithm for Pruning Support Vector Machines
PDF
Statement of stochastic programming problems
PDF
Accelerated approximate Bayesian computation with applications to protein fol...
PPT
Introduction
PDF
ABC with data cloning for MLE in state space models
PDF
Inference for stochastic differential equations via approximate Bayesian comp...
PDF
My data are incomplete and noisy: Information-reduction statistical methods f...
PDF
RECENT ADVANCES in PREDICTIVE (MACHINE) LEARNING
PDF
sada_pres
PDF
Nber slides11 lecture2
PDF
An introduction to neural networks
PDF
Intro to Approximate Bayesian Computation (ABC)
PDF
An introduction to neural network
PDF
X01 Supervised learning problem linear regression one feature theorie
PDF
T. Proietti, M. Marczak, G. Mazzi - EuroMInd-D: A density estimate of monthly...
PPT
isabelle_webinar_jan..
PPTX
Introduction to Machine Learning
PDF
Fundementals of Machine Learning and Deep Learning
PDF
Toward an Ethical Experiment
PDF
Welcome to International Journal of Engineering Research and Development (IJERD)
A Multi-Objective Genetic Algorithm for Pruning Support Vector Machines
Statement of stochastic programming problems
Accelerated approximate Bayesian computation with applications to protein fol...
Introduction
ABC with data cloning for MLE in state space models
Inference for stochastic differential equations via approximate Bayesian comp...
My data are incomplete and noisy: Information-reduction statistical methods f...
RECENT ADVANCES in PREDICTIVE (MACHINE) LEARNING
sada_pres
Nber slides11 lecture2
An introduction to neural networks
Intro to Approximate Bayesian Computation (ABC)
An introduction to neural network
X01 Supervised learning problem linear regression one feature theorie
T. Proietti, M. Marczak, G. Mazzi - EuroMInd-D: A density estimate of monthly...
isabelle_webinar_jan..
Introduction to Machine Learning
Fundementals of Machine Learning and Deep Learning
Toward an Ethical Experiment
Welcome to International Journal of Engineering Research and Development (IJERD)
Ad

Viewers also liked (10)

PDF
NEWPER - Functional Requirements for eWP
PPT
Презентация курса товароведение на фарм. предприятии
PDF
An economic case for transnational and international transmission
PDF
245 gantuz - fe ey-e
DOCX
Shruti_Wipro_Service Delivery Manager UK
PPTX
Better Burger
PDF
Capacity mechanisms for improving security of supply: quick fixes or thoughtf...
PPTX
PPTX
Mgw Pitch Deck
PPTX
Soal Pilihan Ganda Elektrolisis dan Korosi
NEWPER - Functional Requirements for eWP
Презентация курса товароведение на фарм. предприятии
An economic case for transnational and international transmission
245 gantuz - fe ey-e
Shruti_Wipro_Service Delivery Manager UK
Better Burger
Capacity mechanisms for improving security of supply: quick fixes or thoughtf...
Mgw Pitch Deck
Soal Pilihan Ganda Elektrolisis dan Korosi
Ad

Similar to Clinical data based optimal STI strategies for HIV: a reinforcement learning approach (20)

PPTX
EMOD Optimization Presentation School.pptx
PDF
Stability criterion of periodic oscillations in a (1)
PDF
Deterministic And Stochastic Models Of Aids Epidemics And Hiv Infections With...
PPTX
EMOD_Optimization_Presentation.pptx
PPTX
Using modelling to inform our diagnostics strategy
PDF
PDF
Hidden Markov Model of Evaluation of Break-Even Point of HIV patients: A Simu...
PDF
Hiv Replication Model for The Succeeding Period Of Viral Dynamic Studies In A...
PDF
2019 PMED Spring Course - Introduction to Dynamic Treatment Regimes - Marie D...
PPT
Start impaact june 7 2011
PPT
Start impaact june 7 2011
PPTX
Journal Club: 2015 August; START study
PDF
Page2018 duwal
PDF
Mathematical Models For Therapeutic Approaches To Control Hiv Disease Transmi...
PDF
PMED Transition Workshop - Dynamic Treatment Regimes via Reward Ignorant Mode...
PPT
Antiretroviral Therapy Update 2016
PPTX
What it takes to build a model for detecting patients that defaults from medi...
PPTX
Cadth 2015 c2 tt eincea_cadth_042015
PPT
D1 Highly Active Antiretroviral Treatment (HAART) DHHS Guidelines 2009 Duffus
PDF
Causal Inference Opening Workshop - Some Applications of Reinforcement Learni...
EMOD Optimization Presentation School.pptx
Stability criterion of periodic oscillations in a (1)
Deterministic And Stochastic Models Of Aids Epidemics And Hiv Infections With...
EMOD_Optimization_Presentation.pptx
Using modelling to inform our diagnostics strategy
Hidden Markov Model of Evaluation of Break-Even Point of HIV patients: A Simu...
Hiv Replication Model for The Succeeding Period Of Viral Dynamic Studies In A...
2019 PMED Spring Course - Introduction to Dynamic Treatment Regimes - Marie D...
Start impaact june 7 2011
Start impaact june 7 2011
Journal Club: 2015 August; START study
Page2018 duwal
Mathematical Models For Therapeutic Approaches To Control Hiv Disease Transmi...
PMED Transition Workshop - Dynamic Treatment Regimes via Reward Ignorant Mode...
Antiretroviral Therapy Update 2016
What it takes to build a model for detecting patients that defaults from medi...
Cadth 2015 c2 tt eincea_cadth_042015
D1 Highly Active Antiretroviral Treatment (HAART) DHHS Guidelines 2009 Duffus
Causal Inference Opening Workshop - Some Applications of Reinforcement Learni...

More from Université de Liège (ULg) (20)

PDF
Reinforcement learning for electrical markets and the energy transition
PDF
Algorithms for the control and sizing of renewable energy communities
PDF
AI for energy: a bright and uncertain future ahead
PDF
Extreme engineering for fighting climate change and the Katabata project
PDF
Ex-post allocation of electricity and real-time control strategy for renewabl...
PDF
Big infrastructures for fighting climate change
PDF
Harvesting wind energy in Greenland: a project for Europe and a huge step tow...
PDF
Décret favorisant le développement des communautés d’énergie renouvelable
PDF
Harnessing the Potential of Power-to-Gas Technologies. Insights from a prelim...
PDF
Reinforcement learning, energy systems and deep neural nets
PDF
Soirée des Grands Prix SEE - A glimpse at the research work of the laureate o...
PDF
Reinforcement learning for data-driven optimisation
PDF
Electricity retailing in Europe: remarkable events (with a special focus on B...
PDF
Projet de décret « GRD »: quelques remarques du Prof. Damien ERNST
PPTX
Belgian offshore wind potential
PDF
Time to make a choice between a fully liberal or fully regulated model for th...
PDF
Electrification and the Democratic Republic of the Congo
PDF
Energy: the clash of nations
PDF
Smart Grids Versus Microgrids
PDF
Uber-like Models for the Electrical Industry
Reinforcement learning for electrical markets and the energy transition
Algorithms for the control and sizing of renewable energy communities
AI for energy: a bright and uncertain future ahead
Extreme engineering for fighting climate change and the Katabata project
Ex-post allocation of electricity and real-time control strategy for renewabl...
Big infrastructures for fighting climate change
Harvesting wind energy in Greenland: a project for Europe and a huge step tow...
Décret favorisant le développement des communautés d’énergie renouvelable
Harnessing the Potential of Power-to-Gas Technologies. Insights from a prelim...
Reinforcement learning, energy systems and deep neural nets
Soirée des Grands Prix SEE - A glimpse at the research work of the laureate o...
Reinforcement learning for data-driven optimisation
Electricity retailing in Europe: remarkable events (with a special focus on B...
Projet de décret « GRD »: quelques remarques du Prof. Damien ERNST
Belgian offshore wind potential
Time to make a choice between a fully liberal or fully regulated model for th...
Electrification and the Democratic Republic of the Congo
Energy: the clash of nations
Smart Grids Versus Microgrids
Uber-like Models for the Electrical Industry

Recently uploaded (20)

PPT
Obstructive sleep apnea in orthodontics treatment
PPTX
CME 2 Acute Chest Pain preentation for education
PPTX
Human Reproduction: Anatomy, Physiology & Clinical Insights.pptx
PDF
Handout_ NURS 220 Topic 10-Abnormal Pregnancy.pdf
PPTX
CEREBROVASCULAR DISORDER.POWERPOINT PRESENTATIONx
PPTX
neonatal infection(7392992y282939y5.pptx
PPTX
post stroke aphasia rehabilitation physician
PPTX
History and examination of abdomen, & pelvis .pptx
PPTX
Clinical approach and Radiotherapy principles.pptx
PDF
شيت_عطا_0000000000000000000000000000.pdf
PPTX
15.MENINGITIS AND ENCEPHALITIS-elias.pptx
PPT
Management of Acute Kidney Injury at LAUTECH
PPTX
SKIN Anatomy and physiology and associated diseases
PPTX
Respiratory drugs, drugs acting on the respi system
PPTX
ACID BASE management, base deficit correction
PPT
1b - INTRODUCTION TO EPIDEMIOLOGY (comm med).ppt
PPTX
Cardiovascular - antihypertensive medical backgrounds
PDF
Human Health And Disease hggyutgghg .pdf
PPTX
JUVENILE NASOPHARYNGEAL ANGIOFIBROMA.pptx
PPTX
Acid Base Disorders educational power point.pptx
Obstructive sleep apnea in orthodontics treatment
CME 2 Acute Chest Pain preentation for education
Human Reproduction: Anatomy, Physiology & Clinical Insights.pptx
Handout_ NURS 220 Topic 10-Abnormal Pregnancy.pdf
CEREBROVASCULAR DISORDER.POWERPOINT PRESENTATIONx
neonatal infection(7392992y282939y5.pptx
post stroke aphasia rehabilitation physician
History and examination of abdomen, & pelvis .pptx
Clinical approach and Radiotherapy principles.pptx
شيت_عطا_0000000000000000000000000000.pdf
15.MENINGITIS AND ENCEPHALITIS-elias.pptx
Management of Acute Kidney Injury at LAUTECH
SKIN Anatomy and physiology and associated diseases
Respiratory drugs, drugs acting on the respi system
ACID BASE management, base deficit correction
1b - INTRODUCTION TO EPIDEMIOLOGY (comm med).ppt
Cardiovascular - antihypertensive medical backgrounds
Human Health And Disease hggyutgghg .pdf
JUVENILE NASOPHARYNGEAL ANGIOFIBROMA.pptx
Acid Base Disorders educational power point.pptx

Clinical data based optimal STI strategies for HIV: a reinforcement learning approach

  • 1. Clinical data based optimal STI strategies for HIV: a reinforcement learning approach Damien Ernst Department of Electrical Engineering and Computer Science University of Li`ege Montefiore - March 9, 2006 Presentation based on the paper: “Clinical data based optimal STI strategies for HIV: a reinforcement leanring approach”. D. Ernst, G.B. Stan, J. Gon¸calves and L. Wehenkel . Damien Ernst Clinical data .... (1/22)
  • 2. HIV Human Immunodeficiency Virus (HIV) is a retrovirus at the source of the Acquired Immune Defficiency Syndrome (AIDS) HIV particles target cells of the immune system (mostly CD4+ lymphocytes and macrophages) Inclusion of HIV particles in immune cells lead to massive production of new viral particles, death of the infected cells and, ultimately, devastation of the immune system Damien Ernst Clinical data .... (2/22)
  • 3. Current anti-HIV drugs Two main categories: 1. Reverse Transcriptaese Inhibitors (RTI) 2. Protease Inhibitor (PI) Figure: Taken from http://www.cellsalive.com/hiv0.htm Damien Ernst Clinical data .... (3/22)
  • 4. Treatments for infected patients Highly Active Anti-Retroviral Therapy (HAART): combination of two or more drugs. Usually one or more RTIs in combinations with a PI. Two main concerns about the long-term used of anti retroviral drugs: undesirable side effects (leading to poor compliance) and mutation of the virus (need to change drugs or even inability to find appropriate pharmaceutical treatments). Need for efficient drug scheduling strategies. Idealistically, a drug-scheduling strategy should bring the system to a state where the immune system has control over the virus (with low amount of drugs and low systemic effects). Damien Ernst Clinical data .... (4/22)
  • 5. Structured Treatment Interruption (STI) STI: to cycle the patient on and off drug therapy STI strategies often well received by patients since they offer them period of relief from treatment In some remarkable cases, STI strategies have enabled the patients to maintain immune control over the virus in the absence of treatment Goal of this research: to compute optimal STI strategies Damien Ernst Clinical data .... (5/22)
  • 6. STI: A glimpse at today’s practice If CD4+ cell count falls below a certain threshold, put the patient on drugs. Otherwise put him off. This practice has met some problems: Figure: Taken from http://www.cpcra.org/docs/pubs/2006/croi2006-smart.pdf Damien Ernst Clinical data .... (6/22)
  • 7. More advanced techniques (not clinically tested) Some authors have proposed to design STI treatments by exploiting mathematical models of the HIV infection. Models are under the form of a set of Ordinary Differential Equations (ODEs) Deduction of STI strategies is done by using methods from the control theory. But modelling of the HIV dynamics is a difficult task. Indeed, one has to select the right parametric system of ODEs to fit the parameters to reflect quantitatively biological observations Damien Ernst Clinical data .... (7/22)
  • 8. An interesting alternative Infer directly from clinical data good STI strategies, without modelling the HIV infection dynamics. Clinical data: time evolution of patient’s state (CD4+ T cell count, systemic costs of the drugs, etc) recorded at discrete-time instant and sequence of drugs administered. Clinical data can be seen as trajectories of the immune system responding to treatment. Damien Ernst Clinical data .... (8/22)
  • 9. Inferring policies from trajectories Problem of inferring from trajectories appropriate control policy has been studied in control theory and computer science. One way to approach it: state an optimality criterion and search for strategies optimizing this criterion. Classical approach: infer a model and derive from it and the optimality criterion an optimal strategy. Reinforcement learning approach: compute optimal strategies directly from the trajectory, without identifying a model. Damien Ernst Clinical data .... (9/22)
  • 10. The trajectories are processed by using reinforcement learning techniques patients A pool of HIV infected problem which typically containts the following information: some (near) optimal STI strategies, often under the form of a mapping given time and the drugs he has to take protocols and are monitored at regular intervals The patients follow some (possibly suboptimal) STI The monitoring of each patient generates a trajectory for the optimal STI drugs taken by the patient between t0 and t1 = t0 + n days state of the patient at time t0 state of the patient at time t1 drugs taken by the patient between t1 and t2 = t1 + n days state of the patient at time t2 drugs taken by the patient between t2 and t3 = t2 + n days Processing of the trajectories gives between the state of the patient at a till the next time his state is monitored. Figure: Determination of optimal STI strategies from clinical data by using reinforcement learning algorithms: the overall principle. Damien Ernst Clinical data .... (10/22)
  • 11. Learning from a sample of trajectories: the RL approach Problem formulation Discrete-time dynamics: xt+1 = f (xt , ut ) t = 0, 1, . . . where xt ∈ X and ut ∈ U. Cost function: c(x, u) : X × U → R. c(x, u) bounded by Bc. Discounted infinite horizon cost associated to stationary policy µ : X → U: Jµ(x) = lim N→∞ N−1 t=0 γt c(xt , µ(xt)) Optimal stationary policy µ∗ : Policy that minimizes Jµ for all x. Objective: Find an optimal policy µ∗. We do not know: The discrete-time dynamics. We know instead: A set of trajectories (x0, u0, x1, · · · , uT−1, xT ). Damien Ernst Clinical data .... (11/22)
  • 12. Some dynamic programming results Sequence of functions QN : X × U → R QN(x, u) = c(x, u) + γ min u ∈U QN−1(f (x, u), u ), ∀N > 1 with Q1(x, u) ≡ c(x, u), converges to the Q-function, unique solution of the Bellman equation: Q(x, u) = c(x, u) + γ min u ∈U Q(f (x, u), u ). Necessary and sufficient optimality condition: µ∗ (x) ∈ arg min u∈U Q(x, u) Stationary policy µ∗ N: µ∗ N (x) ∈ arg min u∈U QN(x, u). Bound on the suboptimality of µ∗ N: Jµ∗ N − Jµ∗ ≤ 2γN Bc (1 − γ)2 . Damien Ernst Clinical data .... (12/22)
  • 13. Fitted Q iteration Trajectories (x0, u0, x1, · · · , uT−1, xT ) transformed into a set of one-step system transitions F = {(xl t , ul t , xl t+1)}#F l=1. Fitted Q iteration computes from F the functions ˆQ1, ˆQ2, . . ., ˆQN, approximations of Q1, Q2, . . ., QN. Computation done iteratively by solving a sequence of standard supervised learning (SL) problems. Training sample for the kth (k ≥ 2) problem is (xl t , ul t ), c(xl t , ul t) + γmin u∈U ˆQk−1(xl t+1, u) #F l=1 with ˆQ1(x, u) ≡ c(x, u). From the kth training sample, the supervised learning algorithm outputs ˆQk . ˆµ∗ N(x) ∈ arg min u∈U ˆQN (x, u) is taken as approximation of µ∗(x). In our simulations, SL method used is an ensemble of regression trees method named Extra-Trees. Damien Ernst Clinical data .... (13/22)
  • 14. Illustration We present results we have obtained by using the RL-based approach on artificially generated data. The example is directly inspired from B.M. Adams, H.T. Banks, Hee-Dae Kwon and H.T. Tran. (2004). “Dynamic multidrug therapies for HIV: Optimal and STI Control Approaches”. Mathematical Biosciences and Engineering, 1, 223-241. Damien Ernst Clinical data .... (14/22)
  • 15. Illustration: Kinds of STI strategies targeted Bi-therapy treatments combining a fixed RTI and a fixed PI. Revise drug administration every five days based on clinical measurements. Four possible on-off combinations for the next five days: RTI and PI on, only RTI on, only STI on, RTI and PI off We seek STI strategies that minimize Jµ. Instantaneous cost at time t: c(xt, ut ) = 0.1Vt + 20000 2 1t + 2000 2 2t − 1000Et 1t = 0.7 (resp. 1t = 0) if the RTI is cycled on (resp. off) at t 2t = 0.3 (resp. 2t = 0) if the PI is cycled on (resp. off) at time t V : number of free HI viruses E: number of cytotoxic T-lymphocytes Decay factor γ: chosen equal to 0.98. Damien Ernst Clinical data .... (15/22)
  • 16. Illustration: A mathematical model as substitute for real-life patients ˙T1 = λ1 − d1T1 − (1 − 1)k1VT1 ˙T2 = λ2 − d2T2 − (1 − f 1)k2VT2 ˙T∗ 1 = (1 − 1)k1VT1 − δT∗ 1 − m1ET∗ 1 ˙T∗ 2 = (1 − f 1)k2VT2 − δT∗ 2 − m2ET∗ 2 ˙V = (1 − 2)NT δ(T∗ 1 + T∗ 2 ) − cV − [(1 − 1)ρ1k1T1 + (1 − f 1)ρ2k2T2]V ˙E = λE + bE (T∗ 1 + T∗ 2 ) (T∗ 1 + T∗ 2 ) + Kb E − dE (T∗ 1 + T∗ 2 ) (T∗ 1 + T∗ 2 ) + Kd E − δE E T1 (T∗ 1 ) = number of non-infected (infected) CD4+ lymphocytes T2 (T∗ 2 ) = non-infected (infected) macrophages V = number of free HI viruses E = number of cytotoxic T-lymphocytes. 1 and 2 = control actions corresponding to RTI and the PI. Period during which the RTI (resp. the PI) is administrated to the patient: 1 (resp. 2) is set equal to 0.7 (resp. 0.3). RTI (resp. the PI) not administrated: 1 = 0 (resp. 2 = 0). Damien Ernst Clinical data .... (16/22)
  • 17. Illustration: Some insight into this model In absence of treatment, three physical equilibrium points: 1. uninfected state: (T1, T2, T∗ 1 , T∗ 2 , V , E) = (106 , 3198, 0, 0, 0, 10) 2. “healthy” locally stable equilibrium (T1, T2, T∗ 1 , T∗ 2 , V , E) = (967839, 621, 76, 6, 415, 353108) (small viral load, a high CD4+ T-lymphocytes count, high HIV-specific cytotoxic T-cells count) 3. “non-healthy” locally stable equilibrium point (T1, T2, T∗ 1 , T∗ 2 , V , E) = (163573, 5, 11945, 46, 63919, 24) (T-cells depleted, viral load very high). Damien Ernst Clinical data .... (17/22)
  • 18. Illustration: Protocol for artificially generating the clinical data Monitoring of patients: every five days during 1000 days. Medication: can be revised every five days based on the information generated by the monitoring. Iterative generation of the clinical data (ten iterations): First iteration. Thirty patients in “non-healthy” steady-state. Physiological data ( T1, T2, T∗ 1 , T∗ 2 , V , E) recorded and a new type of medication randomly selected in U every five days. Monitoring of each patient generates a trajectory (x0, u0, x1, · · · , x199, u199, x200). Second iteration. Only difference with first iteration: medication determined by the following STI strategy: in 85% of the cases, use strategy ˆµ∗ 400 computed by fitted Q iteration on previously generated trajectories; in the remaining 15% medication randomly selected in U. Third-tenth iteration: idem as second iteration. Damien Ernst Clinical data .... (18/22)
  • 19. Illustration: Simulation results 0 5.2 5.3 5.4 5.5 5.6 5.7 5.8 5.9 days log10(T1) 250 500 750 0 days 250 500 750 -0.5 0.0 0.5 1. 1.5 2. 2.5 3. log10(T2) -1. 0.0 1. 2. 3. 4. 5. 0 days 250 500 750 log10(T∗ 1) 0 days 250 500 750 -1. 0.0 0.5 1. 1.5 2. -0.5 log10(T∗ 2) 0.0 2. 3. 4. 5. 6. 0 days 250 500 750 log10(V) 1. 0 days 250 500 750 log10(E) 2. 3. 4. 5. Figure: Solid curve (−) corresponds = patient which follows STI strategies; dashed curves (− −) = no interruption in the treatment; dotted curves (− ·) = no treatment Damien Ernst Clinical data .... (19/22)
  • 20. 0 days 250 500 750 reversetranscriptase inhibitor off on 0 days 250 500 750 inhibitor protease off on Figure: STI treatment for a patient treated from early stage of infection. Clinical data generated by 300 patients. infinite time horizon cost number of patients -5.e+8 -1.e+9 -1.5e+9 -2.e+9 -2.5e+9 -3.e+9 -3.5e+9 -4.e+9 240 300180120906030 Figure: Influence of the number of patients on the infinite time horizon cost corresponding to the computed STI strategies. Damien Ernst Clinical data .... (20/22)
  • 21. From numerically simulated data to real-life patients We expect to face four main difficulties: The HIV/immune system dynamics may be different from one patient to the other. Difficulty to state properly the optimal control problem Partial observability Corrupted measurements Damien Ernst Clinical data .... (21/22)
  • 22. Conclusions Reinforcement learning algorithms seem to be promising tools to extract from clinical data, good STI strategies. Lot of work is however still needed !!! But 40 millions of people are living with HIV/AIDS. Isn’t it a good reason to keep working hard ? Figure: Taken from UNAIDS. AIDS epidemic update: December 2005. “UNAIDS/05.19E” Damien Ernst Clinical data .... (22/22)