SlideShare a Scribd company logo
A Reinforcement Learning Approach for Hybrid Flexible Flowline Scheduling Problems
A Reinforcement Learning Approach to Solving
Hybrid Flexible Flowline Scheduling Problems
Bert Van Vreckem Dmitriy Borodin Wim De Bruyn Ann
Now´e
Authors
• Bert Van Vreckem, HoGent Business and Information
Management
bert.vanvreckem@hogent.be
• Dmitriy Borodin, OMPartners
dborodin@ompartners.com
• Wim De Bruyn, HoGent Business and Information
Management
wim.debruyn@hogent.be
• Ann Now´e, Artificial Intelligence Lab, Vrije Universiteit Brussel
ann.nowe@vub.ac.be
HFFSP MISTA2013: 29 August 2013 3/28
Contents
1 Hybrid Flexible Flowline Scheduling Problems
2 A Machine Learning Approach
3 Learning Permutations with Precedence Constraints
4 Experiments & results
5 Conclusion
HFFSP MISTA2013: 29 August 2013 4/28
Hybrid Flexible Flowline Scheduling Problems
Powerful model for complex real-life production scheduling
problems.
In α/β/γ notation1:
HFFLm, ((RM(i)
)
(m)
i=1/Mj, rm, prec, Siljk, Ailjk, lag/Cmax
1
(Urlings, 2010)
HFFSP MISTA2013: 29 August 2013 5/28
Hybrid Flexible Flowline Scheduling Problems
Powerful model for complex real-life production scheduling
problems.
In α/β/γ notation1:
HFFLm, ((RM(i)
)
(m)
i=1/Mj, rm, prec, Siljk, Ailjk, lag/Cmax
Flowline Scheduling problems: jobs processed in consecutive stages.
Stage 1 Stage 2 Stage 3 Stage 4
1
(Urlings, 2010)
HFFSP MISTA2013: 29 August 2013 5/28
Hybrid Flexible Flowline Scheduling Problems
Hybrid case: unrelated parallel machines
M11
M12
M13
M21
M22
M31
M32
M33
M34
M41
M42
HFFSP MISTA2013: 29 August 2013 6/28
Hybrid Flexible Flowline Scheduling Problems
Flexible case: stages may be skipped
M11
M12
M13
M21
M22
M41
M42
HFFSP MISTA2013: 29 August 2013 7/28
Hybrid Flexible Flowline Scheduling Problems
Other constraints: Machine eligibility
M11
M13
M21
M22
M31
M33
M42
HFFSP MISTA2013: 29 August 2013 8/28
Hybrid Flexible Flowline Scheduling Problems
Other constraints: Time lag between stages
Stage 1
Stage 2
Stage 3
Stage 4
HFFSP MISTA2013: 29 August 2013 9/28
Hybrid Flexible Flowline Scheduling Problems
Other constraints: Sequence dependent setup times
1 2 3 4 5 6 7 8 9 10 11 12
J1 J2M1
J1 J2M2
HFFSP MISTA2013: 29 August 2013 10/28
Hybrid Flexible Flowline Scheduling Problems
Other constraints: Sequence dependent setup times
1 2 3 4 5 6 7 8 9 10 11 12
J1 J2M1
J1 J2M2
J2 J1M1
J2 J1M2
HFFSP MISTA2013: 29 August 2013 10/28
Hybrid Flexible Flowline Scheduling Problems
Other constraints: Sequence dependent setup times
1 2 3 4 5 6 7 8 9 10 11 12
J1 J2M1
J1 J2M2
J2 J1M1
J2 J1M2
HFFSP MISTA2013: 29 August 2013 11/28
Hybrid Flexible Flowline Scheduling Problems
Other constraints: Precendence relations between jobs
1 2 3 4 5 6 7 8 9 10 11 12
J1 J2M1
J1 J2M2
J2 J1M1
J2 J1M2
HFFSP MISTA2013: 29 August 2013 12/28
Hybrid Flexible Flowline Scheduling Problems
Precedence relations between jobs make the problem much
harder, in a way that MILP/CPLEX approach doesn’t work
anymore for larger instances (Urlings, 2010)
HFFSP MISTA2013: 29 August 2013 13/28
Contents
1 Hybrid Flexible Flowline Scheduling Problems
2 A Machine Learning Approach
3 Learning Permutations with Precedence Constraints
4 Experiments & results
5 Conclusion
HFFSP MISTA2013: 29 August 2013 14/28
A Machine Learning Approach
Scheduling Hybrid Flexible Flowline Scheduling Problems
Two stages:
• Job permutations
• Machine assignment
HFFSP MISTA2013: 29 August 2013 15/28
A Machine Learning Approach
Scheduling Hybrid Flexible Flowline Scheduling Problems
Two stages:
• Job permutations → Learning Automata
• Machine assignment
HFFSP MISTA2013: 29 August 2013 15/28
A Machine Learning Approach
Scheduling Hybrid Flexible Flowline Scheduling Problems
Two stages:
• Job permutations → Learning Automata
• Machine assignment → Earliest Preparation Next Stage
(EPNS) (Urlings, 2010)
HFFSP MISTA2013: 29 August 2013 15/28
A Machine Learning Approach
Scheduling Hybrid Flexible Flowline Scheduling Problems
Two stages:
• Job permutations → Learning Automata
• Machine assignment → Earliest Preparation Next Stage
(EPNS) (Urlings, 2010)
HFFSP MISTA2013: 29 August 2013 15/28
Reinforcement learning
At every discrete time step t:
• Agent percieves environment state s(t)
• Agent chooses action a(t) ∈ A = a1, . . . , an according to
some policy
• Environment places agent in new state s(t + 1) and gives
reinforcement r(t)
• Goal: learn policy that maximizes long term cumulative
reward t r(t)
Environment
Agent
s
r
a
HFFSP MISTA2013: 29 August 2013 16/28
Learning Automata (LA)
Reinforcement Learning agents that choose action according to
probability distribution p(t) = (p1(t), . . . , pn(t)), with
pi = Prob[a(t) = ai] and s.t. n
i=1 pi = 1
pi(0) = 1
n (1)
pi(t + 1) = pi(t) +αrewr(t)(1 − pi(t))
−αpen(1 − r(t))pi(t) (2)
if ai is the action taken at instant t
pj(t + 1) = pj(t) −αrewr(t)pj(t)
+αpen(1 − r(t))
1
n − 1
− pj(t) (3)
if aj = ai
HFFSP MISTA2013: 29 August 2013 17/28
Learning Automata (LA)
Reinforcement Learning agents that choose action according to
probability distribution p(t) = (p1(t), . . . , pn(t)), with
pi = Prob[a(t) = ai] and s.t. n
i=1 pi = 1
pi(0) = 1
n (1)
pi(t + 1) = pi(t) +αrewr(t)(1 − pi(t))
−αpen(1 − r(t))pi(t) (2)
if ai is the action taken at instant t
pj(t + 1) = pj(t) −αrewr(t)pj(t)
+αpen(1 − r(t))
1
n − 1
− pj(t) (3)
if aj = ai
HFFSP MISTA2013: 29 August 2013 17/28
Learning Automata (LA)
Reinforcement Learning agents that choose action according to
probability distribution p(t) = (p1(t), . . . , pn(t)), with
pi = Prob[a(t) = ai] and s.t. n
i=1 pi = 1
pi(0) = 1
n (1)
pi(t + 1) = pi(t) +αrewr(t)(1 − pi(t))
−αpen(1 − r(t))pi(t) (2)
if ai is the action taken at instant t
pj(t + 1) = pj(t) −αrewr(t)pj(t)
+αpen(1 − r(t))
1
n − 1
− pj(t) (3)
if aj = ai
HFFSP MISTA2013: 29 August 2013 17/28
Learning Automaton update
1 2 3 4
0
0.2
0.4
0.6
0.8
1
i
pi
HFFSP MISTA2013: 29 August 2013 18/28
Learning Automaton update
1 2 3 4
0
0.2
0.4
0.6
0.8
1
i
pi
E.g. action 3 was chosen
HFFSP MISTA2013: 29 August 2013 18/28
Learning Automaton update
1 2 3 4
0
0.2
0.4
0.6
0.8
1
i
pi
E.g. action 3 was chosen
1 2 3 4
0
0.2
0.4
0.6
0.8
1
r(t) = 1
pi
HFFSP MISTA2013: 29 August 2013 18/28
Learning Automaton update
1 2 3 4
0
0.2
0.4
0.6
0.8
1
i
pi
E.g. action 3 was chosen
1 2 3 4
0
0.2
0.4
0.6
0.8
1
r(t) = 1
pi
1 2 3 4
0
0.2
0.4
0.6
0.8
1
r(t) = 0
pi
HFFSP MISTA2013: 29 August 2013 18/28
Contents
1 Hybrid Flexible Flowline Scheduling Problems
2 A Machine Learning Approach
3 Learning Permutations with Precedence Constraints
4 Experiments & results
5 Conclusion
HFFSP MISTA2013: 29 August 2013 19/28
Probabilistic Basic Simple Strategy (PBSS)
(Wauters, 2012)
• A LA is assigned to every position of a permutation
HFFSP MISTA2013: 29 August 2013 20/28
Probabilistic Basic Simple Strategy (PBSS)
(Wauters, 2012)
• A LA is assigned to every position of a permutation
• LAs play a dispersion game to choose unique action, resulting
in a permutation
HFFSP MISTA2013: 29 August 2013 20/28
Probabilistic Basic Simple Strategy (PBSS)
(Wauters, 2012)
• A LA is assigned to every position of a permutation
• LAs play a dispersion game to choose unique action, resulting
in a permutation
• Quality of solution is evaluated
HFFSP MISTA2013: 29 August 2013 20/28
Probabilistic Basic Simple Strategy (PBSS)
(Wauters, 2012)
• A LA is assigned to every position of a permutation
• LAs play a dispersion game to choose unique action, resulting
in a permutation
• Quality of solution is evaluated
• Update probabilities according to LA update rule Linear
Reward-Inaction (αpen = 0):
HFFSP MISTA2013: 29 August 2013 20/28
Probabilistic Basic Simple Strategy (PBSS)
(Wauters, 2012)
• A LA is assigned to every position of a permutation
• LAs play a dispersion game to choose unique action, resulting
in a permutation
• Quality of solution is evaluated
• Update probabilities according to LA update rule Linear
Reward-Inaction (αpen = 0):
• Better result than best one so far: r(t) = 1
HFFSP MISTA2013: 29 August 2013 20/28
Probabilistic Basic Simple Strategy (PBSS)
(Wauters, 2012)
• A LA is assigned to every position of a permutation
• LAs play a dispersion game to choose unique action, resulting
in a permutation
• Quality of solution is evaluated
• Update probabilities according to LA update rule Linear
Reward-Inaction (αpen = 0):
• Better result than best one so far: r(t) = 1
• If not, r(t) = 0
HFFSP MISTA2013: 29 August 2013 20/28
Probabilistic Basic Simple Strategy (PBSS)
(Wauters, 2012)
• A LA is assigned to every position of a permutation
• LAs play a dispersion game to choose unique action, resulting
in a permutation
• Quality of solution is evaluated
• Update probabilities according to LA update rule Linear
Reward-Inaction (αpen = 0):
• Better result than best one so far: r(t) = 1
• If not, r(t) = 0
• Repeat until convergence
HFFSP MISTA2013: 29 August 2013 20/28
Probabilistic Basic Simple Strategy (PBSS)
• PBSS: great results in several optimization problems that
involve learning permutations
HFFSP MISTA2013: 29 August 2013 21/28
Probabilistic Basic Simple Strategy (PBSS)
• PBSS: great results in several optimization problems that
involve learning permutations
• but doesn’t work well when precedence constraints are
involved
HFFSP MISTA2013: 29 August 2013 21/28
Probabilistic Basic Simple Strategy (PBSS)
• PBSS: great results in several optimization problems that
involve learning permutations
• but doesn’t work well when precedence constraints are
involved
• PBSS only learns from positive experience (i.e. improving on
previous solutions)
HFFSP MISTA2013: 29 August 2013 21/28
Probabilistic Basic Simple Strategy (PBSS)
• PBSS: great results in several optimization problems that
involve learning permutations
• but doesn’t work well when precedence constraints are
involved
• PBSS only learns from positive experience (i.e. improving on
previous solutions)
• Doesn’t learn to avoid invalid permutations
HFFSP MISTA2013: 29 August 2013 21/28
Extending PBSS for precendence constraints
Updating probabilities:
• If the job permutation is invalid, perform an update with
r(t) = 0 and αpen > 0 for all agents that are involved in the
violation of precedence constraints.
HFFSP MISTA2013: 29 August 2013 22/28
Extending PBSS for precendence constraints
Updating probabilities:
• If the job permutation is invalid, perform an update with
r(t) = 0 and αpen > 0 for all agents that are involved in the
violation of precedence constraints.
• If the job permutation is valid, perform a LR−I update in all
agents, depending on the resulting makespan ms and best
makespan until now msbest:
HFFSP MISTA2013: 29 August 2013 22/28
Extending PBSS for precendence constraints
Updating probabilities:
• If the job permutation is invalid, perform an update with
r(t) = 0 and αpen > 0 for all agents that are involved in the
violation of precedence constraints.
• If the job permutation is valid, perform a LR−I update in all
agents, depending on the resulting makespan ms and best
makespan until now msbest:
• improved: r(t) = 1;
HFFSP MISTA2013: 29 August 2013 22/28
Extending PBSS for precendence constraints
Updating probabilities:
• If the job permutation is invalid, perform an update with
r(t) = 0 and αpen > 0 for all agents that are involved in the
violation of precedence constraints.
• If the job permutation is valid, perform a LR−I update in all
agents, depending on the resulting makespan ms and best
makespan until now msbest:
• improved: r(t) = 1;
• equally good: r(t) = 1/2;
HFFSP MISTA2013: 29 August 2013 22/28
Extending PBSS for precendence constraints
Updating probabilities:
• If the job permutation is invalid, perform an update with
r(t) = 0 and αpen > 0 for all agents that are involved in the
violation of precedence constraints.
• If the job permutation is valid, perform a LR−I update in all
agents, depending on the resulting makespan ms and best
makespan until now msbest:
• improved: r(t) = 1;
• equally good: r(t) = 1/2;
• worse: r(t) = msbest
2ms ;
HFFSP MISTA2013: 29 August 2013 22/28
Extending PBSS for precendence constraints
Updating probabilities:
• If the job permutation is invalid, perform an update with
r(t) = 0 and αpen > 0 for all agents that are involved in the
violation of precedence constraints.
• If the job permutation is valid, perform a LR−I update in all
agents, depending on the resulting makespan ms and best
makespan until now msbest:
• improved: r(t) = 1;
• equally good: r(t) = 1/2;
• worse: r(t) = msbest
2ms ;
• no valid schedule found: r(t) = 0;
HFFSP MISTA2013: 29 August 2013 22/28
Contents
1 Hybrid Flexible Flowline Scheduling Problems
2 A Machine Learning Approach
3 Learning Permutations with Precedence Constraints
4 Experiments & results
5 Conclusion
HFFSP MISTA2013: 29 August 2013 23/28
Experiments
• HFFSP Benchmark problems from (Ruiz et al., 2008)2
• problem sets with 5, 7, 9, 11, 13, 15 jobs, 96 instances in each
set
• + other constraints that make problems harder (precedence
relations!)
• αrew = 0.1; αpen = 0.5 (no tuning)
• Run until converges, or at most 300 seconds
2
Available at http://soa.iti.es/problem-instances
HFFSP MISTA2013: 29 August 2013 24/28
Results
Instance set 5 7 9 11 13 15 overall
mean RD (%) 0.0697 2.0131 1.1568 1.6565 3.7294 7.9189 2.7484
best RD (%) -35.70 -24.71 -26.92 -21.10 -43.34 -10.46 -43.34
# improved 11 12 18 12 9 6 68
# equal 62 40 19 18 8 7 154
# worse 23 44 59 66 79 82 354
HFFSP MISTA2013: 29 August 2013 25/28
Results
Instance set 5 7 9 11 13 15 overall
mean RD (%) 0.0697 2.0131 1.1568 1.6565 3.7294 7.9189 2.7484
best RD (%) -35.70 -24.71 -26.92 -21.10 -43.34 -10.46 -43.34
# improved 11 12 18 12 9 6 68
# equal 62 40 19 18 8 7 154
# worse 23 44 59 66 79 82 354
HFFSP MISTA2013: 29 August 2013 25/28
Contents
1 Hybrid Flexible Flowline Scheduling Problems
2 A Machine Learning Approach
3 Learning Permutations with Precedence Constraints
4 Experiments & results
5 Conclusion
HFFSP MISTA2013: 29 August 2013 26/28
Results and Discussion
Contributions:
• Extension of PBSS for learning permutations with precedence
constraints
• Simple model + RL approach can yield good quality results
for challenging HFFSP instances
HFFSP MISTA2013: 29 August 2013 27/28
Results and Discussion
Contributions:
• Extension of PBSS for learning permutations with precedence
constraints
• Simple model + RL approach can yield good quality results
for challenging HFFSP instances
Discussion & future work:
• Precedence relations do make the problem harder
• Parameter tuning
• Convergence
• Larger instances (50, 100 jobs)
• Explore possibilities for improvement in machine assignment
HFFSP MISTA2013: 29 August 2013 27/28
Thank you!
Questions?
bert.vanvreckem@hogent.be
http://www.slideshare.net/bertvanvreckem/
HFFSP MISTA2013: 29 August 2013 28/28

More Related Content

PDF
ddpg seminar
PDF
Stochastic optimization from mirror descent to recent algorithms
PDF
Chap 8. Optimization for training deep models
PDF
sada_pres
PDF
Accelerating Pseudo-Marginal MCMC using Gaussian Processes
PDF
ppt0320defenseday
PPTX
Multi PPT - Agent Actor-Critic for Mixed Cooperative-Competitive Environments
PDF
Optimization for Deep Learning
ddpg seminar
Stochastic optimization from mirror descent to recent algorithms
Chap 8. Optimization for training deep models
sada_pres
Accelerating Pseudo-Marginal MCMC using Gaussian Processes
ppt0320defenseday
Multi PPT - Agent Actor-Critic for Mixed Cooperative-Competitive Environments
Optimization for Deep Learning

What's hot (20)

PDF
Stein's method for functional Poisson approximation
PDF
Simplified Runtime Analysis of Estimation of Distribution Algorithms
PDF
ABC-Gibbs
PDF
Hierarchical Reinforcement Learning with Option-Critic Architecture
PDF
Problem Understanding through Landscape Theory
PDF
better together? statistical learning in models made of modules
PDF
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
PDF
Variational inference
PDF
prior selection for mixture estimation
PDF
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
PDF
RuleML2015: Input-Output STIT Logic for Normative Systems
PDF
Coordinate sampler : A non-reversible Gibbs-like sampler
PDF
master thesis presentation
PDF
talk MCMC & SMC 2004
PDF
QMC: Operator Splitting Workshop, Incremental Learning-to-Learn with Statisti...
PDF
Maximum likelihood estimation of regularisation parameters in inverse problem...
PDF
Can we estimate a constant?
PDF
CLIM Fall 2017 Course: Statistics for Climate Research, Guest lecture: Data F...
PDF
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
PDF
Bayesian Nonparametrics: Models Based on the Dirichlet Process
Stein's method for functional Poisson approximation
Simplified Runtime Analysis of Estimation of Distribution Algorithms
ABC-Gibbs
Hierarchical Reinforcement Learning with Option-Critic Architecture
Problem Understanding through Landscape Theory
better together? statistical learning in models made of modules
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Variational inference
prior selection for mixture estimation
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
RuleML2015: Input-Output STIT Logic for Normative Systems
Coordinate sampler : A non-reversible Gibbs-like sampler
master thesis presentation
talk MCMC & SMC 2004
QMC: Operator Splitting Workshop, Incremental Learning-to-Learn with Statisti...
Maximum likelihood estimation of regularisation parameters in inverse problem...
Can we estimate a constant?
CLIM Fall 2017 Course: Statistics for Climate Research, Guest lecture: Data F...
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
Bayesian Nonparametrics: Models Based on the Dirichlet Process
Ad

Viewers also liked (8)

ODP
Linux troubleshooting tips
PDF
Een fileserver opzetten met Samba
PDF
One vagrantfile to rule them all
PDF
Wachtwoorden in Linux
PDF
Workshop latex
PDF
Linux Enterprise - inleiding cursus, 5 trends in systeembeheer
PDF
Gebruikers, groepen en permissies
PDF
Een literatuurstudie maken: hoe & waarom
Linux troubleshooting tips
Een fileserver opzetten met Samba
One vagrantfile to rule them all
Wachtwoorden in Linux
Workshop latex
Linux Enterprise - inleiding cursus, 5 trends in systeembeheer
Gebruikers, groepen en permissies
Een literatuurstudie maken: hoe & waarom
Ad

Similar to A Reinforcement Learning Approach for Hybrid Flexible Flowline Scheduling Problems (20)

PDF
A report on designing a model for improving CPU Scheduling by using Machine L...
PPT
original
PDF
Lecture 9 Markov decision process
PDF
MarkovDecisionProcess&POMDP-MDP_PPTX.pdf
PDF
Stochastic scheduling
PPTX
2Multi_armed_bandits.pptx
PDF
Static Neural Compiler Optimization via Deep Reinforcement Learning
PPS
Slides
PDF
slides.pdfArtificial Intelligence: Towards Adaptive Autonomous Systems (resea...
PDF
A new Evolutionary Reinforcement Scheme for Stochastic Learning Automata
PDF
Uncertainty Awareness in Integrating Machine Learning and Game Theory
PDF
Reinforcement learning for data-driven optimisation
PDF
Learning for exploration-exploitation in reinforcement learning. The dusk of ...
PDF
A reinforcement learning approach for designing artificial autonomous intelli...
PPT
about reinforcement-learning ,reinforcement-learning.ppt
PPT
Reinforcement Learner) is an intelligent agent that’s always striving to lear...
PPT
Learning to Search Henry Kautz
PPT
Learning to Search Henry Kautz
PDF
Learning to discover monte carlo algorithm on spin ice manifold
PDF
Reinfrocement Learning
A report on designing a model for improving CPU Scheduling by using Machine L...
original
Lecture 9 Markov decision process
MarkovDecisionProcess&POMDP-MDP_PPTX.pdf
Stochastic scheduling
2Multi_armed_bandits.pptx
Static Neural Compiler Optimization via Deep Reinforcement Learning
Slides
slides.pdfArtificial Intelligence: Towards Adaptive Autonomous Systems (resea...
A new Evolutionary Reinforcement Scheme for Stochastic Learning Automata
Uncertainty Awareness in Integrating Machine Learning and Game Theory
Reinforcement learning for data-driven optimisation
Learning for exploration-exploitation in reinforcement learning. The dusk of ...
A reinforcement learning approach for designing artificial autonomous intelli...
about reinforcement-learning ,reinforcement-learning.ppt
Reinforcement Learner) is an intelligent agent that’s always striving to lear...
Learning to Search Henry Kautz
Learning to Search Henry Kautz
Learning to discover monte carlo algorithm on spin ice manifold
Reinfrocement Learning

Recently uploaded (20)

PDF
2.FourierTransform-ShortQuestionswithAnswers.pdf
PDF
Basic Mud Logging Guide for educational purpose
PDF
VCE English Exam - Section C Student Revision Booklet
PPTX
PPH.pptx obstetrics and gynecology in nursing
PDF
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
PPTX
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
PDF
O5-L3 Freight Transport Ops (International) V1.pdf
PDF
Abdominal Access Techniques with Prof. Dr. R K Mishra
PPTX
Cell Structure & Organelles in detailed.
PDF
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
PDF
Anesthesia in Laparoscopic Surgery in India
PPTX
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
PDF
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
PPTX
GDM (1) (1).pptx small presentation for students
PPTX
Cell Types and Its function , kingdom of life
PDF
FourierSeries-QuestionsWithAnswers(Part-A).pdf
PDF
RMMM.pdf make it easy to upload and study
PPTX
human mycosis Human fungal infections are called human mycosis..pptx
PDF
Sports Quiz easy sports quiz sports quiz
PDF
01-Introduction-to-Information-Management.pdf
2.FourierTransform-ShortQuestionswithAnswers.pdf
Basic Mud Logging Guide for educational purpose
VCE English Exam - Section C Student Revision Booklet
PPH.pptx obstetrics and gynecology in nursing
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
O5-L3 Freight Transport Ops (International) V1.pdf
Abdominal Access Techniques with Prof. Dr. R K Mishra
Cell Structure & Organelles in detailed.
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
Anesthesia in Laparoscopic Surgery in India
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
GDM (1) (1).pptx small presentation for students
Cell Types and Its function , kingdom of life
FourierSeries-QuestionsWithAnswers(Part-A).pdf
RMMM.pdf make it easy to upload and study
human mycosis Human fungal infections are called human mycosis..pptx
Sports Quiz easy sports quiz sports quiz
01-Introduction-to-Information-Management.pdf

A Reinforcement Learning Approach for Hybrid Flexible Flowline Scheduling Problems

  • 2. A Reinforcement Learning Approach to Solving Hybrid Flexible Flowline Scheduling Problems Bert Van Vreckem Dmitriy Borodin Wim De Bruyn Ann Now´e
  • 3. Authors • Bert Van Vreckem, HoGent Business and Information Management bert.vanvreckem@hogent.be • Dmitriy Borodin, OMPartners dborodin@ompartners.com • Wim De Bruyn, HoGent Business and Information Management wim.debruyn@hogent.be • Ann Now´e, Artificial Intelligence Lab, Vrije Universiteit Brussel ann.nowe@vub.ac.be HFFSP MISTA2013: 29 August 2013 3/28
  • 4. Contents 1 Hybrid Flexible Flowline Scheduling Problems 2 A Machine Learning Approach 3 Learning Permutations with Precedence Constraints 4 Experiments & results 5 Conclusion HFFSP MISTA2013: 29 August 2013 4/28
  • 5. Hybrid Flexible Flowline Scheduling Problems Powerful model for complex real-life production scheduling problems. In α/β/γ notation1: HFFLm, ((RM(i) ) (m) i=1/Mj, rm, prec, Siljk, Ailjk, lag/Cmax 1 (Urlings, 2010) HFFSP MISTA2013: 29 August 2013 5/28
  • 6. Hybrid Flexible Flowline Scheduling Problems Powerful model for complex real-life production scheduling problems. In α/β/γ notation1: HFFLm, ((RM(i) ) (m) i=1/Mj, rm, prec, Siljk, Ailjk, lag/Cmax Flowline Scheduling problems: jobs processed in consecutive stages. Stage 1 Stage 2 Stage 3 Stage 4 1 (Urlings, 2010) HFFSP MISTA2013: 29 August 2013 5/28
  • 7. Hybrid Flexible Flowline Scheduling Problems Hybrid case: unrelated parallel machines M11 M12 M13 M21 M22 M31 M32 M33 M34 M41 M42 HFFSP MISTA2013: 29 August 2013 6/28
  • 8. Hybrid Flexible Flowline Scheduling Problems Flexible case: stages may be skipped M11 M12 M13 M21 M22 M41 M42 HFFSP MISTA2013: 29 August 2013 7/28
  • 9. Hybrid Flexible Flowline Scheduling Problems Other constraints: Machine eligibility M11 M13 M21 M22 M31 M33 M42 HFFSP MISTA2013: 29 August 2013 8/28
  • 10. Hybrid Flexible Flowline Scheduling Problems Other constraints: Time lag between stages Stage 1 Stage 2 Stage 3 Stage 4 HFFSP MISTA2013: 29 August 2013 9/28
  • 11. Hybrid Flexible Flowline Scheduling Problems Other constraints: Sequence dependent setup times 1 2 3 4 5 6 7 8 9 10 11 12 J1 J2M1 J1 J2M2 HFFSP MISTA2013: 29 August 2013 10/28
  • 12. Hybrid Flexible Flowline Scheduling Problems Other constraints: Sequence dependent setup times 1 2 3 4 5 6 7 8 9 10 11 12 J1 J2M1 J1 J2M2 J2 J1M1 J2 J1M2 HFFSP MISTA2013: 29 August 2013 10/28
  • 13. Hybrid Flexible Flowline Scheduling Problems Other constraints: Sequence dependent setup times 1 2 3 4 5 6 7 8 9 10 11 12 J1 J2M1 J1 J2M2 J2 J1M1 J2 J1M2 HFFSP MISTA2013: 29 August 2013 11/28
  • 14. Hybrid Flexible Flowline Scheduling Problems Other constraints: Precendence relations between jobs 1 2 3 4 5 6 7 8 9 10 11 12 J1 J2M1 J1 J2M2 J2 J1M1 J2 J1M2 HFFSP MISTA2013: 29 August 2013 12/28
  • 15. Hybrid Flexible Flowline Scheduling Problems Precedence relations between jobs make the problem much harder, in a way that MILP/CPLEX approach doesn’t work anymore for larger instances (Urlings, 2010) HFFSP MISTA2013: 29 August 2013 13/28
  • 16. Contents 1 Hybrid Flexible Flowline Scheduling Problems 2 A Machine Learning Approach 3 Learning Permutations with Precedence Constraints 4 Experiments & results 5 Conclusion HFFSP MISTA2013: 29 August 2013 14/28
  • 17. A Machine Learning Approach Scheduling Hybrid Flexible Flowline Scheduling Problems Two stages: • Job permutations • Machine assignment HFFSP MISTA2013: 29 August 2013 15/28
  • 18. A Machine Learning Approach Scheduling Hybrid Flexible Flowline Scheduling Problems Two stages: • Job permutations → Learning Automata • Machine assignment HFFSP MISTA2013: 29 August 2013 15/28
  • 19. A Machine Learning Approach Scheduling Hybrid Flexible Flowline Scheduling Problems Two stages: • Job permutations → Learning Automata • Machine assignment → Earliest Preparation Next Stage (EPNS) (Urlings, 2010) HFFSP MISTA2013: 29 August 2013 15/28
  • 20. A Machine Learning Approach Scheduling Hybrid Flexible Flowline Scheduling Problems Two stages: • Job permutations → Learning Automata • Machine assignment → Earliest Preparation Next Stage (EPNS) (Urlings, 2010) HFFSP MISTA2013: 29 August 2013 15/28
  • 21. Reinforcement learning At every discrete time step t: • Agent percieves environment state s(t) • Agent chooses action a(t) ∈ A = a1, . . . , an according to some policy • Environment places agent in new state s(t + 1) and gives reinforcement r(t) • Goal: learn policy that maximizes long term cumulative reward t r(t) Environment Agent s r a HFFSP MISTA2013: 29 August 2013 16/28
  • 22. Learning Automata (LA) Reinforcement Learning agents that choose action according to probability distribution p(t) = (p1(t), . . . , pn(t)), with pi = Prob[a(t) = ai] and s.t. n i=1 pi = 1 pi(0) = 1 n (1) pi(t + 1) = pi(t) +αrewr(t)(1 − pi(t)) −αpen(1 − r(t))pi(t) (2) if ai is the action taken at instant t pj(t + 1) = pj(t) −αrewr(t)pj(t) +αpen(1 − r(t)) 1 n − 1 − pj(t) (3) if aj = ai HFFSP MISTA2013: 29 August 2013 17/28
  • 23. Learning Automata (LA) Reinforcement Learning agents that choose action according to probability distribution p(t) = (p1(t), . . . , pn(t)), with pi = Prob[a(t) = ai] and s.t. n i=1 pi = 1 pi(0) = 1 n (1) pi(t + 1) = pi(t) +αrewr(t)(1 − pi(t)) −αpen(1 − r(t))pi(t) (2) if ai is the action taken at instant t pj(t + 1) = pj(t) −αrewr(t)pj(t) +αpen(1 − r(t)) 1 n − 1 − pj(t) (3) if aj = ai HFFSP MISTA2013: 29 August 2013 17/28
  • 24. Learning Automata (LA) Reinforcement Learning agents that choose action according to probability distribution p(t) = (p1(t), . . . , pn(t)), with pi = Prob[a(t) = ai] and s.t. n i=1 pi = 1 pi(0) = 1 n (1) pi(t + 1) = pi(t) +αrewr(t)(1 − pi(t)) −αpen(1 − r(t))pi(t) (2) if ai is the action taken at instant t pj(t + 1) = pj(t) −αrewr(t)pj(t) +αpen(1 − r(t)) 1 n − 1 − pj(t) (3) if aj = ai HFFSP MISTA2013: 29 August 2013 17/28
  • 25. Learning Automaton update 1 2 3 4 0 0.2 0.4 0.6 0.8 1 i pi HFFSP MISTA2013: 29 August 2013 18/28
  • 26. Learning Automaton update 1 2 3 4 0 0.2 0.4 0.6 0.8 1 i pi E.g. action 3 was chosen HFFSP MISTA2013: 29 August 2013 18/28
  • 27. Learning Automaton update 1 2 3 4 0 0.2 0.4 0.6 0.8 1 i pi E.g. action 3 was chosen 1 2 3 4 0 0.2 0.4 0.6 0.8 1 r(t) = 1 pi HFFSP MISTA2013: 29 August 2013 18/28
  • 28. Learning Automaton update 1 2 3 4 0 0.2 0.4 0.6 0.8 1 i pi E.g. action 3 was chosen 1 2 3 4 0 0.2 0.4 0.6 0.8 1 r(t) = 1 pi 1 2 3 4 0 0.2 0.4 0.6 0.8 1 r(t) = 0 pi HFFSP MISTA2013: 29 August 2013 18/28
  • 29. Contents 1 Hybrid Flexible Flowline Scheduling Problems 2 A Machine Learning Approach 3 Learning Permutations with Precedence Constraints 4 Experiments & results 5 Conclusion HFFSP MISTA2013: 29 August 2013 19/28
  • 30. Probabilistic Basic Simple Strategy (PBSS) (Wauters, 2012) • A LA is assigned to every position of a permutation HFFSP MISTA2013: 29 August 2013 20/28
  • 31. Probabilistic Basic Simple Strategy (PBSS) (Wauters, 2012) • A LA is assigned to every position of a permutation • LAs play a dispersion game to choose unique action, resulting in a permutation HFFSP MISTA2013: 29 August 2013 20/28
  • 32. Probabilistic Basic Simple Strategy (PBSS) (Wauters, 2012) • A LA is assigned to every position of a permutation • LAs play a dispersion game to choose unique action, resulting in a permutation • Quality of solution is evaluated HFFSP MISTA2013: 29 August 2013 20/28
  • 33. Probabilistic Basic Simple Strategy (PBSS) (Wauters, 2012) • A LA is assigned to every position of a permutation • LAs play a dispersion game to choose unique action, resulting in a permutation • Quality of solution is evaluated • Update probabilities according to LA update rule Linear Reward-Inaction (αpen = 0): HFFSP MISTA2013: 29 August 2013 20/28
  • 34. Probabilistic Basic Simple Strategy (PBSS) (Wauters, 2012) • A LA is assigned to every position of a permutation • LAs play a dispersion game to choose unique action, resulting in a permutation • Quality of solution is evaluated • Update probabilities according to LA update rule Linear Reward-Inaction (αpen = 0): • Better result than best one so far: r(t) = 1 HFFSP MISTA2013: 29 August 2013 20/28
  • 35. Probabilistic Basic Simple Strategy (PBSS) (Wauters, 2012) • A LA is assigned to every position of a permutation • LAs play a dispersion game to choose unique action, resulting in a permutation • Quality of solution is evaluated • Update probabilities according to LA update rule Linear Reward-Inaction (αpen = 0): • Better result than best one so far: r(t) = 1 • If not, r(t) = 0 HFFSP MISTA2013: 29 August 2013 20/28
  • 36. Probabilistic Basic Simple Strategy (PBSS) (Wauters, 2012) • A LA is assigned to every position of a permutation • LAs play a dispersion game to choose unique action, resulting in a permutation • Quality of solution is evaluated • Update probabilities according to LA update rule Linear Reward-Inaction (αpen = 0): • Better result than best one so far: r(t) = 1 • If not, r(t) = 0 • Repeat until convergence HFFSP MISTA2013: 29 August 2013 20/28
  • 37. Probabilistic Basic Simple Strategy (PBSS) • PBSS: great results in several optimization problems that involve learning permutations HFFSP MISTA2013: 29 August 2013 21/28
  • 38. Probabilistic Basic Simple Strategy (PBSS) • PBSS: great results in several optimization problems that involve learning permutations • but doesn’t work well when precedence constraints are involved HFFSP MISTA2013: 29 August 2013 21/28
  • 39. Probabilistic Basic Simple Strategy (PBSS) • PBSS: great results in several optimization problems that involve learning permutations • but doesn’t work well when precedence constraints are involved • PBSS only learns from positive experience (i.e. improving on previous solutions) HFFSP MISTA2013: 29 August 2013 21/28
  • 40. Probabilistic Basic Simple Strategy (PBSS) • PBSS: great results in several optimization problems that involve learning permutations • but doesn’t work well when precedence constraints are involved • PBSS only learns from positive experience (i.e. improving on previous solutions) • Doesn’t learn to avoid invalid permutations HFFSP MISTA2013: 29 August 2013 21/28
  • 41. Extending PBSS for precendence constraints Updating probabilities: • If the job permutation is invalid, perform an update with r(t) = 0 and αpen > 0 for all agents that are involved in the violation of precedence constraints. HFFSP MISTA2013: 29 August 2013 22/28
  • 42. Extending PBSS for precendence constraints Updating probabilities: • If the job permutation is invalid, perform an update with r(t) = 0 and αpen > 0 for all agents that are involved in the violation of precedence constraints. • If the job permutation is valid, perform a LR−I update in all agents, depending on the resulting makespan ms and best makespan until now msbest: HFFSP MISTA2013: 29 August 2013 22/28
  • 43. Extending PBSS for precendence constraints Updating probabilities: • If the job permutation is invalid, perform an update with r(t) = 0 and αpen > 0 for all agents that are involved in the violation of precedence constraints. • If the job permutation is valid, perform a LR−I update in all agents, depending on the resulting makespan ms and best makespan until now msbest: • improved: r(t) = 1; HFFSP MISTA2013: 29 August 2013 22/28
  • 44. Extending PBSS for precendence constraints Updating probabilities: • If the job permutation is invalid, perform an update with r(t) = 0 and αpen > 0 for all agents that are involved in the violation of precedence constraints. • If the job permutation is valid, perform a LR−I update in all agents, depending on the resulting makespan ms and best makespan until now msbest: • improved: r(t) = 1; • equally good: r(t) = 1/2; HFFSP MISTA2013: 29 August 2013 22/28
  • 45. Extending PBSS for precendence constraints Updating probabilities: • If the job permutation is invalid, perform an update with r(t) = 0 and αpen > 0 for all agents that are involved in the violation of precedence constraints. • If the job permutation is valid, perform a LR−I update in all agents, depending on the resulting makespan ms and best makespan until now msbest: • improved: r(t) = 1; • equally good: r(t) = 1/2; • worse: r(t) = msbest 2ms ; HFFSP MISTA2013: 29 August 2013 22/28
  • 46. Extending PBSS for precendence constraints Updating probabilities: • If the job permutation is invalid, perform an update with r(t) = 0 and αpen > 0 for all agents that are involved in the violation of precedence constraints. • If the job permutation is valid, perform a LR−I update in all agents, depending on the resulting makespan ms and best makespan until now msbest: • improved: r(t) = 1; • equally good: r(t) = 1/2; • worse: r(t) = msbest 2ms ; • no valid schedule found: r(t) = 0; HFFSP MISTA2013: 29 August 2013 22/28
  • 47. Contents 1 Hybrid Flexible Flowline Scheduling Problems 2 A Machine Learning Approach 3 Learning Permutations with Precedence Constraints 4 Experiments & results 5 Conclusion HFFSP MISTA2013: 29 August 2013 23/28
  • 48. Experiments • HFFSP Benchmark problems from (Ruiz et al., 2008)2 • problem sets with 5, 7, 9, 11, 13, 15 jobs, 96 instances in each set • + other constraints that make problems harder (precedence relations!) • αrew = 0.1; αpen = 0.5 (no tuning) • Run until converges, or at most 300 seconds 2 Available at http://soa.iti.es/problem-instances HFFSP MISTA2013: 29 August 2013 24/28
  • 49. Results Instance set 5 7 9 11 13 15 overall mean RD (%) 0.0697 2.0131 1.1568 1.6565 3.7294 7.9189 2.7484 best RD (%) -35.70 -24.71 -26.92 -21.10 -43.34 -10.46 -43.34 # improved 11 12 18 12 9 6 68 # equal 62 40 19 18 8 7 154 # worse 23 44 59 66 79 82 354 HFFSP MISTA2013: 29 August 2013 25/28
  • 50. Results Instance set 5 7 9 11 13 15 overall mean RD (%) 0.0697 2.0131 1.1568 1.6565 3.7294 7.9189 2.7484 best RD (%) -35.70 -24.71 -26.92 -21.10 -43.34 -10.46 -43.34 # improved 11 12 18 12 9 6 68 # equal 62 40 19 18 8 7 154 # worse 23 44 59 66 79 82 354 HFFSP MISTA2013: 29 August 2013 25/28
  • 51. Contents 1 Hybrid Flexible Flowline Scheduling Problems 2 A Machine Learning Approach 3 Learning Permutations with Precedence Constraints 4 Experiments & results 5 Conclusion HFFSP MISTA2013: 29 August 2013 26/28
  • 52. Results and Discussion Contributions: • Extension of PBSS for learning permutations with precedence constraints • Simple model + RL approach can yield good quality results for challenging HFFSP instances HFFSP MISTA2013: 29 August 2013 27/28
  • 53. Results and Discussion Contributions: • Extension of PBSS for learning permutations with precedence constraints • Simple model + RL approach can yield good quality results for challenging HFFSP instances Discussion & future work: • Precedence relations do make the problem harder • Parameter tuning • Convergence • Larger instances (50, 100 jobs) • Explore possibilities for improvement in machine assignment HFFSP MISTA2013: 29 August 2013 27/28