SlideShare a Scribd company logo
Development of a Reinforcement Learning-Based Optimization Model for Customer Order Scheduling with Missing Operations최종발표자료.pdf
2
● →
●
●
●
○
○
○
●
●
●
● →
●
●
3
●
○
○
○
○
○
●
○
○
○
4
–
5
→ →
●
○
●
○
○ –
○
●
○
→
○
○
○
→
→
→
→ “ ”
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
6
● –
○
○
● –
○
○
●
○
○
●
○
: 2
7
Stage 1:
Fixed-size Training
Stage 2:
Mixed-size Fine-tuning
Transformer Encoder
Transformer Decoder
Initial Solution
IS
Final Solution
<Train>
<Test>
8
●
●
●
●
●
●
●
9
●
○
●
○
●
○
○
●
○
Parameter Value
Batch size 32
Embedding dimension 128
Number of heads 8
Optimizer AdamW
Learning rate 4e-4
Decay rate 1e-6
Number of epochs 2,200
Parameter Value
Number of jobs, 𝑁 100, 150, 200, 300
Number of machines, 𝑀 5
The processing time of job 𝑖 on machine 𝑘 , 𝑝𝑖𝑘 𝑈[1,100]
Due date for job 𝑖 , 𝑑𝑖 𝑈[𝑃 − 𝑇𝐹 − 𝑅𝐷𝐷/2,𝑃 1 − 𝑇𝐹 + 𝑅𝐷𝐷/2 ]
Due date tardiness factor, 𝑇𝐹 0.35, 0.65
Due date range factor, 𝑅𝐷𝐷 0.35
𝑃 σ𝑖=1
𝑁 σ𝑘=1
𝑀
𝑝𝑖𝑘𝑎𝑖𝑘
𝑀
※ 𝑇𝐹가 클수록, 납기일이 더 타이트해져 지연 발생 가능성이 높아짐을 의미함
※ 𝑎𝑖𝑘는 누락 작업 여부를 표시하는 이진 행렬임
10
※ GPU: NVIDIAGeForceRTX3080Ti(12GB)
※ CPU: Intel(R)Core(TM) i9-11900KF(3.50GHz)
※ 학습 시간: 약 32h
●
●
●
11
Solver
MILP IBM ILOG CPLEX Optimizer
CP IBM ILOG CP Optimizer
Heuristic
EDD Earliest Due Date
FP Framinan and Perez heuristic
NEH Nawaz-Enscore-Ham
OMDD Order-scheduling Modified Due Date
Metaheuristic
JPO20 Job Position Oscillation δ=2
SR2 Size-Reduction with Q=2
DE Differential Evolution
BRKGA Biased Random Key GeneticAlgorithm
𝑅𝑃𝐷 =
𝑇𝐴 − 𝑇𝐺𝐴
𝑇𝐺𝐴
× 100
※ 𝑇𝐴:비교군이 얻은 총 지연시간
※ 𝑇𝐺𝐴:BRKGA가얻은 총 지연시간
– Loose Due date
TF=0.35
M5 J100 M5 J150 M5 J200 M5 J300
Obj RPD Time(s) Obj RPD Time(s) Obj RPD Time(s) Obj RPD Time(s)
MILP 5670.2 -0.09% 250.0 12116.7 2.47% 375.0 19088.1 7.00% 500.0 156190.4 342.93% 750.0
CP 5860.1 3.26% 250.0 12284.2 3.89% 375.0 18575.6 4.13% 500.0 37906.3 7.50% 750.0
EDD 11067.7 95.02% < 1 22237.5 88.06% < 1 35920.4 101.36% < 1 67306.9 90.87% < 1
FP 7529.9 32.68% < 1 15211.8 28.64% < 1 23412.2 31.24% < 1 47060.7 33.46% < 1
NEH 7824.7 37.88% < 1 16833.6 42.36% < 1 25784.6 44.54% < 1 52310.0 48.34% < 1
OMDD 6316.3 11.30% < 1 13248.9 12.04% < 1 19926.4 11.70% < 1 39368.9 11.64% < 1
JPO20 5868.3 3.40% 250.0 12398.3 4.85% 375.0 19198.5 7.62% 500.0 38658.1 9.63% 750.0
SR2 5962.7 5.07% 250.0 12272.7 3.79% 375.0 18468.1 3.52% 500.0 35822.1 1.59% 750.0
DE 6374.9 12.33% 250.0 14491.9 22.56% 375.0 22837.4 28.02% 500.0 49692.1 40.92% 750.0
BRKGA (Base) 5675.1 0.00% 250.0 11824.7 0.00% 375.0 17839.3 0.00% 500.0 35263.1 0.00% 750.0
Ours 5665.9 -0.16% < 1 11803.7 -0.18% < 1 17832.2 -0.04% < 1 35085.6 -0.50% < 1
Ours
(IS 10)
5656.9 -0.32% < 1 11796.0 -0.24% < 1 17795.5 -0.25% < 1 35018.0 -0.70% < 1
Ours
(IS 100)
5650.1 -0.44% 1.4 11798.2 -0.22% 2.2 17769.7 -0.39% 2.9 34989.3 -0.78% 5.1
Ours
(IS 1000)
5648.3 -0.47% 13.7 11777.5 -0.40% 20.8 17766.2 -0.41% 28.5 34986.1 -0.79% 51.8
12
– Loose Due date
13
※ 작업 300개 인스턴스 상위 5개 알고리즘의 Boxplot
– Tight Due date
TF=0.65
M5 J100 M5 J150 M5 J200 M5 J300
Obj RPD Time(s) Obj RPD Time(s) Obj RPD Time(s) Obj RPD Time(s)
MILP 25088.3 0.67% 250.0 57344.6 0.96% 375.0 94896.9 1.94% 500.0 472110.6 135.56% 750.0
CP 25676.3 3.02% 250.0 59389.1 4.56% 375.0 98444.5 5.76% 500.0 217089.5 8.32% 750.0
EDD 47729.7 91.51% < 1 106694.3 87.85% < 1 178511.1 91.77% < 1 384199.4 91.70% < 1
FP 32110.5 28.84% < 1 73624.9 29.63% < 1 122213.5 31.29% < 1 258419.7 28.94% < 1
NEH 31363.6 25.85% < 1 71269.3 25.48% < 1 118292.7 27.08% < 1 255311.3 27.39% < 1
OMDD 27704.4 11.16% < 1 62937.4 10.81% < 1 104278.7 12.02% < 1 225251.3 12.39% < 1
JPO20 26344.6 5.71% 250.0 59540.5 4.83% 375.0 102757.2 10.39% 500.0 225251.3 12.39% 750.0
SR2 26966.9 8.20% 250.0 60784.6 7.02% 375.0 102088.6 9.67% 500.0 224866.9 12.20% 750.0
DE 28854.4 15.78% 250.0 69460.8 22.29% 375.0 116577.8 25.23% 500.0 258781.0 29.12% 750.0
BRKGA (Base) 24922.4 0.00% 250.0 56798.3 0.00% 375.0 93087.3 0.00% 500.0 200418.2 0.00% 750.0
Ours 24905.7 -0.07% < 1 56375.9 -0.74% < 1 92602.7 -0.52% < 1 199208.0 -0.60% < 1
Ours
(IS 10)
24903.1 -0.08% < 1 56328.4 -0.83% < 1 92534.6 -0.59% < 1 199060.8 -0.68% < 1
Ours
(IS 100)
24877.8 -0.18% 1.4 56302.4 -0.87% 2.1 92492.4 -0.64% 2.7 198984.4 -0.72% 5.1
Ours
(IS 1000)
24871.5 -0.20% 12.9 56290.3 -0.89% 22.4 92477.6 -0.65% 35.7 198977.0 -0.72% 80.6
14
– Tight Due date
15
※ 작업 300개 인스턴스 상위 5개 알고리즘의 Boxplot
16
●
○
○
○
○
●
○
○
[1] L. R. de Abreu, M. J. B. Dias, P. M. O. Palma, and J. J. M. Ferreira, "A novel BRKGA for the customer order scheduling with
missing operations to minimize total tardiness," Swarm and Evolutionary Computation, Vol.75, pp.101149, 2022.
[2] F. Luo, S. Li, M. Wang, Y. Qin, and Z. Tang, "Neural combinatorial optimization with heavy decoder: Toward large scale
generalization," in Advances in Neural Information Processing Systems, Vol.36, pp.8845–8864, 2023.
[3] Y.-D. Kwon, S. Kim, and J. Park, "POMO: Policy optimization with multiple optima for reinforcement learning," in Advances
in Neural Information Processing Systems, Vol.33, pp.21188–21198, 2020.
[4] A. Vaswani et al., "Attention is all you need," in Advances in Neural Information Processing Systems, Vol.30, 2017.
17
Development of a Reinforcement Learning-Based Optimization Model for Customer Order Scheduling with Missing Operations최종발표자료.pdf

More Related Content

PPTX
TRPO(trust region policy optimization)
PPTX
Marl의 개념 및 군사용 적용방안
PDF
2021 1학기 정기 세미나 2주차
PDF
1시간만에 머신러닝 개념 따라 잡기
PDF
Decision tree and ensemble
PPTX
Deep Learning for AI (2)
PDF
Decision Tree Intro [의사결정나무]
PDF
About RNN
TRPO(trust region policy optimization)
Marl의 개념 및 군사용 적용방안
2021 1학기 정기 세미나 2주차
1시간만에 머신러닝 개념 따라 잡기
Decision tree and ensemble
Deep Learning for AI (2)
Decision Tree Intro [의사결정나무]
About RNN

Recently uploaded (20)

PPTX
12. Community Pharmacy and How to organize it
PPTX
HPE Aruba-master-icon-library_052722.pptx
PPTX
LITERATURE CASE STUDY DESIGN SEMESTER 5.pptx
PPT
Machine printing techniques and plangi dyeing
PDF
Integrated-2D-and-3D-Animation-Bridging-Dimensions-for-Impactful-Storytelling...
PDF
Urban Design Final Project-Context
PPTX
Special finishes, classification and types, explanation
PPTX
YV PROFILE PROJECTS PROFILE PRES. DESIGN
PPTX
mahatma gandhi bus terminal in india Case Study.pptx
PDF
Interior Structure and Construction A1 NGYANQI
PDF
Facade & Landscape Lighting Techniques and Trends.pptx.pdf
PDF
UNIT 1 Introduction fnfbbfhfhfbdhdbdto Java.pptx.pdf
PPTX
joggers park landscape assignment bandra
PPTX
AC-Unit1.pptx CRYPTOGRAPHIC NNNNFOR ALL
PPT
WHY_R12 Uaafafafpgradeaffafafafaffff.ppt
PPTX
rapid fire quiz in your house is your india.pptx
PPT
pump pump is a mechanism that is used to transfer a liquid from one place to ...
DOCX
actividad 20% informatica microsoft project
PDF
Quality Control Management for RMG, Level- 4, Certificate
PPTX
6- Architecture design complete (1).pptx
12. Community Pharmacy and How to organize it
HPE Aruba-master-icon-library_052722.pptx
LITERATURE CASE STUDY DESIGN SEMESTER 5.pptx
Machine printing techniques and plangi dyeing
Integrated-2D-and-3D-Animation-Bridging-Dimensions-for-Impactful-Storytelling...
Urban Design Final Project-Context
Special finishes, classification and types, explanation
YV PROFILE PROJECTS PROFILE PRES. DESIGN
mahatma gandhi bus terminal in india Case Study.pptx
Interior Structure and Construction A1 NGYANQI
Facade & Landscape Lighting Techniques and Trends.pptx.pdf
UNIT 1 Introduction fnfbbfhfhfbdhdbdto Java.pptx.pdf
joggers park landscape assignment bandra
AC-Unit1.pptx CRYPTOGRAPHIC NNNNFOR ALL
WHY_R12 Uaafafafpgradeaffafafafaffff.ppt
rapid fire quiz in your house is your india.pptx
pump pump is a mechanism that is used to transfer a liquid from one place to ...
actividad 20% informatica microsoft project
Quality Control Management for RMG, Level- 4, Certificate
6- Architecture design complete (1).pptx
Ad
Ad

Development of a Reinforcement Learning-Based Optimization Model for Customer Order Scheduling with Missing Operations최종발표자료.pdf

  • 2. 2
  • 7. ● – ○ ○ ● – ○ ○ ● ○ ○ ● ○ : 2 7 Stage 1: Fixed-size Training Stage 2: Mixed-size Fine-tuning Transformer Encoder Transformer Decoder Initial Solution IS Final Solution <Train> <Test>
  • 10. Parameter Value Batch size 32 Embedding dimension 128 Number of heads 8 Optimizer AdamW Learning rate 4e-4 Decay rate 1e-6 Number of epochs 2,200 Parameter Value Number of jobs, 𝑁 100, 150, 200, 300 Number of machines, 𝑀 5 The processing time of job 𝑖 on machine 𝑘 , 𝑝𝑖𝑘 𝑈[1,100] Due date for job 𝑖 , 𝑑𝑖 𝑈[𝑃 − 𝑇𝐹 − 𝑅𝐷𝐷/2,𝑃 1 − 𝑇𝐹 + 𝑅𝐷𝐷/2 ] Due date tardiness factor, 𝑇𝐹 0.35, 0.65 Due date range factor, 𝑅𝐷𝐷 0.35 𝑃 σ𝑖=1 𝑁 σ𝑘=1 𝑀 𝑝𝑖𝑘𝑎𝑖𝑘 𝑀 ※ 𝑇𝐹가 클수록, 납기일이 더 타이트해져 지연 발생 가능성이 높아짐을 의미함 ※ 𝑎𝑖𝑘는 누락 작업 여부를 표시하는 이진 행렬임 10 ※ GPU: NVIDIAGeForceRTX3080Ti(12GB) ※ CPU: Intel(R)Core(TM) i9-11900KF(3.50GHz) ※ 학습 시간: 약 32h
  • 11. ● ● ● 11 Solver MILP IBM ILOG CPLEX Optimizer CP IBM ILOG CP Optimizer Heuristic EDD Earliest Due Date FP Framinan and Perez heuristic NEH Nawaz-Enscore-Ham OMDD Order-scheduling Modified Due Date Metaheuristic JPO20 Job Position Oscillation δ=2 SR2 Size-Reduction with Q=2 DE Differential Evolution BRKGA Biased Random Key GeneticAlgorithm 𝑅𝑃𝐷 = 𝑇𝐴 − 𝑇𝐺𝐴 𝑇𝐺𝐴 × 100 ※ 𝑇𝐴:비교군이 얻은 총 지연시간 ※ 𝑇𝐺𝐴:BRKGA가얻은 총 지연시간
  • 12. – Loose Due date TF=0.35 M5 J100 M5 J150 M5 J200 M5 J300 Obj RPD Time(s) Obj RPD Time(s) Obj RPD Time(s) Obj RPD Time(s) MILP 5670.2 -0.09% 250.0 12116.7 2.47% 375.0 19088.1 7.00% 500.0 156190.4 342.93% 750.0 CP 5860.1 3.26% 250.0 12284.2 3.89% 375.0 18575.6 4.13% 500.0 37906.3 7.50% 750.0 EDD 11067.7 95.02% < 1 22237.5 88.06% < 1 35920.4 101.36% < 1 67306.9 90.87% < 1 FP 7529.9 32.68% < 1 15211.8 28.64% < 1 23412.2 31.24% < 1 47060.7 33.46% < 1 NEH 7824.7 37.88% < 1 16833.6 42.36% < 1 25784.6 44.54% < 1 52310.0 48.34% < 1 OMDD 6316.3 11.30% < 1 13248.9 12.04% < 1 19926.4 11.70% < 1 39368.9 11.64% < 1 JPO20 5868.3 3.40% 250.0 12398.3 4.85% 375.0 19198.5 7.62% 500.0 38658.1 9.63% 750.0 SR2 5962.7 5.07% 250.0 12272.7 3.79% 375.0 18468.1 3.52% 500.0 35822.1 1.59% 750.0 DE 6374.9 12.33% 250.0 14491.9 22.56% 375.0 22837.4 28.02% 500.0 49692.1 40.92% 750.0 BRKGA (Base) 5675.1 0.00% 250.0 11824.7 0.00% 375.0 17839.3 0.00% 500.0 35263.1 0.00% 750.0 Ours 5665.9 -0.16% < 1 11803.7 -0.18% < 1 17832.2 -0.04% < 1 35085.6 -0.50% < 1 Ours (IS 10) 5656.9 -0.32% < 1 11796.0 -0.24% < 1 17795.5 -0.25% < 1 35018.0 -0.70% < 1 Ours (IS 100) 5650.1 -0.44% 1.4 11798.2 -0.22% 2.2 17769.7 -0.39% 2.9 34989.3 -0.78% 5.1 Ours (IS 1000) 5648.3 -0.47% 13.7 11777.5 -0.40% 20.8 17766.2 -0.41% 28.5 34986.1 -0.79% 51.8 12
  • 13. – Loose Due date 13 ※ 작업 300개 인스턴스 상위 5개 알고리즘의 Boxplot
  • 14. – Tight Due date TF=0.65 M5 J100 M5 J150 M5 J200 M5 J300 Obj RPD Time(s) Obj RPD Time(s) Obj RPD Time(s) Obj RPD Time(s) MILP 25088.3 0.67% 250.0 57344.6 0.96% 375.0 94896.9 1.94% 500.0 472110.6 135.56% 750.0 CP 25676.3 3.02% 250.0 59389.1 4.56% 375.0 98444.5 5.76% 500.0 217089.5 8.32% 750.0 EDD 47729.7 91.51% < 1 106694.3 87.85% < 1 178511.1 91.77% < 1 384199.4 91.70% < 1 FP 32110.5 28.84% < 1 73624.9 29.63% < 1 122213.5 31.29% < 1 258419.7 28.94% < 1 NEH 31363.6 25.85% < 1 71269.3 25.48% < 1 118292.7 27.08% < 1 255311.3 27.39% < 1 OMDD 27704.4 11.16% < 1 62937.4 10.81% < 1 104278.7 12.02% < 1 225251.3 12.39% < 1 JPO20 26344.6 5.71% 250.0 59540.5 4.83% 375.0 102757.2 10.39% 500.0 225251.3 12.39% 750.0 SR2 26966.9 8.20% 250.0 60784.6 7.02% 375.0 102088.6 9.67% 500.0 224866.9 12.20% 750.0 DE 28854.4 15.78% 250.0 69460.8 22.29% 375.0 116577.8 25.23% 500.0 258781.0 29.12% 750.0 BRKGA (Base) 24922.4 0.00% 250.0 56798.3 0.00% 375.0 93087.3 0.00% 500.0 200418.2 0.00% 750.0 Ours 24905.7 -0.07% < 1 56375.9 -0.74% < 1 92602.7 -0.52% < 1 199208.0 -0.60% < 1 Ours (IS 10) 24903.1 -0.08% < 1 56328.4 -0.83% < 1 92534.6 -0.59% < 1 199060.8 -0.68% < 1 Ours (IS 100) 24877.8 -0.18% 1.4 56302.4 -0.87% 2.1 92492.4 -0.64% 2.7 198984.4 -0.72% 5.1 Ours (IS 1000) 24871.5 -0.20% 12.9 56290.3 -0.89% 22.4 92477.6 -0.65% 35.7 198977.0 -0.72% 80.6 14
  • 15. – Tight Due date 15 ※ 작업 300개 인스턴스 상위 5개 알고리즘의 Boxplot
  • 17. [1] L. R. de Abreu, M. J. B. Dias, P. M. O. Palma, and J. J. M. Ferreira, "A novel BRKGA for the customer order scheduling with missing operations to minimize total tardiness," Swarm and Evolutionary Computation, Vol.75, pp.101149, 2022. [2] F. Luo, S. Li, M. Wang, Y. Qin, and Z. Tang, "Neural combinatorial optimization with heavy decoder: Toward large scale generalization," in Advances in Neural Information Processing Systems, Vol.36, pp.8845–8864, 2023. [3] Y.-D. Kwon, S. Kim, and J. Park, "POMO: Policy optimization with multiple optima for reinforcement learning," in Advances in Neural Information Processing Systems, Vol.33, pp.21188–21198, 2020. [4] A. Vaswani et al., "Attention is all you need," in Advances in Neural Information Processing Systems, Vol.30, 2017. 17