Development of a Reinforcement Learning-Based Optimization Model for Customer Order Scheduling with Missing Operations최종발표자료.pdf

● →
●
●
●
○
○
○
●
●
●
● →
●
●
3

●
○
○
○
○
○
●
○
○
○
4

–
5
→ →
●
○
●
○
○ –
○
●
○
→
○
○
○
→
→
→

→ “ ”
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
6

● –
○
○
● –
○
○
●
○
○
●
○
: 2
7
Stage 1:
Fixed-size Training
Stage 2:
Mixed-size Fine-tuning
Transformer Encoder
Transformer Decoder
Initial Solution
IS
Final Solution
<Train>
<Test>

9
●
○
●
○
●
○
○
●
○

Parameter Value
Batch size 32
Embedding dimension 128
Number of heads 8
Optimizer AdamW
Learning rate 4e-4
Decay rate 1e-6
Number of epochs 2,200
Parameter Value
Number of jobs, 𝑁 100, 150, 200, 300
Number of machines, 𝑀 5
The processing time of job 𝑖 on machine 𝑘 , 𝑝𝑖𝑘 𝑈[1,100]
Due date for job 𝑖 , 𝑑𝑖 𝑈[𝑃 − 𝑇𝐹 − 𝑅𝐷𝐷/2,𝑃 1 − 𝑇𝐹 + 𝑅𝐷𝐷/2 ]
Due date tardiness factor, 𝑇𝐹 0.35, 0.65
Due date range factor, 𝑅𝐷𝐷 0.35
𝑃 σ𝑖=1
𝑁 σ𝑘=1
𝑀
𝑝𝑖𝑘𝑎𝑖𝑘
𝑀
※ 𝑇𝐹가 클수록, 납기일이 더 타이트해져 지연 발생 가능성이 높아짐을 의미함
※ 𝑎𝑖𝑘는 누락 작업 여부를 표시하는 이진 행렬임
10
※ GPU: NVIDIAGeForceRTX3080Ti(12GB)
※ CPU: Intel(R)Core(TM) i9-11900KF(3.50GHz)
※ 학습 시간: 약 32h

●
●
●
11
Solver
MILP IBM ILOG CPLEX Optimizer
CP IBM ILOG CP Optimizer
Heuristic
EDD Earliest Due Date
FP Framinan and Perez heuristic
NEH Nawaz-Enscore-Ham
OMDD Order-scheduling Modified Due Date
Metaheuristic
JPO20 Job Position Oscillation δ=2
SR2 Size-Reduction with Q=2
DE Differential Evolution
BRKGA Biased Random Key GeneticAlgorithm
𝑅𝑃𝐷 =
𝑇𝐴 − 𝑇𝐺𝐴
𝑇𝐺𝐴
× 100
※ 𝑇𝐴:비교군이 얻은 총 지연시간
※ 𝑇𝐺𝐴:BRKGA가얻은 총 지연시간

– Loose Due date
TF=0.35
M5 J100 M5 J150 M5 J200 M5 J300
Obj RPD Time(s) Obj RPD Time(s) Obj RPD Time(s) Obj RPD Time(s)
MILP 5670.2 -0.09% 250.0 12116.7 2.47% 375.0 19088.1 7.00% 500.0 156190.4 342.93% 750.0
CP 5860.1 3.26% 250.0 12284.2 3.89% 375.0 18575.6 4.13% 500.0 37906.3 7.50% 750.0
EDD 11067.7 95.02% < 1 22237.5 88.06% < 1 35920.4 101.36% < 1 67306.9 90.87% < 1
FP 7529.9 32.68% < 1 15211.8 28.64% < 1 23412.2 31.24% < 1 47060.7 33.46% < 1
NEH 7824.7 37.88% < 1 16833.6 42.36% < 1 25784.6 44.54% < 1 52310.0 48.34% < 1
OMDD 6316.3 11.30% < 1 13248.9 12.04% < 1 19926.4 11.70% < 1 39368.9 11.64% < 1
JPO20 5868.3 3.40% 250.0 12398.3 4.85% 375.0 19198.5 7.62% 500.0 38658.1 9.63% 750.0
SR2 5962.7 5.07% 250.0 12272.7 3.79% 375.0 18468.1 3.52% 500.0 35822.1 1.59% 750.0
DE 6374.9 12.33% 250.0 14491.9 22.56% 375.0 22837.4 28.02% 500.0 49692.1 40.92% 750.0
BRKGA (Base) 5675.1 0.00% 250.0 11824.7 0.00% 375.0 17839.3 0.00% 500.0 35263.1 0.00% 750.0
Ours 5665.9 -0.16% < 1 11803.7 -0.18% < 1 17832.2 -0.04% < 1 35085.6 -0.50% < 1
Ours
(IS 10)
5656.9 -0.32% < 1 11796.0 -0.24% < 1 17795.5 -0.25% < 1 35018.0 -0.70% < 1
Ours
(IS 100)
5650.1 -0.44% 1.4 11798.2 -0.22% 2.2 17769.7 -0.39% 2.9 34989.3 -0.78% 5.1
Ours
(IS 1000)
5648.3 -0.47% 13.7 11777.5 -0.40% 20.8 17766.2 -0.41% 28.5 34986.1 -0.79% 51.8
12

– Loose Due date
13
※ 작업 300개 인스턴스 상위 5개 알고리즘의 Boxplot

– Tight Due date
TF=0.65
M5 J100 M5 J150 M5 J200 M5 J300
Obj RPD Time(s) Obj RPD Time(s) Obj RPD Time(s) Obj RPD Time(s)
MILP 25088.3 0.67% 250.0 57344.6 0.96% 375.0 94896.9 1.94% 500.0 472110.6 135.56% 750.0
CP 25676.3 3.02% 250.0 59389.1 4.56% 375.0 98444.5 5.76% 500.0 217089.5 8.32% 750.0
EDD 47729.7 91.51% < 1 106694.3 87.85% < 1 178511.1 91.77% < 1 384199.4 91.70% < 1
FP 32110.5 28.84% < 1 73624.9 29.63% < 1 122213.5 31.29% < 1 258419.7 28.94% < 1
NEH 31363.6 25.85% < 1 71269.3 25.48% < 1 118292.7 27.08% < 1 255311.3 27.39% < 1
OMDD 27704.4 11.16% < 1 62937.4 10.81% < 1 104278.7 12.02% < 1 225251.3 12.39% < 1
JPO20 26344.6 5.71% 250.0 59540.5 4.83% 375.0 102757.2 10.39% 500.0 225251.3 12.39% 750.0
SR2 26966.9 8.20% 250.0 60784.6 7.02% 375.0 102088.6 9.67% 500.0 224866.9 12.20% 750.0
DE 28854.4 15.78% 250.0 69460.8 22.29% 375.0 116577.8 25.23% 500.0 258781.0 29.12% 750.0
BRKGA (Base) 24922.4 0.00% 250.0 56798.3 0.00% 375.0 93087.3 0.00% 500.0 200418.2 0.00% 750.0
Ours 24905.7 -0.07% < 1 56375.9 -0.74% < 1 92602.7 -0.52% < 1 199208.0 -0.60% < 1
Ours
(IS 10)
24903.1 -0.08% < 1 56328.4 -0.83% < 1 92534.6 -0.59% < 1 199060.8 -0.68% < 1
Ours
(IS 100)
24877.8 -0.18% 1.4 56302.4 -0.87% 2.1 92492.4 -0.64% 2.7 198984.4 -0.72% 5.1
Ours
(IS 1000)
24871.5 -0.20% 12.9 56290.3 -0.89% 22.4 92477.6 -0.65% 35.7 198977.0 -0.72% 80.6
14

– Tight Due date
15
※ 작업 300개 인스턴스 상위 5개 알고리즘의 Boxplot

16
●
○
○
○
○
●
○
○

[1] L. R. de Abreu, M. J. B. Dias, P. M. O. Palma, and J. J. M. Ferreira, "A novel BRKGA for the customer order scheduling with
missing operations to minimize total tardiness," Swarm and Evolutionary Computation, Vol.75, pp.101149, 2022.
[2] F. Luo, S. Li, M. Wang, Y. Qin, and Z. Tang, "Neural combinatorial optimization with heavy decoder: Toward large scale
generalization," in Advances in Neural Information Processing Systems, Vol.36, pp.8845–8864, 2023.
[3] Y.-D. Kwon, S. Kim, and J. Park, "POMO: Policy optimization with multiple optima for reinforcement learning," in Advances
in Neural Information Processing Systems, Vol.33, pp.21188–21198, 2020.
[4] A. Vaswani et al., "Attention is all you need," in Advances in Neural Information Processing Systems, Vol.30, 2017.
17

Development of a Reinforcement Learning-Based Optimization Model for Customer Order Scheduling with Missing Operations최종발표자료.pdf

More Related Content

Recently uploaded (20)

Featured (20)

Development of a Reinforcement Learning-Based Optimization Model for Customer Order Scheduling with Missing Operations최종발표자료.pdf