SlideShare a Scribd company logo
takigawa.ichigaku.8s@kyoto-u.ac.jp
15 2024 1 26 ( )
Slido
https://app.sli.do/event/d13KtHT3VXxFbmLvxPSQjn
PDF
https://itakigawa.github.io/data/autograd.pdf
3
§
§
§
§
§ ( )
§ ( )
§ https://itakigawa.github.io/
4
1.
2.
3. ( )
5
§ 画像生成AIが爆速で進化した2023年をまとめて振り返る
https://ascii.jp/elem/000/004/174/4174570/
2023 AI
6
§
AI
https://www.itmedia.co.jp/news/articles/2310/05/news179.html
7
AI
§
https://www.itmedia.co.jp/news/articles/2310/05/news179.html
8
Adobe Photoshop
https://www.adobe.com/jp/products/photoshop/generative-fill.html
9
§ ChatGPT 2022 11 2 1
§ AI
§ GAFAM IT 2023
OpenAI ChatGPT
10
1.
-
-
-
2.
-
-
-
3.
-
-
- DIY
OpenAI ChatGPT
4.
-
-
5. GPTs
- DALL-E
-
-
6.
-
-
§
§ ChatGPT (ChatGPT )
13
§ Microsoft OpenAI ChatGPT DALL-E
AI
§ AI 3 (440 )
3 Apple 2 (🇬🇧🇫🇷🇮🇹 GDP )
§ Bing ( Edge) ChatGPT
ChatGPT Web
§ GitHub Copilot( )
§ ( )
§
§ Windows 11 Microsoft 365 Copilot/Copilot Pro
Microsoft Copilot ( )
14
§ AI Windows 11/ Microsoft 365 Copilot
§ MS (Word, Excel, Powerpoint, Outlook, Teams) AI
§
§ Word
§ Powerpoint
§ Powerpoint
§ Excel
§ Web
§ Windows
MS
15
§
16
AI
§
§ ( 2023 12 )
§ ChatGPT AI 2
§ http://hdl.handle.net/2433/286548
§
( ), ( )
§ 1
(Preferred Networks)
§ 2
( )
§
( )
1.
18
..
.
A B
( )
• /
•
•
19
コンピュータ
プログラム
入力 出力
my_function
x1
x2
y
20
( )
J aime la
musique I love music
(Software 2.0 )
21
=
(g) (cm)
(g)
(cm)
(cm)
(g)
or
22
=
or
(g)
(cm)
(cm)
(g)
23
=
Random Forest
Gaussian Process
Logistic Regression
P( =red)
0 1
or
24
Random Forest Neural Network SVR Kernel Ridge
p1 p2 p3 p4
…
( )
( )
( )
⾒
25
§ : 1 : 1
= ( )
26
Decision
Tree
Random
Forest GBDT
Nearest
Neighbor
Logistic
Regression
SVM
Gaussian
Process
Neural
Network
(
)
2.
28
vs ( )
q1 q2 q3 q4
…
Random
Forest
GBDT
Nearest
Neighbor
SVM
Gaussian
Process
Neural
Network
29
vs ( )
p1 p2 p3 p4
…
( )
q1 q2 q3 q4
…
30
§
§ 𝑓(𝑥) 𝑦
ℓ(𝑓 𝑥 , 𝑦)
§ (MSE)
§ (Cross Entropy)
§ =
(optimization)
Minimize! 𝐿 𝜃 𝐿 𝜃 = ∑"#$
%
ℓ(𝑓 𝑥", 𝜃 , 𝑦") + Ω(𝜃)
31
𝐿 𝑎, 𝑏 = − log *
!:#!$%
𝑃(𝑦 = 1|𝑥!) *
!:#!$&
𝑃(𝑦 = 0|𝑥!)
データ
モデル 𝑓 𝑥 = 𝜎 𝑎𝑥 + 𝑏 =
1
1 + 𝑒'()*+,)
= 𝑃(𝑦 = 1|𝑥)
𝑥%, 𝑦% , 𝑥., 𝑦. , ⋯ , (𝑥/, 𝑦/) 各𝑦! 0 1
2
(Cross Entropy)
(-1)×
= − 8
!$%
/
𝑦! log 𝑓(𝑥!) + 1 − 𝑦! log(1 − 𝑓 𝑥! )
( )
0 1
( )
32
1 1
𝑓 𝑥 = 𝜎 𝑎𝑥 + 𝑏 =
1
1 + 𝑒'()*+,)
𝑎𝑥 + 𝑏
𝑎
𝑏
𝑥
9
𝑦
𝑧
𝑥", 𝑦" , 𝑥#, 𝑦# , (𝑥$, 𝑦$)
Minimize 𝐿 𝑎, 𝑏
𝐿 𝑎, 𝑏 = − 𝑦" log
"
"%&! "#$%& + 1 − 𝑦" log 1 −
"
"%&! "#$%&
− 𝑦# log
1
1 + 𝑒'()*'%+)
+ 1 − 𝑦# log 1 −
1
1 + 𝑒'()*'%+)
− 𝑦$ log
1
1 + 𝑒'()*(%+)
+ 1 − 𝑦$ log 1 −
1
1 + 𝑒'()*(%+)
𝜎 𝑥 =
1
1 + 𝑒'*
Linear Sigmoid
( )
33
1 1 3 ( 1 )
Minimize 𝐿 𝑎!, 𝑎", 𝑏!, 𝑏"
𝐿 𝑎), 𝑎*, 𝑏), 𝑏* = − 𝑦) log
1
1 + 𝑒
+ ,!
)
)-."($%&%'(%)
-/!
+ 1 − 𝑦) log 1 −
1
1 + 𝑒
+ ,!
)
)-."($%&%'(%)
-/!
− 𝑦* log
1
1 + 𝑒
+ ,!
)
)-."($%&!'(%)
-/!
+ 1 − 𝑦* log 1 −
1
1 + 𝑒
+ ,!
)
)-."($%&!'(%)
-/!
− 𝑦0 log
1
1 + 𝑒
+ ,!
)
)-."($%&*'(%)
-/!
+ 1 − 𝑦0 log 1 −
1
1 + 𝑒
+ ,!
)
)-."($%&*'(%)
-/!
𝑎%
𝑏%
𝑥
𝑧
1
1 + 𝑒#(%!&'(!)
𝑎.
𝑏. 𝑦
1
1 + 𝑒#(%"*'(")
𝑥 𝑧 𝑦
34
1 1 3 ( 2 )
𝑥 𝑦
𝑎", 𝑏"
Linear 𝑢"
𝑎#, 𝑏#
Linear 𝑢#
Sigmoid 𝑧"
Sigmoid 𝑧#
𝑎$, 𝑏$, 𝑐$
Linear 𝑣 Sigmoid
Minimize 𝐿 𝑎!, 𝑎", 𝑎+, 𝑏!, 𝑏", 𝑏+, 𝑐+
𝑎0𝑧) + 𝑏0𝑧* + 𝑐0
35
§
§ ResNet50 2600
§ AlexNet 6100
§ ResNeXt101 8400
§ VGG19 1 4300
§ LLaMa 650
§ Chinchilla 700
§ GPT-3 1750
§ Gopher 2800
§ PaLM 5400
(CNN)
Transformer ( )
36
𝐿 𝜃%, 𝜃., ⋯
𝜃% 𝜃.
1. θ$, θ;, ⋯
2. 𝐿 𝜃$, 𝜃;, ⋯
θ$, θ;, ⋯
3.
( )
37
𝑓 𝑥, 𝑦 (𝑎, 𝑏)
𝑥
𝑥
𝑦 = 𝑏
𝑥 = 𝑎
𝑓* 𝑎, 𝑏 = lim
0→&
2 )+0, , '2(),,)
0
𝑓&(𝑎, 𝑏)
𝑓 𝑎, 𝑏 𝑎
38
(Gradient)
=
=
( )
39
=
( )
40
§
41
=
• 𝒙
𝑓(𝒙) 𝒙
(learning rate)
𝛼 : 1 " "(step size)
𝒙
𝑥%
𝑥.
⋮
𝑥4
←
𝑥%
𝑥.
⋮
𝑥4
− 𝛼 ×
𝜕𝑓/𝜕𝑥%
𝜕𝑓/𝜕𝑥.
⋮
𝜕𝑓/𝜕𝑥4
𝑓(𝒙)
𝑥!
𝑥"
𝒙 =
𝑥!
𝑥"
42
( )
43
§
§ lr
( )
( )
44
§
§
§ Momentum
§ AdaGrad
§ RMSProp
§ Adam
https://towardsdatascience.com/a-visual-
explanation-of-gradient-descent-methods-
momentum-adagrad-rmsprop-adam-f898b102325c
45
§
45
𝑥 𝑦
𝑎), 𝑏)
Linear 𝑢)
𝑎*, 𝑏*
Linear 𝑢*
Sigmoid 𝑧)
Sigmoid 𝑧*
𝑎0, 𝑏0, 𝑐0
Linear 𝑣 Sigmoid
𝑦
𝜕𝑦/𝜕𝑎!, 𝜕𝑦/𝜕𝑏!, 𝜕𝑦/𝜕𝑐!,
3. ( )
47
§ (Chain Rule)
§
𝑥 𝑢 = 𝑓(𝑥) 𝑦 = 𝑔(𝑢)
𝑑𝑦
𝑑𝑥
=
𝑑𝑦
𝑑𝑢
𝑑𝑢
𝑑𝑥
§
𝑡 𝑥! = 𝑓! 𝑡 , 𝑥" = 𝑓" 𝑡 , ⋯
⾒ 𝑧 = 𝑔(𝑥!, 𝑥", ⋯ )
𝜕𝑧
𝜕𝑡
=
𝜕𝑧
𝜕𝑥%
𝜕𝑥%
𝜕𝑡
+
𝜕𝑧
𝜕𝑥.
𝜕𝑥.
𝜕𝑡
+ ⋯
𝑡 𝑥%
𝑥.
𝑧
⋯
𝑥 𝑢 𝑦
𝑓 𝑔
𝑓!
𝑓"
𝑔
48
§ 𝑦 = 𝑒;< = >
<
𝑥
𝑒'*
1/𝑥
𝑎
𝑏
𝑦
𝑎.𝑏
𝑎 = 𝑒'*
𝑏 =
1
𝑥
𝑦 = 𝑎.𝑏
𝜕𝑓
𝜕𝑥
=
𝑒'.*
𝑥
5
= −
𝑒'.*(2𝑥 + 1)
𝑥.
( or )
𝜕𝑦
𝜕𝑥
=
𝜕𝑦
𝜕𝑎
𝜕𝑎
𝜕𝑥
+
𝜕𝑦
𝜕𝑏
𝜕𝑏
𝜕𝑥
= 2𝑎𝑏 −𝑒'* + 𝑎. −
1
𝑥.
= 2𝑒'* J
%
*
−𝑒'* + 𝑒'* . −
%
*- = −
6.-/(.*+%)
*-
( )
49
1 3
𝜕𝑦/𝜕𝑥
𝑥 𝑦
𝑎", 𝑏"
Linear 𝑢"
𝑎#, 𝑏#
Linear 𝑢#
Sigmoid 𝑧"
Sigmoid 𝑧#
𝑎$, 𝑏$, 𝑐$
Linear 𝑣 Sigmoid
50
1 3
𝑥 𝑦
𝑢"
𝑢#
𝑧"
𝑧#
𝑣
𝑑𝑦
𝑑𝑥
=
𝑑𝑢%
𝑑𝑥
𝑑𝑧%
𝑑𝑢%
𝑑𝑣
𝑑𝑧%
𝑑𝑦
𝑑𝑣
+
𝑑𝑢.
𝑑𝑥
𝑑𝑧.
𝑑𝑢.
𝑑𝑣
𝑑𝑧.
𝑑𝑦
𝑑𝑣
𝑑𝑦
𝑑𝑣
𝑑𝑣
𝑑𝑧"
𝑑𝑣
𝑑𝑧#
𝑑𝑧#
𝑑𝑢#
𝑑𝑧"
𝑑𝑢"
𝑑𝑢
𝑑𝑥
𝑑𝑢"
𝑑𝑥
𝑑𝑦
𝑑𝑣
( )
51
2
𝑣
𝑢
𝑓%
𝑣 = 3 𝑢 𝑣 = log 𝑢 𝑣 = 1/𝑢
2
𝑣
𝑎
𝑏
𝑔%
𝑣 = 𝑎 𝑏 𝑣 = 𝑎. + 𝑏
𝑑𝑣
𝑑𝑢
=
3
2 𝑢
𝑑𝑣
𝑑𝑢
=
1
𝑢
𝑑𝑣
𝑑𝑢
= −
1
𝑢.
𝜕𝑣
𝜕𝑎
= 𝑏
𝜕𝑣
𝜕𝑏
= 𝑎
𝜕𝑣
𝜕𝑎
= 2𝑎
𝜕𝑣
𝜕𝑏
= 1
𝑣
𝑢
𝑓.
𝑣
𝑢
𝑓7
𝑣
𝑎
𝑏
𝑔.
𝑦 = 𝑔%(𝑔. 𝑓7 𝑥 , 𝑓. 𝑥 , 𝑓% 𝑥 ) = 3 𝑥 log 𝑥 +
1
𝑥.
52
2
𝑑𝑦
𝑑𝑥
=
𝑑𝑧,
𝑑𝑥
𝑑𝑧!
𝑑𝑧,
𝑑𝑦
𝑑𝑧!
+
𝑑𝑧+
𝑑𝑥
𝑑𝑧!
𝑑𝑧+
𝑑𝑦
𝑑𝑧!
+
𝑑𝑧"
𝑑𝑥
𝑑𝑦
𝑑𝑧"
= 3 𝑥
1
𝑥
−
2
𝑥+ +
3 log 𝑥 +
1
𝑥"
2 𝑥
𝑦 = 𝑔% 𝑧%, 𝑧. = 𝑧%𝑧.
𝑧% = 𝑔. 𝑧8, 𝑧7 = 𝑧7 + 𝑧8
.
𝑧. = 𝑓% 𝑥 = 3 𝑥
𝑧7 = 𝑓. 𝑥 = log 𝑥
𝑧8 = 𝑓7 𝑥 = 1/𝑥
𝑥
𝑧.
𝑧7
𝑧8
𝑧%
𝑦
𝑓"
𝑓#
𝑓$
𝑔#
𝑔"
𝑦 = 𝑔%(𝑔. 𝑓7 𝑥 , 𝑓. 𝑥 , 𝑓% 𝑥 ) = 3 𝑥 log 𝑥 +
1
𝑥.
53
(1 )
𝑧. = 3 𝑥
𝑧7 = log 𝑥
𝑧8 = 1/𝑥
𝑧% = 𝑧8
.
+ 𝑧7
𝑦 = 𝑧%𝑧.
2 x=1.2
𝑦 = 3 𝑥 log 𝑥 +
1
𝑥.
2
•
• backward
x=1.2 dy/dx
( )
54
§ ( )
§
§
𝑑𝑦
𝑑𝑥
= 3 𝑥
1
𝑥
−
2
𝑥?
+
3 log 𝑥 +
1
𝑥;
2 𝑥
𝑥 = 1.2
@A
@B
0.14
§
( )
55
𝑦 = 𝑧%𝑧.
𝑧% = 𝑧8
.
+ 𝑧7
𝑧. = 3 𝑥
𝑧7 = log 𝑥
𝑧8 = 1/𝑥
𝑥
1.2
𝑧7
0.18
𝑧.
3.29
𝑧8
0.83
data
data data
data
Forward
56
𝑦 = 𝑧%𝑧.
𝑧% = 𝑧8
.
+ 𝑧7
𝑧. = 3 𝑥
𝑧7 = log 𝑥
𝑧8 = 1/𝑥
𝑥
1.2
𝑧7
0.18
𝑧.
3.29
𝑧8
0.83
𝑧%
0.88
data
data
data data
data
Forward
57
𝑦 = 𝑧%𝑧.
𝑧% = 𝑧8
.
+ 𝑧7
𝑧. = 3 𝑥
𝑧7 = log 𝑥
𝑧8 = 1/𝑥
𝑥
1.2
𝑧7
0.18
𝑧.
3.29
𝑧8
0.83
𝑧%
0.88
𝑦
2.88
data
data
data
data data
data
Forward
58
𝑦 = 𝑧%𝑧.
𝑧% = 𝑧8
.
+ 𝑧7
𝑧. = 3 𝑥
𝑧7 = log 𝑥
𝑧8 = 1/𝑥
𝑥
1.2
𝑧7
0.18
𝑧.
3.29
𝑧8
0.83
𝑧%
0.88
𝑦
2.88 1.00
data grad
data grad
data grad
data grad
𝜕𝑦
𝜕𝑦
= 1
data grad
data grad
Backward
59
𝑦 = 𝑧%𝑧.
𝑧% = 𝑧8
.
+ 𝑧7
𝑧. = 3 𝑥
𝑧7 = log 𝑥
𝑧8 = 1/𝑥
𝑥
1.2
𝑧7
0.18
𝑧.
3.29 0.88
𝑧8
0.83
𝑧%
0.88 3.29
𝑦
2.88 1.00
data grad
data grad
data grad
data grad
𝜕𝑦
𝜕𝑦
= 1
𝜕𝑦
𝜕𝑧"
= 𝑧#
𝜕𝑦
𝜕𝑧#
= 𝑧"
data grad
data grad
= 0.88
= 3.29
Backward
60
𝑦 = 𝑧%𝑧.
𝑧% = 𝑧8
.
+ 𝑧7
𝑧. = 3 𝑥
𝑧7 = log 𝑥
𝑧8 = 1/𝑥
𝑥
1.2
𝑧7
0.18
𝑧.
3.29 0.88
𝑧8
0.83
𝑧%
0.88 3.29
𝑦
2.88 1.00
data grad
data grad
data grad
data grad
𝜕𝑦
𝜕𝑦
= 1
𝜕𝑦
𝜕𝑧"
= 𝑧#
𝜕𝑦
𝜕𝑧#
= 𝑧"
𝜕𝑧"
𝜕𝑧$
= 1
𝜕𝑧"
𝜕𝑧0
= 2 𝑧0
data grad
data grad
= 1.67
𝜕𝑦
𝜕𝑧$
=
𝜕𝑦
𝜕𝑧"
𝜕𝑧"
𝜕𝑧$
𝜕𝑦
𝜕𝑧0
=
𝜕𝑦
𝜕𝑧"
𝜕𝑧"
𝜕𝑧0
= 0.88
= 3.29
Backward
61
𝑦 = 𝑧%𝑧.
𝑧% = 𝑧8
.
+ 𝑧7
𝑧. = 3 𝑥
𝑧7 = log 𝑥
𝑧8 = 1/𝑥
𝑥
1.2
𝑧7
0.18 3.29
𝑧.
3.29 0.88
𝑧8
0.83 5.48
𝑧%
0.88 3.29
𝑦
2.88 1.00
data grad
data grad
data grad
data grad
𝜕𝑦
𝜕𝑦
= 1
𝜕𝑦
𝜕𝑧"
= 𝑧#
𝜕𝑦
𝜕𝑧#
= 𝑧"
𝜕𝑧"
𝜕𝑧$
= 1
𝜕𝑧"
𝜕𝑧0
= 2 𝑧0
data grad
data grad
= 1.67
𝜕𝑦
𝜕𝑧$
=
𝜕𝑦
𝜕𝑧"
𝜕𝑧"
𝜕𝑧$
𝜕𝑦
𝜕𝑧0
=
𝜕𝑦
𝜕𝑧"
𝜕𝑧"
𝜕𝑧0
= 0.88
= 3.29
Backward
62
𝑦 = 𝑧%𝑧.
𝑧% = 𝑧8
.
+ 𝑧7
𝑧. = 3 𝑥
𝑧7 = log 𝑥
𝑧8 = 1/𝑥
𝑥
1.2
𝑧7
0.18 3.29
𝑧.
3.29 0.88
𝑧8
0.83 5.48
𝑧%
0.88 3.29
𝑦
2.88 1.00
data grad
data grad
data grad
data grad
𝜕𝑦
𝜕𝑦
= 1
𝜕𝑦
𝜕𝑧"
= 𝑧#
𝜕𝑦
𝜕𝑧#
= 𝑧"
𝜕𝑧"
𝜕𝑧$
= 1
𝜕𝑧"
𝜕𝑧0
= 2 𝑧0
data grad
data grad
= 1.67
𝜕𝑦
𝜕𝑥
=
𝜕𝑦
𝜕𝑧#
𝜕𝑧#
𝜕𝑥
+
𝜕𝑦
𝜕𝑧"
𝜕𝑧"
𝜕𝑧$
𝜕𝑧$
𝜕𝑥
+
𝜕𝑦
𝜕𝑧"
𝜕𝑧"
𝜕𝑧0
𝜕𝑧0
𝜕𝑥
𝜕𝑧#
𝜕𝑥
=
3
2 𝑥
= 1.37
= 0.88
= 3.29
Backward
63
𝑦 = 𝑧%𝑧.
𝑧% = 𝑧8
.
+ 𝑧7
𝑧. = 3 𝑥
𝑧7 = log 𝑥
𝑧8 = 1/𝑥
𝑥
1.2
𝑧7
0.18 3.29
𝑧.
3.29 0.88
𝑧8
0.83 5.48
𝑧%
0.88 3.29
𝑦
2.88 1.00
data grad
data grad
data grad
data grad
𝜕𝑦
𝜕𝑦
= 1
𝜕𝑦
𝜕𝑧"
= 𝑧#
𝜕𝑦
𝜕𝑧#
= 𝑧"
𝜕𝑧"
𝜕𝑧$
= 1
𝜕𝑧"
𝜕𝑧0
= 2 𝑧0
data grad
data grad
= 1.67
𝜕𝑦
𝜕𝑥
=
𝜕𝑦
𝜕𝑧#
𝜕𝑧#
𝜕𝑥
+
𝜕𝑦
𝜕𝑧"
𝜕𝑧"
𝜕𝑧$
𝜕𝑧$
𝜕𝑥
+
𝜕𝑦
𝜕𝑧"
𝜕𝑧"
𝜕𝑧0
𝜕𝑧0
𝜕𝑥
𝜕𝑧#
𝜕𝑥
=
3
2 𝑥
= 1.37
𝜕𝑧$
𝜕𝑥
=
1
𝑥
= 0.83
= 0.88
= 3.29
Backward
64
𝑦 = 𝑧%𝑧.
𝑧% = 𝑧8
.
+ 𝑧7
𝑧. = 3 𝑥
𝑧7 = log 𝑥
𝑧8 = 1/𝑥
𝑥
1.2
𝑧7
0.18 3.29
𝑧.
3.29 0.88
𝑧8
0.83 5.48
𝑧%
0.88 3.29
𝑦
2.88 1.00
data grad
data grad
data grad
data grad
𝜕𝑦
𝜕𝑦
= 1
𝜕𝑦
𝜕𝑧"
= 𝑧#
𝜕𝑦
𝜕𝑧#
= 𝑧"
𝜕𝑧"
𝜕𝑧$
= 1
𝜕𝑧"
𝜕𝑧0
= 2 𝑧0
data grad
data grad
= 1.67
𝜕𝑦
𝜕𝑥
=
𝜕𝑦
𝜕𝑧#
𝜕𝑧#
𝜕𝑥
+
𝜕𝑦
𝜕𝑧"
𝜕𝑧"
𝜕𝑧$
𝜕𝑧$
𝜕𝑥
+
𝜕𝑦
𝜕𝑧"
𝜕𝑧"
𝜕𝑧0
𝜕𝑧0
𝜕𝑥
𝜕𝑧#
𝜕𝑥
=
3
2 𝑥
= 1.37
𝜕𝑧$
𝜕𝑥
=
1
𝑥
= 0.83
𝜕𝑧0
𝜕𝑥
= −
1
𝑥#
= -0.69
= 0.88
= 3.29
Backward
65
𝑦 = 𝑧%𝑧.
𝑧% = 𝑧8
.
+ 𝑧7
𝑧. = 3 𝑥
𝑧7 = log 𝑥
𝑧8 = 1/𝑥
𝑥
1.2 0.14
𝑧7
0.18 3.29
𝑧.
3.29 0.88
𝑧8
0.83 5.48
𝑧%
0.88 3.29
𝑦
2.88 1.00
data grad
data grad
data grad
data grad
𝜕𝑦
𝜕𝑦
= 1
data grad
data grad
1.67
𝜕𝑦
𝜕𝑥
=
𝜕𝑦
𝜕𝑧#
𝜕𝑧#
𝜕𝑥
+
𝜕𝑦
𝜕𝑧"
𝜕𝑧"
𝜕𝑧$
𝜕𝑧$
𝜕𝑥
+
𝜕𝑦
𝜕𝑧"
𝜕𝑧"
𝜕𝑧0
𝜕𝑧0
𝜕𝑥
1.37
0.83
-0.69
0.88
3.29
1
Backward
66
𝑥
1.2 0.14
𝑧7
0.18 3.29
𝑧.
3.29 0.88
𝑧8
0.83 5.48
𝑧%
0.88 3.29
𝑦
2.88 1.0
data
data
data
data data
data
grad
grad
grad
grad grad
grad
𝜕𝑦
𝜕𝑥
𝜕𝑦
𝜕𝑧#
𝜕𝑦
𝜕𝑧$
𝜕𝑦
𝜕𝑧0
𝜕𝑦
𝜕𝑧"
𝜕𝑦
𝜕𝑦
Forward Backward y
( )
67
𝑧. = 3 𝑥 𝑥
1.2
data grad
𝑧.
3.29
data grad
Forward
Backward
z2.data
← 3 * sqrt(x.data)
𝑥
1.2
data grad
𝑧.
3.29
data grad
backward
x.grad
← 3/(2*sqrt(x.data)) * z2.grad
𝜕𝑧#
𝜕𝑥
=
3
2 𝑥
x.grad
← 3/(2*sqrt(x.data))
* z2.grad
𝜕𝑦
𝜕𝑥
=
𝜕𝑦
𝜕𝑧#
×
𝜕𝑧#
𝜕𝑥
68
𝑧. = 3 𝑥 𝑥
1.2
data grad
𝑧.
3.29
data grad
Forward
Backward
z2.data
← 3 * sqrt(x.data)
𝑥
1.2
data grad
𝑧.
3.29
data grad
backward
x.grad
← 3/(2*sqrt(x.data)) * z2.grad
𝜕𝑧#
𝜕𝑥
=
3
2 𝑥
x.grad
← 3/(2*sqrt(x.data))
* z2.grad
𝜕𝑦
𝜕𝑧#
= 0.88
1.20 0.88
𝜕𝑦
𝜕𝑥
=
𝜕𝑦
𝜕𝑧#
×
𝜕𝑧#
𝜕𝑥
69
𝑦 = 𝑧%𝑧.
𝑧% = 𝑧8
.
+ 𝑧7
𝑧. = 3 𝑥
𝑧7 = log 𝑥
𝑧8 = 1/𝑥
𝑥
1.2
𝑧7
0.18
𝑧.
3.29
𝑧8
0.83
𝑧%
0.88
𝑦
2.88
data
data grad
data grad
data grad
grad data grad
data grad
( )
1. Forward ( & )
70
𝜕𝑧+
𝜕𝑥
= −
1
𝑥,
𝑦 = 𝑧%𝑧.
𝑧% = 𝑧8
.
+ 𝑧7
𝑧. = 3 𝑥
𝑧7 = log 𝑥
𝑧8 = 1/𝑥
𝑥
1.2
𝑧7
0.18
𝑧.
3.29
𝑧8
0.83
𝑧%
0.88
𝑦
2.88
data
data
data
data data
data
grad
grad
grad
grad grad
grad
⾒ Backward
⾒
𝜕𝑧-
𝜕𝑥
=
1
𝑥
𝜕𝑧,
𝜕𝑥
=
3
2 𝑥
𝜕𝑦
𝜕𝑧,
= 𝑧.
𝜕𝑧.
𝜕𝑧-
= 1
𝜕𝑧.
𝜕𝑧+
= 2 𝑧+
𝜕𝑦
𝜕𝑧.
= 𝑧,
1. Forward ( & )
71
𝑦 = 𝑧%𝑧.
𝑧% = 𝑧8
.
+ 𝑧7
𝑧. = 3 𝑥
𝑧7 = log 𝑥
𝑧8 = 1/𝑥
𝑥
1.2
𝑧7
0.18
𝑧.
3.29
𝑧8
0.83
𝑧%
0.88
𝑦
2.88
data
data
data
data data
data
grad
grad
grad
grad grad
grad
x.grad
← 3/(2*sqrt(x.data)) * z2.grad
x.grad
← (-1/x.data**2) * z4.grad
z2.grad
← z1.data* y.grad
z1.grad
← z2.data* y.grad
z3.grad
← 1.0 * z1.grad
z4.grad
← (2*z4.data)* z1.grad
x.grad
← 1/x.data * z3.grad
1. Forward ( & )
⾒ Backward
⾒
72
𝑦 = 𝑧%𝑧.
𝑧% = 𝑧8
.
+ 𝑧7
𝑧. = 3 𝑥
𝑧7 = log 𝑥
𝑧8 = 1/𝑥
𝑥
1.2 0.0
𝑧7
0.18 0.0
𝑧.
3.29 0.0
𝑧8
0.83 0.0
𝑧%
0.88 0.0
𝑦
2.88 0.0
data
data
data
data data
data
grad
grad
grad
grad grad
grad
2. Backward (grad 0 )
73
𝑦 = 𝑧%𝑧.
𝑧% = 𝑧8
.
+ 𝑧7
𝑧. = 3 𝑥
𝑧7 = log 𝑥
𝑧8 = 1/𝑥
𝑥
1.2 0.0
𝑧7
0.18 0.0
𝑧.
3.29 0.0
𝑧8
0.83 0.0
𝑧%
0.88 0.0
𝑦
2.88 1.0
data
data
data
data data
data
grad
grad
grad
grad grad
grad
3. Backward ( grad 1 )
74
𝑦 = 𝑧%𝑧.
𝑧% = 𝑧8
.
+ 𝑧7
𝑧. = 3 𝑥
𝑧7 = log 𝑥
𝑧8 = 1/𝑥
𝑥
1.2 0.0
𝑧7
0.18 0.0
𝑧.
3.29 0.88
𝑧8
0.83 0.0
𝑧%
0.88 3.29
𝑦
2.88 1.0
data
data
data
data data
data
grad
grad
grad
grad grad
grad
z2.grad
← z1.data* y.grad
z1.grad
← z2.data* y.grad
4. Backward (y grad )
75
𝑦 = 𝑧%𝑧.
𝑧% = 𝑧8
.
+ 𝑧7
𝑧. = 3 𝑥
𝑧7 = log 𝑥
𝑧8 = 1/𝑥
𝑥
1.2 0.0
𝑧7
0.18 3.29
𝑧.
3.29 0.88
𝑧8
0.83 5.48
𝑧%
0.88 3.29
𝑦
2.88 1.0
data
data
data
data data
data
grad
grad
grad
grad grad
grad
z3.grad
← 1.0 * z1.grad
z4.grad
← (2*z4.data)* z1.grad
𝜕𝑦
𝜕𝑧$
=
𝜕𝑦
𝜕𝑧"
𝜕𝑧"
𝜕𝑧$
𝜕𝑦
𝜕𝑧0
=
𝜕𝑦
𝜕𝑧"
𝜕𝑧"
𝜕𝑧0
5. Backward (z1 grad )
76
𝑦 = 𝑧%𝑧.
𝑧% = 𝑧8
.
+ 𝑧7
𝑧. = 3 𝑥
𝑧7 = log 𝑥
𝑧8 = 1/𝑥
𝑥
1.2 1.20
𝑧7
0.18 3.29
𝑧.
3.29 0.88
𝑧8
0.83 5.48
𝑧%
0.88 3.29
𝑦
2.88 1.0
data
data
data
data data
data
grad
grad
grad
grad grad
grad
x.grad
← 3/(2*sqrt(x.data)) * z2.grad
1.20
grad=0.0
𝜕𝑦
𝜕𝑥
=
𝜕𝑦
𝜕𝑧#
𝜕𝑧#
𝜕𝑥
+
𝜕𝑦
𝜕𝑧"
𝜕𝑧"
𝜕𝑧$
𝜕𝑧$
𝜕𝑥
+
𝜕𝑦
𝜕𝑧"
𝜕𝑧"
𝜕𝑧0
𝜕𝑧0
𝜕𝑥
6. Backward (z2 grad )
77
𝑦 = 𝑧%𝑧.
𝑧% = 𝑧8
.
+ 𝑧7
𝑧. = 3 𝑥
𝑧7 = log 𝑥
𝑧8 = 1/𝑥
𝑥
1.2 3.94
𝑧7
0.18 3.29
𝑧.
3.29 0.88
𝑧8
0.83 5.48
𝑧%
0.88 3.29
𝑦
2.88 1.0
data
data
data
data data
data
grad
grad
grad
grad grad
grad
x.grad
← 1/x.data * z3.grad
2.74
grad=1.20
𝜕𝑦
𝜕𝑥
=
𝜕𝑦
𝜕𝑧#
𝜕𝑧#
𝜕𝑥
+
𝜕𝑦
𝜕𝑧"
𝜕𝑧"
𝜕𝑧$
𝜕𝑧$
𝜕𝑥
+
𝜕𝑦
𝜕𝑧"
𝜕𝑧"
𝜕𝑧0
𝜕𝑧0
𝜕𝑥
𝜕𝑦
𝜕𝑧$
6. Backward (z3 grad )
78
𝑦 = 𝑧%𝑧.
𝑧% = 𝑧8
.
+ 𝑧7
𝑧. = 3 𝑥
𝑧7 = log 𝑥
𝑧8 = 1/𝑥
𝑥
1.2 0.14
𝑧7
0.18 3.29
𝑧.
3.29 0.88
𝑧8
0.83 5.48
𝑧%
0.88 3.29
𝑦
2.88 1.0
data
data
data
data data
data
grad
grad
grad
grad grad
grad
-3.80
grad=3.94
x.grad
← (-1/x.data**2) * z4.grad
𝜕𝑦
𝜕𝑥
=
𝜕𝑦
𝜕𝑧#
𝜕𝑧#
𝜕𝑥
+
𝜕𝑦
𝜕𝑧"
𝜕𝑧"
𝜕𝑧$
𝜕𝑧$
𝜕𝑥
+
𝜕𝑦
𝜕𝑧"
𝜕𝑧"
𝜕𝑧0
𝜕𝑧0
𝜕𝑥
𝜕𝑦
𝜕𝑧0
6. Backward (z4 grad )
79
𝑦 = 𝑧%𝑧.
𝑧% = 𝑧8
.
+ 𝑧7
𝑧. = 3 𝑥
𝑧7 = log 𝑥
𝑧8 = 1/𝑥
𝑥
1.2 0.14
𝑧7
0.18 3.29
𝑧.
3.29 0.88
𝑧8
0.83 5.48
𝑧%
0.88 3.29
𝑦
2.88 1.0
data
data
data
data data
data
grad
grad
grad
grad grad
grad
𝜕𝑦
𝜕𝑥
𝜕𝑦
𝜕𝑧#
𝜕𝑦
𝜕𝑧$
𝜕𝑦
𝜕𝑧0
𝜕𝑦
𝜕𝑧"
𝜕𝑦
𝜕𝑦
( )
y
80
𝑧. = 3 𝑥
𝑧7 = log 𝑥
𝑧8 = 1/𝑥
𝑧% = 𝑧8
.
+ 𝑧7
𝑦 = 𝑧%𝑧.
PyTorch
Backward
Forward
81
grad
x = tensor(1.2, requires_grad=True)
z2 = 3*sqrt(x)
z3 = log(x)
z4 = 1/x
z1 = z4**2 + z3
y = z1 * z2
y.retain_grad()
z1.retain_grad()
z2.retain_grad()
z3.retain_grad()
z4.retain_grad()
y.backward()
print(x.data, z2.data, z3.data, z4.data, z1.data, y.data)
print(x.grad, z2.grad, z3.grad, z4.grad, z1.grad, y.grad)
tensor(1.20) tensor(3.29) tensor(0.18) tensor(0.83) tensor(0.88) tensor(2.88)
tensor(0.14) tensor(0.88) tensor(3.29) tensor(5.48) tensor(3.29) tensor(1.)
import torch
torch.set_printoptions(2)
2
PyTorch
82
𝑦 = 𝑥. + 2𝑥 + 3 = 𝑥 + 1 . + 2
(2.0)
• (Forward)
• (Backward)
•
• x.grad
𝑥 = −1 𝑦 = 2
83
𝑦 = 𝑥$ + 3.2 𝑥# + 1.3 𝑥 − 2
20 ( -1.5 1.5)
84
x.grad
(Forward)
(Backward)
3
85
MSE = mean squared
errors ( )
SGD = stochastic
gradient descent
( )
PyTorch
86
( )
•
• (lr )
•
•
• GPU
87
• https://github.com/karpathy/micrograd
• Python 94
•
https://github.com/karpathy/micrograd/blob/master/micrograd/engine.py
• Andrej Karpathy
OpenAI
(Telsa AI
2023 OpenAI )
•
https://youtu.be/VMj-3S1tku0?si=91ZWzaA4ECidua4g
micrograd
88
§ 𝑦 = 3 𝑥 𝑧 = 𝑥 𝑦 = 3𝑧
§
PyTorch https://pytorch.org/docs/stable/torch.html#math-operations
§
§
89
( )
J aime la
musique I love music
(Software 2.0 )
Q & A
91
1.
2.
3. ( )
Slido
https://app.sli.do/event/d13KtHT3VXxFbmLvxPSQjn

More Related Content

PDF
Tokyo r7 sem_20100724
PDF
2 2.尤度と最尤法
PDF
LBFGSの実装
PDF
CVIM#11 3. 最小化のための数値計算
PDF
技術者が知るべき Gröbner 基底
PDF
Graphic Notes on Linear Algebra and Data Science
PPTX
5分でわかるベイズ確率
PDF
Stan超初心者入門
Tokyo r7 sem_20100724
2 2.尤度と最尤法
LBFGSの実装
CVIM#11 3. 最小化のための数値計算
技術者が知るべき Gröbner 基底
Graphic Notes on Linear Algebra and Data Science
5分でわかるベイズ確率
Stan超初心者入門

What's hot (20)

PDF
高次元空間におけるハブの出現 (第11回ステアラボ人工知能セミナー)
PDF
HiPPO/S4解説
PPTX
EMアルゴリズム
PDF
3分でわかる多項分布とディリクレ分布
PDF
効果測定入門 Rによる傾向スコア解析
PDF
検定力分析とベイズファクターデザイン分析によるサンプルサイズ設計【※Docswellにも同じものを上げています】
PPTX
東京大学2020年度深層学習(Deep learning基礎講座) 第9回「深層学習と自然言語処理」
PPTX
ベイズ統計学の概論的紹介
PPTX
【DL輪読会】Emergent World Representations: Exploring a Sequence ModelTrained on a...
PDF
星野「調査観察データの統計科学」第1&2章
PPTX
【DL輪読会】Hyena Hierarchy: Towards Larger Convolutional Language Models
PDF
連続変量を含む相互情報量の推定
PPTX
劣モジュラ最適化と機械学習1章
PDF
グラフデータの機械学習における特徴表現の設計と学習
PPTX
RとStanで分散分析
PDF
Silkによる並列分散ワークフロープログラミング
PPTX
PandasとSQLとの比較
PDF
MCMCと正規分布の推測
PDF
Stan勉強会資料(前編)
PDF
効果的な論文構成:論文を通して価値あるアイデアを伝える方法
高次元空間におけるハブの出現 (第11回ステアラボ人工知能セミナー)
HiPPO/S4解説
EMアルゴリズム
3分でわかる多項分布とディリクレ分布
効果測定入門 Rによる傾向スコア解析
検定力分析とベイズファクターデザイン分析によるサンプルサイズ設計【※Docswellにも同じものを上げています】
東京大学2020年度深層学習(Deep learning基礎講座) 第9回「深層学習と自然言語処理」
ベイズ統計学の概論的紹介
【DL輪読会】Emergent World Representations: Exploring a Sequence ModelTrained on a...
星野「調査観察データの統計科学」第1&2章
【DL輪読会】Hyena Hierarchy: Towards Larger Convolutional Language Models
連続変量を含む相互情報量の推定
劣モジュラ最適化と機械学習1章
グラフデータの機械学習における特徴表現の設計と学習
RとStanで分散分析
Silkによる並列分散ワークフロープログラミング
PandasとSQLとの比較
MCMCと正規分布の推測
Stan勉強会資料(前編)
効果的な論文構成:論文を通して価値あるアイデアを伝える方法
Ad

Similar to 機械学習と自動微分 (20)

PDF
PRML エビデンス近似 3.5 3.6.1
PDF
التفاضل و التكامل
PDF
Attention-Based Adaptive Selection of Operations for Image Restoration in the...
PDF
Lois de kirchhoff, dipôles électrocinétiques
PDF
Oceans 2019 tutorial-geophysical-nav_7-updated
PDF
GAN in_kakao
PDF
PRML ベイズロジスティック回帰 4.5 4.5.2
PDF
第5回 様々なファイル形式の読み込みとデータの書き出し(解答付き)
PDF
Week_2_Neural_Networks_Basichhhhhhhs.pdf
PDF
1º rueda de prensa
PDF
第5回 様々なファイル形式の読み込みとデータの書き出し
PDF
Capurro o conceito de informação
PDF
Identifying Attributes
PDF
(KO) 온라인 뉴스 댓글 플랫폼을 흐리는 어뷰저 분석기 / (EN) Online ...
PDF
PDF
PDF
Safeway presentation
PDF
Despacho conjunto nº300 97 de 09 setembro
PDF
PDF
Connectix webserver
PRML エビデンス近似 3.5 3.6.1
التفاضل و التكامل
Attention-Based Adaptive Selection of Operations for Image Restoration in the...
Lois de kirchhoff, dipôles électrocinétiques
Oceans 2019 tutorial-geophysical-nav_7-updated
GAN in_kakao
PRML ベイズロジスティック回帰 4.5 4.5.2
第5回 様々なファイル形式の読み込みとデータの書き出し(解答付き)
Week_2_Neural_Networks_Basichhhhhhhs.pdf
1º rueda de prensa
第5回 様々なファイル形式の読み込みとデータの書き出し
Capurro o conceito de informação
Identifying Attributes
(KO) 온라인 뉴스 댓글 플랫폼을 흐리는 어뷰저 분석기 / (EN) Online ...
Safeway presentation
Despacho conjunto nº300 97 de 09 setembro
Connectix webserver
Ad

More from Ichigaku Takigawa (20)

PDF
データ社会を生きる技術
〜機械学習の夢と現実〜
PDF
機械学習を科学研究で使うとは?
PDF
A Modern Introduction to Decision Tree Ensembles
PDF
Exploring Practices in Machine Learning and Machine Discovery for Heterogeneo...
PDF
機械学習と機械発見:自然科学融合が誘起するデータ科学の新展開
PDF
機械学習と機械発見:自然科学研究におけるデータ利活用の再考
PDF
小1にルービックキューブを教えてみた 〜群論スポーツの教育とパターン認知〜
PDF
"データ化"する化学と情報技術・人工知能・データサイエンス
PDF
自然科学における機械学習と機械発見
PDF
幾何と機械学習: A Short Intro
PDF
決定森回帰の信頼区間推定, Benign Overfitting, 多変量木とReLUネットの入力空間分割
PDF
Machine Learning for Molecules: Lessons and Challenges of Data-Centric Chemistry
PDF
機械学習を自然現象の理解・発見に使いたい人に知っておいてほしいこと
PDF
自己紹介:機械学習・機械発見とデータ中心的自然科学
PDF
機械学習・機械発見から見るデータ中心型化学の野望と憂鬱
PDF
Machine Learning for Molecular Graph Representations and Geometries
PDF
(2021.11) 機械学習と機械発見:データ中心型の化学・材料科学の教訓とこれから
PDF
機械学習~データを予測に変える技術~で化学に挑む! (サイエンスアゴラ2021)
PDF
(2021.10) 機械学習と機械発見 データ中心型の化学・材料科学の教訓とこれから
PDF
Machine Learning for Molecules
データ社会を生きる技術
〜機械学習の夢と現実〜
機械学習を科学研究で使うとは?
A Modern Introduction to Decision Tree Ensembles
Exploring Practices in Machine Learning and Machine Discovery for Heterogeneo...
機械学習と機械発見:自然科学融合が誘起するデータ科学の新展開
機械学習と機械発見:自然科学研究におけるデータ利活用の再考
小1にルービックキューブを教えてみた 〜群論スポーツの教育とパターン認知〜
"データ化"する化学と情報技術・人工知能・データサイエンス
自然科学における機械学習と機械発見
幾何と機械学習: A Short Intro
決定森回帰の信頼区間推定, Benign Overfitting, 多変量木とReLUネットの入力空間分割
Machine Learning for Molecules: Lessons and Challenges of Data-Centric Chemistry
機械学習を自然現象の理解・発見に使いたい人に知っておいてほしいこと
自己紹介:機械学習・機械発見とデータ中心的自然科学
機械学習・機械発見から見るデータ中心型化学の野望と憂鬱
Machine Learning for Molecular Graph Representations and Geometries
(2021.11) 機械学習と機械発見:データ中心型の化学・材料科学の教訓とこれから
機械学習~データを予測に変える技術~で化学に挑む! (サイエンスアゴラ2021)
(2021.10) 機械学習と機械発見 データ中心型の化学・材料科学の教訓とこれから
Machine Learning for Molecules

Recently uploaded (20)

PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PDF
A novel scalable deep ensemble learning framework for big data classification...
PDF
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf
PDF
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
PDF
DP Operators-handbook-extract for the Mautical Institute
PDF
WOOl fibre morphology and structure.pdf for textiles
PDF
Architecture types and enterprise applications.pdf
PDF
Enhancing emotion recognition model for a student engagement use case through...
PDF
1 - Historical Antecedents, Social Consideration.pdf
PPTX
1. Introduction to Computer Programming.pptx
PDF
Getting Started with Data Integration: FME Form 101
PPTX
TLE Review Electricity (Electricity).pptx
PPTX
Modernising the Digital Integration Hub
PDF
Zenith AI: Advanced Artificial Intelligence
PDF
project resource management chapter-09.pdf
PPTX
cloud_computing_Infrastucture_as_cloud_p
PDF
Microsoft Solutions Partner Drive Digital Transformation with D365.pdf
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PPTX
O2C Customer Invoices to Receipt V15A.pptx
PDF
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
gpt5_lecture_notes_comprehensive_20250812015547.pdf
A novel scalable deep ensemble learning framework for big data classification...
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
DP Operators-handbook-extract for the Mautical Institute
WOOl fibre morphology and structure.pdf for textiles
Architecture types and enterprise applications.pdf
Enhancing emotion recognition model for a student engagement use case through...
1 - Historical Antecedents, Social Consideration.pdf
1. Introduction to Computer Programming.pptx
Getting Started with Data Integration: FME Form 101
TLE Review Electricity (Electricity).pptx
Modernising the Digital Integration Hub
Zenith AI: Advanced Artificial Intelligence
project resource management chapter-09.pdf
cloud_computing_Infrastucture_as_cloud_p
Microsoft Solutions Partner Drive Digital Transformation with D365.pdf
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
O2C Customer Invoices to Receipt V15A.pptx
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf

機械学習と自動微分

  • 3. 3 § § § § § ( ) § ( ) § https://itakigawa.github.io/
  • 9. 9 § ChatGPT 2022 11 2 1 § AI § GAFAM IT 2023 OpenAI ChatGPT
  • 10. 10 1. - - - 2. - - - 3. - - - DIY OpenAI ChatGPT 4. - - 5. GPTs - DALL-E - - 6. - - § § ChatGPT (ChatGPT )
  • 11. 13 § Microsoft OpenAI ChatGPT DALL-E AI § AI 3 (440 ) 3 Apple 2 (🇬🇧🇫🇷🇮🇹 GDP ) § Bing ( Edge) ChatGPT ChatGPT Web § GitHub Copilot( ) § ( ) § § Windows 11 Microsoft 365 Copilot/Copilot Pro Microsoft Copilot ( )
  • 12. 14 § AI Windows 11/ Microsoft 365 Copilot § MS (Word, Excel, Powerpoint, Outlook, Teams) AI § § Word § Powerpoint § Powerpoint § Excel § Web § Windows MS
  • 13. 15 §
  • 14. 16 AI § § ( 2023 12 ) § ChatGPT AI 2 § http://hdl.handle.net/2433/286548 § ( ), ( ) § 1 (Preferred Networks) § 2 ( ) § ( )
  • 15. 1.
  • 16. 18 .. . A B ( ) • / • •
  • 18. 20 ( ) J aime la musique I love music (Software 2.0 )
  • 21. 23 = Random Forest Gaussian Process Logistic Regression P( =red) 0 1 or
  • 22. 24 Random Forest Neural Network SVR Kernel Ridge p1 p2 p3 p4 … ( ) ( ) ( ) ⾒
  • 23. 25 § : 1 : 1 = ( )
  • 25. 2.
  • 26. 28 vs ( ) q1 q2 q3 q4 … Random Forest GBDT Nearest Neighbor SVM Gaussian Process Neural Network
  • 27. 29 vs ( ) p1 p2 p3 p4 … ( ) q1 q2 q3 q4 …
  • 28. 30 § § 𝑓(𝑥) 𝑦 ℓ(𝑓 𝑥 , 𝑦) § (MSE) § (Cross Entropy) § = (optimization) Minimize! 𝐿 𝜃 𝐿 𝜃 = ∑"#$ % ℓ(𝑓 𝑥", 𝜃 , 𝑦") + Ω(𝜃)
  • 29. 31 𝐿 𝑎, 𝑏 = − log * !:#!$% 𝑃(𝑦 = 1|𝑥!) * !:#!$& 𝑃(𝑦 = 0|𝑥!) データ モデル 𝑓 𝑥 = 𝜎 𝑎𝑥 + 𝑏 = 1 1 + 𝑒'()*+,) = 𝑃(𝑦 = 1|𝑥) 𝑥%, 𝑦% , 𝑥., 𝑦. , ⋯ , (𝑥/, 𝑦/) 各𝑦! 0 1 2 (Cross Entropy) (-1)× = − 8 !$% / 𝑦! log 𝑓(𝑥!) + 1 − 𝑦! log(1 − 𝑓 𝑥! ) ( ) 0 1 ( )
  • 30. 32 1 1 𝑓 𝑥 = 𝜎 𝑎𝑥 + 𝑏 = 1 1 + 𝑒'()*+,) 𝑎𝑥 + 𝑏 𝑎 𝑏 𝑥 9 𝑦 𝑧 𝑥", 𝑦" , 𝑥#, 𝑦# , (𝑥$, 𝑦$) Minimize 𝐿 𝑎, 𝑏 𝐿 𝑎, 𝑏 = − 𝑦" log " "%&! "#$%& + 1 − 𝑦" log 1 − " "%&! "#$%& − 𝑦# log 1 1 + 𝑒'()*'%+) + 1 − 𝑦# log 1 − 1 1 + 𝑒'()*'%+) − 𝑦$ log 1 1 + 𝑒'()*(%+) + 1 − 𝑦$ log 1 − 1 1 + 𝑒'()*(%+) 𝜎 𝑥 = 1 1 + 𝑒'* Linear Sigmoid ( )
  • 31. 33 1 1 3 ( 1 ) Minimize 𝐿 𝑎!, 𝑎", 𝑏!, 𝑏" 𝐿 𝑎), 𝑎*, 𝑏), 𝑏* = − 𝑦) log 1 1 + 𝑒 + ,! ) )-."($%&%'(%) -/! + 1 − 𝑦) log 1 − 1 1 + 𝑒 + ,! ) )-."($%&%'(%) -/! − 𝑦* log 1 1 + 𝑒 + ,! ) )-."($%&!'(%) -/! + 1 − 𝑦* log 1 − 1 1 + 𝑒 + ,! ) )-."($%&!'(%) -/! − 𝑦0 log 1 1 + 𝑒 + ,! ) )-."($%&*'(%) -/! + 1 − 𝑦0 log 1 − 1 1 + 𝑒 + ,! ) )-."($%&*'(%) -/! 𝑎% 𝑏% 𝑥 𝑧 1 1 + 𝑒#(%!&'(!) 𝑎. 𝑏. 𝑦 1 1 + 𝑒#(%"*'(") 𝑥 𝑧 𝑦
  • 32. 34 1 1 3 ( 2 ) 𝑥 𝑦 𝑎", 𝑏" Linear 𝑢" 𝑎#, 𝑏# Linear 𝑢# Sigmoid 𝑧" Sigmoid 𝑧# 𝑎$, 𝑏$, 𝑐$ Linear 𝑣 Sigmoid Minimize 𝐿 𝑎!, 𝑎", 𝑎+, 𝑏!, 𝑏", 𝑏+, 𝑐+ 𝑎0𝑧) + 𝑏0𝑧* + 𝑐0
  • 33. 35 § § ResNet50 2600 § AlexNet 6100 § ResNeXt101 8400 § VGG19 1 4300 § LLaMa 650 § Chinchilla 700 § GPT-3 1750 § Gopher 2800 § PaLM 5400 (CNN) Transformer ( )
  • 34. 36 𝐿 𝜃%, 𝜃., ⋯ 𝜃% 𝜃. 1. θ$, θ;, ⋯ 2. 𝐿 𝜃$, 𝜃;, ⋯ θ$, θ;, ⋯ 3. ( )
  • 35. 37 𝑓 𝑥, 𝑦 (𝑎, 𝑏) 𝑥 𝑥 𝑦 = 𝑏 𝑥 = 𝑎 𝑓* 𝑎, 𝑏 = lim 0→& 2 )+0, , '2(),,) 0 𝑓&(𝑎, 𝑏) 𝑓 𝑎, 𝑏 𝑎
  • 38. 40 §
  • 39. 41 = • 𝒙 𝑓(𝒙) 𝒙 (learning rate) 𝛼 : 1 " "(step size) 𝒙 𝑥% 𝑥. ⋮ 𝑥4 ← 𝑥% 𝑥. ⋮ 𝑥4 − 𝛼 × 𝜕𝑓/𝜕𝑥% 𝜕𝑓/𝜕𝑥. ⋮ 𝜕𝑓/𝜕𝑥4 𝑓(𝒙) 𝑥! 𝑥" 𝒙 = 𝑥! 𝑥"
  • 42. 44 § § § Momentum § AdaGrad § RMSProp § Adam https://towardsdatascience.com/a-visual- explanation-of-gradient-descent-methods- momentum-adagrad-rmsprop-adam-f898b102325c
  • 43. 45 § 45 𝑥 𝑦 𝑎), 𝑏) Linear 𝑢) 𝑎*, 𝑏* Linear 𝑢* Sigmoid 𝑧) Sigmoid 𝑧* 𝑎0, 𝑏0, 𝑐0 Linear 𝑣 Sigmoid 𝑦 𝜕𝑦/𝜕𝑎!, 𝜕𝑦/𝜕𝑏!, 𝜕𝑦/𝜕𝑐!,
  • 45. 47 § (Chain Rule) § 𝑥 𝑢 = 𝑓(𝑥) 𝑦 = 𝑔(𝑢) 𝑑𝑦 𝑑𝑥 = 𝑑𝑦 𝑑𝑢 𝑑𝑢 𝑑𝑥 § 𝑡 𝑥! = 𝑓! 𝑡 , 𝑥" = 𝑓" 𝑡 , ⋯ ⾒ 𝑧 = 𝑔(𝑥!, 𝑥", ⋯ ) 𝜕𝑧 𝜕𝑡 = 𝜕𝑧 𝜕𝑥% 𝜕𝑥% 𝜕𝑡 + 𝜕𝑧 𝜕𝑥. 𝜕𝑥. 𝜕𝑡 + ⋯ 𝑡 𝑥% 𝑥. 𝑧 ⋯ 𝑥 𝑢 𝑦 𝑓 𝑔 𝑓! 𝑓" 𝑔
  • 46. 48 § 𝑦 = 𝑒;< = > < 𝑥 𝑒'* 1/𝑥 𝑎 𝑏 𝑦 𝑎.𝑏 𝑎 = 𝑒'* 𝑏 = 1 𝑥 𝑦 = 𝑎.𝑏 𝜕𝑓 𝜕𝑥 = 𝑒'.* 𝑥 5 = − 𝑒'.*(2𝑥 + 1) 𝑥. ( or ) 𝜕𝑦 𝜕𝑥 = 𝜕𝑦 𝜕𝑎 𝜕𝑎 𝜕𝑥 + 𝜕𝑦 𝜕𝑏 𝜕𝑏 𝜕𝑥 = 2𝑎𝑏 −𝑒'* + 𝑎. − 1 𝑥. = 2𝑒'* J % * −𝑒'* + 𝑒'* . − % *- = − 6.-/(.*+%) *- ( )
  • 47. 49 1 3 𝜕𝑦/𝜕𝑥 𝑥 𝑦 𝑎", 𝑏" Linear 𝑢" 𝑎#, 𝑏# Linear 𝑢# Sigmoid 𝑧" Sigmoid 𝑧# 𝑎$, 𝑏$, 𝑐$ Linear 𝑣 Sigmoid
  • 49. 51 2 𝑣 𝑢 𝑓% 𝑣 = 3 𝑢 𝑣 = log 𝑢 𝑣 = 1/𝑢 2 𝑣 𝑎 𝑏 𝑔% 𝑣 = 𝑎 𝑏 𝑣 = 𝑎. + 𝑏 𝑑𝑣 𝑑𝑢 = 3 2 𝑢 𝑑𝑣 𝑑𝑢 = 1 𝑢 𝑑𝑣 𝑑𝑢 = − 1 𝑢. 𝜕𝑣 𝜕𝑎 = 𝑏 𝜕𝑣 𝜕𝑏 = 𝑎 𝜕𝑣 𝜕𝑎 = 2𝑎 𝜕𝑣 𝜕𝑏 = 1 𝑣 𝑢 𝑓. 𝑣 𝑢 𝑓7 𝑣 𝑎 𝑏 𝑔. 𝑦 = 𝑔%(𝑔. 𝑓7 𝑥 , 𝑓. 𝑥 , 𝑓% 𝑥 ) = 3 𝑥 log 𝑥 + 1 𝑥.
  • 50. 52 2 𝑑𝑦 𝑑𝑥 = 𝑑𝑧, 𝑑𝑥 𝑑𝑧! 𝑑𝑧, 𝑑𝑦 𝑑𝑧! + 𝑑𝑧+ 𝑑𝑥 𝑑𝑧! 𝑑𝑧+ 𝑑𝑦 𝑑𝑧! + 𝑑𝑧" 𝑑𝑥 𝑑𝑦 𝑑𝑧" = 3 𝑥 1 𝑥 − 2 𝑥+ + 3 log 𝑥 + 1 𝑥" 2 𝑥 𝑦 = 𝑔% 𝑧%, 𝑧. = 𝑧%𝑧. 𝑧% = 𝑔. 𝑧8, 𝑧7 = 𝑧7 + 𝑧8 . 𝑧. = 𝑓% 𝑥 = 3 𝑥 𝑧7 = 𝑓. 𝑥 = log 𝑥 𝑧8 = 𝑓7 𝑥 = 1/𝑥 𝑥 𝑧. 𝑧7 𝑧8 𝑧% 𝑦 𝑓" 𝑓# 𝑓$ 𝑔# 𝑔" 𝑦 = 𝑔%(𝑔. 𝑓7 𝑥 , 𝑓. 𝑥 , 𝑓% 𝑥 ) = 3 𝑥 log 𝑥 + 1 𝑥.
  • 51. 53 (1 ) 𝑧. = 3 𝑥 𝑧7 = log 𝑥 𝑧8 = 1/𝑥 𝑧% = 𝑧8 . + 𝑧7 𝑦 = 𝑧%𝑧. 2 x=1.2 𝑦 = 3 𝑥 log 𝑥 + 1 𝑥. 2 • • backward x=1.2 dy/dx ( )
  • 52. 54 § ( ) § § 𝑑𝑦 𝑑𝑥 = 3 𝑥 1 𝑥 − 2 𝑥? + 3 log 𝑥 + 1 𝑥; 2 𝑥 𝑥 = 1.2 @A @B 0.14 § ( )
  • 53. 55 𝑦 = 𝑧%𝑧. 𝑧% = 𝑧8 . + 𝑧7 𝑧. = 3 𝑥 𝑧7 = log 𝑥 𝑧8 = 1/𝑥 𝑥 1.2 𝑧7 0.18 𝑧. 3.29 𝑧8 0.83 data data data data Forward
  • 54. 56 𝑦 = 𝑧%𝑧. 𝑧% = 𝑧8 . + 𝑧7 𝑧. = 3 𝑥 𝑧7 = log 𝑥 𝑧8 = 1/𝑥 𝑥 1.2 𝑧7 0.18 𝑧. 3.29 𝑧8 0.83 𝑧% 0.88 data data data data data Forward
  • 55. 57 𝑦 = 𝑧%𝑧. 𝑧% = 𝑧8 . + 𝑧7 𝑧. = 3 𝑥 𝑧7 = log 𝑥 𝑧8 = 1/𝑥 𝑥 1.2 𝑧7 0.18 𝑧. 3.29 𝑧8 0.83 𝑧% 0.88 𝑦 2.88 data data data data data data Forward
  • 56. 58 𝑦 = 𝑧%𝑧. 𝑧% = 𝑧8 . + 𝑧7 𝑧. = 3 𝑥 𝑧7 = log 𝑥 𝑧8 = 1/𝑥 𝑥 1.2 𝑧7 0.18 𝑧. 3.29 𝑧8 0.83 𝑧% 0.88 𝑦 2.88 1.00 data grad data grad data grad data grad 𝜕𝑦 𝜕𝑦 = 1 data grad data grad Backward
  • 57. 59 𝑦 = 𝑧%𝑧. 𝑧% = 𝑧8 . + 𝑧7 𝑧. = 3 𝑥 𝑧7 = log 𝑥 𝑧8 = 1/𝑥 𝑥 1.2 𝑧7 0.18 𝑧. 3.29 0.88 𝑧8 0.83 𝑧% 0.88 3.29 𝑦 2.88 1.00 data grad data grad data grad data grad 𝜕𝑦 𝜕𝑦 = 1 𝜕𝑦 𝜕𝑧" = 𝑧# 𝜕𝑦 𝜕𝑧# = 𝑧" data grad data grad = 0.88 = 3.29 Backward
  • 58. 60 𝑦 = 𝑧%𝑧. 𝑧% = 𝑧8 . + 𝑧7 𝑧. = 3 𝑥 𝑧7 = log 𝑥 𝑧8 = 1/𝑥 𝑥 1.2 𝑧7 0.18 𝑧. 3.29 0.88 𝑧8 0.83 𝑧% 0.88 3.29 𝑦 2.88 1.00 data grad data grad data grad data grad 𝜕𝑦 𝜕𝑦 = 1 𝜕𝑦 𝜕𝑧" = 𝑧# 𝜕𝑦 𝜕𝑧# = 𝑧" 𝜕𝑧" 𝜕𝑧$ = 1 𝜕𝑧" 𝜕𝑧0 = 2 𝑧0 data grad data grad = 1.67 𝜕𝑦 𝜕𝑧$ = 𝜕𝑦 𝜕𝑧" 𝜕𝑧" 𝜕𝑧$ 𝜕𝑦 𝜕𝑧0 = 𝜕𝑦 𝜕𝑧" 𝜕𝑧" 𝜕𝑧0 = 0.88 = 3.29 Backward
  • 59. 61 𝑦 = 𝑧%𝑧. 𝑧% = 𝑧8 . + 𝑧7 𝑧. = 3 𝑥 𝑧7 = log 𝑥 𝑧8 = 1/𝑥 𝑥 1.2 𝑧7 0.18 3.29 𝑧. 3.29 0.88 𝑧8 0.83 5.48 𝑧% 0.88 3.29 𝑦 2.88 1.00 data grad data grad data grad data grad 𝜕𝑦 𝜕𝑦 = 1 𝜕𝑦 𝜕𝑧" = 𝑧# 𝜕𝑦 𝜕𝑧# = 𝑧" 𝜕𝑧" 𝜕𝑧$ = 1 𝜕𝑧" 𝜕𝑧0 = 2 𝑧0 data grad data grad = 1.67 𝜕𝑦 𝜕𝑧$ = 𝜕𝑦 𝜕𝑧" 𝜕𝑧" 𝜕𝑧$ 𝜕𝑦 𝜕𝑧0 = 𝜕𝑦 𝜕𝑧" 𝜕𝑧" 𝜕𝑧0 = 0.88 = 3.29 Backward
  • 60. 62 𝑦 = 𝑧%𝑧. 𝑧% = 𝑧8 . + 𝑧7 𝑧. = 3 𝑥 𝑧7 = log 𝑥 𝑧8 = 1/𝑥 𝑥 1.2 𝑧7 0.18 3.29 𝑧. 3.29 0.88 𝑧8 0.83 5.48 𝑧% 0.88 3.29 𝑦 2.88 1.00 data grad data grad data grad data grad 𝜕𝑦 𝜕𝑦 = 1 𝜕𝑦 𝜕𝑧" = 𝑧# 𝜕𝑦 𝜕𝑧# = 𝑧" 𝜕𝑧" 𝜕𝑧$ = 1 𝜕𝑧" 𝜕𝑧0 = 2 𝑧0 data grad data grad = 1.67 𝜕𝑦 𝜕𝑥 = 𝜕𝑦 𝜕𝑧# 𝜕𝑧# 𝜕𝑥 + 𝜕𝑦 𝜕𝑧" 𝜕𝑧" 𝜕𝑧$ 𝜕𝑧$ 𝜕𝑥 + 𝜕𝑦 𝜕𝑧" 𝜕𝑧" 𝜕𝑧0 𝜕𝑧0 𝜕𝑥 𝜕𝑧# 𝜕𝑥 = 3 2 𝑥 = 1.37 = 0.88 = 3.29 Backward
  • 61. 63 𝑦 = 𝑧%𝑧. 𝑧% = 𝑧8 . + 𝑧7 𝑧. = 3 𝑥 𝑧7 = log 𝑥 𝑧8 = 1/𝑥 𝑥 1.2 𝑧7 0.18 3.29 𝑧. 3.29 0.88 𝑧8 0.83 5.48 𝑧% 0.88 3.29 𝑦 2.88 1.00 data grad data grad data grad data grad 𝜕𝑦 𝜕𝑦 = 1 𝜕𝑦 𝜕𝑧" = 𝑧# 𝜕𝑦 𝜕𝑧# = 𝑧" 𝜕𝑧" 𝜕𝑧$ = 1 𝜕𝑧" 𝜕𝑧0 = 2 𝑧0 data grad data grad = 1.67 𝜕𝑦 𝜕𝑥 = 𝜕𝑦 𝜕𝑧# 𝜕𝑧# 𝜕𝑥 + 𝜕𝑦 𝜕𝑧" 𝜕𝑧" 𝜕𝑧$ 𝜕𝑧$ 𝜕𝑥 + 𝜕𝑦 𝜕𝑧" 𝜕𝑧" 𝜕𝑧0 𝜕𝑧0 𝜕𝑥 𝜕𝑧# 𝜕𝑥 = 3 2 𝑥 = 1.37 𝜕𝑧$ 𝜕𝑥 = 1 𝑥 = 0.83 = 0.88 = 3.29 Backward
  • 62. 64 𝑦 = 𝑧%𝑧. 𝑧% = 𝑧8 . + 𝑧7 𝑧. = 3 𝑥 𝑧7 = log 𝑥 𝑧8 = 1/𝑥 𝑥 1.2 𝑧7 0.18 3.29 𝑧. 3.29 0.88 𝑧8 0.83 5.48 𝑧% 0.88 3.29 𝑦 2.88 1.00 data grad data grad data grad data grad 𝜕𝑦 𝜕𝑦 = 1 𝜕𝑦 𝜕𝑧" = 𝑧# 𝜕𝑦 𝜕𝑧# = 𝑧" 𝜕𝑧" 𝜕𝑧$ = 1 𝜕𝑧" 𝜕𝑧0 = 2 𝑧0 data grad data grad = 1.67 𝜕𝑦 𝜕𝑥 = 𝜕𝑦 𝜕𝑧# 𝜕𝑧# 𝜕𝑥 + 𝜕𝑦 𝜕𝑧" 𝜕𝑧" 𝜕𝑧$ 𝜕𝑧$ 𝜕𝑥 + 𝜕𝑦 𝜕𝑧" 𝜕𝑧" 𝜕𝑧0 𝜕𝑧0 𝜕𝑥 𝜕𝑧# 𝜕𝑥 = 3 2 𝑥 = 1.37 𝜕𝑧$ 𝜕𝑥 = 1 𝑥 = 0.83 𝜕𝑧0 𝜕𝑥 = − 1 𝑥# = -0.69 = 0.88 = 3.29 Backward
  • 63. 65 𝑦 = 𝑧%𝑧. 𝑧% = 𝑧8 . + 𝑧7 𝑧. = 3 𝑥 𝑧7 = log 𝑥 𝑧8 = 1/𝑥 𝑥 1.2 0.14 𝑧7 0.18 3.29 𝑧. 3.29 0.88 𝑧8 0.83 5.48 𝑧% 0.88 3.29 𝑦 2.88 1.00 data grad data grad data grad data grad 𝜕𝑦 𝜕𝑦 = 1 data grad data grad 1.67 𝜕𝑦 𝜕𝑥 = 𝜕𝑦 𝜕𝑧# 𝜕𝑧# 𝜕𝑥 + 𝜕𝑦 𝜕𝑧" 𝜕𝑧" 𝜕𝑧$ 𝜕𝑧$ 𝜕𝑥 + 𝜕𝑦 𝜕𝑧" 𝜕𝑧" 𝜕𝑧0 𝜕𝑧0 𝜕𝑥 1.37 0.83 -0.69 0.88 3.29 1 Backward
  • 64. 66 𝑥 1.2 0.14 𝑧7 0.18 3.29 𝑧. 3.29 0.88 𝑧8 0.83 5.48 𝑧% 0.88 3.29 𝑦 2.88 1.0 data data data data data data grad grad grad grad grad grad 𝜕𝑦 𝜕𝑥 𝜕𝑦 𝜕𝑧# 𝜕𝑦 𝜕𝑧$ 𝜕𝑦 𝜕𝑧0 𝜕𝑦 𝜕𝑧" 𝜕𝑦 𝜕𝑦 Forward Backward y ( )
  • 65. 67 𝑧. = 3 𝑥 𝑥 1.2 data grad 𝑧. 3.29 data grad Forward Backward z2.data ← 3 * sqrt(x.data) 𝑥 1.2 data grad 𝑧. 3.29 data grad backward x.grad ← 3/(2*sqrt(x.data)) * z2.grad 𝜕𝑧# 𝜕𝑥 = 3 2 𝑥 x.grad ← 3/(2*sqrt(x.data)) * z2.grad 𝜕𝑦 𝜕𝑥 = 𝜕𝑦 𝜕𝑧# × 𝜕𝑧# 𝜕𝑥
  • 66. 68 𝑧. = 3 𝑥 𝑥 1.2 data grad 𝑧. 3.29 data grad Forward Backward z2.data ← 3 * sqrt(x.data) 𝑥 1.2 data grad 𝑧. 3.29 data grad backward x.grad ← 3/(2*sqrt(x.data)) * z2.grad 𝜕𝑧# 𝜕𝑥 = 3 2 𝑥 x.grad ← 3/(2*sqrt(x.data)) * z2.grad 𝜕𝑦 𝜕𝑧# = 0.88 1.20 0.88 𝜕𝑦 𝜕𝑥 = 𝜕𝑦 𝜕𝑧# × 𝜕𝑧# 𝜕𝑥
  • 67. 69 𝑦 = 𝑧%𝑧. 𝑧% = 𝑧8 . + 𝑧7 𝑧. = 3 𝑥 𝑧7 = log 𝑥 𝑧8 = 1/𝑥 𝑥 1.2 𝑧7 0.18 𝑧. 3.29 𝑧8 0.83 𝑧% 0.88 𝑦 2.88 data data grad data grad data grad grad data grad data grad ( ) 1. Forward ( & )
  • 68. 70 𝜕𝑧+ 𝜕𝑥 = − 1 𝑥, 𝑦 = 𝑧%𝑧. 𝑧% = 𝑧8 . + 𝑧7 𝑧. = 3 𝑥 𝑧7 = log 𝑥 𝑧8 = 1/𝑥 𝑥 1.2 𝑧7 0.18 𝑧. 3.29 𝑧8 0.83 𝑧% 0.88 𝑦 2.88 data data data data data data grad grad grad grad grad grad ⾒ Backward ⾒ 𝜕𝑧- 𝜕𝑥 = 1 𝑥 𝜕𝑧, 𝜕𝑥 = 3 2 𝑥 𝜕𝑦 𝜕𝑧, = 𝑧. 𝜕𝑧. 𝜕𝑧- = 1 𝜕𝑧. 𝜕𝑧+ = 2 𝑧+ 𝜕𝑦 𝜕𝑧. = 𝑧, 1. Forward ( & )
  • 69. 71 𝑦 = 𝑧%𝑧. 𝑧% = 𝑧8 . + 𝑧7 𝑧. = 3 𝑥 𝑧7 = log 𝑥 𝑧8 = 1/𝑥 𝑥 1.2 𝑧7 0.18 𝑧. 3.29 𝑧8 0.83 𝑧% 0.88 𝑦 2.88 data data data data data data grad grad grad grad grad grad x.grad ← 3/(2*sqrt(x.data)) * z2.grad x.grad ← (-1/x.data**2) * z4.grad z2.grad ← z1.data* y.grad z1.grad ← z2.data* y.grad z3.grad ← 1.0 * z1.grad z4.grad ← (2*z4.data)* z1.grad x.grad ← 1/x.data * z3.grad 1. Forward ( & ) ⾒ Backward ⾒
  • 70. 72 𝑦 = 𝑧%𝑧. 𝑧% = 𝑧8 . + 𝑧7 𝑧. = 3 𝑥 𝑧7 = log 𝑥 𝑧8 = 1/𝑥 𝑥 1.2 0.0 𝑧7 0.18 0.0 𝑧. 3.29 0.0 𝑧8 0.83 0.0 𝑧% 0.88 0.0 𝑦 2.88 0.0 data data data data data data grad grad grad grad grad grad 2. Backward (grad 0 )
  • 71. 73 𝑦 = 𝑧%𝑧. 𝑧% = 𝑧8 . + 𝑧7 𝑧. = 3 𝑥 𝑧7 = log 𝑥 𝑧8 = 1/𝑥 𝑥 1.2 0.0 𝑧7 0.18 0.0 𝑧. 3.29 0.0 𝑧8 0.83 0.0 𝑧% 0.88 0.0 𝑦 2.88 1.0 data data data data data data grad grad grad grad grad grad 3. Backward ( grad 1 )
  • 72. 74 𝑦 = 𝑧%𝑧. 𝑧% = 𝑧8 . + 𝑧7 𝑧. = 3 𝑥 𝑧7 = log 𝑥 𝑧8 = 1/𝑥 𝑥 1.2 0.0 𝑧7 0.18 0.0 𝑧. 3.29 0.88 𝑧8 0.83 0.0 𝑧% 0.88 3.29 𝑦 2.88 1.0 data data data data data data grad grad grad grad grad grad z2.grad ← z1.data* y.grad z1.grad ← z2.data* y.grad 4. Backward (y grad )
  • 73. 75 𝑦 = 𝑧%𝑧. 𝑧% = 𝑧8 . + 𝑧7 𝑧. = 3 𝑥 𝑧7 = log 𝑥 𝑧8 = 1/𝑥 𝑥 1.2 0.0 𝑧7 0.18 3.29 𝑧. 3.29 0.88 𝑧8 0.83 5.48 𝑧% 0.88 3.29 𝑦 2.88 1.0 data data data data data data grad grad grad grad grad grad z3.grad ← 1.0 * z1.grad z4.grad ← (2*z4.data)* z1.grad 𝜕𝑦 𝜕𝑧$ = 𝜕𝑦 𝜕𝑧" 𝜕𝑧" 𝜕𝑧$ 𝜕𝑦 𝜕𝑧0 = 𝜕𝑦 𝜕𝑧" 𝜕𝑧" 𝜕𝑧0 5. Backward (z1 grad )
  • 74. 76 𝑦 = 𝑧%𝑧. 𝑧% = 𝑧8 . + 𝑧7 𝑧. = 3 𝑥 𝑧7 = log 𝑥 𝑧8 = 1/𝑥 𝑥 1.2 1.20 𝑧7 0.18 3.29 𝑧. 3.29 0.88 𝑧8 0.83 5.48 𝑧% 0.88 3.29 𝑦 2.88 1.0 data data data data data data grad grad grad grad grad grad x.grad ← 3/(2*sqrt(x.data)) * z2.grad 1.20 grad=0.0 𝜕𝑦 𝜕𝑥 = 𝜕𝑦 𝜕𝑧# 𝜕𝑧# 𝜕𝑥 + 𝜕𝑦 𝜕𝑧" 𝜕𝑧" 𝜕𝑧$ 𝜕𝑧$ 𝜕𝑥 + 𝜕𝑦 𝜕𝑧" 𝜕𝑧" 𝜕𝑧0 𝜕𝑧0 𝜕𝑥 6. Backward (z2 grad )
  • 75. 77 𝑦 = 𝑧%𝑧. 𝑧% = 𝑧8 . + 𝑧7 𝑧. = 3 𝑥 𝑧7 = log 𝑥 𝑧8 = 1/𝑥 𝑥 1.2 3.94 𝑧7 0.18 3.29 𝑧. 3.29 0.88 𝑧8 0.83 5.48 𝑧% 0.88 3.29 𝑦 2.88 1.0 data data data data data data grad grad grad grad grad grad x.grad ← 1/x.data * z3.grad 2.74 grad=1.20 𝜕𝑦 𝜕𝑥 = 𝜕𝑦 𝜕𝑧# 𝜕𝑧# 𝜕𝑥 + 𝜕𝑦 𝜕𝑧" 𝜕𝑧" 𝜕𝑧$ 𝜕𝑧$ 𝜕𝑥 + 𝜕𝑦 𝜕𝑧" 𝜕𝑧" 𝜕𝑧0 𝜕𝑧0 𝜕𝑥 𝜕𝑦 𝜕𝑧$ 6. Backward (z3 grad )
  • 76. 78 𝑦 = 𝑧%𝑧. 𝑧% = 𝑧8 . + 𝑧7 𝑧. = 3 𝑥 𝑧7 = log 𝑥 𝑧8 = 1/𝑥 𝑥 1.2 0.14 𝑧7 0.18 3.29 𝑧. 3.29 0.88 𝑧8 0.83 5.48 𝑧% 0.88 3.29 𝑦 2.88 1.0 data data data data data data grad grad grad grad grad grad -3.80 grad=3.94 x.grad ← (-1/x.data**2) * z4.grad 𝜕𝑦 𝜕𝑥 = 𝜕𝑦 𝜕𝑧# 𝜕𝑧# 𝜕𝑥 + 𝜕𝑦 𝜕𝑧" 𝜕𝑧" 𝜕𝑧$ 𝜕𝑧$ 𝜕𝑥 + 𝜕𝑦 𝜕𝑧" 𝜕𝑧" 𝜕𝑧0 𝜕𝑧0 𝜕𝑥 𝜕𝑦 𝜕𝑧0 6. Backward (z4 grad )
  • 77. 79 𝑦 = 𝑧%𝑧. 𝑧% = 𝑧8 . + 𝑧7 𝑧. = 3 𝑥 𝑧7 = log 𝑥 𝑧8 = 1/𝑥 𝑥 1.2 0.14 𝑧7 0.18 3.29 𝑧. 3.29 0.88 𝑧8 0.83 5.48 𝑧% 0.88 3.29 𝑦 2.88 1.0 data data data data data data grad grad grad grad grad grad 𝜕𝑦 𝜕𝑥 𝜕𝑦 𝜕𝑧# 𝜕𝑦 𝜕𝑧$ 𝜕𝑦 𝜕𝑧0 𝜕𝑦 𝜕𝑧" 𝜕𝑦 𝜕𝑦 ( ) y
  • 78. 80 𝑧. = 3 𝑥 𝑧7 = log 𝑥 𝑧8 = 1/𝑥 𝑧% = 𝑧8 . + 𝑧7 𝑦 = 𝑧%𝑧. PyTorch Backward Forward
  • 79. 81 grad x = tensor(1.2, requires_grad=True) z2 = 3*sqrt(x) z3 = log(x) z4 = 1/x z1 = z4**2 + z3 y = z1 * z2 y.retain_grad() z1.retain_grad() z2.retain_grad() z3.retain_grad() z4.retain_grad() y.backward() print(x.data, z2.data, z3.data, z4.data, z1.data, y.data) print(x.grad, z2.grad, z3.grad, z4.grad, z1.grad, y.grad) tensor(1.20) tensor(3.29) tensor(0.18) tensor(0.83) tensor(0.88) tensor(2.88) tensor(0.14) tensor(0.88) tensor(3.29) tensor(5.48) tensor(3.29) tensor(1.) import torch torch.set_printoptions(2) 2 PyTorch
  • 80. 82 𝑦 = 𝑥. + 2𝑥 + 3 = 𝑥 + 1 . + 2 (2.0) • (Forward) • (Backward) • • x.grad 𝑥 = −1 𝑦 = 2
  • 81. 83 𝑦 = 𝑥$ + 3.2 𝑥# + 1.3 𝑥 − 2 20 ( -1.5 1.5)
  • 83. 85 MSE = mean squared errors ( ) SGD = stochastic gradient descent ( ) PyTorch
  • 84. 86 ( ) • • (lr ) • • • GPU
  • 85. 87 • https://github.com/karpathy/micrograd • Python 94 • https://github.com/karpathy/micrograd/blob/master/micrograd/engine.py • Andrej Karpathy OpenAI (Telsa AI 2023 OpenAI ) • https://youtu.be/VMj-3S1tku0?si=91ZWzaA4ECidua4g micrograd
  • 86. 88 § 𝑦 = 3 𝑥 𝑧 = 𝑥 𝑦 = 3𝑧 § PyTorch https://pytorch.org/docs/stable/torch.html#math-operations § §
  • 87. 89 ( ) J aime la musique I love music (Software 2.0 )
  • 88. Q & A