ChainerでDeep Learningを試す為に必要なこと

ChainerでDeep Learningを
試す為に必要なこと
株式会社レトリバ
西鳥羽二郎

自己紹介
• 西鳥羽二郎
• ID: jnishi
• 略歴
• 東京大学情報理工学系研究科コンピュータ科学専攻修士課程卒業
• 2006年 Preferred Infrastructureに創業メンバーとして参画
• プロトタイプ開発
• プロフェッショナルサービス・サポートサービス
• 研究開発
• 2016年レトリバ創業
• 取締役・リサーチャーとして研究開発に従事
• 主に音声認識や自然言語処理を担当

Deep Learning(DL)への取り組み
• 2015年 3月頃に音声認識でDLが使えそうなことを知る
• 2015年 6月からChaierを用いて音声認識エンジンの開発開始
• 最適化関数 NesterovAG
• 活性化関数 ClippedReLU
• 損失関数 Connectionist Temporal Classification
Torch7: Baiduが
2016年1月に公開
TensorFlow:
2016年2月に搭載
Chainer: 2015年
10月に搭載

Deep Learningの手法をためそう!

Deep Learningの手法をためそう!
tterance at atime with better results than evaluating with alargebatch.
ples of varying length posesomealgorithmic challenges. Onepossible solution is
opagation through time [68], so that all examples have the same sequence length
2]. However, this can inhibit the ability to learn longer term dependencies. Other
that presenting examples in order of difﬁculty can accelerate online learning [6,
theme in many sequence learning problems including machine translation and
n isthat longer examples tend to bemorechallenging [11].
ction that weuseimplicitly depends on thelength of theutterance,
L(x, y; ✓) = − log
X
`2 Align(x,y)
TY
t
pctc(`t |x; ✓). (9)
is the set of all possible alignments of the characters of the transcription y to
under theCTC operator. In equation 9, theinner term isaproduct over time-steps
which shrinks with the length of the sequence since pctc(`t |x; ✓) < 1. This moti-
OK実装だ!

見るべきところ
• BaiduのDeep Specch2

• Googleの音声認識

• Microsoftの画像認識

Deep Learningのシステムを実装する際
• きちんと処理を理解するには数式を理解することが大事
• 実際に処理を記述する際には構造を図示したグラフを見ること
が多い

ニューラルネットワークの基本単位
x1
x2
xn
…
n個の入力 1個の出力
w1
w2
wn
u = w1x1 + w2x2 + …+ wnxn
ユニット

ニューラルネットワークの基本
x1
x2
xn
…
n個の入力 m個の出力
…
入力を同じとするユニットをたくさん並べる

ニューラルネットワーク(全結合)
x1
x2
xn
…
n個の入力 m個の入力
…
入
力
Linear

活性化関数
x1
x2
xn
…
u
出力にスケーリングや制限を
かける処理を行うことがある
活性化関数の例
• ReLU: 負の時は0にする
• sigmoid: 大小関係を維したまま0〜1にする
• tanh: 大小関係を維持したまま-1〜1にする

活性化関数も同様に表せる
入
力
Linear
ReLU

ネットワークとして示す
• ニューラルネットワーク以下のものをコンポーネントとする
ネットワークで表すことができる
• 入力
• Linear
• 活性化関数
• 損失関数
• Convolution層
• 正則化関数
• etc.

ネットワークの読み方

入力

Convolution層を
3段つなげる

RNNを7層

BatchNormalization
を正則化として用いる

Linearを1層用いる

CTCという
損失関数を用いる

Deep Learningを行う際に必要なこと
• forward処理
• back propagation
• 行列計算
• 微分計算
• 処理に用いる関数
• 入出力の関係
• 入力の大きさ
• 出力の大きさ
フレームワークが実行
フレームワークを用いて
実装する時に考えること

Chainerのexampleコード
class MLP(Chain):
def __init__(self, n_units=100, n_out=10):
super(MLP, self).__init__(
l1=L.Linear(None, n_units),
l3=L.Linear(None, n_out),
)
def __call__(self, x):
h1 = F.relu(self.l1(x))
h2 = F.relu(self.l2(h1))
y = self.l3(h2)
return y
Linear
MNIST画像
Linear
Linear
784(28x28)
100
100
0〜９の判定
10

layer1
class MLP(Chain):
)
y = self.l3(h2)
return y
Linear
MNIST画像
Linear
Linear
784(28x28)
100
100
0〜９の判定
10
l1

layer2
class MLP(Chain):
)
y = self.l3(h2)
return y
Linear
MNIST画像
Linear
Linear
784(28x28)
100
100
0〜９の判定
10
l1
l2

layer3
class MLP(Chain):
)
y = self.l3(h2)
return y
Linear
MNIST画像
Linear
Linear
784(28x28)
100
100
0〜９の判定
10
l1
l2
l3

forward処理
class MLP(Chain):
)
y = self.l3(h2)
return y
Linear
MNIST画像
Linear
Linear
x
h1
h2
0〜９の判定
y
l1
l2
l3

まとめ
• Deep Learningを行う際にはネットワーク構造が大事
• 構造が決まれば後はフレームワークが処理を行う
• Chainerの場合、MNISTのtrain_example.pyの例がシンプル
• Chainerに限らない

最後に
• “自明でない抽象化には程度の差こそあれ、漏れはある”
• “漏れのある抽象化の法則にうまく対処する唯一の方法は、そ
の抽象化がどのように機能し、それが何を抽象化しているかを
学ぶことだ”
• from Joel on Software
• Deep Learningの場合
• 使いたい関数がない
• GPUのメモリを食い尽くす
• 学習が収束しない

ChainerでDeep Learningを試す為に必要なこと

More Related Content

What's hot (20)

Viewers also liked (20)

Similar to ChainerでDeep Learningを試す為に必要なこと (20)

More from Jiro Nishitoba (12)

ChainerでDeep Learningを試す為に必要なこと