論文LT会用資料: Attention Augmented Convolution Networks

【画像処理&機械学習】論文LT会！#2
Attention Augmented
Convolution Networks
(https://arxiv.org/abs/1904.09925)
2019/05/09
@fam_taro

Agenda
1. イントロ
2. 関連研究
3. Attention Augmented Convolution
4. 個人的な感想
1

1. イントロ
 Conv 層に self-attention を追加
 Attention Augmented Convolution(以下 AAC)
 SE Block 等が channel の重みだけを出力するのに対し、AACは
feature map 自体を出力
 Conv 層を丸々置き換えることもできる
 提案手法を使うことで ResNet 等の精度が上がる
 Detection でも上がることを確認した (RetinaNet on COCO)
 Squeeze-and-Excitation(SE) よりも精度上がる
 かつパラメータ数も SE より少ない
 SE-ResNet 系が置き換わるかも？
 メモリ消費量が大きい(出力を考えれば妥当)
 (height x width)^2 に比例
 画像サイズを2倍にすると16倍のメモリが必要になる？
 適用したブロックだけなので単純に16倍に増えるわけ
ではなさそうだが… 2

2. 関連研究
 Attention mechanisms in networks
 Attention の特徴
 長い距離での関連について capture できる
 NLP 界隈で流行し始める
 RNN を使った Machine Translation
 CV 界隈では
 かなり幅広く応用されてる
 例
 CV用のモデルに組み込んだもの
 CV用モデルとRNN系モデルをあわせたもの
 代表的な例
 Squeeze-and-Excitation(SE Block)
 Gather-Excite(GE Block)
3

2. 関連研究
4
チャンネルに対
する重み
 Squeeze-and-Excitation(SE Block, 2017)
 https://arxiv.org/abs/1709.01507
 チャンネルに重み付けをする方法
 幅広く使える(大抵の Block に組み込める)
 pretrained-model によくある SE~~~ は基本これが
はいっている

 先行研究(SE Block 等) と比較して
 事前に AA Block 抜きで pre-training する必要がない
 non-local neural networks などは事前に付与せずに ImageNet 等を学習する必要あり
 The use of multi-head attention allows the model to attend jointly to both spatial and feature
subspaces
 multi-head attention を利用
 → 空間と特徴の部分空間を一緒に扱えるようになった？(意味怪しいです…)
5

 理解に必要な事前知識(Attention とは)
 > Attention の基本は query と memory(key, value) です。
 > Attention とは query によって memory から必要な情報を選択的に引っ張ってくること
 下記サイトより引用
6
“作って理解する Transformer / Attention - Qiita” より引用
https://qiita.com/halhorn/items/c91497522be27bde17ce
・query
・key
・value
の3つの単語が出
現したことだけ
押さえましょう

 理解に必要な事前知識
 > Attention の基本は query と memory(key, value) です。
 > Attention とは query によって memory から必要な情報を選択的に引っ張ってくること
 下記サイトより引用
7
“作って理解する Transformer / Attention - Qiita” より引用
https://qiita.com/halhorn/items/c91497522be27bde17ce
LSTM 等を使った場合、
この memory は記憶セ
ル等を示す
今回は input をそのま
ま入力

 数式的内訳
 変数の定義
 H: height of input
 W: width of input
 F_in: channel of input
 N_h: number of head in multihead-attention (MHA)
 d_v: depth of values in MHA
 d_k: depth of queries and keys in MHA
8

 画像が入力された時の操作
 1. H, W について flatten
 2. flatten したベクトル X から各 attention の head について以下の出力を得る
9

 1. H, W について flatten
 2. flatten したベクトル X から各 attention の head について以下の出力を得る
10
Query 特徴量に変換 Key 特徴量に変換
Value 特徴量に変換

 3. それぞれの head の出力を concat
 4. contat した結果に対して重みをかけて MHA の出力とする
 MHA(X) は (H, W, d_v)
 メモリ消費量が (HW)^2 に比例していることに注意
11

 5. Convolution 層と concat
 細かい使用例はコードを参照ください
12

4. 個人的な所感
 むずかしい…
 SE Block は channel の重み付けがメインなのでシンプルでわかりやすかった
 TensorFlow 実装例が出ているのはありがたい
 PyTorch 等の実装がすでに上がっているので試せそう
 https://github.com/leaderj1001/Attention-Augmented-Conv2d
 SE-ResNet 系は丸々置き換わりそう？
 とはいえ GPU 消費量が大きいので使うのが厳しそう…
 誰かが pre-trained model を公開してくれれば…
 pre-trained model がないと現行モデルを差し替えるのが辛いような
 もしかしたら pre-trained model の一部の weight を使い回せばいけるかも？
13

おまけ
 Wide-ResNet に適用した場合の比較
 図の通り top-1 accuracy は AA 使ったほうが良い
 ただし top-5 については大きい差はない
14

References
 Attention Augmented Convolutional Networks
 https://arxiv.org/abs/1904.09925
 Attention Augmented Convolutional Networks - 医療系AIエンジニアの技術メモ
 https://ai-engineer-memo.hatenablog.com/entry/2019/05/06/182901
 弊社先輩のメモ。本資料より詳しく書かれてます
 Squeeze-and-Excitation Networks - 医療系AIエンジニアの技術メモ
 https://ai-engineer-memo.hatenablog.com/entry/2019/02/04/005845
 [DL輪読会]Attention Is All You Need
 https://www.slideshare.net/DeepLearningJP2016/dlattention-is-all-you-need
 https://github.com/leaderj1001/Attention-Augmented-Conv2d
 PyTorch 実装（はやい）
 論文解説 Attention Is All You Need (Transformer) - ディープラーニングブログ
 http://deeplearning.hatenablog.com/entry/transformer)
 作って理解する Transformer / Attention - Qiita
 https://qiita.com/halhorn/items/c91497522be27bde17ce)
15

論文LT会用資料: Attention Augmented Convolution Networks

More Related Content

What's hot (20)

Similar to 論文LT会用資料: Attention Augmented Convolution Networks (20)

More from Yusuke Fujimoto (6)

論文LT会用資料: Attention Augmented Convolution Networks