Azure Machine Learning NLP 最新動向 2020/07/02

https://interpret.ml/
機械学習モデル解釈・説明のための
包括的なフレームワーク
決定木
ルールリスト
線形回帰・ロジスティック回帰
GAM2
….
SHAP
LIME
Partial Dependence
Sensitivity Analysis

Interpretability for
Text Data
https://github.com/interpretml/interpret-text

• 最先端のテキスト解釈・説明の技術を積極的に採用
• 各アルゴリズムについて、使いやすい統合 API を提供
• 対話型ダッシュボードによるインサイトの獲得を支援
各文章データのテキスト分類モデルの
解釈性・説明性の機能を提供するオープンソースライブラリ

• Classical Text Explainer
(glass-box)
• Unified Information Explainer
(post-hoc and model agnostic)
• Introspective Rationale Explainer
(plug-in during training, model agnostic)
古典的な手法 & 最先端の手法

• 古典的な機械学習パイプライン
• 前処理、エンコーディング、学習、
ハイパーパラメータチューニングが実装済み
• サポートしているモデル
• scikit-learn の線形モデル (coefs_)
• Tree ベースのアンサンブルモデル (feature_importances)
• 上記モデルの係数や重要度を利用した説明性を提供
デフォルトの設定 : 1-gram bag-of-words
+ scikit-learn count vectorizer + Logistic regression

• 相互情報量をベースにした post-hoc の
アプローチ
• DNN モデルの隠れ層について、
統一された一貫性のある説明性を提供
• 現在は BERT をサポート
• 将来的に LSTM, RNN に対応予定
Towards A Deep and Unified Understanding of Deep Neural Models in NLP, Guan et al. [ICML 2019]

• モデル学習の仕組みに埋め込むタイプ
• 内省的生成器 (Introspective Generator) を前処理で利用
• 入力テキストを根拠 (rationales) と反根拠 (anti-rationales) に分岐
• 根拠 (rationales) のみを使って、精度が最大になるように学習
• モデルは入力テキストから生成された根拠 (rationales) しか見ないため、
何が予測に影響したのかを提示可能
Rethinking Cooperative Rationalization: Introspective Extraction and Complement Control, Yu et al. [EMNLP 2019]
内省的生成器
min-max ゲーム
予測器補完的予測器

Classical Text Explainer Unified Information
Explainer
Introspective Rationale
Explainer
Input model support Scikit-learn linear models
and tree-based models
PyTorch PyTorch
Explain BERT No Yes Yes
Explain RNN No No Yes
NLP Pipeline
Support
Handles text pre-
processing, encoding,
training, hyperparameter
tuning
Uses BERT tokenizer
however user needs to
supply trained/fine-
tuned BERT model, and
samples of trained data
Generator and predictor
modules handle the required
text pre-processing.

Demo
• Webサイト
• interpret.ml
• Interpre-text 対話型ダッシュボード
• サンプル : https://github.com/interpretml/interpret-text/tree/master/notebooks

ユーザーの入力特徴量
エンジニアリング
アルゴリズム
の選択
ハイパーパラメータ
のチューニング
モデルの
リーダーボード
データセット
設定と制約
76% 34% 82%
41%
88%
72%
81% 54% 73%
88% 90% 91%
95% 68%
56%
89% 89% 79%
順位モデルスコア
1 95%
2 76%
3 53%
…
自動機械学習は与えられたデータに対して
「高精度のモデル」を構築するために、
特徴量エンジニアリング、アルゴリズム選択、ハイパーパラメータ選択を
自動で効率的に探索します。

特徴量エンジニアリング
• 欠損値の自動補完
• 特徴量変換のカスタム指定
• 自動での特徴量変換
• 時系列データ前処理の自動化
• Lag, Rolling Windows etc
• BERT Embedding 対応

※参考；How BERT is integrated into Azure automated machine learning
https://techcommunity.microsoft.com/t5/azure-ai/how-bert-is-integrated-into-azure-
automated-machine-learning/ba-p/1194657
BERT の多言語対応
• これまでは英語のみの対応
• 日本語に対応した BERT Embedding
が特徴量として利用可能に

Demo
AutoML 日本語データ
• livedoor ニュースの多クラス分類
※コードを公開しています。
- AutoML のBERT モデルによるテキスト分類
https://medium.com/@konabuta/automl-の-bert-モデルによるテキスト分類-5758d4456975

https://dllab.connpass.com/event/178714/

Azure Machine Learning NLP 最新動向 2020/07/02

More Related Content

Similar to Azure Machine Learning NLP 最新動向 2020/07/02 (17)

More from Keita Onabuta (9)

Recently uploaded (9)

Azure Machine Learning NLP 最新動向 2020/07/02