Distributed Representations of Words and
Phrases and their Compositionally
長岡技術科学大学 自然言語処理研究室
高橋寛治
Mikolov, T., Sutskever, I., Chen, K., Corrado, G., & Dean, J. (2013).
Distributed Representations of Words and Phrases and their
Compositionality. Advances in Neural Information Processing
Systems 26 (NIPS 2013)
「word2vecによる自然言語処理」の図を利用
文献紹介 2016年4月13日
概要
•MikolovらのWord2vecの論文
•前のモデルと比べ、計算が早くなり高精度化
•フレーズも考慮
Ø“Canada”と“Air”→”Air Canada”
Distributed	Representations	of	Words	and	Phrases	and	their	Compositionally
はじめに
•ベクトルによる単語の表現は1986年から研究
•Mikolovら(2013)がSkip-gram modelを提案
•vec(“Madrid”) – vec(“Spain”) + vec(“France”)
≒ vec(“Paris”)
Distributed	Representations	of	Words	and	Phrases	and	their	Compositionally
Skip-gramモデル Mikolov(2013)
Distributed	Representations	of	Words	and	Phrases	and	their	
Compositionally
•入力単語の文脈中の単語
を推定
•これを拡張
Skip-gramモデル
•単語列w1,w2,w3…wT,文脈サイズc
•W(105~107)が大きすぎて計算は非現実的
Distributed	Representations	of	Words	and	Phrases	and	their	Compositionally
階層的ソフトマックス
•グループ化し計算を省略
•語彙数Nの場合、O(logN)に削減
Distributed	Representations	of	Words	and	Phrases	and	their	Compositionally
ハフマン符号を利用
ネガティブサンプリング
•ランダムに5個ぐらい
偽の入力
•不正解ニューロンを選
ぶ確率は単語の出現確
率の3/4乗にする
Distributed	Representations	of	Words	and	Phrases	and	their	
Compositionally
高頻度語のサブサンプリング
•“in”, “the”, “a”などの頻出語をサブサンプリング
•f(wi)は単語wiの相対頻度
•t(スレッショルド)は10-5
•高頻度語がよく間引かれる
Distributed	Representations	of	Words	and	Phrases	and	their	Compositionally
実験結果
•類推タスク
Øvec(“Berlin”)-vec(“Germany”)+vec(“France”)が
vec(“Paris”)かどうか
•NEG-15が良い
Distributed	Representations	of	Words	and	Phrases	and	their	Compositionally
複合語の学習
•複合語は単純な意味の合算ではない
•δは割引係数
•ユニグラムとバイグラムでスコアを計算
Distributed	Representations	of	Words	and	Phrases	and	their	Compositionally
複合語タスクと結果
Distributed	Representations	of	Words	and	Phrases	and	their	Compositionally
類推タスクの例
結果
語構成の確認
•単純なベクトル計算による構成
•ANDのような振る舞い
Ø似た文脈で同様の単語列が現れるから、似たベクト
ルと考えられる
Distributed	Representations	of	Words	and	Phrases	and	their	Compositionally
他の分散表現との比較
Distributed	Representations	of	Words	and	Phrases	and	their	Compositionally
300億単語で学習
まとめ
•Skip-gramモデルによる単語・複合語の単語ベク
トル表現
•省略による学習の高速化と高精度化
•単純なベクトル演算で意味を表現できた
Distributed	Representations	of	Words	and	Phrases	and	their	Compositionally

More Related Content

PDF
Rethinking Knowledge Graph Propagation for Zero-Shot Learinig 論文紹介
PDF
Feature Generationg Networks for Zero-Shot Learning 論文紹介
PDF
NIPS2013読み会 DeViSE: A Deep Visual-Semantic Embedding Model
PPTX
[DL輪読会]Life-Long Disentangled Representation Learning with Cross-Domain Laten...
PPTX
A simple neural network mnodule for relation reasoning
PPTX
[DL輪読会]Graph Convolutional Policy Network for Goal-Directed Molecular Graph G...
PDF
Recurrent neural networks
PPT
Distributed representation of sentences and documents
Rethinking Knowledge Graph Propagation for Zero-Shot Learinig 論文紹介
Feature Generationg Networks for Zero-Shot Learning 論文紹介
NIPS2013読み会 DeViSE: A Deep Visual-Semantic Embedding Model
[DL輪読会]Life-Long Disentangled Representation Learning with Cross-Domain Laten...
A simple neural network mnodule for relation reasoning
[DL輪読会]Graph Convolutional Policy Network for Goal-Directed Molecular Graph G...
Recurrent neural networks
Distributed representation of sentences and documents

Viewers also liked (11)

ODP
Recurrent Neural Network tutorial (2nd)
PPT
Ai and neural networks
PDF
Recurrent Neural Networks. Part 1: Theory
PPTX
Peer to peer system
PPTX
Exploring Session Context using Distributed Representations of Queries and Re...
PPT
Artificial Intelligence
PDF
A Brief Introduction on Recurrent Neural Network and Its Application
PPT
Introduction to Peer-to-Peer Networks
PPT
Knowledge Representation in Artificial intelligence
PPTX
neural network
PPTX
Peer To Peer Networking
Recurrent Neural Network tutorial (2nd)
Ai and neural networks
Recurrent Neural Networks. Part 1: Theory
Peer to peer system
Exploring Session Context using Distributed Representations of Queries and Re...
Artificial Intelligence
A Brief Introduction on Recurrent Neural Network and Its Application
Introduction to Peer-to-Peer Networks
Knowledge Representation in Artificial intelligence
neural network
Peer To Peer Networking
Ad

Similar to Distributed Representations of Words and Phrases and their Compositionally (20)

PDF
読解支援@2015 07-03
PDF
PDF
NIPS2013読み会: Distributed Representations of Words and Phrases and their Compo...
PDF
言語と画像の表現学習
PDF
Learning Image Embeddings using Convolutional Neural Networks for Improved Mu...
PPTX
【論文紹介】Distributed Representations of Sentences and Documents
PPTX
Machine Learning Seminar (5)
PDF
読解支援@2015 06-26
PDF
CluBERT: A Cluster-Based Approach for Learning Sense Distributions in Multipl...
PDF
2016word embbed
PPTX
dont_count_predict_in_acl2014
PDF
4thNLPDL
PDF
[Tutorial] Sentence Representation
PDF
100816 nlpml sec2
PPTX
Improving Distributional Similarity with Lessons Learned from Word Embeddings
PDF
GeneratingWikipedia_ICLR18_論文紹介
PDF
第64回情報科学談話会(岡﨑 直観 准教授)
PPTX
通時的な単語の意味変化の検出のサーベイ (Dynamic Word Embeddings Survey)
PDF
読解支援@2015 07-13
PPTX
All-but-the-Top: Simple and Effective Postprocessing for Word Representations
読解支援@2015 07-03
NIPS2013読み会: Distributed Representations of Words and Phrases and their Compo...
言語と画像の表現学習
Learning Image Embeddings using Convolutional Neural Networks for Improved Mu...
【論文紹介】Distributed Representations of Sentences and Documents
Machine Learning Seminar (5)
読解支援@2015 06-26
CluBERT: A Cluster-Based Approach for Learning Sense Distributions in Multipl...
2016word embbed
dont_count_predict_in_acl2014
4thNLPDL
[Tutorial] Sentence Representation
100816 nlpml sec2
Improving Distributional Similarity with Lessons Learned from Word Embeddings
GeneratingWikipedia_ICLR18_論文紹介
第64回情報科学談話会(岡﨑 直観 准教授)
通時的な単語の意味変化の検出のサーベイ (Dynamic Word Embeddings Survey)
読解支援@2015 07-13
All-but-the-Top: Simple and Effective Postprocessing for Word Representations
Ad

More from Kanji Takahashi (20)

PDF
20180718Eightニュースフィード活性化のための自然言語処理の取り組み
PDF
論文読み会 Creating Speech and Language Data With Amazon’s Mechanical Turk
PDF
論文読み会 Enriching Word Vectors with Subword Information
PDF
第17回Machine Learning 15 minutes!:ビジネスの出会いを科学する
PDF
論文読み会 Data Augmentation for Low-Resource Neural Machine Translation
PPTX
言語処理学会第23回年次大会参加報告
PDF
20170203The Effects of Data Size and Frequency Range on Distributional Semant...
PDF
20161215Neural Machine Translation of Rare Words with Subword Units
PDF
Enriching Morphologically Poor Languages for Statistical Machine Translation
PDF
A Beam-Search Decoder for Normalization of Social Media Text with Application...
PDF
Reducing the Impact of Data Sparsity in Statistical Machine Translation
PDF
文献紹介:Morphological analysis for Statistical Machine Translation
PDF
Nlp2016参加報告(高橋)
PDF
Domain-spesific Paraphrase Extraction
PDF
Vietnamese Word Segmentation with CRFs and SVMs: An Investigation
PDF
Improving vietnamese word segmentation and pos tagging using MEM with various...
PDF
日本語機能表現の自動検出と統計的係り受け解析への応用
PDF
20150916How Far are We from Fully Automatic High Quality Grammatical Error Co...
PDF
20150728So similar and yet incompatible: Toward automated identification of s...
PDF
20150701 Improving SMT quality with morpho-syntactic analysis
20180718Eightニュースフィード活性化のための自然言語処理の取り組み
論文読み会 Creating Speech and Language Data With Amazon’s Mechanical Turk
論文読み会 Enriching Word Vectors with Subword Information
第17回Machine Learning 15 minutes!:ビジネスの出会いを科学する
論文読み会 Data Augmentation for Low-Resource Neural Machine Translation
言語処理学会第23回年次大会参加報告
20170203The Effects of Data Size and Frequency Range on Distributional Semant...
20161215Neural Machine Translation of Rare Words with Subword Units
Enriching Morphologically Poor Languages for Statistical Machine Translation
A Beam-Search Decoder for Normalization of Social Media Text with Application...
Reducing the Impact of Data Sparsity in Statistical Machine Translation
文献紹介:Morphological analysis for Statistical Machine Translation
Nlp2016参加報告(高橋)
Domain-spesific Paraphrase Extraction
Vietnamese Word Segmentation with CRFs and SVMs: An Investigation
Improving vietnamese word segmentation and pos tagging using MEM with various...
日本語機能表現の自動検出と統計的係り受け解析への応用
20150916How Far are We from Fully Automatic High Quality Grammatical Error Co...
20150728So similar and yet incompatible: Toward automated identification of s...
20150701 Improving SMT quality with morpho-syntactic analysis

Distributed Representations of Words and Phrases and their Compositionally