Efficient estimation of word representations in vector space (2013)

Efficient Estimation of
Word Representations in
Vector Space
• Tomas Mikolov
• Kai Chen
• Greg Corrado
• Jeffrey Dean

“
(...) the meaning of a word is its
use in the language.
—Wittgenstein, Ludwig,
Philosophical Investigations – 1953
2

Vector Space Model
◎ Traditional Vector Space
Model (Information
Retrieval):
○ documents and queries
represented in a vector
space
○ where the dimensions
are the words
3

Co-occurrence matrix
◎ Let’s see window based co-occurrence matrix
◎ Example Corpus :
○ I like deep learning.
○ I like NLP.
○ I enjoy flying.
◎ Total vocabulary size(|V|) = 8
◎ Vector(“I”) = [0, 2, 1, 0, 0, 0, 0, 0]
◎ Vector(“like”) = [2, 0, 0, 1, 0, 1, 0 , 0]
4

What
◎ A two layer neural network to generate word embedding's given
a text corpus.
◎ Word Embedding's --- Mapping of words in a vector space
◎ So that similar words are mapped to nearby points
5

What
◎ For example – sentence = ” Word Embedding's are Word
converted into numbers ”
◎ A dictionary may be the list of all unique words in the sentence.
◎ So, a dictionary may look like –
[‘Word’,’Embedding's’,’are’,’Converted’,’into’,’numbers’]
◎ The vector representation of “numbers” according to the above
dictionary is [0,0,0,0,0,1] and of converted is[0,0,0,1,0,0].
6

Why
◎ Preserves relationship between words
◎ Deals with addition of new words in the vocabulary
◎ Better results in lots of deep learning application
7

Goal
8
Target
word
Word2Vec
Context
words
Context
words
Word2Vec
Word

CBOW
◎ Predict the target word from the context
◎ order of words in the history does
not inﬂuence the projection
◎ faster & more appropriate for
larger corpora
9

Skip Gram
◎ Predict the context words from target
◎ maximize classiﬁcation of a word based
on another word in the same sentence
◎ better word vectors for frequent words,
but slower to train
10

Skip-gram network architecture
13

Advantages
◎ It scales
○ Train on billion word corpora
○ In limited time
○ Possibility of parallel training
◎ Pre-trained word embedding's trained by one can be used by others
○ For entirely different tasks
◎ Incremental training
○ Train on one piece of data, save results, continue training later on
◎ There is a Python module for it:
○ Gensim word2vec
14

Disadvantages
◎ Inability to handle unknown or OOV words. ...
◎ No shared representations at sub-word levels. ...
◎ Scaling to new languages requires new embedding matrices. ...
◎ Cannot be used to initialize state-of-the-art architectures.
15

Efficient estimation of word representations in vector space (2013)

More Related Content

What's hot (20)

Similar to Efficient estimation of word representations in vector space (2013) (20)

More from Minhazul Arefin (9)

Recently uploaded (20)

Efficient estimation of word representations in vector space (2013)