Natural Language Processing word to Vec.pdf

Natural Language Processing
(Vectorization Techniques)

CloudKarya
❖ Word Vector
■ one-hot vector
❖ Countvectorizer
■ Bag of Words
■ Term frequency-Inverse Document Frequency(TF-IDF)
❖ SVD(Singular Value Decomposition)
■ Word-Document Matrix
■ window based Co-occurrence Matrix
■ Applying SVD to the cooccurrence matrix
❖ Iterative Method(Word2vec)
How to represent words?

CloudKarya
Let us consider the two sentences -
1. “You can scale your business.”
2. “You can grow your business.”
Word Vectors
Grow: [0,0,0,1,0,0],
Your:[0,0,0,0,1,0],
Business: [0,0,0,0,0,1]
You Can Scale Grow Your Business
} none of these words
has any similarity with
each other
Vocabulary:
You: [1,0,0,0,0,0],
Can: [0,1,0,0,0,0],
Scale: [0,0,1,0,0,0],

CloudKarya
Let us consider the two sentences –
1. ”The cat jumped over the moon."
2. "The cow jumped over the moon."
Countvectorizer
Vocabulary:
The Cat jumped over moon cow
TF-IDF Vectorization:
Sentence 1: [0.3, 0.6, 0.3, 0.3, 0.3, 0]
Sentence 2 :[0.3, 0, 0.3, 0.3, 0.3, 0.6]
BOW Representation:
Sentence 1: [2, 1, 1, 1, 1, 0]
Sentence 2 :[2, 0, 1, 1, 1, 1].
}
none of
these words
has any
similarity
with each
other
This approach disregards the importance of the
speciﬁc words and treats all words equally.

What is Word2Vec ?
● A two layer neural network to generate word embeddings given a text corpus.
● Word Embeddings — Mapping of words in a vector space.
0.52
0.76
1.21
0.22
-1.36
0.49
-3.69
-0.07
0.73
0.89
-1.67
1.32
0.36
-1.49
2.71
0.05
Man Women
King - Man + Women = Queen

Why Word2Vec ?
Why
Word2Vec
?
Preserves relationship between words.
Deals with addition of new words in the vocabulary.
Better results in lot of deep learning applications.

Working of Word2Vec
● The word2vec objective function causes the words that occur in similar
contents to have similar embeddings.
Example:
● The words kid and child will have similar word vectors due to a similar
context.
The kid said he would grow up to be superman.
The child said he would grow up to be superman.

CBOW (Continuous Bag of Words)
● Predict the target word from the context.
The quick brown fox over The lazy dog
Jumps

Skip Gram
● Predict the context words from target.
The quick brown fox over The lazy dog
Jumps

CBOW - Working
Hope can set you free.
One hot vector - One bit ‘1’ and all others ‘0’
Vector length = Number of words in language
1
0
0
0
0
0
0
1
0
0
V 5 x 1
, One hot vector
of “Hope”
V 5 x 1
, One hot vector
of “set”
W3x5
W ’
3x5
W3x5
3 nodes in
hidden layer
V5 X1
,
Predicted one hot
Vector of “Can”
0
1
0
0
0
Actual Target
Compare and
Update weights w00 w01 w02 w03 w04
w10 w11 w12 w13 w14
w20 w21 w22 w23 w24
W3x5

Skip Gram - Working
Hope can set you free.
1
0
0
0
0
0
0
1
0
0
V 5 x 1
, One hot vector
of “Can””
W ’
3x5
3 nodes in
hidden layer
V5 X1
, Predicted vector of “set”
0
1
0
0
0
Actual Target
Compare and Update weights
W3x5
W ’
3x5
V5 X1
, Predicted vector of “hope”

Word Emmendings:
w00 w01 w02 w03 w04
w10 w11 w12 w13 w14
w20 w21 w22 w23 w24
w00 w01 w02 w03 w04
w10 w11 w12 w13 w14
w20 w21 w22 w23 w24
x
1
0
0
0
0
=
w00
w10
w20
1
0
0
0
0
0
0
1
0
0
0
1
0
0
0
0
0
0
0
1
One Hot vector of words V 5 x 1
Hope can set you free
0
0
0
0
1
Weights after training
W 3 x 5
,
Word vector for Hope = W 3 x 5
X V 5 x 1
V 3 x 1
Word Vector for Hope
Reason behind predicting words instead of generating embedding.
The embedding is extracted from the weight matrix.

How to deal with variable length reviews?
● Vector Averaging: It is simply averaging the word vectors in the
given review
● Clustering: This is a of exploiting the similarity of words within a
cluster
○ K-means Clustering
○ Bag of centroids
Thank you

Natural Language Processing word to Vec.pdf

More Related Content

Similar to Natural Language Processing word to Vec.pdf (20)

Recently uploaded (20)

Natural Language Processing word to Vec.pdf