4. CloudKarya
❖ Word Vector
■ one-hot vector
❖ Countvectorizer
■ Bag of Words
■ Term frequency-Inverse Document Frequency(TF-IDF)
❖ SVD(Singular Value Decomposition)
■ Word-Document Matrix
■ window based Co-occurrence Matrix
■ Applying SVD to the cooccurrence matrix
❖ Iterative Method(Word2vec)
How to represent words?
5. CloudKarya
Let us consider the two sentences -
1. “You can scale your business.”
2. “You can grow your business.”
Word Vectors
Grow: [0,0,0,1,0,0],
Your:[0,0,0,0,1,0],
Business: [0,0,0,0,0,1]
You Can Scale Grow Your Business
} none of these words
has any similarity with
each other
Vocabulary:
You: [1,0,0,0,0,0],
Can: [0,1,0,0,0,0],
Scale: [0,0,1,0,0,0],
6. CloudKarya
Let us consider the two sentences –
1. ”The cat jumped over the moon."
2. "The cow jumped over the moon."
Countvectorizer
Vocabulary:
The Cat jumped over moon cow
TF-IDF Vectorization:
Sentence 1: [0.3, 0.6, 0.3, 0.3, 0.3, 0]
Sentence 2 :[0.3, 0, 0.3, 0.3, 0.3, 0.6]
BOW Representation:
Sentence 1: [2, 1, 1, 1, 1, 0]
Sentence 2 :[2, 0, 1, 1, 1, 1].
}
none of
these words
has any
similarity
with each
other
This approach disregards the importance of the
specific words and treats all words equally.
8. What is Word2Vec ?
● A two layer neural network to generate word embeddings given a text corpus.
● Word Embeddings — Mapping of words in a vector space.
0.52
0.76
1.21
0.22
-1.36
0.49
-3.69
-0.07
0.73
0.89
-1.67
1.32
0.36
-1.49
2.71
0.05
Man Women
King - Man + Women = Queen
9. Why Word2Vec ?
Why
Word2Vec
?
Preserves relationship between words.
Deals with addition of new words in the vocabulary.
Better results in lot of deep learning applications.
10. Working of Word2Vec
● The word2vec objective function causes the words that occur in similar
contents to have similar embeddings.
Example:
● The words kid and child will have similar word vectors due to a similar
context.
The kid said he would grow up to be superman.
The child said he would grow up to be superman.
12. CBOW (Continuous Bag of Words)
● Predict the target word from the context.
The quick brown fox over The lazy dog
Jumps
13. Skip Gram
● Predict the context words from target.
The quick brown fox over The lazy dog
Jumps
14. CBOW - Working
Hope can set you free.
One hot vector - One bit ‘1’ and all others ‘0’
Vector length = Number of words in language
1
0
0
0
0
0
0
1
0
0
V 5 x 1
, One hot vector
of “Hope”
V 5 x 1
, One hot vector
of “set”
W3x5
W ’
3x5
W3x5
3 nodes in
hidden layer
V5 X1
,
Predicted one hot
Vector of “Can”
0
1
0
0
0
Actual Target
Compare and
Update weights w00 w01 w02 w03 w04
w10 w11 w12 w13 w14
w20 w21 w22 w23 w24
W3x5
15. Skip Gram - Working
Hope can set you free.
1
0
0
0
0
0
0
1
0
0
V 5 x 1
, One hot vector
of “Can””
W ’
3x5
3 nodes in
hidden layer
V5 X1
, Predicted vector of “set”
0
1
0
0
0
Actual Target
Compare and Update weights
W3x5
W ’
3x5
V5 X1
, Predicted vector of “hope”
16. Word Emmendings:
w00 w01 w02 w03 w04
w10 w11 w12 w13 w14
w20 w21 w22 w23 w24
w00 w01 w02 w03 w04
w10 w11 w12 w13 w14
w20 w21 w22 w23 w24
x
1
0
0
0
0
=
w00
w10
w20
1
0
0
0
0
0
0
1
0
0
0
1
0
0
0
0
0
0
0
1
One Hot vector of words V 5 x 1
Hope can set you free
0
0
0
0
1
Weights after training
W 3 x 5
,
Word vector for Hope = W 3 x 5
X V 5 x 1
V 3 x 1
Word Vector for Hope
Reason behind predicting words instead of generating embedding.
The embedding is extracted from the weight matrix.
17. How to deal with variable length reviews?
● Vector Averaging: It is simply averaging the word vectors in the
given review
● Clustering: This is a of exploiting the similarity of words within a
cluster
○ K-means Clustering
○ Bag of centroids
Thank you