SlideShare a Scribd company logo
4
Most read
5
Most read
9
Most read
Efficient Estimation of
Word Representations in
Vector Space
• Tomas Mikolov
• Kai Chen
• Greg Corrado
• Jeffrey Dean
“
(...) the meaning of a word is its
use in the language.
—Wittgenstein, Ludwig,
Philosophical Investigations – 1953
2
Vector Space Model
◎ Traditional Vector Space
Model (Information
Retrieval):
○ documents and queries
represented in a vector
space
○ where the dimensions
are the words
3
Co-occurrence matrix
◎ Let’s see window based co-occurrence matrix
◎ Example Corpus :
○ I like deep learning.
○ I like NLP.
○ I enjoy flying.
◎ Total vocabulary size(|V|) = 8
◎ Vector(“I”) = [0, 2, 1, 0, 0, 0, 0, 0]
◎ Vector(“like”) = [2, 0, 0, 1, 0, 1, 0 , 0]
4
What
◎ A two layer neural network to generate word embedding's given
a text corpus.
◎ Word Embedding's --- Mapping of words in a vector space
◎ So that similar words are mapped to nearby points
5
What
◎ For example – sentence = ” Word Embedding's are Word
converted into numbers ”
◎ A dictionary may be the list of all unique words in the sentence.
◎ So, a dictionary may look like –
[‘Word’,’Embedding's’,’are’,’Converted’,’into’,’numbers’]
◎ The vector representation of “numbers” according to the above
dictionary is [0,0,0,0,0,1] and of converted is[0,0,0,1,0,0].
6
Why
◎ Preserves relationship between words
◎ Deals with addition of new words in the vocabulary
◎ Better results in lots of deep learning application
7
Goal
8
Target
word
Word2Vec
Context
words
Context
words
Word2Vec
Word
CBOW
◎ Predict the target word from the context
◎ order of words in the history does
not influence the projection
◎ faster & more appropriate for
larger corpora
9
Skip Gram
◎ Predict the context words from target
◎ maximize classification of a word based
on another word in the same sentence
◎ better word vectors for frequent words,
but slower to train
10
A sliding window example
11
One hot encoding
12
Skip-gram network architecture
13
Advantages
◎ It scales
○ Train on billion word corpora
○ In limited time
○ Possibility of parallel training
◎ Pre-trained word embedding's trained by one can be used by others
○ For entirely different tasks
◎ Incremental training
○ Train on one piece of data, save results, continue training later on
◎ There is a Python module for it:
○ Gensim word2vec
14
Disadvantages
◎ Inability to handle unknown or OOV words. ...
◎ No shared representations at sub-word levels. ...
◎ Scaling to new languages requires new embedding matrices. ...
◎ Cannot be used to initialize state-of-the-art architectures.
15

More Related Content

PPTX
Jpeg compression
PPT
Dynamic programming
PPTX
Introduction to dynamic programming
PPTX
Color model in computer graphics
PPTX
Presentation on Numerical Method (Trapezoidal Method)
PDF
Introduction to batch normalization
PDF
Computer graphics lab report with code in cpp
PPTX
Nelder Mead Search Algorithm
Jpeg compression
Dynamic programming
Introduction to dynamic programming
Color model in computer graphics
Presentation on Numerical Method (Trapezoidal Method)
Introduction to batch normalization
Computer graphics lab report with code in cpp
Nelder Mead Search Algorithm

What's hot (20)

PPTX
Mid-Point Cirle Drawing Algorithm
PPTX
Asymptotic Notations.pptx
PPTX
Developing R Graphical User Interfaces
PDF
Introduction to Generative Adversarial Networks
PPTX
Recurrence relation of Bessel's and Legendre's function
PPT
Variational Inference
PPTX
R Graphical User Interface Comparison.pptx
PDF
PDF
6. R data structures
PPTX
Circle generation algorithm
PPTX
Longest Common Subsequence
PPTX
Segments in Graphics
PPTX
Modular arithmetic
PDF
Python basic
PPTX
Mid point circle algorithm
PPTX
Calculus and Numerical Method =_=
PPT
hidden surface removal in computer graphics
PPT
Unit-I Objects,Attributes,Similarity&Dissimilarity.ppt
PDF
生成對抗模式 GAN 的介紹
PPTX
Mid-Point Cirle Drawing Algorithm
Asymptotic Notations.pptx
Developing R Graphical User Interfaces
Introduction to Generative Adversarial Networks
Recurrence relation of Bessel's and Legendre's function
Variational Inference
R Graphical User Interface Comparison.pptx
6. R data structures
Circle generation algorithm
Longest Common Subsequence
Segments in Graphics
Modular arithmetic
Python basic
Mid point circle algorithm
Calculus and Numerical Method =_=
hidden surface removal in computer graphics
Unit-I Objects,Attributes,Similarity&Dissimilarity.ppt
生成對抗模式 GAN 的介紹
Ad

Similar to Efficient estimation of word representations in vector space (2013) (20)

PPTX
Tomáš Mikolov - Distributed Representations for NLP
PPTX
What is word2vec?
PDF
AINL 2016: Nikolenko
PPTX
Introduction to Neural Information Retrieval and Large Language Models
PDF
Distilling Linguistic Context for Language Model Compression
PDF
Distilling Linguistic Context for Language Model Compression
PPT
Word2vector
PPTX
PPTX
presentation2-180202073525.pptx
PDF
Effect of word embedding vector dimensionality on sentiment analysis through ...
PDF
Language Modelling in Natural Language Processing-Part II.pdf
PPTX
Deep Learning勉強会@小町研 "Learning Character-level Representations for Part-of-Sp...
PPT
Word 2 vector
PPTX
wordembedding.pptx
PPTX
Word embedding
PPTX
Word_Embeddings.pptx
PPTX
Fasttext 20170720 yjy
PPTX
Neural Models for Information Retrieval
PDF
Domain Driven Design and Model Driven Software Development
PDF
Challenges in transfer learning in nlp
Tomáš Mikolov - Distributed Representations for NLP
What is word2vec?
AINL 2016: Nikolenko
Introduction to Neural Information Retrieval and Large Language Models
Distilling Linguistic Context for Language Model Compression
Distilling Linguistic Context for Language Model Compression
Word2vector
presentation2-180202073525.pptx
Effect of word embedding vector dimensionality on sentiment analysis through ...
Language Modelling in Natural Language Processing-Part II.pdf
Deep Learning勉強会@小町研 "Learning Character-level Representations for Part-of-Sp...
Word 2 vector
wordembedding.pptx
Word embedding
Word_Embeddings.pptx
Fasttext 20170720 yjy
Neural Models for Information Retrieval
Domain Driven Design and Model Driven Software Development
Challenges in transfer learning in nlp
Ad

More from Minhazul Arefin (9)

PPTX
LaMMA-P: Generalizable Multi-Agent Long-Horizon Task Allocation and Planning ...
PPTX
pic2code: Generating HTML Code from Handwritten Picture.pptx
PPTX
Controlling Home Appliances adopting Chatbot using Machine Learning Approach
PPTX
Object Detection on Dental X-ray Images using R-CNN
PPTX
Natural Language Query to SQL conversion using Machine Learning Approach
PPTX
Semantic scaffolds for pseudocode to-code generation (2020)
PPTX
Recurrent neural networks (rnn) and long short term memory networks (lstm)
PPTX
SPoC: search-based pseudocode to code
PPTX
The rise of “Big Data” on cloud computing
LaMMA-P: Generalizable Multi-Agent Long-Horizon Task Allocation and Planning ...
pic2code: Generating HTML Code from Handwritten Picture.pptx
Controlling Home Appliances adopting Chatbot using Machine Learning Approach
Object Detection on Dental X-ray Images using R-CNN
Natural Language Query to SQL conversion using Machine Learning Approach
Semantic scaffolds for pseudocode to-code generation (2020)
Recurrent neural networks (rnn) and long short term memory networks (lstm)
SPoC: search-based pseudocode to code
The rise of “Big Data” on cloud computing

Recently uploaded (20)

PDF
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
PDF
R24 SURVEYING LAB MANUAL for civil enggi
DOCX
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
PPTX
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
PDF
Digital Logic Computer Design lecture notes
PPTX
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
PPTX
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
PPTX
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
PPTX
UNIT-1 - COAL BASED THERMAL POWER PLANTS
PDF
Embodied AI: Ushering in the Next Era of Intelligent Systems
PPTX
Geodesy 1.pptx...............................................
PDF
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
PPT
CRASH COURSE IN ALTERNATIVE PLUMBING CLASS
PPTX
OOP with Java - Java Introduction (Basics)
PPTX
web development for engineering and engineering
PDF
TFEC-4-2020-Design-Guide-for-Timber-Roof-Trusses.pdf
PDF
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
PPTX
CYBER-CRIMES AND SECURITY A guide to understanding
PPTX
additive manufacturing of ss316l using mig welding
PPTX
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
R24 SURVEYING LAB MANUAL for civil enggi
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
Digital Logic Computer Design lecture notes
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
UNIT-1 - COAL BASED THERMAL POWER PLANTS
Embodied AI: Ushering in the Next Era of Intelligent Systems
Geodesy 1.pptx...............................................
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
CRASH COURSE IN ALTERNATIVE PLUMBING CLASS
OOP with Java - Java Introduction (Basics)
web development for engineering and engineering
TFEC-4-2020-Design-Guide-for-Timber-Roof-Trusses.pdf
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
CYBER-CRIMES AND SECURITY A guide to understanding
additive manufacturing of ss316l using mig welding
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...

Efficient estimation of word representations in vector space (2013)

  • 1. Efficient Estimation of Word Representations in Vector Space • Tomas Mikolov • Kai Chen • Greg Corrado • Jeffrey Dean
  • 2. “ (...) the meaning of a word is its use in the language. —Wittgenstein, Ludwig, Philosophical Investigations – 1953 2
  • 3. Vector Space Model ◎ Traditional Vector Space Model (Information Retrieval): ○ documents and queries represented in a vector space ○ where the dimensions are the words 3
  • 4. Co-occurrence matrix ◎ Let’s see window based co-occurrence matrix ◎ Example Corpus : ○ I like deep learning. ○ I like NLP. ○ I enjoy flying. ◎ Total vocabulary size(|V|) = 8 ◎ Vector(“I”) = [0, 2, 1, 0, 0, 0, 0, 0] ◎ Vector(“like”) = [2, 0, 0, 1, 0, 1, 0 , 0] 4
  • 5. What ◎ A two layer neural network to generate word embedding's given a text corpus. ◎ Word Embedding's --- Mapping of words in a vector space ◎ So that similar words are mapped to nearby points 5
  • 6. What ◎ For example – sentence = ” Word Embedding's are Word converted into numbers ” ◎ A dictionary may be the list of all unique words in the sentence. ◎ So, a dictionary may look like – [‘Word’,’Embedding's’,’are’,’Converted’,’into’,’numbers’] ◎ The vector representation of “numbers” according to the above dictionary is [0,0,0,0,0,1] and of converted is[0,0,0,1,0,0]. 6
  • 7. Why ◎ Preserves relationship between words ◎ Deals with addition of new words in the vocabulary ◎ Better results in lots of deep learning application 7
  • 9. CBOW ◎ Predict the target word from the context ◎ order of words in the history does not influence the projection ◎ faster & more appropriate for larger corpora 9
  • 10. Skip Gram ◎ Predict the context words from target ◎ maximize classification of a word based on another word in the same sentence ◎ better word vectors for frequent words, but slower to train 10
  • 11. A sliding window example 11
  • 14. Advantages ◎ It scales ○ Train on billion word corpora ○ In limited time ○ Possibility of parallel training ◎ Pre-trained word embedding's trained by one can be used by others ○ For entirely different tasks ◎ Incremental training ○ Train on one piece of data, save results, continue training later on ◎ There is a Python module for it: ○ Gensim word2vec 14
  • 15. Disadvantages ◎ Inability to handle unknown or OOV words. ... ◎ No shared representations at sub-word levels. ... ◎ Scaling to new languages requires new embedding matrices. ... ◎ Cannot be used to initialize state-of-the-art architectures. 15