Neural word embedding and language modelling

Neural Word
Embedding And
Language Modelling
A Survey Paper Presentation by
Riddhi Jain
Graduate Student Computer Engineering, SJSU
(014600716)

Paper:
A survey on Neural Word Embeddings
Authors:
ERHAN SEZERER
SELMA TEKIR
Article Link: https://arxiv.org/pdf/2110.01804.pdf
Published Date: 2021
About

What is VSM?
The vector space model is an algebraic model that represents objects (like text)
as vectors. This makes it easy to determine the similarity between words or the
relevance between a search query and document. Cosine similarity is often used
to determine similarity between vectors.

Neural Word Embeddings
Neural network architecture is constructed to predict the next word given the set
of neighboring words in the sequence in neural language modeling.

Word Embeddings with Improved Language
Models
● Early Word Embeddings
● Embeddings Target Specific Semantic Relations
● Sense Embeddings
● Morpheme Embeddings

Early Word Embeddings
Word2vec is the first neural word embedding model that efficiently computes
representations to leverage the context of target words
Two word2vec variants:
● CBOW (Continuous Bag of words)
○ Example - “nature is pleased with simplicity”
● Skip-gram

Embeddings Target Specific Semantic
Relations
Example - “She took a sip of hot coffee” and “He is taking a sip of cold water”
Types of algorithms
- SGNS
- GloVe
- ATTRACT and REPEL

Sense Embeddings
● Early word embeddings unite all the senses of a word into one
representation.
● In reality, a word gets meaning in its use and can mean different things in
varying contexts.
● When the issue becomes labeling those sense groups, the task becomes a
supervised one

Morpheme Embeddings
Proposes several ways to target morphological information in order to obtain
sub-word information for solving the rare/unknown word problem of earlier word
embedding methods and also to have better representations of words for
morphologically rich languages.
Two ways
- Training Morphological Embeddings from Scratch
- Adjusting the Existing Embeddings

Datasets
● Similarity Tasks
● Analogy Task
Google Analogy Task with 8869 semantic and 10675 syntactic
● Synonym Selection Tasks
● Downstream Tasks
GLUE benchmark dataset

Human-level language understanding is
one of the oldest challenges in computer
science. Pre-trained language models’
knowledge has been transferred to fine-
tuned task-specific models. Multi-modal
language models are based on human
language acquisition, where learning starts
with concrete concepts through images
early on.
Conclusion

Neural word embedding and language modelling

More Related Content

What's hot (20)

Similar to Neural word embedding and language modelling (20)

Recently uploaded (20)

Neural word embedding and language modelling

Editor's Notes