Transfer learning in nlp

Transfer Learning in NLP
Navneet Kumar Chaudhary
Data Scientist
Aasaanjobs.com

Recent State of The Arts Models
SOTA NLP Models
Image Sourced from https://jalammar.github.io/illustrated-bert/

Transfer Learning in CV and how we use embeddings

What is NLTK
❖ NLTK or The Natural Language ToolKit is a suite of
libraries and programs for a variety of academic Text
processing tasks:
❖ It has in built functionalities for Removing Stop words,
Tokenization, Stemming, Lemmatizing

Stemming vs Lemmatization
Lemmatisation is closely related to stemming. The difference is that a stemmer operates on
a single word without knowledge of the context, and therefore cannot discriminate between
words which have different meanings depending on part of speech. However, stemmers are
typically easier to implement and run faster, and the reduced accuracy may not matter for
some applications.
For instance:
1. The word "better" has "good" as its lemma. This link is missed by stemming, as it requires a
dictionary look-up. 
2. The word "walk" is the base form for word "walking", and hence this is matched in both
stemming and lemmatisation. 
3. The word "meeting" can be either the base form of a noun or a form of a verb ("to meet")
depending on the context, e.g., "in our last meeting" or "We are meeting again tomorrow".
Unlike stemming, lemmatisation can in principle select the appropriate lemma depending
on the context.

Word Embeddings Recap
❖ For words to be processed by machine learning models,
they need some form of numeric representation that
models can use in their calculation.
❖ Word2Vec showed that we can use a vector (a list of
numbers) to properly represent words in a way that
captures semantic or meaning-related relationships.
❖ Queen = King - Man + Woman
❖ Relationship between Country and their respective
Capitals

Limitations/Isuues in Word Embeddings
❖ Out of Vocabulary/Unknown words as we need to ﬁx
the vocabulary size(when a word is not known vector
cannot be constructed deterministically)
❖ Cannot handle the shared representation of the same
word. Meaning of a word depends on the context it is
used.
❖ Our model won’t be robust for new Languages, and
thus we cannot use for incremental learning.

ELMO Context Matters
Context Aware Embeddings by ELMO

ULMFiT Approach to pre-training

The idea for converting this to Transfer Learning

Step:1 Finding Context aware Embeddings

Step 2: Finding Context aware Embeddings

Why is ULMFiT Universal?
❖ Dataset independent. You start with wiki text LM and
ﬁne-tune for your dataset.
❖ Works across all documents and datasets of varying
lengths.
❖ Architecture is consistent, same as we use ResNets for
many CV tasks.
❖ Can work on very small datasets as well, as we already
have a good LM to start with.

Classifier fine-tuning for Task Specific Weights
❖ Two additional linear blocks have been added. Each block
uses batch normalization and a lower value of dropout
❖ ReLU is used as activation function in between the linear
blocks.
❖ Softmax is used to provide the probability distribution
over the target classes.
❖ Classiﬁers only take the embeddings provided by the LM
and are always trained from scratch.

Results from ULMFiT
Validation Error Rate ULMFiT vs Scratch

Acknowledgements
❖ "Images speak louder than words” and they were
sourced from other blogposts and Google results.
❖ A lot of them are taken from this great blogpost by Jay
Alammar https://jalammar.github.io/illustrated-bert/
❖ The results image is taken from the ULMFiT paper.

–Navneet Kumar Chaudhary
Thanks a Lot!!!

Transfer learning in nlp

More Related Content

What's hot (14)

Similar to Transfer learning in nlp (20)

Recently uploaded (20)

Transfer learning in nlp