SlideShare a Scribd company logo
Transfer Learning in NLP
Navneet Kumar Chaudhary
Data Scientist
Aasaanjobs.com
Recent State of The Arts Models
SOTA NLP Models
Image Sourced from https://jalammar.github.io/illustrated-bert/
Transfer Learning in CV and how we use embeddings
What is NLTK
❖ NLTK or The Natural Language ToolKit is a suite of
libraries and programs for a variety of academic Text
processing tasks:
❖ It has in built functionalities for Removing Stop words,
Tokenization, Stemming, Lemmatizing
Stemming vs Lemmatization
Lemmatisation is closely related to stemming. The difference is that a stemmer operates on
a single word without knowledge of the context, and therefore cannot discriminate between
words which have different meanings depending on part of speech. However, stemmers are
typically easier to implement and run faster, and the reduced accuracy may not matter for
some applications.
For instance:
1. The word "better" has "good" as its lemma. This link is missed by stemming, as it requires a
dictionary look-up.

2. The word "walk" is the base form for word "walking", and hence this is matched in both
stemming and lemmatisation.

3. The word "meeting" can be either the base form of a noun or a form of a verb ("to meet")
depending on the context, e.g., "in our last meeting" or "We are meeting again tomorrow".
Unlike stemming, lemmatisation can in principle select the appropriate lemma depending
on the context.
Word Embeddings Recap
❖ For words to be processed by machine learning models,
they need some form of numeric representation that
models can use in their calculation.
❖ Word2Vec showed that we can use a vector (a list of
numbers) to properly represent words in a way that
captures semantic or meaning-related relationships.
❖ Queen = King - Man + Woman
❖ Relationship between Country and their respective
Capitals
Limitations/Isuues in Word Embeddings
❖ Out of Vocabulary/Unknown words as we need to fix
the vocabulary size(when a word is not known vector
cannot be constructed deterministically)
❖ Cannot handle the shared representation of the same
word. Meaning of a word depends on the context it is
used.
❖ Our model won’t be robust for new Languages, and
thus we cannot use for incremental learning.
ELMO Context Matters
Context Aware Embeddings by ELMO
ULMFiT Approach to pre-training
The idea for converting this to Transfer Learning
Transfer learning in nlp
BERT Pre-training Process
Step:1 Finding Context aware Embeddings
Step 2: Finding Context aware Embeddings
Why is ULMFiT Universal?
❖ Dataset independent. You start with wiki text LM and
fine-tune for your dataset.
❖ Works across all documents and datasets of varying
lengths.
❖ Architecture is consistent, same as we use ResNets for
many CV tasks.
❖ Can work on very small datasets as well, as we already
have a good LM to start with.
Classifier fine-tuning for Task Specific Weights
❖ Two additional linear blocks have been added. Each block
uses batch normalization and a lower value of dropout
❖ ReLU is used as activation function in between the linear
blocks.
❖ Softmax is used to provide the probability distribution
over the target classes.
❖ Classifiers only take the embeddings provided by the LM
and are always trained from scratch.
Results from ULMFiT
Validation Error Rate ULMFiT vs Scratch
Acknowledgements
❖ "Images speak louder than words” and they were
sourced from other blogposts and Google results.
❖ A lot of them are taken from this great blogpost by Jay
Alammar https://jalammar.github.io/illustrated-bert/
❖ The results image is taken from the ULMFiT paper.
–Navneet Kumar Chaudhary
Thanks a Lot!!!

More Related Content

PDF
Junki Matsuo - 2015 - Source Phrase Segmentation and Translation for Japanese...
PPTX
Week1 xml
PPTX
A word sense disambiguation technique for sinhala
PDF
Open vocabulary problem
DOCX
Langauage model
Junki Matsuo - 2015 - Source Phrase Segmentation and Translation for Japanese...
Week1 xml
A word sense disambiguation technique for sinhala
Open vocabulary problem
Langauage model

What's hot (14)

PPTX
E.mail Style and Structure
ODP
Word sense dissambiguation
PDF
Multilingual Term Extraction as a Service from Acrolinx, CHAT2013
PDF
Everyday Functional Programming in JavaScript
PDF
Jq3616701679
PPTX
An Improved Approach to Word Sense Disambiguation
PDF
How to translate your Single Page Application - Webcamp 2016 (en)
PPTX
XML | Computer Science
PPTX
5. phases of nlp
PDF
1st PyData Piraeus - Omilia Slides
PPTX
poster spring senior year
PPSX
Introduction to W3C I18N Best Practices
PDF
HIGH LEVEL LANGUAGES
E.mail Style and Structure
Word sense dissambiguation
Multilingual Term Extraction as a Service from Acrolinx, CHAT2013
Everyday Functional Programming in JavaScript
Jq3616701679
An Improved Approach to Word Sense Disambiguation
How to translate your Single Page Application - Webcamp 2016 (en)
XML | Computer Science
5. phases of nlp
1st PyData Piraeus - Omilia Slides
poster spring senior year
Introduction to W3C I18N Best Practices
HIGH LEVEL LANGUAGES
Ad

Similar to Transfer learning in nlp (20)

PDF
Challenges in transfer learning in nlp
PDF
A Review of Deep Contextualized Word Representations (Peters+, 2018)
PDF
An Introduction to Pre-training General Language Representations
PDF
Masterclass: Natural Language Processing in Trading with Terry Benzschawel & ...
PDF
Wanda Fiat (Ars Analitica) – NLP Beyond Chatbots. Quick Solutions to Hard Pro...
PDF
State-of-the-Art Text Classification using Deep Contextual Word Representations
PPTX
Vectorization In NLP.pptx
PDF
A pragmatic introduction to natural language processing models (October 2019)
PPTX
PDF
Better Machine Learning with Less Data - Slater Victoroff (Indico Data)
PDF
BERT Explained_ State of the art language model for NLP.pdf
PDF
ODSC East: Effective Transfer Learning for NLP
PPTX
Natural language processing and transformer models
PDF
Small Data for Big Problems: Practical Transfer Learning for NLP
PPTX
Latest trends in NLP - Exploring BERT
PPTX
Recent Advances in Natural Language Processing
PPTX
A Light Introduction to Transfer Learning for NLP
PDF
Financial Question Answering with BERT Language Models
PPTX
Pre trained language model
Challenges in transfer learning in nlp
A Review of Deep Contextualized Word Representations (Peters+, 2018)
An Introduction to Pre-training General Language Representations
Masterclass: Natural Language Processing in Trading with Terry Benzschawel & ...
Wanda Fiat (Ars Analitica) – NLP Beyond Chatbots. Quick Solutions to Hard Pro...
State-of-the-Art Text Classification using Deep Contextual Word Representations
Vectorization In NLP.pptx
A pragmatic introduction to natural language processing models (October 2019)
Better Machine Learning with Less Data - Slater Victoroff (Indico Data)
BERT Explained_ State of the art language model for NLP.pdf
ODSC East: Effective Transfer Learning for NLP
Natural language processing and transformer models
Small Data for Big Problems: Practical Transfer Learning for NLP
Latest trends in NLP - Exploring BERT
Recent Advances in Natural Language Processing
A Light Introduction to Transfer Learning for NLP
Financial Question Answering with BERT Language Models
Pre trained language model
Ad

Recently uploaded (20)

PDF
Transcultural that can help you someday.
PPTX
Pilar Kemerdekaan dan Identi Bangsa.pptx
PDF
REAL ILLUMINATI AGENT IN KAMPALA UGANDA CALL ON+256765750853/0705037305
PDF
Business Analytics and business intelligence.pdf
PPTX
IMPACT OF LANDSLIDE.....................
PDF
Capcut Pro Crack For PC Latest Version {Fully Unlocked 2025}
PPTX
A Complete Guide to Streamlining Business Processes
PDF
Data Engineering Interview Questions & Answers Cloud Data Stacks (AWS, Azure,...
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PDF
Microsoft Core Cloud Services powerpoint
PPTX
Leprosy and NLEP programme community medicine
PPT
ISS -ESG Data flows What is ESG and HowHow
PPTX
CYBER SECURITY the Next Warefare Tactics
PPTX
retention in jsjsksksksnbsndjddjdnFPD.pptx
PPTX
Database Infoormation System (DBIS).pptx
PDF
annual-report-2024-2025 original latest.
PDF
Global Data and Analytics Market Outlook Report
PPTX
SAP 2 completion done . PRESENTATION.pptx
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
Transcultural that can help you someday.
Pilar Kemerdekaan dan Identi Bangsa.pptx
REAL ILLUMINATI AGENT IN KAMPALA UGANDA CALL ON+256765750853/0705037305
Business Analytics and business intelligence.pdf
IMPACT OF LANDSLIDE.....................
Capcut Pro Crack For PC Latest Version {Fully Unlocked 2025}
A Complete Guide to Streamlining Business Processes
Data Engineering Interview Questions & Answers Cloud Data Stacks (AWS, Azure,...
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
Microsoft Core Cloud Services powerpoint
Leprosy and NLEP programme community medicine
ISS -ESG Data flows What is ESG and HowHow
CYBER SECURITY the Next Warefare Tactics
retention in jsjsksksksnbsndjddjdnFPD.pptx
Database Infoormation System (DBIS).pptx
annual-report-2024-2025 original latest.
Global Data and Analytics Market Outlook Report
SAP 2 completion done . PRESENTATION.pptx
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
Acceptance and paychological effects of mandatory extra coach I classes.pptx

Transfer learning in nlp

  • 1. Transfer Learning in NLP Navneet Kumar Chaudhary Data Scientist Aasaanjobs.com
  • 2. Recent State of The Arts Models SOTA NLP Models Image Sourced from https://jalammar.github.io/illustrated-bert/
  • 3. Transfer Learning in CV and how we use embeddings
  • 4. What is NLTK ❖ NLTK or The Natural Language ToolKit is a suite of libraries and programs for a variety of academic Text processing tasks: ❖ It has in built functionalities for Removing Stop words, Tokenization, Stemming, Lemmatizing
  • 5. Stemming vs Lemmatization Lemmatisation is closely related to stemming. The difference is that a stemmer operates on a single word without knowledge of the context, and therefore cannot discriminate between words which have different meanings depending on part of speech. However, stemmers are typically easier to implement and run faster, and the reduced accuracy may not matter for some applications. For instance: 1. The word "better" has "good" as its lemma. This link is missed by stemming, as it requires a dictionary look-up.
 2. The word "walk" is the base form for word "walking", and hence this is matched in both stemming and lemmatisation.
 3. The word "meeting" can be either the base form of a noun or a form of a verb ("to meet") depending on the context, e.g., "in our last meeting" or "We are meeting again tomorrow". Unlike stemming, lemmatisation can in principle select the appropriate lemma depending on the context.
  • 6. Word Embeddings Recap ❖ For words to be processed by machine learning models, they need some form of numeric representation that models can use in their calculation. ❖ Word2Vec showed that we can use a vector (a list of numbers) to properly represent words in a way that captures semantic or meaning-related relationships. ❖ Queen = King - Man + Woman ❖ Relationship between Country and their respective Capitals
  • 7. Limitations/Isuues in Word Embeddings ❖ Out of Vocabulary/Unknown words as we need to fix the vocabulary size(when a word is not known vector cannot be constructed deterministically) ❖ Cannot handle the shared representation of the same word. Meaning of a word depends on the context it is used. ❖ Our model won’t be robust for new Languages, and thus we cannot use for incremental learning.
  • 8. ELMO Context Matters Context Aware Embeddings by ELMO
  • 9. ULMFiT Approach to pre-training
  • 10. The idea for converting this to Transfer Learning
  • 13. Step:1 Finding Context aware Embeddings
  • 14. Step 2: Finding Context aware Embeddings
  • 15. Why is ULMFiT Universal? ❖ Dataset independent. You start with wiki text LM and fine-tune for your dataset. ❖ Works across all documents and datasets of varying lengths. ❖ Architecture is consistent, same as we use ResNets for many CV tasks. ❖ Can work on very small datasets as well, as we already have a good LM to start with.
  • 16. Classifier fine-tuning for Task Specific Weights ❖ Two additional linear blocks have been added. Each block uses batch normalization and a lower value of dropout ❖ ReLU is used as activation function in between the linear blocks. ❖ Softmax is used to provide the probability distribution over the target classes. ❖ Classifiers only take the embeddings provided by the LM and are always trained from scratch.
  • 17. Results from ULMFiT Validation Error Rate ULMFiT vs Scratch
  • 18. Acknowledgements ❖ "Images speak louder than words” and they were sourced from other blogposts and Google results. ❖ A lot of them are taken from this great blogpost by Jay Alammar https://jalammar.github.io/illustrated-bert/ ❖ The results image is taken from the ULMFiT paper.