SlideShare a Scribd company logo
State-of-the-Art
Text Classification using
Deep Contextual Word Representations
Under the guidance of
Dr. Wencen Wu
By
Ausaf Ahmed (013744315)
Overview
• Natural language refers to the way we, humans, communicate with each
other.
• Numerous applications of Natural language Processing in real life.
Automatic summarization, translation,
named entity recognition, relationship extraction,
sentiment analysis, speech recognition,
and topic segmentation.
• Deep learning can make sense of data using multiple layers of abstraction.
Introduction
Neural Language Modeling: The ML Way
• Two main techniques to understand natural language :
▪ Syntactic Analysis (Syntax): Analyzing natural language conforming to
the rules of a formal grammar.
▪ Semantic Analysis: Understanding the meaning and interpretation of
words, signs, and sentence structure.
Pre-Processing Data
• It is necessary to highlight required attributes from dataset.
• Steps for cleaning the data:
▪ Tokenization
▪ Remove Punctuation
▪ Remove Stop words
▪ Stemming
▪ Lemmatizing
▪ Regex
Modeling Challenges
• We were wrestling here with the following challenges:
▪ Using as much relevant evidence as possible.
▪ Pooling evidence between words.
▪ Model Polysemy, the coexistence of many possible meanings for a word
or phrase.
Representing Words
We are wrestling here with the following challenges –
▪ Using as much relevant evidence as possible
▪ Pooling evidence between words
▪ Model Polysemy, the coexistence of many possible meanings for a word/phrase
• Words Embeddings: Represented data with a one-hot or two-hot vector, TF-
IDF scaling, Co-Occurrence matric e.g.,
– dog = (0,0,0,0,1,0,0,0,0,....)
– cat = (0,0,0,0,0,0,0,1,0,....)
– eat = (0,1,0,0,0,0,0,0,0,....)
• That’s a large vector!
• Remedies
– limit to, say, 20,000 most frequent words, rest are OTHER
– Place words in sqrt(n) classes, dimensionality reduction, and more
Representing Words
We are wrestling here with the following challenges –
▪ Using as much relevant evidence as possible
▪ Pooling evidence between words
▪ Model Polysemy, the coexistence of many possible meanings for a word/phrase
Beauty of Word Embeddings:
Capture some sort of relationship between words, be it meaning,
morphology, context, or some other kind of relationship.
Representing Words
We are wrestling here with the following challenges –
▪ Using as much relevant evidence as possible
▪ Pooling evidence between words
▪ Model Polysemy, the coexistence of many possible meanings for a
word/phrase
ELMo.
DEEP CONTEXTUALIZED
WORD REPRESENTATION
What is ELMo?
Deep contextualized word representations
Matthew E. Peters, Mark Neumann, Mohit Iyyer, Matt Gardner,
Christopher Clark, Kenton Lee, Luke Zettlemoyer.
Best Paper at NAACL 2018
ELMo (Embeddings from Language MOdels)
• Deep Contextual Word Representations that models,
▪ Complex characters of word use
▪ How these uses vary across linguistic contexts (polysemy)
I must make a deposit at the bank.
Let’s have lunch beside a river bank.
• The word vectors are learned functions of the internal states of a deep bi-
directional language model (biLM).
Salient Features
• ELMo representations are:
• Contextual
• Deep
• Character-based
How ELMo works?
2-layer bidirectional LSTM backbone
• The red box represents
the forward recurrent
unit.
• The blue represents the
backward recurrent
unit.
Add Residual Connection
• A residual connection is
added between the LSTM
layers.
• The input to the first
layer is added to its
output before being
passed on as the input to
the second layer.
Transformation
Transformations applied for each token before being
provided to input of LSTM layer.
• Convert each token to an appropriate
representation using character
embeddings.
• Max pooling is a sample-based
discretization process.
• Highway networks use learned gating
mechanisms to regulate information
flow, inspired by Long Short-Term
Memory (LSTM) recurrent neural
networks.
Combining Representations
Combining the bidirectional
hidden representations and
word representation for
"happy" to get an ELMo-
specific representation.
Hands-On Implementation
NLP Task Specific Model
• Built models using ELMo on the two tasks below:
• Sentiment Analysis
• Email Spam Classification
• Used TensorFlow v1.8 and Keras 2.0 API.
• CUDA, cuDnn to provide GPU-acceleration over Nvidia GeForce GTX 1070.
• Custom implementation of confusion matrix for every epoch.
• Calculated precision, recall and F1-score apart from accuracy to streamline
model for imbalanced data as well.
0
0.2
0.4
0.6
0.8
1
1.2
Sentiment Analysis Email Spam Classification
F1-Score Accuracy
Result and Comparison
Task Previous
SOTA
ELMo
Result
Sentiment Analysis
(F1-Score)
0.53 0.547
Email Classification
(Accuracy)
0.954 0.99
Email Classification Metrics: IMPRESSIVE!
Conclusion
Final Thoughts
• The experimental results really speak to the power of the ELMo concept.
• ELMo representations were integrated to existing NLP tasks: Sentiment Analysis
and Email Spam Classification.
• In both cases, the ELMo models achieved state-of-the-art performance!
• ELMo follows an interesting vein of deep learning research related to transfer
learning.
• ELMo is such an important paper because it has taken the first steps in
demonstrating that language model transfer learning may be the
ImageNet equivalent for natural language processing.
Thank you

More Related Content

PPTX
Deep Learning for Natural Language Processing
PDF
Learning to understand phrases by embedding the dictionary
PDF
Anthiil Inside workshop on NLP
PDF
Engineering Intelligent NLP Applications Using Deep Learning – Part 1
PDF
UCU NLP Summer Workshops 2017 - Part 2
PDF
Natural Language Processing from Object Automation
PDF
Engineering Intelligent NLP Applications Using Deep Learning – Part 2
PDF
Introduction to Natural Language Processing (NLP)
Deep Learning for Natural Language Processing
Learning to understand phrases by embedding the dictionary
Anthiil Inside workshop on NLP
Engineering Intelligent NLP Applications Using Deep Learning – Part 1
UCU NLP Summer Workshops 2017 - Part 2
Natural Language Processing from Object Automation
Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Introduction to Natural Language Processing (NLP)

What's hot (20)

PDF
Networks and Natural Language Processing
PDF
Deep Learning for Natural Language Processing: Word Embeddings
PPTX
Talk from NVidia Developer Connect
PDF
Natural Language Processing (NLP)
PPTX
Introduction to natural language processing, history and origin
PPTX
Intent Classifier with Facebook fastText
PDF
Natural Language Processing
PPTX
Recent Advances in NLP
PDF
Deep learning for nlp
PPTX
An Improved Approach to Word Sense Disambiguation
PPTX
Lecture 1: Semantic Analysis in Language Technology
PDF
NLP Bootcamp 2018 : Representation Learning of text for NLP
PDF
Representation Learning of Vectors of Words and Phrases
PPTX
natural language processing help at myassignmenthelp.net
PPTX
NLP Bootcamp
PDF
Word2Vec: Learning of word representations in a vector space - Di Mitri & Her...
PPTX
Artificial Intelligence Notes Unit 4
PDF
Nlp research presentation
PDF
Natural language processing (nlp)
PDF
(Deep) Neural Networks在 NLP 和 Text Mining 总结
Networks and Natural Language Processing
Deep Learning for Natural Language Processing: Word Embeddings
Talk from NVidia Developer Connect
Natural Language Processing (NLP)
Introduction to natural language processing, history and origin
Intent Classifier with Facebook fastText
Natural Language Processing
Recent Advances in NLP
Deep learning for nlp
An Improved Approach to Word Sense Disambiguation
Lecture 1: Semantic Analysis in Language Technology
NLP Bootcamp 2018 : Representation Learning of text for NLP
Representation Learning of Vectors of Words and Phrases
natural language processing help at myassignmenthelp.net
NLP Bootcamp
Word2Vec: Learning of word representations in a vector space - Di Mitri & Her...
Artificial Intelligence Notes Unit 4
Nlp research presentation
Natural language processing (nlp)
(Deep) Neural Networks在 NLP 和 Text Mining 总结
Ad

Similar to State-of-the-Art Text Classification using Deep Contextual Word Representations (20)

PDF
A Review of Deep Contextualized Word Representations (Peters+, 2018)
PPTX
[論文紹介] Deep contextualized word representations
PDF
Large Language Models - From RNN to BERT
PDF
A pragmatic introduction to natural language processing models (October 2019)
PDF
Challenges in transfer learning in nlp
PDF
Deep-learning based Language Understanding and Emotion extractions
PDF
Let Android dream electric sheep: Making emotion model for chat-bot with Pyth...
PPTX
Text Classification
PDF
How can text-mining leverage developments in Deep Learning? Presentation at ...
PDF
Contemporary Models of Natural Language Processing
PDF
Deep Contextualized Word Representations - SNLP2018
PPTX
Natural language processing and transformer models
PDF
Deep learning for natural language embeddings
PPTX
Balochi Language Text Classification Using Deep Learning 1.pptx
PPTX
Pycon ke word vectors
PDF
Representation Learning of Text for NLP
PDF
Deep contextualized word representations
PDF
Ekaterina vylomova-what-do-neural models-know-about-language-p1
PPTX
Understanding Generative AI Models and Their Real-World Applications.pptx
PDF
Deep learning for NLP
A Review of Deep Contextualized Word Representations (Peters+, 2018)
[論文紹介] Deep contextualized word representations
Large Language Models - From RNN to BERT
A pragmatic introduction to natural language processing models (October 2019)
Challenges in transfer learning in nlp
Deep-learning based Language Understanding and Emotion extractions
Let Android dream electric sheep: Making emotion model for chat-bot with Pyth...
Text Classification
How can text-mining leverage developments in Deep Learning? Presentation at ...
Contemporary Models of Natural Language Processing
Deep Contextualized Word Representations - SNLP2018
Natural language processing and transformer models
Deep learning for natural language embeddings
Balochi Language Text Classification Using Deep Learning 1.pptx
Pycon ke word vectors
Representation Learning of Text for NLP
Deep contextualized word representations
Ekaterina vylomova-what-do-neural models-know-about-language-p1
Understanding Generative AI Models and Their Real-World Applications.pptx
Deep learning for NLP
Ad

Recently uploaded (20)

PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PPTX
IB Computer Science - Internal Assessment.pptx
PPTX
Computer network topology notes for revision
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
PDF
Clinical guidelines as a resource for EBP(1).pdf
PDF
annual-report-2024-2025 original latest.
PPTX
climate analysis of Dhaka ,Banglades.pptx
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PPTX
Qualitative Qantitative and Mixed Methods.pptx
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PPTX
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PPTX
1_Introduction to advance data techniques.pptx
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PDF
Fluorescence-microscope_Botany_detailed content
PPTX
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
PDF
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
PPTX
Introduction to Knowledge Engineering Part 1
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
IB Computer Science - Internal Assessment.pptx
Computer network topology notes for revision
IBA_Chapter_11_Slides_Final_Accessible.pptx
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
oil_refinery_comprehensive_20250804084928 (1).pptx
Clinical guidelines as a resource for EBP(1).pdf
annual-report-2024-2025 original latest.
climate analysis of Dhaka ,Banglades.pptx
Galatica Smart Energy Infrastructure Startup Pitch Deck
Qualitative Qantitative and Mixed Methods.pptx
STUDY DESIGN details- Lt Col Maksud (21).pptx
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
1_Introduction to advance data techniques.pptx
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
Fluorescence-microscope_Botany_detailed content
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
Introduction to Knowledge Engineering Part 1

State-of-the-Art Text Classification using Deep Contextual Word Representations

  • 1. State-of-the-Art Text Classification using Deep Contextual Word Representations Under the guidance of Dr. Wencen Wu By Ausaf Ahmed (013744315)
  • 2. Overview • Natural language refers to the way we, humans, communicate with each other. • Numerous applications of Natural language Processing in real life. Automatic summarization, translation, named entity recognition, relationship extraction, sentiment analysis, speech recognition, and topic segmentation. • Deep learning can make sense of data using multiple layers of abstraction.
  • 4. Neural Language Modeling: The ML Way • Two main techniques to understand natural language : ▪ Syntactic Analysis (Syntax): Analyzing natural language conforming to the rules of a formal grammar. ▪ Semantic Analysis: Understanding the meaning and interpretation of words, signs, and sentence structure.
  • 5. Pre-Processing Data • It is necessary to highlight required attributes from dataset. • Steps for cleaning the data: ▪ Tokenization ▪ Remove Punctuation ▪ Remove Stop words ▪ Stemming ▪ Lemmatizing ▪ Regex
  • 6. Modeling Challenges • We were wrestling here with the following challenges: ▪ Using as much relevant evidence as possible. ▪ Pooling evidence between words. ▪ Model Polysemy, the coexistence of many possible meanings for a word or phrase.
  • 7. Representing Words We are wrestling here with the following challenges – ▪ Using as much relevant evidence as possible ▪ Pooling evidence between words ▪ Model Polysemy, the coexistence of many possible meanings for a word/phrase • Words Embeddings: Represented data with a one-hot or two-hot vector, TF- IDF scaling, Co-Occurrence matric e.g., – dog = (0,0,0,0,1,0,0,0,0,....) – cat = (0,0,0,0,0,0,0,1,0,....) – eat = (0,1,0,0,0,0,0,0,0,....) • That’s a large vector! • Remedies – limit to, say, 20,000 most frequent words, rest are OTHER – Place words in sqrt(n) classes, dimensionality reduction, and more
  • 8. Representing Words We are wrestling here with the following challenges – ▪ Using as much relevant evidence as possible ▪ Pooling evidence between words ▪ Model Polysemy, the coexistence of many possible meanings for a word/phrase Beauty of Word Embeddings: Capture some sort of relationship between words, be it meaning, morphology, context, or some other kind of relationship.
  • 9. Representing Words We are wrestling here with the following challenges – ▪ Using as much relevant evidence as possible ▪ Pooling evidence between words ▪ Model Polysemy, the coexistence of many possible meanings for a word/phrase ELMo. DEEP CONTEXTUALIZED WORD REPRESENTATION
  • 10. What is ELMo? Deep contextualized word representations Matthew E. Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, Luke Zettlemoyer. Best Paper at NAACL 2018
  • 11. ELMo (Embeddings from Language MOdels) • Deep Contextual Word Representations that models, ▪ Complex characters of word use ▪ How these uses vary across linguistic contexts (polysemy) I must make a deposit at the bank. Let’s have lunch beside a river bank. • The word vectors are learned functions of the internal states of a deep bi- directional language model (biLM).
  • 12. Salient Features • ELMo representations are: • Contextual • Deep • Character-based
  • 14. 2-layer bidirectional LSTM backbone • The red box represents the forward recurrent unit. • The blue represents the backward recurrent unit.
  • 15. Add Residual Connection • A residual connection is added between the LSTM layers. • The input to the first layer is added to its output before being passed on as the input to the second layer.
  • 16. Transformation Transformations applied for each token before being provided to input of LSTM layer. • Convert each token to an appropriate representation using character embeddings. • Max pooling is a sample-based discretization process. • Highway networks use learned gating mechanisms to regulate information flow, inspired by Long Short-Term Memory (LSTM) recurrent neural networks.
  • 17. Combining Representations Combining the bidirectional hidden representations and word representation for "happy" to get an ELMo- specific representation.
  • 19. NLP Task Specific Model • Built models using ELMo on the two tasks below: • Sentiment Analysis • Email Spam Classification • Used TensorFlow v1.8 and Keras 2.0 API. • CUDA, cuDnn to provide GPU-acceleration over Nvidia GeForce GTX 1070. • Custom implementation of confusion matrix for every epoch. • Calculated precision, recall and F1-score apart from accuracy to streamline model for imbalanced data as well.
  • 20. 0 0.2 0.4 0.6 0.8 1 1.2 Sentiment Analysis Email Spam Classification F1-Score Accuracy Result and Comparison Task Previous SOTA ELMo Result Sentiment Analysis (F1-Score) 0.53 0.547 Email Classification (Accuracy) 0.954 0.99
  • 23. Final Thoughts • The experimental results really speak to the power of the ELMo concept. • ELMo representations were integrated to existing NLP tasks: Sentiment Analysis and Email Spam Classification. • In both cases, the ELMo models achieved state-of-the-art performance! • ELMo follows an interesting vein of deep learning research related to transfer learning. • ELMo is such an important paper because it has taken the first steps in demonstrating that language model transfer learning may be the ImageNet equivalent for natural language processing.