SlideShare a Scribd company logo
State-of-the-Art Named Entity
Recognition Framework
Anirudh Ganesh
Jayavardhan P Reddy
Speech and Language Processing
(CSE 5525) Spring 2018
Final Presentation
Prof. Eric Fosler-Lussier
Introduction
● The main goal with this tutorial is to develop a state of the art Named Entity
Recognizer.
● Along with this we also want the student to be able to understand and
implement a deep learning approach to solve a given problem.
● The tutorial is also structured in such a way as to provide the student with a
decent hands on experience in replicating a journal publication, given the
data and the model used.
Importance
● This is important since deep learning is gaining widespread traction for most
modern machine learning applications especially NLP.
● Replication of the results in such deep learning research publication is critical
for accelerated research growth.
● This is one crucial point that we wanted to tackle, that is replicability of deep
learning based publications and studies, because we feel that there is a huge
shortage of such work.
Resources used
● PyTorch
● NumPy
● Jupyter Notebook
● Python 3.5
● CoNLL 2003 dataset for NER
● Paper: End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF by
Xuezhe Ma, Eduard Hovy; https://arxiv.org/abs/1603.01354
● F1 score : 91.21
Main parts of the Architecture:
● Data Preparation
● Convolutional Neural Network (CNN) Encoder for Character Level
representation
● Bi-directional Long Short Term Memory (LSTM) for Word-Level Encoding
● Conditional Random Fields (CRF) for output decoding
Data Preparation
The paper*
uses the English data from CoNLL 2003 shared task
● Tag Update:
In the paper, the authors use the tagging Scheme ( BIOES ) rather than BIO (which
is used by the dataset). So, we need to first update the data to convert tag scheme
from BIO to BIOES. (Beginning, Inside, Outside, End, Single or Unit Length)
● Mappings:
Create mapping for Words-to-ids, Tag-to-ids, characters-to-ids
* “End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF”, Xuezhe Ma and Eduard H. Hovy, CoRR (2016) abs/1603.01354
Word Embedding
● Using Pre-Trained Embeddings:
The paper uses Global Vectors (GloVe) 100 dimension vectors trained on the (
Wikipedia 2014 + Gigaword 5 ) corpus containing 6 Billion Words.
● Word embedding Mapping
Create mapping for words in vocabulary to word embeddings
Model Details (CNN Encoder for Character Level representation)
● Convolution layer on top that generates
spatial coherence across characters
● Maxpool extracts meaningful features out
of our convolution layer
● This now gives us a dense vector
representation of each word.
● This representation will be concatenated
with the pre-trained GloVe embeddings
using a simple lookup
Figure Illustrating the Character Embedding CNN layer.
(Adapted from the paper)
Model Details (Bi-LSTM for Word-Level Encoding)
● The word-embeddings that we generated in
our previous layer, we feed to a bi-directional
LSTM model
● The forward layer takes in a sequence of word
vectors and generates a new vector based on
what it has seen so far in the forward
direction
● This vector can be thought of as a summary
of all the words it has seen
● The backwards layer does the same but in
opposite direction
Figure Illustrating the Sequence labelling LSTM layer.
(Adapted from the paper)
Model Details (CRF Layer)
● Even if we capture some information from
the context thanks to the bi-LSTM, the
tagging decision needs to take advantage of
this.
● Since NER is heavily influenced by
neighboring tagging decisions.
● This is why we apply CRFs over traditional
softmax
● Given a sequence of words and a sequence
of score vectors, a linear-chain CRF defines
a global score such that it generates
sentence level likelihoods for optimal tags.
Figure Illustrating Conditional Random Fields (CRF) for
sequence tagging.
(Adapted from the paper)
Computing Tags
Recall that the CRF computes a conditional probability. Let y be a tag sequence
and x an input sequence of words. Then we compute maximum likelihood as,
Viterbi decode is basically applying dynamic programming to choosing our tag
sequence
Closing Comments
Sample Output from the given model
Experience of presenting to friends
● Initial few drafts
○ Too much time preprocessing data
○ PyTorch API
● Changes
○ Detailed comments explaining each step and the intuition behind it.
○ Detailed comments for the PyTorch functions
● Final Draft:
○ Takes a little longer than the intended time if the student solving it
doesn’t have sufficient background in deep learning and PyTorch.
Thank You!

More Related Content

PPTX
PDF
Study_of_Sequence_labeling_Systems
PPTX
[Paper review] BERT
PDF
BERT: Bidirectional Encoder Representations from Transformers
PDF
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
PPTX
Pre trained language model
PPTX
1909 BERT: why-and-how (CODE SEMINAR)
PPTX
BERT introduction
Study_of_Sequence_labeling_Systems
[Paper review] BERT
BERT: Bidirectional Encoder Representations from Transformers
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Pre trained language model
1909 BERT: why-and-how (CODE SEMINAR)
BERT introduction

What's hot (20)

PDF
Bert pre_training_of_deep_bidirectional_transformers_for_language_understanding
PPTX
NLP State of the Art | BERT
PPTX
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
PDF
Kostiantyn Omelianchuk, Oleksandr Skurzhanskyi "Building a state-of-the-art a...
DOCX
Bt8903, c# programming
DOCX
Mca1020 programming in c
PDF
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
PPTX
Gpt1 and 2 model review
PDF
7. Trevor Cohn (usfd) Statistical Machine Translation
PDF
Compression-Based Parts-of-Speech Tagger for The Arabic Language
PDF
Hua Shan - 2015 - A Dependency-to-String Model for Chinese-Japanese SMT System
PPTX
Electra
PDF
BERT Finetuning Webinar Presentation
PPTX
Deep Learning勉強会@小町研 "Learning Character-level Representations for Part-of-Sp...
DOCX
Mi0041 java and web design
PDF
5. manuel arcedillo & juanjo arevalillo (hermes) translation memories
DOCX
Mit4021 c# and .net
PDF
An Introduction to Pre-training General Language Representations
PDF
Asp.net main
PDF
Object oriented programming interview questions
Bert pre_training_of_deep_bidirectional_transformers_for_language_understanding
NLP State of the Art | BERT
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Kostiantyn Omelianchuk, Oleksandr Skurzhanskyi "Building a state-of-the-art a...
Bt8903, c# programming
Mca1020 programming in c
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
Gpt1 and 2 model review
7. Trevor Cohn (usfd) Statistical Machine Translation
Compression-Based Parts-of-Speech Tagger for The Arabic Language
Hua Shan - 2015 - A Dependency-to-String Model for Chinese-Japanese SMT System
Electra
BERT Finetuning Webinar Presentation
Deep Learning勉強会@小町研 "Learning Character-level Representations for Part-of-Sp...
Mi0041 java and web design
5. manuel arcedillo & juanjo arevalillo (hermes) translation memories
Mit4021 c# and .net
An Introduction to Pre-training General Language Representations
Asp.net main
Object oriented programming interview questions
Ad

Similar to End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF (20)

PDF
Neural Architectures for Named Entity Recognition
PDF
ENSEMBLE MODEL FOR CHUNKING
PDF
Arabic named entity recognition using deep learning approach
PDF
Named Entity Recognition using Bi-LSTM and Tenserflow Model
PDF
BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...
PDF
BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...
PDF
Bi-directional LSTM-CNNs-CRF for Italian Sequence Labeling
PDF
Grammarly AI-NLP Club #6 - Sequence Tagging using Neural Networks - Artem Che...
PDF
IRJET- Visual Information Narrator using Neural Network
PDF
IRJET- Image Captioning using Multimodal Embedding
PDF
5_RNN_LSTM.pdf
 
PDF
IRJET- Extension to Visual Information Narrator using Neural Network
PPTX
asdrfasdfasdf
PPTX
Image captions.pptx
PPTX
Named entity recognition - Kaggle/Own data
PDF
Marek Rei - 2017 - Semi-supervised Multitask Learning for Sequence Labeling
PPTX
Inside the Black Box: How Does a Neural Network Understand Names? - Philip Bl...
PDF
Alberto Massidda - Images and words: mechanics of automated captioning with n...
PPTX
Natural Language Processing Advancements By Deep Learning: A Survey
PDF
Video + Language 2019
Neural Architectures for Named Entity Recognition
ENSEMBLE MODEL FOR CHUNKING
Arabic named entity recognition using deep learning approach
Named Entity Recognition using Bi-LSTM and Tenserflow Model
BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...
BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...
Bi-directional LSTM-CNNs-CRF for Italian Sequence Labeling
Grammarly AI-NLP Club #6 - Sequence Tagging using Neural Networks - Artem Che...
IRJET- Visual Information Narrator using Neural Network
IRJET- Image Captioning using Multimodal Embedding
5_RNN_LSTM.pdf
 
IRJET- Extension to Visual Information Narrator using Neural Network
asdrfasdfasdf
Image captions.pptx
Named entity recognition - Kaggle/Own data
Marek Rei - 2017 - Semi-supervised Multitask Learning for Sequence Labeling
Inside the Black Box: How Does a Neural Network Understand Names? - Philip Bl...
Alberto Massidda - Images and words: mechanics of automated captioning with n...
Natural Language Processing Advancements By Deep Learning: A Survey
Video + Language 2019
Ad

Recently uploaded (20)

PDF
Basic Mud Logging Guide for educational purpose
PPTX
master seminar digital applications in india
PPTX
Lesson notes of climatology university.
PDF
FourierSeries-QuestionsWithAnswers(Part-A).pdf
PDF
01-Introduction-to-Information-Management.pdf
PPTX
Institutional Correction lecture only . . .
PDF
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
PDF
O7-L3 Supply Chain Operations - ICLT Program
PPTX
PPH.pptx obstetrics and gynecology in nursing
PDF
TR - Agricultural Crops Production NC III.pdf
PPTX
Renaissance Architecture: A Journey from Faith to Humanism
PDF
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
PPTX
human mycosis Human fungal infections are called human mycosis..pptx
PDF
Pre independence Education in Inndia.pdf
PPTX
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
PDF
Supply Chain Operations Speaking Notes -ICLT Program
PDF
Sports Quiz easy sports quiz sports quiz
PDF
Classroom Observation Tools for Teachers
PDF
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
PPTX
Cell Structure & Organelles in detailed.
Basic Mud Logging Guide for educational purpose
master seminar digital applications in india
Lesson notes of climatology university.
FourierSeries-QuestionsWithAnswers(Part-A).pdf
01-Introduction-to-Information-Management.pdf
Institutional Correction lecture only . . .
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
O7-L3 Supply Chain Operations - ICLT Program
PPH.pptx obstetrics and gynecology in nursing
TR - Agricultural Crops Production NC III.pdf
Renaissance Architecture: A Journey from Faith to Humanism
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
human mycosis Human fungal infections are called human mycosis..pptx
Pre independence Education in Inndia.pdf
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
Supply Chain Operations Speaking Notes -ICLT Program
Sports Quiz easy sports quiz sports quiz
Classroom Observation Tools for Teachers
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
Cell Structure & Organelles in detailed.

End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF

  • 1. State-of-the-Art Named Entity Recognition Framework Anirudh Ganesh Jayavardhan P Reddy Speech and Language Processing (CSE 5525) Spring 2018 Final Presentation Prof. Eric Fosler-Lussier
  • 2. Introduction ● The main goal with this tutorial is to develop a state of the art Named Entity Recognizer. ● Along with this we also want the student to be able to understand and implement a deep learning approach to solve a given problem. ● The tutorial is also structured in such a way as to provide the student with a decent hands on experience in replicating a journal publication, given the data and the model used.
  • 3. Importance ● This is important since deep learning is gaining widespread traction for most modern machine learning applications especially NLP. ● Replication of the results in such deep learning research publication is critical for accelerated research growth. ● This is one crucial point that we wanted to tackle, that is replicability of deep learning based publications and studies, because we feel that there is a huge shortage of such work.
  • 4. Resources used ● PyTorch ● NumPy ● Jupyter Notebook ● Python 3.5 ● CoNLL 2003 dataset for NER ● Paper: End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF by Xuezhe Ma, Eduard Hovy; https://arxiv.org/abs/1603.01354 ● F1 score : 91.21
  • 5. Main parts of the Architecture: ● Data Preparation ● Convolutional Neural Network (CNN) Encoder for Character Level representation ● Bi-directional Long Short Term Memory (LSTM) for Word-Level Encoding ● Conditional Random Fields (CRF) for output decoding
  • 6. Data Preparation The paper* uses the English data from CoNLL 2003 shared task ● Tag Update: In the paper, the authors use the tagging Scheme ( BIOES ) rather than BIO (which is used by the dataset). So, we need to first update the data to convert tag scheme from BIO to BIOES. (Beginning, Inside, Outside, End, Single or Unit Length) ● Mappings: Create mapping for Words-to-ids, Tag-to-ids, characters-to-ids * “End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF”, Xuezhe Ma and Eduard H. Hovy, CoRR (2016) abs/1603.01354
  • 7. Word Embedding ● Using Pre-Trained Embeddings: The paper uses Global Vectors (GloVe) 100 dimension vectors trained on the ( Wikipedia 2014 + Gigaword 5 ) corpus containing 6 Billion Words. ● Word embedding Mapping Create mapping for words in vocabulary to word embeddings
  • 8. Model Details (CNN Encoder for Character Level representation) ● Convolution layer on top that generates spatial coherence across characters ● Maxpool extracts meaningful features out of our convolution layer ● This now gives us a dense vector representation of each word. ● This representation will be concatenated with the pre-trained GloVe embeddings using a simple lookup Figure Illustrating the Character Embedding CNN layer. (Adapted from the paper)
  • 9. Model Details (Bi-LSTM for Word-Level Encoding) ● The word-embeddings that we generated in our previous layer, we feed to a bi-directional LSTM model ● The forward layer takes in a sequence of word vectors and generates a new vector based on what it has seen so far in the forward direction ● This vector can be thought of as a summary of all the words it has seen ● The backwards layer does the same but in opposite direction Figure Illustrating the Sequence labelling LSTM layer. (Adapted from the paper)
  • 10. Model Details (CRF Layer) ● Even if we capture some information from the context thanks to the bi-LSTM, the tagging decision needs to take advantage of this. ● Since NER is heavily influenced by neighboring tagging decisions. ● This is why we apply CRFs over traditional softmax ● Given a sequence of words and a sequence of score vectors, a linear-chain CRF defines a global score such that it generates sentence level likelihoods for optimal tags. Figure Illustrating Conditional Random Fields (CRF) for sequence tagging. (Adapted from the paper)
  • 11. Computing Tags Recall that the CRF computes a conditional probability. Let y be a tag sequence and x an input sequence of words. Then we compute maximum likelihood as, Viterbi decode is basically applying dynamic programming to choosing our tag sequence
  • 12. Closing Comments Sample Output from the given model
  • 13. Experience of presenting to friends ● Initial few drafts ○ Too much time preprocessing data ○ PyTorch API ● Changes ○ Detailed comments explaining each step and the intuition behind it. ○ Detailed comments for the PyTorch functions ● Final Draft: ○ Takes a little longer than the intended time if the student solving it doesn’t have sufficient background in deep learning and PyTorch.