Deep Learning and Modern Natural Language Processing (AnacondaCon2019)

Deep Learning and Modern Natural
Language Processing
Zachary Brown, Lead Data Scientist, S&P Global

Outline
• Neural Methods for Natural Language Processing
• Shapes of Natural Language Processing Tasks
• Perceptron Text Classification
• Local vs. Global Text Representations
• Contextual Representations and Sequence Modeling
2

Neural Methods for Natural Language
Processing

Processing
• Natural Language Processing (NLP) has moved largely to neural
methods in recent years
4

Processing
5
• Traditional NLP builds on years of research into
language representation
• Theoretical foundations can lead to model rigidity
• Tasks often rely on manually generated and
curated dictionaries and thesauruses
• Built upon local word representations

Processing
6
• Few to no assumptions need to be made
• Active area of research; most open-source
• Ability to learn global and contextualized
word representations
• Purpose-built model architectures

Processing
7
• Few to no assumptions need to be made
• Active area of research; most open-source
• Ability to learn global and contextualized
word representations
• Purpose-built model architectures

Shapes of Natural Language
Processing Tasks

Shapes of Natural Language Processing
Tasks
• A general task in natural language processing often takes the form:
9

Tasks
• For binary classification processes (relevance), our target is a single
number, often interpreted as a probability
10

Tasks
• For multi-class classification processes (type of text), our target is a
set of probabilities, one for each of the output classes
11

Tasks
• For sequential classification (e.g. LM, NER, POS) the target is a
probability of each class for each element in the input
12

Tasks
• In a traditional machine learning pipeline, vectorization (feature
engineering) process is often a (very) time consuming process
13
80-90%

Tasks
• A relatively small proportion of time is spent on the actual modeling
14
10-20%

Tasks
• Neural networks allow us to develop purpose-built architectures to
solve tasks, that learn the appropriate vectorization for a task
15
100%

Perceptron Text Classification

• To introduce the shape of information as it flows through a neural
network, we'll first look at a network that only handles classification
17

• For the vectorization, we'll assume that we've converted our text
into a vector using a count-based method like tf-idf
18
tf-idf

• A perceptron is one of the simplest neural network architectures,
and is a good fit for this task
19

• A perceptron is one of the simplest neural network architectures,
and is a good fit for this task
20
input
hidden
(linear)
activation
output

• The hidden layer represents the weights that will be optimized by
the deep learning framework.
21
input
hidden
(linear)
activation
output
weights

• If we want to change our task to multiclass classification, we can
simply change the size of our hidden layer (+ minor mods)
22

• The result of this is that we now have a matrix of weights to optimize
23
weights

Local vs. Global Text Representations

• Let's look back to the problem of creating a vector representation for
our text
25
tf-idf

• Further, let's only consider the task of how we'd represent single
words or tokens as vectors
26
dog

• Traditional approaches to word representations treat each word as
a unique entity
27

a unique entity
28

a unique entity
29

a unique entity
30

• Modern approaches move to a fixed dimensional vector size, with
dense vectors
31

dense vectors
32

dense vectors
33

• There are a variety of frameworks available that allow for computing
these vectors in an unsupervised way
34

Contextual Representations and
Sequence Modeling

Contextual Representations and Sequence
Modeling
• Global word representations are a fantastic starting point for many
problems in NLP, but consider the following sentence
36
I'm going to book our vacation then relax and read a good book

Modeling
37
I'm going to book our vacation then relax and read a good book

I don't really hate horror movies, but I hate comedies
Modeling
38

I don't really hate horror movies, but I hate comedies
Modeling
39
Context
Matters

Modeling
• For modeling tasks where word ordering and context matter,
sequential models are often used. These tasks often take the
following shape:
40

Modeling
• Recurrent neural networks are a type of neural network architecture
that naturally handles modeling sequential data
41

Modeling
• This type of network generates a new output vector for each input in
a sequence, and also feeds that same information forward
42

Modeling
• By feeding the information forward, each subsequent output vector
has contextual information encoded from the preceding words
43

Modeling
• This type of architecture can be used to build language models,
where the task is to predict the next word in the sequence
44

Modeling
• It can also be used for problems like named entity recognition
45
animalo o o o animal

Modeling
• By taking the final vector in the sequence, you can perform tasks
like sentiment classification
46
positive

Modeling
• For all of these different types of tasks, a network similar to the
perceptron can be placed at the end to carry out the final
classification of each word
47

Modeling
• For all of these different types of tasks, a network similar to the
perceptron can be placed at the end to carry out the final
classification of each word, or the classification of the whole
sequence
48

Modeling
• In a similar manner, these individual elements can be combined in a
variety of ways to tackle very complex tasks
49

Modeling
50

Modeling
51

Modeling
52

Deep Learning and Modern Natural Language Processing (AnacondaCon2019)

More Related Content

What's hot (20)

Similar to Deep Learning and Modern Natural Language Processing (AnacondaCon2019) (20)

More from Zachary S. Brown (6)

Recently uploaded (20)

Deep Learning and Modern Natural Language Processing (AnacondaCon2019)