neural based_context_representation_learning_for_dialog_act_classification

Neural-based
Context Representation Learning for
Dialog Act Classification
Daniel Ortega, Ngoc Thang Vu
(Institute for Natural Language Processing (IMS) University of
Stuttgart)
-
Proceedings of the SIGDIAL 2017 Conference,
-
Slides by Park JeeHyun
25 OCT 2018

Introduction
• The study of spoken dialogs between two or more speakers
can be approached by analyzing the dialog acts (DAs).
• Dialog Acts
• the intention of the speaker at every utterance during a
conversation.
• e.g.

Introduction. Automatic_DA_classification
• Automatic DA classification
• This classification task has been approached using traditional
statistical methods such as
• hidden Markov models (HMMs) (Stolcke et al., 2000),
• conditional random fields (CRF) (Zimmermann, 2009) and
• support vector machines (SVMs) (Henderson et al., 2012).
• Recently, deep learning (DL) techniques have brought state-of-
the-art models in DA classification, such as
• convolutional neural networks (CNNs) (Kalchbrenner and Blunsom, 2013;
Lee and Dernoncourt, 2016),
• recurrent neural networks (RNNs) (Lee and Dernoncourt, 2016; Ji et al., 2016)
and
• long short-term memory (LSTM) models (Shen and Lee, 2016).

Introduction. Automatic_DA_with_context_info
• In many cases, the utterances are too short so that is hard
to classify them,
• for example the utterance ‘Right’ can be
either an Agreement or a Backchannel,
in this case the context plays a key role at disambiguating.
• Therefore, using context information from the previous utterances
in a dialog flow is a crucial step for improving DA classification.

Introduction. Attention_Mechanisms
• Attention mechanisms (AMs) introduced by Bahdanau et al.
(2014) have contributed to significant improvements in
many natural language processing tasks.

Model
• CNN-based Dialog Utterance Representation
• Internal Attention Mechanism
• Neural-based Context Modeling

Model. CNN-based_Dialog_Utterance_Representation
• CNNs is used for the representation of each utterance.
• CNNs perform a discrete convolution on an input matrix with
a set of different filters.
• For the DA classification task, the input matrix rep- resents
a dialog utterance and its context, this is n previous
utterances:
• each column of the matrix stores the word embedding of the
corresponding word.
• 2D filters f (with width |f|) spanning all embedding
dimensions d.

Model. CNN-based_Dialog_Utterance_Representation
• 2D filters f (with width |f|) spanning all embedding
dimensions d.

Model. Internal_Attention_Mechanism
• Attention mechanisms can be applied in different
sequences of input vectors, e.g. representations of
consecutive dialog utterances.
for output attention
for input attention
attention strength
input vector @ time step t-i

Model. Neural-based_Context_Modeling

Experimental Setup
• Data
• MRDA:
ICSI Meeting Recorder Dialog Act Corpus (Janin et al., 2003; Shriberg et al.,
2004; Dhillon et al., 2004), a dialog corpus of multiparty meetings. The 5-tag-
set used in this work was introduced by Ang et al. (2005).
• SwDA:
Switchboard Dialog Act Corpus (Godfrey et al., 1992; Jurafsky et al., 1997), a
dialog corpus of 2-speaker conversations.
In both datasets the classes are highly unbalanced,
the majority class is 59.1% on MRDA
and 33.7 % on SwDA.

Experimental Setup
• Hyperparameters and Training

Experimental Results
• Baseline Models
• a one-layer CNN for sentence classification based on Kim (2014)
• Baseline I: The input is a single utterance a time without any contextual
information
• Baseline II: The input is the concatenation of the current utterance and
previous utterances.
Q) Which model did they used as baseline models?
Static or Non-static?
Or Multi-channel?

Experimental Results. Results
Q) How about using attention on both ends?
(RNN-Input-Output-Attention???)

Experimental Results. Impact_of_Context_Length

Comparison with Other Works
Lee and Dernon-court (CNN-FF & LSTM-FF) is
the newest research in DA classification,
which published train/validation splits and
claimed to be the state-of-the-art on that setup.
Therefore, an accurate comparison of our results
can be only done with this work.

Conclusions
• We explored different neural-based context representation learning
methods for dialog act classification which combine RNN
architectures with attention mechanisms at different context levels.
• Our results on two benchmark datasets reveal that using
RNN architecture is important to learn the context representation.
• Moreover, attention mechanisms contribute to the overall
improvements,
however, the place where AM should be applied depends on
the nature of the dataset.

neural based_context_representation_learning_for_dialog_act_classification

neural based_context_representation_learning_for_dialog_act_classification

More Related Content

What's hot (19)

Similar to neural based_context_representation_learning_for_dialog_act_classification (20)

More from JEE HYUN PARK (9)

Recently uploaded (20)

neural based_context_representation_learning_for_dialog_act_classification