Text-based Speaker Identification on Multiparty Dialogues Using Multi-document Convolutional Neural Networks

Text-based Speaker Identification on Multiparty Dialogues
Using Multi-document Convolutional Neural Networks
Kaixin Ma, Catherine Xiao, Jinho D. Choi
Department of Mathematics and Computer Science, Emory University
• Withhold the identities of speakers in multi-party dialogue.
• Classify each utterance in dialogue to speakers.
• This work attempts to identify the six main characters in the first 8 seasons
of the TV show, Friends.
• The minor characters in the show are to be identified collectively as Other.
Objective
• The corpus consists of 194 episodes, 2579 scenes and 49755 utterances.
Seasons
Episodes
Scenes
Utterances
Utterance Text +
Speaker + Statement
Corpus Structure Speaker Distribution
• Each utterance may contain one or multiple sentences.
• Each consecutive utterance must have a different speaker.
• The frequencies of interactions between pairs of speakers varies.
• Large number of misspelling and colloquialisms.
• Utterances that are too short and too general.
• Another dataset is created by utterance concatenation.
• Utterances from the same speaker within the scene are concatenated.
U1 U2 U3 U4 U5 U1+U3+U5 U2 U4
Corpus Description
• Each utterance is predicted independently.
Baseline CNN Structure
• The model takes one scene as a batch of input.
• The original sequence of dialogue is preserved.
• The tensor is sliced and padded to represent the previous/next utterance.
Multi-document CNN Structure
• The multi-document CNN model’s identification accuracy increase by 6%
from that of basic CNN.
• The model can better capture different speech patterns on longer document.
• When prediction labels are restricted, the accuracies boosts of 10% and
12% are achieved on two datasets, respectively.
• The Speakers with higher accuracies are also confused by the model more
often than others.
• Frequency of interactions between speaker pairs correlates with the rate of
confusion.
Results
• We present neural network based approach to speaker identification in
multiparty dialogue relying on textual transcription data.
• The contextual information is essential to the performance of text-based
speaker identification.
• Because of our model’s ability to identify speakers in the absence of audio
data, interests in the intelligence and surveillance community are expected.
• We plan to incorporate text-based features in a larger audio-based system
of speaker identification to enhance its security.
Conclusion
• We gratefully acknowledge the department of Mathematics and Computer
Science at Emory University for supporting this work. Any content presented
here is solely the responsibility of the authors and does not necessarily
represent the official view of the organization.
Acknowledgement
Approaches

Text-based Speaker Identification on Multiparty Dialogues Using Multi-document Convolutional Neural Networks

More Related Content

What's hot (20)

More from Jinho Choi (20)

Recently uploaded (20)

Text-based Speaker Identification on Multiparty Dialogues Using Multi-document Convolutional Neural Networks