Centroid-based Text Summarization through Compositionality of Word Embeddings

Centroid-based Text Summarization through
Compositionality of Word Embeddings
Gaetano Rossiello, Pierpaolo Basile, Giovanni Semeraro
gaetano.rossiello@uniba.it
Department of Computer Science
University of Bari - Aldo Moro, Italy
MultiLing 2017 Workshop in EACL 2017
Summarization and summary evaluation across source types and genres
3 April 2017 - Valencia, Spain

Introduction
Extractive Text Summarization
The generated summary is a selection of relevant sentences from the
source text in a copy-paste fashion.
A good extractive summarization method must satisfy:
Coverage: the selected sentences should cover a suﬃcient
amount of topics from the original source text
Diversity: avoid the redundancy of information in the summary
Gaetano Rossiello, Pierpaolo Basile, Giovanni Semeraro Centroid-based Text Summarization using Word Embeddings

Extractive Text Summarization
An extractive text summarization method should deﬁne:
Representation model: a paradigm to represent the sentences
Scoring method: a technique for assigning a score to each sentence
Ranking module: a method to properly select the relevant sentences

Bag-of-Words
Several summarization methods proposed in the literature use the
bag-of-words as representation model for the sentence scoring and
selection modules.

Limit of the Bag-of-Words
The textual similarity is a crucial aspect for many extractive text
summarization methods.
For example, taking into account two semantically related phrases:
S1 Syd leaves Pink Floyd
S2 Barrett abandons the band
abandons band Barrett Floyd leaves Pink Syd the
S1 0 0 0 1 1 1 1 0
S2 1 1 1 0 0 0 0 1
S1 ⊥ S2 =⇒ cosine(S1, S2) = 0
In the BOW model their vector (sparse) representation result
orthogonal since they have no words in common.

The Idea
Humans use a wide background knowledge when writing a summary.
What Representation Learning
How Distributional Semantics
Why Transfer Learning

Neural Language Model - Word2Vec
Word embedding stands for a continuous vector representation
able to capture syntactic and semantic information of a word.
vec(“Gilmour”) = [0.1, 0.3, 0.2, ...]
vec(“Barrett”) = [0.3, 0.1, 0.6, ...]
...
Figure: Continuous bag-of-words and Skip-gram [Mikolov et al., 2013]
vec(Barrett) − vec(singer) + vec(guitarist) ≈ vec(Gilmour)

Centroid-based Text Summarization: Overview
Centroid-based Extractive Text Summarization [Radev et al., 2004]
The centroid represents a pseudo-document which condenses
the meaningful information of a document (tﬁdf (w) > t)
The main idea is to project in the vector space the vector
representations of both the centroid and each sentence of a
document
The sentences closer to the centroid are selected

Sentence Scoring using Word Embeddings
Word Embeddings Lookup Table
Given a corpus of documents [D1, D2, . . . ]a and its vocabulary V
with size N = |V |, we deﬁne a matrix E ∈ RN,k, so-called lookup
table, where the i-th row is a word embedding of size k, k << N,
of the i-th word in V .
a
The model can be trained on the collection of documents to be summarized
or on a larger corpus.
Given a document D to be summarized:
Preprocessing: split into sentences, remove stopwords, no stemming
Centroid Embedding: C = w∈D,tﬁdf (w)>t E[idx(w)]
Sentence Embedding: Sj = w∈Sj
E[idx(w)]
Sentence Score: sim(C, Sj ) =
CT •Sj
||C||·||Sj ||

Sentence Selection
Input: S, Scores, st, limit
Output: Summary
1: S ← sortDesc(S,Scores)
2: k ← 1
3: for i ← 1 to m do
4: length ← length(Summary)
5: if length > limit then return Summary
6: SV ← sumVectors(S[i])
7: include ← True
8: for j ← 1 to k do
9: SV 2 ← sumVectors(Summary[j])
10: sim ← similarity(SV ,SV 2)
11: if sim > st then
12: include ← False
13: if include then
14: Summary[k] ← S[i]
15: k ← k + 1

Text Summarization using Word Embeddings: Example
Figure: Embeddings visualization using t-SNE [van der Maaten et al., 2008]

Text Summarization using Word Embeddings: Example
arcade donkey kong game nintendo coleco centroid embedding
arcades goat hong gameplay mario intellivision nes
pac-man pig macao multiplayer wii atari gamecube
console monkey fung videogame console nes konami
famicom horse taiwan rpg nes msx wii
sega cow wong gamespot gamecube 3do famicom
Table: Centroid words of the Donkey Kong (video game) article having the
tf-idf values greater than a topic threshold
Sent ID Sentence Score
136 The original arcade version of the game appears in the Nintendo 64 game Donkey
Kong 64.
0.9533
131 The game was ported to Nintendo’s Family Computer (Famicom) console in 1983 as
one of the system’s three launch titles; the same version was a launch title for the
Famicom’s North American version, the Nintendo Entertainment System (NES).
0.9375
186 In 2004, Nintendo released Mario vs. Donkey Kong, a sequel to the Game Boy title. 0.9366
192 In 2007, Donkey Kong Barrel Blast was released for the Nintendo Wii. 0.9362
135 The NES version was re-released as an unlockable game in Animal Crossing for the
GameCube and as an item for purchase on the Wii’s Virtual Console.
0.9308
Table: The most relevant sentences

Experiments
Research Question
Can word embeddings improve the eﬀectiveness of the centroid-based
text summarization method?
Python implementation on GitHub:
github.com/gaetangate/text-summarizer/
DUC-2004 Multi-document Summarization task 2
Tuning Grid search on DUC-2003 dataset
Word2Vec CBOW, Skip-gram on DUC-03/04 -
Pre-trained on Google News
MSS 2015 Multilingual Single Document task 2015
Tuning Grid search on MSS 2015 training set
Word2Vec Skip-gram on Wikipedia (en, it, de, es, fr)
MSS 2017 Multilingual Single Document task 2017
Tuning Grid search on MSS 2017 training set
Word2Vec Skip-gram on Wikipedia (en, it, de, es, fr)

DUC-2004 task 2: Multi-document Summarization
ROUGE-1 ROUGE-2 tt st size
LEAD 32.42 6.42
SumBasic 37.27 8.58
Peer65 38.22 9.18
NMF 31.60 6.31
LexRank 37.58 8.78
RNN 38.78 9.86
C BOW 37.76 8.08 0.1 0.6
C GNEWS 37.91 8.45 0.2 0.9 300
C CBOW 38.68 8.93 0.3 0.93 200
C SKIP 38.81 9.97 0.3 0.94 400
Table: ROUGE scores (%) on DUC-2004 dataset. tt and st are the topic
and similarity thresholds respectively. size is the dimension of embeddings

DUC-2004 task 2: Multi-document Summarization
Although the diﬀerent methods achieve similar ROUGE scores, they
not necessarily generate similar summaries.
GNEWS CBOW SKIP BOW
GNEWS 1 0.109 0.171 0.075
CBOW 1 0.460 0.072
SKIP 1 0.105
BOW 1
Table: Sentence overlap using the Jaccard coeﬃcient

MSS 2015: Multilingual Single Document Summarization
English Italian German Spanish French
R1 R2 R1 R2 R1 R2 R1 R2 R1 R2
LEAD 44.33 11.68 30.46 4.38 29.13 3.21 43.02 9.17 42.73 8.07
WORST 37.17 9.93 39.68 10.01 33.02 4.88 45.20 13.04 46.68 12.96
BEST 50.38 15.10 43.87 12.50 40.58 8.80 53.23 17.86 51.39 15.38
C BOW 49.06 13.43 33.44 4.82 35.28 4.93 48.38 12.88 46.13 10.45
C W2V 50.43‡
13.34†
35.12 6.81 35.38†
5.39†
49.25†
12.99 47.82†
12.15
ORACLE 61.91 22.42 53.31 17.51 54.34 13.32 62.55 22.36 58.68 17.18
Table: ROUGE-1, -2 scores (%) on MultiLing MSS 2015 dataset for ﬁve
diﬀerent languages

Improvements for MSS 2017
SWAP system in MSS 2017 task:
Always retain the ﬁrst sentence in the summary
Subtract to each word embedding the centroid vector of the
whole embedding space
Combination of four scores:
sc1 word2vec centroid similarity
sc2 bag-of-words centroid similarity
sc3 normalized sentence length
sc4 normalized sentence position
score(Sj ) = λ1 ∗ sc1 + λ2 ∗ sc2 + λ3 ∗ sc3 + λ4 ∗ sc4

MSS 2017: Multilingual Single Document Summarization
English Italian German Spanish French
R1 MM R1 MM R1 MM R1 MM R1 MM
CIST 45.06 16.83 30.07 20.22 32.32 16.52 45.31 16.94 41.67 17.66
TeamMD 43.08 16.35 30.22 20.79 32.91 15.76 44.95 16.54 42.81 17.18
SWAP 45.62 17.05 32.66 18.45 35.15 18.27 46.67 20.58 43.68 20.06
ORACLE 55.52 20.98 41.25 26.89 41.58 23.75 52.20 23.16 51.41 25.44
Table: ROUGE-1, MeMog scores (%) on MultiLing MSS 2017 dataset for
ﬁve diﬀerent languages

Future Works
Try word embeddings with other summarization methods
Sentence embeddings using Deep Learning
Doc2Vec
Autoencoders
Convolutional Neural Networks
Recurrent Neural Networks (LSTM, GRU)
Attention and Memory Networks
Joint Learning of Distributional and Relational Semantics
WordNet
DBpedia
Wikidata
...
Infuse prior knowledge in sentence embeddings
Compositionality of Concepts
Transfer Learning

Moral of the Story
Math > Magic

Centroid-based Text Summarization through Compositionality of Word Embeddings

Centroid-based Text Summarization through Compositionality of Word Embeddings

More Related Content

What's hot (20)

Similar to Centroid-based Text Summarization through Compositionality of Word Embeddings (19)

Recently uploaded (20)

Centroid-based Text Summarization through Compositionality of Word Embeddings