Query-Focused Extractive Text Summarization for Multi-Topic Document

Query-Focused Extractive

Text Summarization

for Multi-Topic Document
Shinichiro Mizuno(2030414)

Master's Thesis Defense Program

Japan Advanced Institute of Science and Technology

1
Contents
1. Introduction

2. Related Work

3. Dataset

4. Proposed Methods

5. Baselines

6. Experiments

7. Conclusions

3
Document summarization is an effective tool for quickly going through huge amount of
information. However, people have different interests for each individual.

If the summary is generated based on a different perspective from the one you expect,
you could not find the information that you look for.
Business Strategy

(By author A)
• Finance

• Marketing

• Team Building

…
I am looking for information
about Finance. But I don't
have time to look through
all the relevant books.
I would like to compare the
Marketing approach
between the authors.
Business Strategy

(By author B)
• Leadership

• Finance

• Marketing

…
Business Strategy

(By author C)
• Marketing

• Finance

• Accounting

…
1-1. Background
1. Introduction
How do we implement the requirement?

4
1-2. Implementation Approach(1)
1. Introduction
Summary

Extractor
Summary
Query
Summary
Query
Summary
Query
One of the implementation approaches is to apply query-focused text summarization
methods, in which we take the summary perspective as a query, and extract the text
related to the query.
Query-Focused Extractive Text Summarization

5
1-2. Implementation Approach(1) - Problem
1. Introduction
Summary
Summary
One of the existing well known query-focused summary datasets is DUC 2005-2007.
However, this dataset has only one query per document, which is different from what we
want to implement.
DUC 2005-2007
Document
…
Summary
Summary
Document
…
…
… …
Query
Query

6
1-3. Implementation Approach(2)
1. Introduction
Span Selector
Answer
Question
Answer
Question
Answer
Question
Other implementation approach is to apply QA task methods, in which we view the
extractive perspective as a question, and extract “Answer Span” related to the question
from the target document.
Question Answering (Reading Comprehension)

7
1-3. Implementation Approach(2) - Problem
1. Introduction
Answer
SQuAD 1.1/2.0 and TriviaQA are well-known QA task datasets. Those datasets have only a
single span to be selected for a document. However, we expect multiple spans to be
selected for multi-topic document.
SQuAD 1.1/2.0
S E
Question
Answer
S E
Question
Article
Answer
TriviaQA
S E
Question
Answer
S E
Question
Evidence

8
1-4. Objectives
1. Introduction
The objectives of this study are to verify our proposals;

1. To build a new dataset with a set of multi-topic documents

2. To establish a method for extracting topic-by-topic text from multi-topic documents.
Problem Proposal
(1)
(2)
No dataset exists for extracting
topic-by-topic text from multi-
topic documents.
No reasonable method has
been established for extracting
topic-by-topic text from multi-
topic documents.
To build a new dataset with a set
of multi-topic documents and
summary text per topic.
To establish a method for
extracting topic-by-topic text from
multi-topic documents.

10
2-1. Related Work (1) - BERT-Base
2. Related Work
Zhu et al. proposed a query-focused extractive summary model based on BERT. In their
architecture, query and sentences are concatenated and passed to BERT encoding layer
and then output layer derives the sentence scores to indicate whether it is a summary.
one [L2] two [CLS]
query query ﬁrst sent [SEP] sent [SEP]
second
[CLS] [CLS]
EQ EQ EQ ED
EQ EQ ED ED ED ED ED
ED
ED ED
Eone E[L2] Etwo E[CLS]
Equery Equery Efirst Esent E[SEP] Esent E[SEP]
Esecond
E[CLS] E[CLS]
E3 E4 E6
E2 E5 E7 E8 E9 E12 E13
E11
E10 E14
[L1]
EQ
E[L1]
E1 E15
Input
Token
Embeddings
Segment
Embeddings
Posi>on
Embeddings
Query Document
sent [SEP]
last
ED ED ED
E16 E17 E18
Elast Esent E[SEP]
BERT Encoding Layer
Output Layer
h1 h3 h3
L L L
Sentene
Representa>ons
Sentene
Scores
r(s1) r(s2) r(s3)
Figure 2: The overview of the proposed BERT-based extractive summarization model. We use special tokens (e.g.,
[L1], [L2]) to indicate hierarchial structure in queries. We surround each sentence with a [CLS] token before and a
[SEP] token after. The input representations of each token are composed of three embeddings. The hidden vectors
Architecture of BERT-Base 1
[1] H. Zhu, L. Dong, F. Wei, B. Qin, T. Liu. Transforming Wikipedia into Augmented Data for Query-Focused Summarization. arXiv preprint
arXiv:1911.03324 (2019)

[2] Ming Zhong, Da Yin, Tao Yu, Ahmad Zaidi, Mutethia Mutuma, Rahul Jha, Ahmed Hassan Awadallah, Asli Celikyilmaz, Yang Liu, Xipeng Qiu, and Dragomir
Radev.: QMSum: A New Benchmark for Query-based Multi-domain Meeting Summarization. In Proceedings of the 2021 Conference of the North American Chapter
of the Association for Computational Linguistics: Human Language Technologies, pp. 5905–5921. Association for Computational Linguistics, Online (2021)
11
2-2. Related Work (2) - QMSum
2. Related Work
Zhong et al. created QMSum as a dataset for generating summaries by perspective from
meeting minutes. QMSum is a dataset with multiple queries and summaries for a single
document. However, the queries are not set consistently throughout the dataset.
g⇤
⇤†
Da Yin⇤|
Tao Yu‡
Ahmad Zaidi‡
ma‡
Rahul Jha¶
Ahmed Hassan Awadallah¶
¶
Yang Liu§
Xipeng Qiu†
Dragomir Radev‡
University of California, Los Angeles ‡
Yale University
earch §
Microsoft Cognitive Services Research
udan.edu.cn da.yin@cs.ucla.edu
yu, dragomir.radev}@yale.edu
f human col-
of meetings
meeting sum-
emind those
ed the meet-
ade and the
it is hard to
at covers all
olving multi-
o satisfy the
we define a
meeting sum-
ave to select Figure 1: Examples of query-based meeting summa-
Examples of query-based meeting summarization task 2

13
3-1. Dataset Requirement
3. Dataset
Summary
The requirement for the dataset is that documents consist of multiple queries and
extractive summary be provided for each of the query and the queries be consistent
throughout the dataset.
Dataset Requirement
Document X
…
… …
Query A
Summary
Query B
Summary
Query C
Summary
Document Y
Query A
Summary
Query B
Summary
Query C

42 Sustainability Initiatives

66 Business Foundations Supporting

Corporate Value

98 Financial / Data Section
2 Management Message

10 A Philosophy Inherited from Our Founder

12 The ANA Group Value Creation Process

22 Business Strategy

14
3-2. Data Source
3. Dataset
We take advantage of integrated reports as the source of our dataset.

An integrated report is a report issued by a company for investors on an annual basis that
integrates financial information, with non-financial information, such as environmental
and social initiatives.
Sample Contents of Integrated Reports 3
Annual Report 2021
Fiscal 2020 (Year ended March 2021)
12 ANA Group Strengths

14 The Value Creation Process

16 Timeline for Simultaneous Creation of

Social Value and Economic Value

18 What Must Change, What Must Never Change

Message from the Independent Outside Directors
24 Overview of Business Structure Reform and

Fiscal 2021 Plan

32 Overview by Business

38 Special Feature: Establishing a New Platform

Business
44 ANA Group ESG Management

46 ESG Management Promotion Cycle for

Simultaneous Creation of

Social Value and Economic Value

48 Dialogue with Stakeholders on ESG

50 Material Issues
68 Safety

72 Human Resources

76 The Power of People in the ANA Group

78 Risk Management

80 Compliance

82 Responsible Dialogue with Stakeholders

84 Corporate Governance

[3] Annual Report 2021, ANA HOLDINGS INC. https://www.ana.co.jp/group/en/investors/irdata/annual/

15
3-3. Integrated Report
3. Dataset
Some of the integrated reports have labels to indicate relevance between their initiatives
and the 17 SDGs goals.

These integrated reports are not only suitable as multi-topic documents, but also can be
seen as a corpus with labels of the 17 SDGs already annotated by corporate IRs.
SDGs 5
Sample Pages of Integrated Reports 4
[4] Annual Report 2021, ANA HOLDINGS INC. https://
www.ana.co.jp/group/en/investors/irdata/annual/
[5] United Nations, https://www.un.org/development/desa/
disabilities/about-us/sustainable-development-goals-sdgs-and-
disability.html

16
3-4. Data Collection Pipeline
3. Dataset
1. Identified the companies that publish integrated reports. (251 companies)

2. Downloaded files for the past five years from the websites of the companies. (754 files).

3. Selected integrated reports that had been labeled with SDGs Goal No. (250 files)
Data Collection Pipeline
PDF
Download
Company’s Website

(251 companies)
Integrated Reports

Before Selection

(754 files)
PDF
PDF
Select
PDF
PDF
Integrated Reports

After Selection

(250 files)
List
List of Companies

Publishing

Integrated Reports
Identify

17
3-5. Dataset Creation Pipeline
3. Dataset
1. Extracted source text and summary text manually from the selected PDF files.

2. Labelled summary text with Goal No. manually by adding Goal No. in the text file name.

3. Aligned the summary text with source text to indicate which part of source text is the
summary text for each Goal No.
Dataset Creation Pipeline
PDF
Source

Text
Goal No.
(Query)
Extract Label
Summary

Text
Summary

Text
Integrated

Report
Goal No.
(Query)
Alignment
Source

Text

Summary

Text
Summary

Text

18
3-6. Dataset Instance
3. Dataset
An example of the dataset created is shown below. For each sentence, we assigned "0" or
"1" to indicate whether or not it is related to each Goal No.
Example of Sentences and Labels
Sth Sentence
Goal No.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
1 Maintaining a sense of crisis , but never forgetting hope . 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2
The ANA Group ( ANA HOLDINGS INC. and its consolidated subsidiaries )
strives to create social value and economic value , leveraging the strengths
we have cultivated based on the spirit of our founders .
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
… … …
501
In addition , ANA and ANA Catering Service Co. , Ltd. received the Excellence
in Energy E
ffi
ciency Award ( S Class ) certi
fi
cation under the Act on the
Rational Use of Energy of the Ministry of Economy , Trade and Industry
( METI ) for the sixth consecutive year since this scheme was established .
0 0 0 0 0 0 1 0 1 0 0 1 1 1 1 0 1
502
To achieve net zero CO2 non-aircraft emissions by
fi
scal 2050 , we will work
to reduce energy consumption by
fi
scal 2030 , focusing on the use of
electricity and vehicle fuel ( gasoline and diesel fuel ) , which accounts for
the majority of our total emissions .
0 0 0 0 0 0 1 0 1 0 0 1 1 1 1 0 1
… … …
551
By using this summarized data going forward , we will strive to provide a
suitable and comfortable work environment .
0 0 0 0 1 0 0 1 0 1 0 0 0 0 0 0 0
552
In addition , with the cooperation of a third - party organization ( Caux Round
Table Japan * 1 ) , we have begun operating a grievance process system in
accordance with global standards .
0 0 0 0 1 0 0 1 0 1 0 0 0 0 0 0 0
… … …
[6] Annual Report 2021, ANA HOLDINGS INC. https://www.ana.co.jp/group/en/investors/irdata/annual/
6

19
3-7. Statistics of Dataset
3. Dataset
Characteristic of our dataset is a large number of sentences per document, compared to
DUC 2005-2007. Out dataset is an imbalanced dataset with a very small number of
summary sentences compared to the source documents.
Comparison with DUC 2005-2007
DUC
2005-2007
Our Dataset
(a) No. of Documents 3,968 250
(b) No. of Sentences in Total 102,820 173,664
(c)
Avg. No. of Sentences
per Document(=(b)/(a))
26 695
(d)
No. of Query per
Document
1 17
(e)
No. of Sentences in
Summary Text
1,961 96,910
(f)
No. of Sentences in
Summary Text per Query
(=(e)/(d))
1,961 5,701
Statistics by Goal No.
No. Sentences Ratio to Source
Source 173,664 1.00
Goal 1 1,493 0.01
Goal 2 1,338 0.01
Goal 3 8,891 0.05
Goal 4 3,932 0.02
Goal 5 6,201 0.04
Goal 6 2,849 0.02
Goal 7 6,938 0.04
Goal 8 10,217 0.06
Goal 9 8,102 0.05
Goal 10 4,522 0.03
Goal 11 6,078 0.03
Goal 12 9,676 0.06
Goal 13 8,761 0.05
Goal 14 2,985 0.02
Goal 15 4,482 0.03
Goal 16 3,815 0.02
Goal 17 6,630 0.04
Average 5,701 0.03

21
4-1. Proposed Approach (1)
4. Methods
We solve it as an extractive summarization task. We leverage the generic extractive
summarization method and apply it to a multi-classed model with One-vs-Rest strategy,
resulting in a query-focused extractive summarization method.
One-vs-Rest Strategy
…
Summary or Not
Goal No.1 or Not Goal No.2 or Not
Goal No.17 or Not

22
4-2. Proposed Methods (1) - Multi-BERTSum
4. Methods
BERTSum(Ext)*7 is a generic extractive summarization method proposed by Yang et al.
where BERT encoder and Transformer classifier incorporated. We apply BERTSum(Ext) to
our strategy, calling it “Multi-BERTSum”. For comparison, we also apply simple classifier.
Multi-BERTSum Architecture
Sentence
Input
+ Sentence + Sentence
+
…
+ + +
…
+ + +
…
Predict
Encoding Layer (BERT) #1~#17
Sentence Vectors
Classification Layer (Transformer / Simple) #1~#17
y [0, 1]
Sentence Vectors Sentence Vectors
y [0, 1] y [0, 1]
[7] Yang Liu and Mirella Lapata.: Text summarization with pretrained encoders. In Proceedings of the 2019 Conference on Empirical Methods in Natural
Language Processing and the 9th International Joint Conference on Natural Language Process- ing (EMNLP-IJCNLP), pp. 3730–3740. Association for
Computational Linguistics, Hong Kong, China (2019)

23
4-2. Proposed Approach (2)
4. Methods
For comparison, we apply another method where we solve it as a QA task.

Once we split the document into part, we can solve it as a QA task to select the span that
is the answer to the query.
Document Split for Span Selection
S E
S E
Document
S
E
S E
S E
S
E

24
4-3. Proposed Methods (2) - Multi-Span-Selector
4. Methods
For implementing QA task method, our proposed method is to implement Span Selector
instead of classification layer. Span Selector derives the start position and end position of
the span. BERT encoder and One-vs-Rest strategy both applied in this method.
Multi-Span Selector Architecture
Sentence
Input
+
…
+ + +
…
Predict
Encoding Layer (BERT) #1~#17
Sentence Vectors
Span Selector (Linear) #1~#17
Sentence Vectors Sentence Vectors
max(y [0, 1]) max(y [0, 1])
Start
Position
End
Position
Span

26
5-1. Unsupervised Baselines - LEAD & MMR
5. Baselines
One of the unsupervised baseline that we apply is LEAD method.

We have explored the optimal number of leading sentence through validation data.
LEAD method
Sentence
Input
+
…
+ + +
…
Predict
y = 1 y = 1 y = 0
No. of Leading Sentences
Another unsupervised baseline is Maximum Marginal Relevance (MMR).

MMR extracts summaries reducing redundancy while maintaining relevance to the query.
explore the length of the leading sentences through experiments. Based on
the experimental results, the length of the leading sentence with the highest
F1 score is passed to the model.
5.2 MMR
The other baseline we apply is maximum marginal relevance (MMR) [5], a
model that seeks to reduce redundancy while maintaining query relevance
through ranking documents and selecting appropriate sentences for text
summarization. The MMR is formulated as follows;
MMR
def
= arg max
Di2RS

Sim1(Di, Q) (1 ) max
Dj2S
Sim2(Di, Dj) (5.1)
D is the document collection, Q is the query, and R is the list of sentences

27
5-2. Sentence BERT
5. Baselines
One of the supervised baseline that we apply is Sentence BERT. Sentence BERT is a method
that uses pre-trained BERT to derive sentence embeddings.

We give Sentence BERT a pair of each sentence in the document and each of the 17 queries.
The similarity of sentence-query pair is derived to use as a criteria to extract summary.
Architecture of Sentence BERT (Ours)
Sentence
Input
+ + +
…
Sentence BERT
+ + +
…
Predict
y [0, 1] y [0, 1] y [0, 1]
Query#1~#17
Sentence
Query#1~#17
Sentence
Query#1~#17

28
5-3. BERT-Base
5. Baselines
As another supervised baseline method, we apply the architecture proposed by Zhu et al.
(BERT-Base), through which query and sentences are passed to BERT encoding layer and
then classification layer derives the scores to indicate whether it is a summary.
Architecture of BERT-Base(Ours)
Input
Encoding Layer (BERT)
Predict
+
+ + +
…
+ + +
…
+ + +
…
Classification Layer (Linear + Sigmoid)
Query#1~#17
Vectors
Sentence Vectors Vectors
Sentence Vectors Vectors
Sentence Vectors
y [0, 1] y [0, 1] y [0, 1]
Vectors
Sentence Vectors
Sentence Vectors
Sentence

30
Before evaluation, we trained the baseline models and proposed models with training
dataset and explored hyper-parameters for each models with validation dataset.
LEAD/MMR
Sentence BERT/
BERT-Base
Training
Hyper-
parameters
Explored
N/A
6-1. Training and Hyper-parameters Exploration
6. Experiments
• LEAD

Number of leading
sentences

• MMR

λ and number of
sentences to be
selected
• Sentence-BERT

Threshold on scores to
extract summary

• BERT-Base

Threshold on scores to
extract summary
• Sentence-BERT

Fine-tune Sentence BERT

• BERT-Base

Fine-tune BERT uncased
with output layer
Multi-BERTSum/
Multi-Span-Selector
• Multi-BERTSum

Fine-tune BERT uncased with
classification layer by query

• Multi-Span-Selector

Fine-tune BERT uncased with
span selector by query
• Multi-BERTSum

Threshold on scores to extract
summary and alignment in
integration

• Multi-Span-Selector

Alignment in integration
Baselines Proposed

31
6-2. Experimental Results
6. Experiments
We evaluate the performance of the baselines and the proposed methods with F1 scores.
Highest score among the baseline methods is 0.302, achieved by BERT-Base. Highest
score among the proposed methods is 0.389, achieved by Multi-BERTSum (Simple).
Baselines Proposed
Unsupervised Supervised Multi-BERTSum Multi-Span
Selector
Lead MMR Sentence BERT BERT-Base Transformer Simple
Goal1 0.015 0.017 0.042 0.078 0.156 0.177 0.142
Goal2 0.008 0.054 0.175 0.289 0.240 0.253 0.075
Goal3 0.058 0.087 0.279 0.237 0.299 0.364 0.271
Goal4 0.036 0.075 0.269 0.286 0.403 0.393 0.434
Goal5 0.065 0.115 0.309 0.360 0.362 0.421 0.370
Goal6 0.012 0.125 0.424 0.375 0.588 0.611 0.519
Goal7 0.094 0.113 0.328 0.375 0.467 0.455 0.433
Goal8 0.120 0.131 0.287 0.329 0.361 0.345 0.349
Goal9 0.079 0.097 0.317 0.365 0.409 0.367 0.309
Goal10 0.050 0.041 0.253 0.256 0.209 0.202 0.235
Goal11 0.044 0.072 0.253 0.277 0.317 0.300 0.244
Goal12 0.087 0.125 0.330 0.360 0.454 0.436 0.410
Goal13 0.148 0.116 0.402 0.427 0.523 0.542 0.470
Goal14 0.053 0.079 0.336 0.338 0.428 0.439 0.379
Goal15 0.045 0.095 0.349 0.375 0.467 0.499 0.395
Goal16 0.037 0.045 0.178 0.048 0.305 0.360 0.316
Goal17 0.082 0.084 0.173 0.182 0.205 0.197 0.164
Total 0.067 0.093 0.298 0.302 0.379 0.389 0.350

32
6-2. Analysis
6. Experiments
The proposed method outperforms the baseline method by 30% in performance.

Meanwhile, we identified three findings;
Findings Analysis
(1) The score varies widely from
query to query.
• The number of summaries in training
data differs widely from query to query.

• The annotation is not made in
standardized manner.
(2)
The performance of Transformer
classifier is not as good as simple
(Linear+Sigmoid) classifier.
Imbalanced dataset and the few
summaries existing collectively. It
means most of data returns 0 for all
the data and the few case returns 1 for
all the data.
(3)
The performance of Multi-Span
Selector is not as good as
Multi-BERTSum.
Our implementation only identifies
one span. However, multiple spans
needs to be selected in some cases.
This is caused because of the simple
document split approach.

34
7. Conclusions
We achieved original objectives as we built a new dataset of multi-topic documents and
the proposed method outperforms the baseline methods. We identified some future work;

1. improving dataset by increasing number of document and having more consistency

2. improving models to better understand long document.
Problem Proposal
(1)
(2)
Evaluation
Built a new dataset
of multi-topic
documents
Confirmed the
effectiveness of
application of generic
method to query-
focused method
through One-vs-Rest
strategy
No dataset exists
for extracting
topic-by-topic text
from multi-topic
documents.
No reasonable
method has been
established for
extracting topic-
specific text from
multi-topic
documents.
To build a new
dataset with a set of
multi-topic
documents, topics,
and text per topic.
To establish a
method for
extracting topic-
specific relevant
text from multi-topic
documents.
Future Work
• To increase the
number of document

• To improve dataset
consistency through
standardization
To understand long
document structure

36
Hyper-parameters Exploration Results
Appendix
LEAD MMR
Sentenc
eBERT
BERT-
Base
Multi-BERTSum Multi-
Span-
Selector
Transformer
Classi
fi
er
Simple
Classi
fi
er
L λ L T T T A T A A
Goal1 258 0.9 50 0.32 0.02 0.01 bottom 0.05 center bottom
Goal2 258 0.9 20 0.17 0.07 0.12 top 0.07 top bottom
Goal3 270 0.9 120 0.24 0.07 0.02 center 0.01 center bottom
Goal4 278 0.9 115 0.25 0.10 0.18 center 0.10 center center
Goal6 260 0.9 10 0.36 0.10 0.07 center 0.13 bottom center
Goal10 459 0.9 110 0.28 0.15 0.03 center 0.15 bottom center
Goal11 270 0.9 160 0.27 0.22 0.01 center 0.19 center top
Goal12 260 0.9 185 0.14 0.21 0.10 center 0.26 center top
Goal16 479 0.9 75 0.32 0.35 0.17 center 0.27 bottom top

Query-Focused Extractive Text Summarization for Multi-Topic Document

More Related Content

Similar to Query-Focused Extractive Text Summarization for Multi-Topic Document (20)

Recently uploaded (20)

Query-Focused Extractive Text Summarization for Multi-Topic Document