RatingBot: A Text Mining Based Rating Approach

RatingBot: A Text Mining Based Rating Approach
Presented by Wong Ming Jie & Zhu Cungen

CONTENT
● Motivation
● Credit Risk
● Measurement of Credit Risk
● Related Work
● Text Mining in Finance
● Machine Learning in credit Rating
● Research Question
● Data Context
● Methodology
● Discussion and Conclusion

MOTIVATION
● Financial institutions such as banks serve as
intermediaries between
o lenders (seeks short-term investing opportunities) and
o borrowers (seeks long-term financing)
● Facilitating development of products and services to both
groups of clienteles
● Their services and products need to achieve balance
between risk-taking and profit
o Maximise profits
o Do not want to exceed their risk appetite

MOTIVATION
● Risk is core of the financial industry
o Credit;
o Market; and
o Operational risks
● Credit risk is defined as the risk that a counterparty
(borrower) may become less likely to fulfill its obligation
in part or in full on the agreed upon date
● To manage credit risk, banks need to develop rating
models for capital calculation to
o Quantify possible expected and unexpected loss of the
counterparty
o Ascertain their creditworthiness

MOTIVATION
● Measurement of Credit Risk comprises:
o Probability Default (PD);
o Loss Given Default (LGD); and
o Exposure at Default (EAD)
● PD represents the creditworthiness of the counterparty
and is used for loan approval
● Estimating PD is based on applying statistical inference
from quantitative/categorical input variables using
financial information (historical number of defaults),
demographics (country) to determine whether a
counterparty will default
● This form of inference is limited to large homogenous
populations with substantial incidents of defaults

MOTIVATION
● For small populations which are heterogenous and low
default rates, the data used for statistical inference may
not be generalizable and is difficult to predict PD
● Most cases, banks move towards the approach of using
o Quantitative sources (including profitability)
o Qualitative sources (corporate disclosures, news, analyst calls)
● Characteristics of qualitative sources may contribute to
estimating credit-worthiness (improves PD estimation)
o Sentiment of corporate disclosures correlates with future earnings,
return on assets, stock returns and return volatility
o Strong financial performance may relate to creditworthiness which
could be overlooked

MOTIVATION
● Qualitative sources such as annual reports provide more
structured, objective and forward-looking information
and are publicly available
● More predictive power since it includes consideration of
quantitatively-difficult-to-quantify factors (future
strategy, performance) which may be relevant to
creditworthiness
● Unlike quantitative sources, qualitative sources require
experts to extract and interpret meaningful information
● Manual interpreting and coding exposed to subjectivity,
errors and inefficiencies which affects PD estimation

RELATED WORK – TEXT MINING
● Text mining has been studied extensively in finance
● Content analysis methods required to extract relevant
information from unstructured form of annual reports for
PD estimation
o Bag-of-words
o Document narrative extraction
● Unlike document narrative extraction, bag-of-words is
more flexible and assumes word (or sentence) order is
irrelevant for document representation
● Bag-of-words method can be implemented with either
o Term-weighting approach
o Machine-learning approach

● Term-weighing structure a document in different terms
and assign a weight to each of them based on the level of
representation of importance to derive a sentiment score
● Predefined sentiment dictionaries can use to determine
the sentiment score of the document
o Harvard GI word list (designed for general uses)
o Longhran and McDonald (designed for financial uses and credit
rating applications)
● Longhran and MacDonald is extracted from a set of 10-K
SEC filings
● A 10-K filing is an annual report required by the U.S.
Securities and Exchange Commission (SEC) from a
company that provides a comprehensive summary of its
financial performance (within 60 days after its fiscal year)

● Machine-learning require a set of manually labelled words
or sentences as inputs which are then used to train
classification models
● These models will then be used to label new words or
sentences in the document
● This method is costlier to implement and requires finance
experts who are native speakers of the language of the
words
● A new approach is to identify themes within a corpus of
documents by identifying latent (hidden) topics that
represent the document using probabilistic Latent
Dirichlet Allocation (LDA) and then Gibbs Sampling
(MCMC) to derive the topics and approximate the
distribution parameters

RELATED WORK – CREDIT RATING
● Most machine learning in credit rating uses binary
classification (default/non-) to predict the default of a
counterparty
● Most common methods in literature are:
o Neural Networks (NN)
o Support Vector Machine (SVM)
o Linear and Logistic Regression (LR)
o Decision Trees (DT)
o Fuzzy Logic (FL)
o Genetic Programming (GP)
o Discriminant Analysis (DA)
● Most machine learning techniques proposed specifically
use only quantitative information as model variables or
inputs

RESEARCH QUESTION
● Research Question: Can we combine the two approaches?
o Term-weighting, topic models and sentiment analysis to identify
and represent qualitative information in annual reports in a
structured way -> inputs
o Apply classification approaches using machine-learning on these
inputs to predict the credit rating of a company
● Proposition: Based on the idea to use annual report of a
company as inputs and then apply text mining and
classification approaches to automatically and objectively
derive its credit rating
● Credit rating as an indication of perceived risk can be used
to investigate PD by banks and financial institutions

DATA CONTEXT
● Dataset 1:
o Downloaded from 10 available 10-K filings from SEC EDGAR
database from 2009 to 2016
o Avoid the influence of financial crisis between 2007 and 2008
o 34,972 10-K filings were joined with 17,622 Standard & Poor
(S&P)’s ratings in 2016 (9197 unique companies)
o Then the new joined dataset was constructed by matching the SEC
statements with the latest S&P ratings after 2009 so that the
reports were issued at most 9 months before the credit rating dates
o This resulted in 1716 data points for 1351 companies
o After removing 228 data points which had a defaulted rating given
(intention is to study credit rating class), the final sample has 1488
data points

DATA CONTEXT
● Dataset 2:
o Provided by a major European bank
o Consists of annual reports and internal credit ratings of companies
between 2009 and 2016
o Contains non-standardized general reports with some reported
partly in 10-K filings format and some not written in English
o 10,435 total annual statements
o After removing read-protected and non-English reports and reports
from defaulted and non-rated companies, there were only 5508
annual statements for consideration
● Dataset 1 was used because it allows for replication of the
results and 10-K filings are commonly used sources in
literature
● Dataset 2 was used because it represents real-world
scenario with internal ratings

DATA CONTEXT
● Document representation:
o Loughran and MacDonald dictionary was used to determine the
sentiment-weighted analysis for term-weighting approach
o Dictionary was derived from 10-K SEC findings (suitable for credit
rating context)
o Terms appearing in less than 5% of the documents (dataset 1 and
2) were removed based on the chi-squared statistics so that only
the most important terms remained
● Robustness check was also made to ensure that the
annual reports from dataset 1 and 2 are relevant to the
credit rating of a company
● Datasets 1 and 2 were compared across the absolute
frequency of terms and the sentiment-weighted
frequency of terms after using the dictionary

Methodology
 Data Context
o Feature: A company rates its counterparty 𝑖 by making annual
report 𝑎𝑖, 𝑖 ∈ 1,2, … . , 𝑚 : textual or qualitative data 𝑿
o Label: Counterparty 𝑖 get credit rating 𝑐𝑖,𝑐𝑖 ∈ 1,2, … . , 𝑛
o What we can do?
 Derive the relationship between textural report and rating
 Predict the rating for new given counterparties or annual report
 Cannot directly compute textual data
o Information transformation
 Preprocess m textual documents 𝑎𝑖 𝑖=1
𝑚
 Represent report with new computational form: quantitative
11/14/2018 18
Transformation:
1. Text preprocessing
2. Document representation
Qualitative
Form
Quantitative
Form

Methodology
 Text Preprocessing
o Step 1: clean pictures, HTML tags, formatting etc. raw text
 Only raw text is left
o Step 2: transform every letter into low-case formation
 For instance, ‘Capital’ become ‘capital’
o Step 3: remove numbers, special characters and punctuation
o Step 4: tokenize sentence into words by removing spaces
o Step 5: remove stop words, such as ‘or’, ‘and’, and ‘the’ meaningless
o Step 6: stem terms (words) back to root form
11/14/2018 19
Transformation:
Qualitative
Form
Quantitative
Form

Methodology
 Text Preprocessing
o Step 6: stem terms (words) back to root form
 Stemming: remove ‘s’ from ‘risks’ or ‘ed’ from ‘declined’
• Group terms into same semantic root
• However, not applicable to terms like ‘goes’ and ‘went’
 Alternatives: lemmatization
• Can overcome limitation of stemming with higher precise
• Require complex computation, unwieldy in practice
• Still choose stemming algorithm
o Result of text preprocessing
 A list of stemmed terms
11/14/2018 20
Transformation:
Qualitative
Form
Quantitative
Form

Methodology
 Document representation
o In the list of stemmed terms
 Many terms occur more than once
 Compress them and make interpretable
• Term-weighting: weight every unique term
11/14/2018 21
Transformation:
Qualitative
Form
Quantitative
Form

Methodology
 Term-weighting
o Binary frequency: whether occur, dummy
 Too Naive to consider true frequency
o Absolute or relative frequency: How many times a term occurs in 𝑎𝑖
 Ignore the distribution of a term over different 𝑎𝑖
 E.g. if a term has same distribution cross 𝑎𝑖, it might be useless to
predict rating score
o Term frequency-inverted document frequency (tf-idf)
 Decrease high frequency while increase low frequency (smoothing)
o Ignore the sentiment of words
 In financial field, sentiment of report has significant relationship with
company performance
11/14/2018 22

Methodology
 Term-weighting
o Ignore the sentiment of words
o Sentiment-weighted frequency
11/14/2018 23
What if 𝑆𝑒𝑛𝑖.𝑙 = 𝑣𝑖,𝑙 × 𝑆𝑒𝑛 𝑡𝑒𝑟𝑚𝑙 ?

Methodology
 Reduce terms to avoid overfitting and computation complexity
• Term selection: remove terms with low frequency
• Term selection: remove terms with low explanatory power by chi-
square test
• Term extraction: topic model---LDA
11/14/2018 24
Transformation:
Qualitative
Form
Quantitative
Form
input

Methodology
 LDA
o Sample one document 𝑎𝑖 with probability 𝑝 𝑎𝑖
o Sample 𝑑𝑖𝑟𝑖𝑐ℎ𝑙𝑒𝑡 𝑑𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛 𝜶 to generate topic distribution 𝜽𝑖 for 𝑎𝑖
o Sample 𝜽𝑖 for 𝑎𝑖 to get topic 𝑧𝑖,𝑗
o Topics are latent and unknown
o Sample 𝑑𝑖𝑟𝑖𝑐ℎ𝑙𝑒𝑡 𝑑𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛 𝜷 to generate word distribution 𝝋 𝑧 𝑖,𝑗
for
topic 𝑧𝑖,𝑗
o Sample 𝝋 𝑧 𝑖,𝑗
to generate final word 𝑤𝑖,𝑗
11/14/2018 25
Hyperparameter: parameter of parameter

Methodology
 Reduce terms to avoid overfitting and computation complexity
• Term selection: remove terms with low frequency
• Term selection: remove terms with low explanatory power by chi-
square test
• Term extraction: topic model---LDA
o Results: 𝑝 𝑡𝑒𝑟𝑚𝑙 𝑙=1
𝑝
|𝑡𝑜𝑝𝑖𝑐ℎ ,𝑡𝑜𝑝𝑖𝑐𝑖,ℎ = 𝑝 𝑡𝑜𝑝𝑖𝑐ℎ|𝑎𝑖
11/14/2018 26
Transformation:
Qualitative
Form
Quantitative
Form
input

Methodology
o Results: 𝑝 𝑡𝑒𝑟𝑚𝑙 𝑙=1
𝑝
 Interpretation: 𝑡𝑜𝑝𝑖𝑐𝑖,ℎ measures the significance of topic term h
just as the role of weight of word term l
o Final date set: 𝑝 𝑡𝑒𝑟𝑚𝑙 𝑙=1
𝑝
 𝑤𝑖,𝑙|𝑤𝑖,𝑙 𝑖𝑛𝑐𝑙𝑢𝑑𝑒𝑠 𝑤𝑒𝑖𝑔ℎ𝑡𝑠 𝑆𝑒𝑛𝑖,𝑙 𝑎𝑛𝑑 𝑡𝑜𝑝𝑖𝑐𝑖,ℎ 𝑙=1
𝑝
, 𝑐𝑖
𝑖=1
𝑚
 𝑎𝑖, 𝑐𝑖 𝑖=1
𝑚
11/14/2018 27
Transformation:
Qualitative
Form 𝑎𝑖
Quantitative
Form 𝑤𝑖,𝑙

Methodology
 Classification
o Support Vector Machine (SVM): what about benchmark?
 Aim: maximum-margin hyperplane
 the distance between the hyperplane and the nearest point 𝒘
from either group is maximized.
 Reduce overfitting by
L2-norm regularization
 Stable
 Lack of interpretability
11/14/2018 29

Methodology
 Classification
o Neural Networks (NN)
 Three layers: input, hidden (a or multiple) and output layer
 Before hidden----Aggregation function: 𝑙=1
𝑝
𝛽𝑙 𝑤𝑖,𝑙
 In the hidden---- Activation function: g 𝑥 =
1, 𝑥 > 0
0, 𝑥 ≤ 0
for each
left layers
 Label rule:𝑐 𝑎 𝑛𝑒𝑤
=
1, g 𝜷𝒘 𝑛𝑒𝑤 > 0
2, 𝑒𝑙𝑠𝑒
 Train model:
 backpropagation to update prediction errors
11/14/2018 30

Methodology
 Classification
o Decision Tree (DT)
 Aim: 𝑝 𝑐𝑖 = 𝑘|𝑎𝑖 ∝ 𝑝 𝑐𝑖 = 𝑘 𝑝 𝑎𝑖|𝑐𝑖 = 𝑘
 spilling criterion: chi-squared, Gini coefficient or entropy-based
 Overfitting
 Good interpretability
11/14/2018 31

Methodology
 Classification
o Logistic Regression (LR)
 Popular methods
 requires uncorrelated independent variables: exogeneity
11/14/2018 32

Methodology
 Classification
o Discriminant Analysis (DA)
 Prior probability: 𝑤𝑖|𝑐𝑖 = 𝑘~𝑁 𝝁 𝑘, 𝞢
 Estimate 𝝁 𝑘, 𝞢
 Rule: 𝑐 𝑎 𝑛𝑒𝑤
=
1, log
𝑝 𝑐 𝑖=1|𝑎 𝑛𝑒𝑤
> 1
2, 𝑒𝑙𝑠𝑒
 log
is linear formation among 𝑤𝑖,𝑙
 Not applicable to non-linear case
11/14/2018 33

Methodology
 Classification
o Supervised Topic Models (STM)
 Define topic distribution over document probability: 𝜃 𝑎 𝑖
|𝜗, ~𝐷𝑖𝑟𝑖𝑐ℎ𝑙𝑒𝑡 𝜗
 Sample topic 𝑧 𝑎 𝑖,𝑙| 𝜃 𝑎 𝑖
, ~𝑀𝑢𝑙𝑡𝑖𝑛𝑜𝑚𝑖𝑎𝑙 𝐷𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛 𝜃 𝑎 𝑖
 Sample term 𝑡𝑒𝑟𝑚 𝑎 𝑖,𝑙|𝑧 𝑎 𝑖,𝑙, 𝜷, ~𝑀𝑢𝑙𝑡 𝛽𝑧 𝑎𝑖
 Sample the response variable 𝑐 𝑎 𝑖
|𝑧 𝑎 𝑖,𝑙, 𝜹, 𝜎2
, ~𝑁 𝜹 𝑇
𝑧 𝑎 𝑙
, 𝜎2 by linear
regression
• 𝜹= 𝛿1, . . , 𝛿ℎ, . . , 𝛿 𝐻
𝑇, 𝑧 𝑎 𝑙
=
1
𝑝 𝑙=1
𝑝
𝑧 𝑎 𝑖,𝑙
• Regress label on topics
• Parameters are estimated Expectation-Maximization (EM)
 Rule: 𝑐 𝑎 𝑛𝑒𝑤
= 𝑟𝑜𝑢𝑛𝑑 𝜹 𝑇
𝐸 𝑧 𝑎 𝑛𝑒𝑤
|𝜗, 𝜷, 𝑤
11/14/2018 34

Model Development and Evaluation
 Dependent Variables
o 19 ratings or classes: 𝑐𝑖
o Rating bands
 𝑏𝑎𝑛𝑑𝑖 =
1 𝑖𝑓𝑐𝑖 ≤ 7
2 𝑖𝑓 7 < 𝑐𝑖 ≤ 10
3 𝑖𝑓 10 < 𝑐𝑖 ≤ 13
4 𝑖𝑓𝑐𝑖 > 13
 Less granular
o Binary dependent variable: 𝑏𝑖𝑛𝑖
 𝑏𝑖𝑛𝑖 =
investment grade company 𝑖𝑓𝑐𝑖 ≤ 10
𝑠𝑝𝑒𝑐𝑢𝑙𝑎𝑡𝑖𝑣𝑒 𝑔𝑟𝑎𝑑𝑒 𝑐𝑜𝑚𝑝𝑎𝑛𝑦 𝑖𝑓 > 10
 Comparable
11/14/2018 35
Unbalanced
Unbalanced
Unbalanced

 Dependent Variables
o In addition, intuitively, less classes, more accurate
 𝑝 𝑙𝑜𝑐𝑎𝑡𝑒𝑑 𝑖𝑛 𝑏𝑖𝑔 𝑐𝑦𝑐𝑙𝑒 > 𝑝 𝑙𝑜𝑐𝑎𝑡𝑒𝑑 𝑖𝑛 𝑠𝑢𝑏 − 𝑐𝑦𝑐𝑙𝑒
 Accuracy: Binary ratings> Rating bands > 19 ratings
11/14/2018 36

 Test Set Performance
o Train: test = 3:1
o Every presented classifier performs best under responding column
o higher than random accuracy 1/19=5.2%, ¼=25%, ½=50%
o SVM best for dataset2 while DT and NN best for dataset1
11/14/2018 37

o 𝑎𝑐𝑐𝑢𝑟𝑎𝑐𝑦 =
𝑇𝑃+𝑇𝑁
𝑇𝑃+𝑇𝑁+𝐹𝑃+𝐹𝑁
: overall accuracy
o 𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 =
𝑇𝑃
𝑇𝑃+𝐹𝑃
: the proportion of correctly classified points within a true
class
o 𝑟𝑒𝑐𝑎𝑙𝑙 =
𝑇𝑃
𝑇𝑃+𝐹𝑁
: the proportion of correctly classified points within a
classified or generated class
11/14/2018 38

o 𝑎𝑐𝑐 =
𝑇𝑃+𝑇𝑁
𝑇𝑃+𝑇𝑁+𝐹𝑃+𝐹𝑁
; 𝑝𝑟𝑒 =
𝑇𝑃
𝑇𝑃+𝐹𝑃
; 𝑟𝑒𝑐𝑎𝑙𝑙 =
𝑇𝑃
𝑇𝑃+𝐹𝑁
o Thought: small class tends to have low precision
o Thought: Some classifiers can sacrifice the precision and recall of
small class in order to pursue a high accuracy or a
or precision of big class.
o Thought: understand class distribution
Before choosing and develop classifier.
11/14/2018 39

 Model Results
11/14/2018 40

 Model Results
11/14/2018 41

 Model Results
o How to interpret the result?
 Compare accuracy between sentiment-using and -unusing
11/14/2018 42

 Model Results
o The effect of considering sentiment of the terms depends on the classifier,
the dataset, and the type of dependent variable.
o NN, DT, and SVM work best
o STM works bad, which is surprising
 Good interpretability
 Potential reason given by this paper
 Should use linear regression (LR) rather than generalized linear model(GLM)
 My thought:
 GLM may work better since it has less assumption than LR
 But why not good? In the generating process of STM, it assume response
variable (class) is produced by or sampled from topics. However, the true
ratings may not be produced in this way.
 We only know sentiment has relationship with rating, but that dose not
means sentiment produce rating
11/14/2018 43

Conclusion
 RatingBot to predict rating score
 Limitations
o The credibility of rating
 Considering time component to capture the manipulation of companies to
annual report to get higher rating
o Try other classifiers, such as deep learning
o Including other text sources, such as news, social media content
11/14/2018 44

RatingBot: A Text Mining Based Rating Approach

More Related Content

Similar to RatingBot: A Text Mining Based Rating Approach (20)

More from Nauman Shahid (8)

Recently uploaded (20)

RatingBot: A Text Mining Based Rating Approach

Editor's Notes