SlideShare a Scribd company logo
RatingBot: A Text Mining Based Rating Approach
Presented by Wong Ming Jie & Zhu Cungen
CONTENT
● Motivation
● Credit Risk
● Measurement of Credit Risk
● Related Work
● Text Mining in Finance
● Machine Learning in credit Rating
● Research Question
● Data Context
● Methodology
● Discussion and Conclusion
MOTIVATION
● Financial institutions such as banks serve as
intermediaries between
o lenders (seeks short-term investing opportunities) and
o borrowers (seeks long-term financing)
● Facilitating development of products and services to both
groups of clienteles
● Their services and products need to achieve balance
between risk-taking and profit
o Maximise profits
o Do not want to exceed their risk appetite
MOTIVATION
● Risk is core of the financial industry
o Credit;
o Market; and
o Operational risks
● Credit risk is defined as the risk that a counterparty
(borrower) may become less likely to fulfill its obligation
in part or in full on the agreed upon date
● To manage credit risk, banks need to develop rating
models for capital calculation to
o Quantify possible expected and unexpected loss of the
counterparty
o Ascertain their creditworthiness
MOTIVATION
● Measurement of Credit Risk comprises:
o Probability Default (PD);
o Loss Given Default (LGD); and
o Exposure at Default (EAD)
● PD represents the creditworthiness of the counterparty
and is used for loan approval
● Estimating PD is based on applying statistical inference
from quantitative/categorical input variables using
financial information (historical number of defaults),
demographics (country) to determine whether a
counterparty will default
● This form of inference is limited to large homogenous
populations with substantial incidents of defaults
MOTIVATION
● For small populations which are heterogenous and low
default rates, the data used for statistical inference may
not be generalizable and is difficult to predict PD
● Most cases, banks move towards the approach of using
o Quantitative sources (including profitability)
o Qualitative sources (corporate disclosures, news, analyst calls)
● Characteristics of qualitative sources may contribute to
estimating credit-worthiness (improves PD estimation)
o Sentiment of corporate disclosures correlates with future earnings,
return on assets, stock returns and return volatility
o Strong financial performance may relate to creditworthiness which
could be overlooked
MOTIVATION
● Qualitative sources such as annual reports provide more
structured, objective and forward-looking information
and are publicly available
● More predictive power since it includes consideration of
quantitatively-difficult-to-quantify factors (future
strategy, performance) which may be relevant to
creditworthiness
● Unlike quantitative sources, qualitative sources require
experts to extract and interpret meaningful information
● Manual interpreting and coding exposed to subjectivity,
errors and inefficiencies which affects PD estimation
RELATED WORK – TEXT MINING
● Text mining has been studied extensively in finance
● Content analysis methods required to extract relevant
information from unstructured form of annual reports for
PD estimation
o Bag-of-words
o Document narrative extraction
● Unlike document narrative extraction, bag-of-words is
more flexible and assumes word (or sentence) order is
irrelevant for document representation
● Bag-of-words method can be implemented with either
o Term-weighting approach
o Machine-learning approach
RELATED WORK – TEXT MINING
● Text mining has been studied extensively in finance
● Content analysis methods required to extract relevant
information from unstructured form of annual reports for
PD estimation
o Bag-of-words
o Document narrative extraction
● Unlike document narrative extraction, bag-of-words is
more flexible and assumes word (or sentence) order is
irrelevant for document representation
● Bag-of-words method can be implemented with either
o Term-weighting approach
o Machine-learning approach
RELATED WORK – TEXT MINING
● Term-weighing structure a document in different terms
and assign a weight to each of them based on the level of
representation of importance to derive a sentiment score
● Predefined sentiment dictionaries can use to determine
the sentiment score of the document
o Harvard GI word list (designed for general uses)
o Longhran and McDonald (designed for financial uses and credit
rating applications)
● Longhran and MacDonald is extracted from a set of 10-K
SEC filings
● A 10-K filing is an annual report required by the U.S.
Securities and Exchange Commission (SEC) from a
company that provides a comprehensive summary of its
financial performance (within 60 days after its fiscal year)
RELATED WORK – TEXT MINING
● Machine-learning require a set of manually labelled words
or sentences as inputs which are then used to train
classification models
● These models will then be used to label new words or
sentences in the document
● This method is costlier to implement and requires finance
experts who are native speakers of the language of the
words
● A new approach is to identify themes within a corpus of
documents by identifying latent (hidden) topics that
represent the document using probabilistic Latent
Dirichlet Allocation (LDA) and then Gibbs Sampling
(MCMC) to derive the topics and approximate the
distribution parameters
RELATED WORK – CREDIT RATING
● Most machine learning in credit rating uses binary
classification (default/non-) to predict the default of a
counterparty
● Most common methods in literature are:
o Neural Networks (NN)
o Support Vector Machine (SVM)
o Linear and Logistic Regression (LR)
o Decision Trees (DT)
o Fuzzy Logic (FL)
o Genetic Programming (GP)
o Discriminant Analysis (DA)
● Most machine learning techniques proposed specifically
use only quantitative information as model variables or
inputs
RESEARCH QUESTION
● Research Question: Can we combine the two approaches?
o Term-weighting, topic models and sentiment analysis to identify
and represent qualitative information in annual reports in a
structured way -> inputs
o Apply classification approaches using machine-learning on these
inputs to predict the credit rating of a company
● Proposition: Based on the idea to use annual report of a
company as inputs and then apply text mining and
classification approaches to automatically and objectively
derive its credit rating
● Credit rating as an indication of perceived risk can be used
to investigate PD by banks and financial institutions
DATA CONTEXT
● Dataset 1:
o Downloaded from 10 available 10-K filings from SEC EDGAR
database from 2009 to 2016
o Avoid the influence of financial crisis between 2007 and 2008
o 34,972 10-K filings were joined with 17,622 Standard & Poor
(S&P)’s ratings in 2016 (9197 unique companies)
o Then the new joined dataset was constructed by matching the SEC
statements with the latest S&P ratings after 2009 so that the
reports were issued at most 9 months before the credit rating dates
o This resulted in 1716 data points for 1351 companies
o After removing 228 data points which had a defaulted rating given
(intention is to study credit rating class), the final sample has 1488
data points
DATA CONTEXT
● Dataset 2:
o Provided by a major European bank
o Consists of annual reports and internal credit ratings of companies
between 2009 and 2016
o Contains non-standardized general reports with some reported
partly in 10-K filings format and some not written in English
o 10,435 total annual statements
o After removing read-protected and non-English reports and reports
from defaulted and non-rated companies, there were only 5508
annual statements for consideration
● Dataset 1 was used because it allows for replication of the
results and 10-K filings are commonly used sources in
literature
● Dataset 2 was used because it represents real-world
scenario with internal ratings
DATA CONTEXT
● Document representation:
o Loughran and MacDonald dictionary was used to determine the
sentiment-weighted analysis for term-weighting approach
o Dictionary was derived from 10-K SEC findings (suitable for credit
rating context)
o Terms appearing in less than 5% of the documents (dataset 1 and
2) were removed based on the chi-squared statistics so that only
the most important terms remained
● Robustness check was also made to ensure that the
annual reports from dataset 1 and 2 are relevant to the
credit rating of a company
● Datasets 1 and 2 were compared across the absolute
frequency of terms and the sentiment-weighted
frequency of terms after using the dictionary
DATA CONTEXT
Methodology
 Data Context
o Feature: A company rates its counterparty 𝑖 by making annual
report 𝑎𝑖, 𝑖 ∈ 1,2, … . , 𝑚 : textual or qualitative data 𝑿
o Label: Counterparty 𝑖 get credit rating 𝑐𝑖,𝑐𝑖 ∈ 1,2, … . , 𝑛
o What we can do?
 Derive the relationship between textural report and rating
 Predict the rating for new given counterparties or annual report
 Cannot directly compute textual data
o Information transformation
 Preprocess m textual documents 𝑎𝑖 𝑖=1
𝑚
 Represent report with new computational form: quantitative
11/14/2018 18
Transformation:
1. Text preprocessing
2. Document representation
Qualitative
Form
Quantitative
Form
Methodology
 Text Preprocessing
o Step 1: clean pictures, HTML tags, formatting etc. raw text
 Only raw text is left
o Step 2: transform every letter into low-case formation
 For instance, ‘Capital’ become ‘capital’
o Step 3: remove numbers, special characters and punctuation
o Step 4: tokenize sentence into words by removing spaces
o Step 5: remove stop words, such as ‘or’, ‘and’, and ‘the’ meaningless
o Step 6: stem terms (words) back to root form
11/14/2018 19
Transformation:
1. Text preprocessing
2. Document representation
Qualitative
Form
Quantitative
Form
Methodology
 Text Preprocessing
o Step 6: stem terms (words) back to root form
 Stemming: remove ‘s’ from ‘risks’ or ‘ed’ from ‘declined’
• Group terms into same semantic root
• However, not applicable to terms like ‘goes’ and ‘went’
 Alternatives: lemmatization
• Can overcome limitation of stemming with higher precise
• Require complex computation, unwieldy in practice
• Still choose stemming algorithm
o Result of text preprocessing
 A list of stemmed terms
11/14/2018 20
Transformation:
1. Text preprocessing
2. Document representation
Qualitative
Form
Quantitative
Form
Methodology
 Document representation
o In the list of stemmed terms
 Many terms occur more than once
 Compress them and make interpretable
• Term-weighting: weight every unique term
11/14/2018 21
Transformation:
1. Text preprocessing
2. Document representation
Qualitative
Form
Quantitative
Form
Methodology
 Term-weighting
o Binary frequency: whether occur, dummy
 Too Naive to consider true frequency
o Absolute or relative frequency: How many times a term occurs in 𝑎𝑖
 Ignore the distribution of a term over different 𝑎𝑖
 E.g. if a term has same distribution cross 𝑎𝑖, it might be useless to
predict rating score
o Term frequency-inverted document frequency (tf-idf)
 Decrease high frequency while increase low frequency (smoothing)
o Ignore the sentiment of words
 In financial field, sentiment of report has significant relationship with
company performance
11/14/2018 22
Methodology
 Term-weighting
o Ignore the sentiment of words
o Sentiment-weighted frequency
11/14/2018 23
What if 𝑆𝑒𝑛𝑖.𝑙 = 𝑣𝑖,𝑙 × 𝑆𝑒𝑛 𝑡𝑒𝑟𝑚𝑙 ?
Methodology
 Document representation
o In the list of stemmed terms
 Many terms occur more than once
 Compress them and make interpretable
• Term-weighting: weight every unique term
 Reduce terms to avoid overfitting and computation complexity
• Term selection: remove terms with low frequency
• Term selection: remove terms with low explanatory power by chi-
square test
• Term extraction: topic model---LDA
11/14/2018 24
Transformation:
1. Text preprocessing
2. Document representation
Qualitative
Form
Quantitative
Form
input
Methodology
 LDA
o Sample one document 𝑎𝑖 with probability 𝑝 𝑎𝑖
o Sample 𝑑𝑖𝑟𝑖𝑐ℎ𝑙𝑒𝑡 𝑑𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛 𝜶 to generate topic distribution 𝜽𝑖 for 𝑎𝑖
o Sample 𝜽𝑖 for 𝑎𝑖 to get topic 𝑧𝑖,𝑗
o Topics are latent and unknown
o Sample 𝑑𝑖𝑟𝑖𝑐ℎ𝑙𝑒𝑡 𝑑𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛 𝜷 to generate word distribution 𝝋 𝑧 𝑖,𝑗
for
topic 𝑧𝑖,𝑗
o Sample 𝝋 𝑧 𝑖,𝑗
to generate final word 𝑤𝑖,𝑗
11/14/2018 25
Hyperparameter: parameter of parameter
Methodology
 Document representation
o In the list of stemmed terms
 Many terms occur more than once
 Compress them and make interpretable
• Term-weighting: weight every unique term
 Reduce terms to avoid overfitting and computation complexity
• Term selection: remove terms with low frequency
• Term selection: remove terms with low explanatory power by chi-
square test
• Term extraction: topic model---LDA
o Results: 𝑝 𝑡𝑒𝑟𝑚𝑙 𝑙=1
𝑝
|𝑡𝑜𝑝𝑖𝑐ℎ ,𝑡𝑜𝑝𝑖𝑐𝑖,ℎ = 𝑝 𝑡𝑜𝑝𝑖𝑐ℎ|𝑎𝑖
11/14/2018 26
Transformation:
1. Text preprocessing
2. Document representation
Qualitative
Form
Quantitative
Form
input
Methodology
 Document representation
o Results: 𝑝 𝑡𝑒𝑟𝑚𝑙 𝑙=1
𝑝
|𝑡𝑜𝑝𝑖𝑐ℎ ,𝑡𝑜𝑝𝑖𝑐𝑖,ℎ = 𝑝 𝑡𝑜𝑝𝑖𝑐ℎ|𝑎𝑖
 Interpretation: 𝑡𝑜𝑝𝑖𝑐𝑖,ℎ measures the significance of topic term h
just as the role of weight of word term l
o Final date set: 𝑝 𝑡𝑒𝑟𝑚𝑙 𝑙=1
𝑝
|𝑡𝑜𝑝𝑖𝑐ℎ ,𝑡𝑜𝑝𝑖𝑐𝑖,ℎ = 𝑝 𝑡𝑜𝑝𝑖𝑐ℎ|𝑎𝑖
 𝑤𝑖,𝑙|𝑤𝑖,𝑙 𝑖𝑛𝑐𝑙𝑢𝑑𝑒𝑠 𝑤𝑒𝑖𝑔ℎ𝑡𝑠 𝑆𝑒𝑛𝑖,𝑙 𝑎𝑛𝑑 𝑡𝑜𝑝𝑖𝑐𝑖,ℎ 𝑙=1
𝑝
, 𝑐𝑖
𝑖=1
𝑚
 𝑎𝑖, 𝑐𝑖 𝑖=1
𝑚
11/14/2018 27
Transformation:
1. Text preprocessing
2. Document representation
Qualitative
Form 𝑎𝑖
Quantitative
Form 𝑤𝑖,𝑙
Methodology
 Classification
o Naïve Bayes (NB): benchmark
 Aim: 𝑝 𝑐𝑖 = 𝑘|𝑎𝑖 ∝ 𝑝 𝑐𝑖 = 𝑘 𝑝 𝑎𝑖|𝑐𝑖 = 𝑘
 Known: 𝑝 𝑐𝑖 = 𝑘 = 𝑖=1
𝑚
𝐼 𝑐𝑖 = 𝑘 /𝑚
 MLE: 𝑝 𝑎𝑖|𝑐𝑖 = 𝑘 = 𝑙=1
𝑝
𝑝 𝑤𝑖,𝑙|𝑐𝑖 = 𝑘 = 𝑙=1
𝑝
𝑝 𝑤𝑖,𝑙|𝑐𝑖 =
11/14/2018 28
Methodology
 Classification
o Support Vector Machine (SVM): what about benchmark?
 Aim: maximum-margin hyperplane
 the distance between the hyperplane and the nearest point 𝒘
from either group is maximized.
 Reduce overfitting by
L2-norm regularization
 Stable
 Lack of interpretability
11/14/2018 29
Methodology
 Classification
o Neural Networks (NN)
 Three layers: input, hidden (a or multiple) and output layer
 Before hidden----Aggregation function: 𝑙=1
𝑝
𝛽𝑙 𝑤𝑖,𝑙
 In the hidden---- Activation function: g 𝑥 =
1, 𝑥 > 0
0, 𝑥 ≤ 0
for each
left layers
 Label rule:𝑐 𝑎 𝑛𝑒𝑤
=
1, g 𝜷𝒘 𝑛𝑒𝑤 > 0
2, 𝑒𝑙𝑠𝑒
 Train model:
 backpropagation to update prediction errors
11/14/2018 30
Methodology
 Classification
o Decision Tree (DT)
 Aim: 𝑝 𝑐𝑖 = 𝑘|𝑎𝑖 ∝ 𝑝 𝑐𝑖 = 𝑘 𝑝 𝑎𝑖|𝑐𝑖 = 𝑘
 spilling criterion: chi-squared, Gini coefficient or entropy-based
 Overfitting
 Good interpretability
11/14/2018 31
Methodology
 Classification
o Logistic Regression (LR)
 Popular methods
 requires uncorrelated independent variables: exogeneity
11/14/2018 32
Methodology
 Classification
o Discriminant Analysis (DA)
 Prior probability: 𝑤𝑖|𝑐𝑖 = 𝑘~𝑁 𝝁 𝑘, 𝞢
 Estimate 𝝁 𝑘, 𝞢
 Rule: 𝑐 𝑎 𝑛𝑒𝑤
=
1, log
𝑝 𝑐 𝑖=1|𝑎 𝑛𝑒𝑤
𝑝 𝑐 𝑖=2|𝑎 𝑛𝑒𝑤
> 1
2, 𝑒𝑙𝑠𝑒
 log
𝑝 𝑐 𝑖=1|𝑎 𝑛𝑒𝑤
𝑝 𝑐 𝑖=2|𝑎 𝑛𝑒𝑤
is linear formation among 𝑤𝑖,𝑙
 Not applicable to non-linear case
11/14/2018 33
Methodology
 Classification
o Supervised Topic Models (STM)
 Define topic distribution over document probability: 𝜃 𝑎 𝑖
|𝜗, ~𝐷𝑖𝑟𝑖𝑐ℎ𝑙𝑒𝑡 𝜗
 Sample topic 𝑧 𝑎 𝑖,𝑙| 𝜃 𝑎 𝑖
, ~𝑀𝑢𝑙𝑡𝑖𝑛𝑜𝑚𝑖𝑎𝑙 𝐷𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛 𝜃 𝑎 𝑖
 Sample term 𝑡𝑒𝑟𝑚 𝑎 𝑖,𝑙|𝑧 𝑎 𝑖,𝑙, 𝜷, ~𝑀𝑢𝑙𝑡 𝛽𝑧 𝑎𝑖
 Sample the response variable 𝑐 𝑎 𝑖
|𝑧 𝑎 𝑖,𝑙, 𝜹, 𝜎2
, ~𝑁 𝜹 𝑇
𝑧 𝑎 𝑙
, 𝜎2 by linear
regression
• 𝜹= 𝛿1, . . , 𝛿ℎ, . . , 𝛿 𝐻
𝑇, 𝑧 𝑎 𝑙
=
1
𝑝 𝑙=1
𝑝
𝑧 𝑎 𝑖,𝑙
• Regress label on topics
• Parameters are estimated Expectation-Maximization (EM)
 Rule: 𝑐 𝑎 𝑛𝑒𝑤
= 𝑟𝑜𝑢𝑛𝑑 𝜹 𝑇
𝐸 𝑧 𝑎 𝑛𝑒𝑤
|𝜗, 𝜷, 𝑤
11/14/2018 34
Model Development and Evaluation
 Dependent Variables
o 19 ratings or classes: 𝑐𝑖
o Rating bands
 𝑏𝑎𝑛𝑑𝑖 =
1 𝑖𝑓𝑐𝑖 ≤ 7
2 𝑖𝑓 7 < 𝑐𝑖 ≤ 10
3 𝑖𝑓 10 < 𝑐𝑖 ≤ 13
4 𝑖𝑓𝑐𝑖 > 13
 Less granular
o Binary dependent variable: 𝑏𝑖𝑛𝑖
 𝑏𝑖𝑛𝑖 =
investment grade company 𝑖𝑓𝑐𝑖 ≤ 10
𝑠𝑝𝑒𝑐𝑢𝑙𝑎𝑡𝑖𝑣𝑒 𝑔𝑟𝑎𝑑𝑒 𝑐𝑜𝑚𝑝𝑎𝑛𝑦 𝑖𝑓 > 10
 Comparable
11/14/2018 35
Unbalanced
Unbalanced
Unbalanced
Model Development and Evaluation
 Dependent Variables
o In addition, intuitively, less classes, more accurate
 𝑝 𝑙𝑜𝑐𝑎𝑡𝑒𝑑 𝑖𝑛 𝑏𝑖𝑔 𝑐𝑦𝑐𝑙𝑒 > 𝑝 𝑙𝑜𝑐𝑎𝑡𝑒𝑑 𝑖𝑛 𝑠𝑢𝑏 − 𝑐𝑦𝑐𝑙𝑒
 Accuracy: Binary ratings> Rating bands > 19 ratings
11/14/2018 36
Model Development and Evaluation
 Test Set Performance
o Train: test = 3:1
o Every presented classifier performs best under responding column
o higher than random accuracy 1/19=5.2%, ¼=25%, ½=50%
o SVM best for dataset2 while DT and NN best for dataset1
11/14/2018 37
Model Development and Evaluation
 Test Set Performance
o 𝑎𝑐𝑐𝑢𝑟𝑎𝑐𝑦 =
𝑇𝑃+𝑇𝑁
𝑇𝑃+𝑇𝑁+𝐹𝑃+𝐹𝑁
: overall accuracy
o 𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 =
𝑇𝑃
𝑇𝑃+𝐹𝑃
: the proportion of correctly classified points within a true
class
o 𝑟𝑒𝑐𝑎𝑙𝑙 =
𝑇𝑃
𝑇𝑃+𝐹𝑁
: the proportion of correctly classified points within a
classified or generated class
11/14/2018 38
Model Development and Evaluation
 Test Set Performance
o 𝑎𝑐𝑐 =
𝑇𝑃+𝑇𝑁
𝑇𝑃+𝑇𝑁+𝐹𝑃+𝐹𝑁
; 𝑝𝑟𝑒 =
𝑇𝑃
𝑇𝑃+𝐹𝑃
; 𝑟𝑒𝑐𝑎𝑙𝑙 =
𝑇𝑃
𝑇𝑃+𝐹𝑁
o Thought: small class tends to have low precision
o Thought: Some classifiers can sacrifice the precision and recall of
small class in order to pursue a high accuracy or a
or precision of big class.
o Thought: understand class distribution
Before choosing and develop classifier.
11/14/2018 39
Model Development and Evaluation
 Model Results
11/14/2018 40
Model Development and Evaluation
 Model Results
11/14/2018 41
Model Development and Evaluation
 Model Results
o How to interpret the result?
 Compare accuracy between sentiment-using and -unusing
11/14/2018 42
Model Development and Evaluation
 Model Results
o The effect of considering sentiment of the terms depends on the classifier,
the dataset, and the type of dependent variable.
o NN, DT, and SVM work best
o STM works bad, which is surprising
 Good interpretability
 Potential reason given by this paper
 Should use linear regression (LR) rather than generalized linear model(GLM)
 My thought:
 GLM may work better since it has less assumption than LR
 But why not good? In the generating process of STM, it assume response
variable (class) is produced by or sampled from topics. However, the true
ratings may not be produced in this way.
 We only know sentiment has relationship with rating, but that dose not
means sentiment produce rating
11/14/2018 43
Conclusion
 RatingBot to predict rating score
 Limitations
o The credibility of rating
 Considering time component to capture the manipulation of companies to
annual report to get higher rating
o Try other classifiers, such as deep learning
o Including other text sources, such as news, social media content
11/14/2018 44
Thank you!

More Related Content

DOC
Resume_ Krishna R S updated 20160716
DOC
sampath_CV
PPTX
Financial Modeling
DOCX
DOC
Murali_CV
DOC
Dharamkar_Sunitha-CV
Resume_ Krishna R S updated 20160716
sampath_CV
Financial Modeling
Murali_CV
Dharamkar_Sunitha-CV

Similar to RatingBot: A Text Mining Based Rating Approach (20)

DOCX
Gaurang_Chotalia
DOC
asherresume_master1
ODP
Complete Introduction to Business Data Analysis
PDF
Cecl automation banking book analytics v3
DOCX
Shinto_Analytics_with_5 Yrs Experience
PDF
Sagar-Shukla
PDF
PayNet AbsolutePD
PDF
Creating a contemporary risk management system using python (dc)
PDF
LSC Technology Initiative Grant Conference 2015 | Session Materials - Demonst...
DOC
Mallikarjun-Ext
PDF
Enhancing Financial Sentiment Analysis via Retrieval Augmented Large Language...
PDF
Pushplata Bora-Resume
PDF
cc-profile.en.pdf
PDF
Feasibility_Report.pptx.pdf
PDF
Your Guide to Becoming a Data Analyst
DOC
Resume
DOC
Resume_Nidhi Malhotra_BA_shared
DOCX
Vishal Dube Resume Visual
PDF
Amit R Kolwankar
PPT
7776.ppt
Gaurang_Chotalia
asherresume_master1
Complete Introduction to Business Data Analysis
Cecl automation banking book analytics v3
Shinto_Analytics_with_5 Yrs Experience
Sagar-Shukla
PayNet AbsolutePD
Creating a contemporary risk management system using python (dc)
LSC Technology Initiative Grant Conference 2015 | Session Materials - Demonst...
Mallikarjun-Ext
Enhancing Financial Sentiment Analysis via Retrieval Augmented Large Language...
Pushplata Bora-Resume
cc-profile.en.pdf
Feasibility_Report.pptx.pdf
Your Guide to Becoming a Data Analyst
Resume
Resume_Nidhi Malhotra_BA_shared
Vishal Dube Resume Visual
Amit R Kolwankar
7776.ppt
Ad

More from Nauman Shahid (8)

PPTX
Pick the Odd-ones Out! Conferring Legitimacy of Initial Coin Offerings by Dis...
PPTX
Status, Celebrity, and Reputation of Firms
PDF
Anchor and Adjustment Heuristic
PPTX
The Impact of User Personality Traits on Word of Mouth: Text-Mining Social Me...
PPTX
Turing Award Winners 2004
PPTX
Generating and justifying design theory
PPTX
Secondary Design: A Case of Behavioral Design Science Research
PPTX
Positioning and presenting design science research for maximum impact
Pick the Odd-ones Out! Conferring Legitimacy of Initial Coin Offerings by Dis...
Status, Celebrity, and Reputation of Firms
Anchor and Adjustment Heuristic
The Impact of User Personality Traits on Word of Mouth: Text-Mining Social Me...
Turing Award Winners 2004
Generating and justifying design theory
Secondary Design: A Case of Behavioral Design Science Research
Positioning and presenting design science research for maximum impact
Ad

Recently uploaded (20)

PDF
VCE English Exam - Section C Student Revision Booklet
PDF
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
PDF
Anesthesia in Laparoscopic Surgery in India
PDF
Computing-Curriculum for Schools in Ghana
PPTX
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
PPTX
master seminar digital applications in india
PPTX
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
PPTX
Introduction-to-Literarature-and-Literary-Studies-week-Prelim-coverage.pptx
PPTX
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
PDF
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
PPTX
human mycosis Human fungal infections are called human mycosis..pptx
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PDF
A GUIDE TO GENETICS FOR UNDERGRADUATE MEDICAL STUDENTS
PPTX
Microbial diseases, their pathogenesis and prophylaxis
PDF
Supply Chain Operations Speaking Notes -ICLT Program
PDF
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
PPTX
Presentation on HIE in infants and its manifestations
PDF
01-Introduction-to-Information-Management.pdf
PDF
Module 4: Burden of Disease Tutorial Slides S2 2025
PDF
2.FourierTransform-ShortQuestionswithAnswers.pdf
VCE English Exam - Section C Student Revision Booklet
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
Anesthesia in Laparoscopic Surgery in India
Computing-Curriculum for Schools in Ghana
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
master seminar digital applications in india
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
Introduction-to-Literarature-and-Literary-Studies-week-Prelim-coverage.pptx
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
human mycosis Human fungal infections are called human mycosis..pptx
Final Presentation General Medicine 03-08-2024.pptx
A GUIDE TO GENETICS FOR UNDERGRADUATE MEDICAL STUDENTS
Microbial diseases, their pathogenesis and prophylaxis
Supply Chain Operations Speaking Notes -ICLT Program
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
Presentation on HIE in infants and its manifestations
01-Introduction-to-Information-Management.pdf
Module 4: Burden of Disease Tutorial Slides S2 2025
2.FourierTransform-ShortQuestionswithAnswers.pdf

RatingBot: A Text Mining Based Rating Approach

  • 1. RatingBot: A Text Mining Based Rating Approach Presented by Wong Ming Jie & Zhu Cungen
  • 2. CONTENT ● Motivation ● Credit Risk ● Measurement of Credit Risk ● Related Work ● Text Mining in Finance ● Machine Learning in credit Rating ● Research Question ● Data Context ● Methodology ● Discussion and Conclusion
  • 3. MOTIVATION ● Financial institutions such as banks serve as intermediaries between o lenders (seeks short-term investing opportunities) and o borrowers (seeks long-term financing) ● Facilitating development of products and services to both groups of clienteles ● Their services and products need to achieve balance between risk-taking and profit o Maximise profits o Do not want to exceed their risk appetite
  • 4. MOTIVATION ● Risk is core of the financial industry o Credit; o Market; and o Operational risks ● Credit risk is defined as the risk that a counterparty (borrower) may become less likely to fulfill its obligation in part or in full on the agreed upon date ● To manage credit risk, banks need to develop rating models for capital calculation to o Quantify possible expected and unexpected loss of the counterparty o Ascertain their creditworthiness
  • 5. MOTIVATION ● Measurement of Credit Risk comprises: o Probability Default (PD); o Loss Given Default (LGD); and o Exposure at Default (EAD) ● PD represents the creditworthiness of the counterparty and is used for loan approval ● Estimating PD is based on applying statistical inference from quantitative/categorical input variables using financial information (historical number of defaults), demographics (country) to determine whether a counterparty will default ● This form of inference is limited to large homogenous populations with substantial incidents of defaults
  • 6. MOTIVATION ● For small populations which are heterogenous and low default rates, the data used for statistical inference may not be generalizable and is difficult to predict PD ● Most cases, banks move towards the approach of using o Quantitative sources (including profitability) o Qualitative sources (corporate disclosures, news, analyst calls) ● Characteristics of qualitative sources may contribute to estimating credit-worthiness (improves PD estimation) o Sentiment of corporate disclosures correlates with future earnings, return on assets, stock returns and return volatility o Strong financial performance may relate to creditworthiness which could be overlooked
  • 7. MOTIVATION ● Qualitative sources such as annual reports provide more structured, objective and forward-looking information and are publicly available ● More predictive power since it includes consideration of quantitatively-difficult-to-quantify factors (future strategy, performance) which may be relevant to creditworthiness ● Unlike quantitative sources, qualitative sources require experts to extract and interpret meaningful information ● Manual interpreting and coding exposed to subjectivity, errors and inefficiencies which affects PD estimation
  • 8. RELATED WORK – TEXT MINING ● Text mining has been studied extensively in finance ● Content analysis methods required to extract relevant information from unstructured form of annual reports for PD estimation o Bag-of-words o Document narrative extraction ● Unlike document narrative extraction, bag-of-words is more flexible and assumes word (or sentence) order is irrelevant for document representation ● Bag-of-words method can be implemented with either o Term-weighting approach o Machine-learning approach
  • 9. RELATED WORK – TEXT MINING ● Text mining has been studied extensively in finance ● Content analysis methods required to extract relevant information from unstructured form of annual reports for PD estimation o Bag-of-words o Document narrative extraction ● Unlike document narrative extraction, bag-of-words is more flexible and assumes word (or sentence) order is irrelevant for document representation ● Bag-of-words method can be implemented with either o Term-weighting approach o Machine-learning approach
  • 10. RELATED WORK – TEXT MINING ● Term-weighing structure a document in different terms and assign a weight to each of them based on the level of representation of importance to derive a sentiment score ● Predefined sentiment dictionaries can use to determine the sentiment score of the document o Harvard GI word list (designed for general uses) o Longhran and McDonald (designed for financial uses and credit rating applications) ● Longhran and MacDonald is extracted from a set of 10-K SEC filings ● A 10-K filing is an annual report required by the U.S. Securities and Exchange Commission (SEC) from a company that provides a comprehensive summary of its financial performance (within 60 days after its fiscal year)
  • 11. RELATED WORK – TEXT MINING ● Machine-learning require a set of manually labelled words or sentences as inputs which are then used to train classification models ● These models will then be used to label new words or sentences in the document ● This method is costlier to implement and requires finance experts who are native speakers of the language of the words ● A new approach is to identify themes within a corpus of documents by identifying latent (hidden) topics that represent the document using probabilistic Latent Dirichlet Allocation (LDA) and then Gibbs Sampling (MCMC) to derive the topics and approximate the distribution parameters
  • 12. RELATED WORK – CREDIT RATING ● Most machine learning in credit rating uses binary classification (default/non-) to predict the default of a counterparty ● Most common methods in literature are: o Neural Networks (NN) o Support Vector Machine (SVM) o Linear and Logistic Regression (LR) o Decision Trees (DT) o Fuzzy Logic (FL) o Genetic Programming (GP) o Discriminant Analysis (DA) ● Most machine learning techniques proposed specifically use only quantitative information as model variables or inputs
  • 13. RESEARCH QUESTION ● Research Question: Can we combine the two approaches? o Term-weighting, topic models and sentiment analysis to identify and represent qualitative information in annual reports in a structured way -> inputs o Apply classification approaches using machine-learning on these inputs to predict the credit rating of a company ● Proposition: Based on the idea to use annual report of a company as inputs and then apply text mining and classification approaches to automatically and objectively derive its credit rating ● Credit rating as an indication of perceived risk can be used to investigate PD by banks and financial institutions
  • 14. DATA CONTEXT ● Dataset 1: o Downloaded from 10 available 10-K filings from SEC EDGAR database from 2009 to 2016 o Avoid the influence of financial crisis between 2007 and 2008 o 34,972 10-K filings were joined with 17,622 Standard & Poor (S&P)’s ratings in 2016 (9197 unique companies) o Then the new joined dataset was constructed by matching the SEC statements with the latest S&P ratings after 2009 so that the reports were issued at most 9 months before the credit rating dates o This resulted in 1716 data points for 1351 companies o After removing 228 data points which had a defaulted rating given (intention is to study credit rating class), the final sample has 1488 data points
  • 15. DATA CONTEXT ● Dataset 2: o Provided by a major European bank o Consists of annual reports and internal credit ratings of companies between 2009 and 2016 o Contains non-standardized general reports with some reported partly in 10-K filings format and some not written in English o 10,435 total annual statements o After removing read-protected and non-English reports and reports from defaulted and non-rated companies, there were only 5508 annual statements for consideration ● Dataset 1 was used because it allows for replication of the results and 10-K filings are commonly used sources in literature ● Dataset 2 was used because it represents real-world scenario with internal ratings
  • 16. DATA CONTEXT ● Document representation: o Loughran and MacDonald dictionary was used to determine the sentiment-weighted analysis for term-weighting approach o Dictionary was derived from 10-K SEC findings (suitable for credit rating context) o Terms appearing in less than 5% of the documents (dataset 1 and 2) were removed based on the chi-squared statistics so that only the most important terms remained ● Robustness check was also made to ensure that the annual reports from dataset 1 and 2 are relevant to the credit rating of a company ● Datasets 1 and 2 were compared across the absolute frequency of terms and the sentiment-weighted frequency of terms after using the dictionary
  • 18. Methodology  Data Context o Feature: A company rates its counterparty 𝑖 by making annual report 𝑎𝑖, 𝑖 ∈ 1,2, … . , 𝑚 : textual or qualitative data 𝑿 o Label: Counterparty 𝑖 get credit rating 𝑐𝑖,𝑐𝑖 ∈ 1,2, … . , 𝑛 o What we can do?  Derive the relationship between textural report and rating  Predict the rating for new given counterparties or annual report  Cannot directly compute textual data o Information transformation  Preprocess m textual documents 𝑎𝑖 𝑖=1 𝑚  Represent report with new computational form: quantitative 11/14/2018 18 Transformation: 1. Text preprocessing 2. Document representation Qualitative Form Quantitative Form
  • 19. Methodology  Text Preprocessing o Step 1: clean pictures, HTML tags, formatting etc. raw text  Only raw text is left o Step 2: transform every letter into low-case formation  For instance, ‘Capital’ become ‘capital’ o Step 3: remove numbers, special characters and punctuation o Step 4: tokenize sentence into words by removing spaces o Step 5: remove stop words, such as ‘or’, ‘and’, and ‘the’ meaningless o Step 6: stem terms (words) back to root form 11/14/2018 19 Transformation: 1. Text preprocessing 2. Document representation Qualitative Form Quantitative Form
  • 20. Methodology  Text Preprocessing o Step 6: stem terms (words) back to root form  Stemming: remove ‘s’ from ‘risks’ or ‘ed’ from ‘declined’ • Group terms into same semantic root • However, not applicable to terms like ‘goes’ and ‘went’  Alternatives: lemmatization • Can overcome limitation of stemming with higher precise • Require complex computation, unwieldy in practice • Still choose stemming algorithm o Result of text preprocessing  A list of stemmed terms 11/14/2018 20 Transformation: 1. Text preprocessing 2. Document representation Qualitative Form Quantitative Form
  • 21. Methodology  Document representation o In the list of stemmed terms  Many terms occur more than once  Compress them and make interpretable • Term-weighting: weight every unique term 11/14/2018 21 Transformation: 1. Text preprocessing 2. Document representation Qualitative Form Quantitative Form
  • 22. Methodology  Term-weighting o Binary frequency: whether occur, dummy  Too Naive to consider true frequency o Absolute or relative frequency: How many times a term occurs in 𝑎𝑖  Ignore the distribution of a term over different 𝑎𝑖  E.g. if a term has same distribution cross 𝑎𝑖, it might be useless to predict rating score o Term frequency-inverted document frequency (tf-idf)  Decrease high frequency while increase low frequency (smoothing) o Ignore the sentiment of words  In financial field, sentiment of report has significant relationship with company performance 11/14/2018 22
  • 23. Methodology  Term-weighting o Ignore the sentiment of words o Sentiment-weighted frequency 11/14/2018 23 What if 𝑆𝑒𝑛𝑖.𝑙 = 𝑣𝑖,𝑙 × 𝑆𝑒𝑛 𝑡𝑒𝑟𝑚𝑙 ?
  • 24. Methodology  Document representation o In the list of stemmed terms  Many terms occur more than once  Compress them and make interpretable • Term-weighting: weight every unique term  Reduce terms to avoid overfitting and computation complexity • Term selection: remove terms with low frequency • Term selection: remove terms with low explanatory power by chi- square test • Term extraction: topic model---LDA 11/14/2018 24 Transformation: 1. Text preprocessing 2. Document representation Qualitative Form Quantitative Form input
  • 25. Methodology  LDA o Sample one document 𝑎𝑖 with probability 𝑝 𝑎𝑖 o Sample 𝑑𝑖𝑟𝑖𝑐ℎ𝑙𝑒𝑡 𝑑𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛 𝜶 to generate topic distribution 𝜽𝑖 for 𝑎𝑖 o Sample 𝜽𝑖 for 𝑎𝑖 to get topic 𝑧𝑖,𝑗 o Topics are latent and unknown o Sample 𝑑𝑖𝑟𝑖𝑐ℎ𝑙𝑒𝑡 𝑑𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛 𝜷 to generate word distribution 𝝋 𝑧 𝑖,𝑗 for topic 𝑧𝑖,𝑗 o Sample 𝝋 𝑧 𝑖,𝑗 to generate final word 𝑤𝑖,𝑗 11/14/2018 25 Hyperparameter: parameter of parameter
  • 26. Methodology  Document representation o In the list of stemmed terms  Many terms occur more than once  Compress them and make interpretable • Term-weighting: weight every unique term  Reduce terms to avoid overfitting and computation complexity • Term selection: remove terms with low frequency • Term selection: remove terms with low explanatory power by chi- square test • Term extraction: topic model---LDA o Results: 𝑝 𝑡𝑒𝑟𝑚𝑙 𝑙=1 𝑝 |𝑡𝑜𝑝𝑖𝑐ℎ ,𝑡𝑜𝑝𝑖𝑐𝑖,ℎ = 𝑝 𝑡𝑜𝑝𝑖𝑐ℎ|𝑎𝑖 11/14/2018 26 Transformation: 1. Text preprocessing 2. Document representation Qualitative Form Quantitative Form input
  • 27. Methodology  Document representation o Results: 𝑝 𝑡𝑒𝑟𝑚𝑙 𝑙=1 𝑝 |𝑡𝑜𝑝𝑖𝑐ℎ ,𝑡𝑜𝑝𝑖𝑐𝑖,ℎ = 𝑝 𝑡𝑜𝑝𝑖𝑐ℎ|𝑎𝑖  Interpretation: 𝑡𝑜𝑝𝑖𝑐𝑖,ℎ measures the significance of topic term h just as the role of weight of word term l o Final date set: 𝑝 𝑡𝑒𝑟𝑚𝑙 𝑙=1 𝑝 |𝑡𝑜𝑝𝑖𝑐ℎ ,𝑡𝑜𝑝𝑖𝑐𝑖,ℎ = 𝑝 𝑡𝑜𝑝𝑖𝑐ℎ|𝑎𝑖  𝑤𝑖,𝑙|𝑤𝑖,𝑙 𝑖𝑛𝑐𝑙𝑢𝑑𝑒𝑠 𝑤𝑒𝑖𝑔ℎ𝑡𝑠 𝑆𝑒𝑛𝑖,𝑙 𝑎𝑛𝑑 𝑡𝑜𝑝𝑖𝑐𝑖,ℎ 𝑙=1 𝑝 , 𝑐𝑖 𝑖=1 𝑚  𝑎𝑖, 𝑐𝑖 𝑖=1 𝑚 11/14/2018 27 Transformation: 1. Text preprocessing 2. Document representation Qualitative Form 𝑎𝑖 Quantitative Form 𝑤𝑖,𝑙
  • 28. Methodology  Classification o Naïve Bayes (NB): benchmark  Aim: 𝑝 𝑐𝑖 = 𝑘|𝑎𝑖 ∝ 𝑝 𝑐𝑖 = 𝑘 𝑝 𝑎𝑖|𝑐𝑖 = 𝑘  Known: 𝑝 𝑐𝑖 = 𝑘 = 𝑖=1 𝑚 𝐼 𝑐𝑖 = 𝑘 /𝑚  MLE: 𝑝 𝑎𝑖|𝑐𝑖 = 𝑘 = 𝑙=1 𝑝 𝑝 𝑤𝑖,𝑙|𝑐𝑖 = 𝑘 = 𝑙=1 𝑝 𝑝 𝑤𝑖,𝑙|𝑐𝑖 = 11/14/2018 28
  • 29. Methodology  Classification o Support Vector Machine (SVM): what about benchmark?  Aim: maximum-margin hyperplane  the distance between the hyperplane and the nearest point 𝒘 from either group is maximized.  Reduce overfitting by L2-norm regularization  Stable  Lack of interpretability 11/14/2018 29
  • 30. Methodology  Classification o Neural Networks (NN)  Three layers: input, hidden (a or multiple) and output layer  Before hidden----Aggregation function: 𝑙=1 𝑝 𝛽𝑙 𝑤𝑖,𝑙  In the hidden---- Activation function: g 𝑥 = 1, 𝑥 > 0 0, 𝑥 ≤ 0 for each left layers  Label rule:𝑐 𝑎 𝑛𝑒𝑤 = 1, g 𝜷𝒘 𝑛𝑒𝑤 > 0 2, 𝑒𝑙𝑠𝑒  Train model:  backpropagation to update prediction errors 11/14/2018 30
  • 31. Methodology  Classification o Decision Tree (DT)  Aim: 𝑝 𝑐𝑖 = 𝑘|𝑎𝑖 ∝ 𝑝 𝑐𝑖 = 𝑘 𝑝 𝑎𝑖|𝑐𝑖 = 𝑘  spilling criterion: chi-squared, Gini coefficient or entropy-based  Overfitting  Good interpretability 11/14/2018 31
  • 32. Methodology  Classification o Logistic Regression (LR)  Popular methods  requires uncorrelated independent variables: exogeneity 11/14/2018 32
  • 33. Methodology  Classification o Discriminant Analysis (DA)  Prior probability: 𝑤𝑖|𝑐𝑖 = 𝑘~𝑁 𝝁 𝑘, 𝞢  Estimate 𝝁 𝑘, 𝞢  Rule: 𝑐 𝑎 𝑛𝑒𝑤 = 1, log 𝑝 𝑐 𝑖=1|𝑎 𝑛𝑒𝑤 𝑝 𝑐 𝑖=2|𝑎 𝑛𝑒𝑤 > 1 2, 𝑒𝑙𝑠𝑒  log 𝑝 𝑐 𝑖=1|𝑎 𝑛𝑒𝑤 𝑝 𝑐 𝑖=2|𝑎 𝑛𝑒𝑤 is linear formation among 𝑤𝑖,𝑙  Not applicable to non-linear case 11/14/2018 33
  • 34. Methodology  Classification o Supervised Topic Models (STM)  Define topic distribution over document probability: 𝜃 𝑎 𝑖 |𝜗, ~𝐷𝑖𝑟𝑖𝑐ℎ𝑙𝑒𝑡 𝜗  Sample topic 𝑧 𝑎 𝑖,𝑙| 𝜃 𝑎 𝑖 , ~𝑀𝑢𝑙𝑡𝑖𝑛𝑜𝑚𝑖𝑎𝑙 𝐷𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛 𝜃 𝑎 𝑖  Sample term 𝑡𝑒𝑟𝑚 𝑎 𝑖,𝑙|𝑧 𝑎 𝑖,𝑙, 𝜷, ~𝑀𝑢𝑙𝑡 𝛽𝑧 𝑎𝑖  Sample the response variable 𝑐 𝑎 𝑖 |𝑧 𝑎 𝑖,𝑙, 𝜹, 𝜎2 , ~𝑁 𝜹 𝑇 𝑧 𝑎 𝑙 , 𝜎2 by linear regression • 𝜹= 𝛿1, . . , 𝛿ℎ, . . , 𝛿 𝐻 𝑇, 𝑧 𝑎 𝑙 = 1 𝑝 𝑙=1 𝑝 𝑧 𝑎 𝑖,𝑙 • Regress label on topics • Parameters are estimated Expectation-Maximization (EM)  Rule: 𝑐 𝑎 𝑛𝑒𝑤 = 𝑟𝑜𝑢𝑛𝑑 𝜹 𝑇 𝐸 𝑧 𝑎 𝑛𝑒𝑤 |𝜗, 𝜷, 𝑤 11/14/2018 34
  • 35. Model Development and Evaluation  Dependent Variables o 19 ratings or classes: 𝑐𝑖 o Rating bands  𝑏𝑎𝑛𝑑𝑖 = 1 𝑖𝑓𝑐𝑖 ≤ 7 2 𝑖𝑓 7 < 𝑐𝑖 ≤ 10 3 𝑖𝑓 10 < 𝑐𝑖 ≤ 13 4 𝑖𝑓𝑐𝑖 > 13  Less granular o Binary dependent variable: 𝑏𝑖𝑛𝑖  𝑏𝑖𝑛𝑖 = investment grade company 𝑖𝑓𝑐𝑖 ≤ 10 𝑠𝑝𝑒𝑐𝑢𝑙𝑎𝑡𝑖𝑣𝑒 𝑔𝑟𝑎𝑑𝑒 𝑐𝑜𝑚𝑝𝑎𝑛𝑦 𝑖𝑓 > 10  Comparable 11/14/2018 35 Unbalanced Unbalanced Unbalanced
  • 36. Model Development and Evaluation  Dependent Variables o In addition, intuitively, less classes, more accurate  𝑝 𝑙𝑜𝑐𝑎𝑡𝑒𝑑 𝑖𝑛 𝑏𝑖𝑔 𝑐𝑦𝑐𝑙𝑒 > 𝑝 𝑙𝑜𝑐𝑎𝑡𝑒𝑑 𝑖𝑛 𝑠𝑢𝑏 − 𝑐𝑦𝑐𝑙𝑒  Accuracy: Binary ratings> Rating bands > 19 ratings 11/14/2018 36
  • 37. Model Development and Evaluation  Test Set Performance o Train: test = 3:1 o Every presented classifier performs best under responding column o higher than random accuracy 1/19=5.2%, ¼=25%, ½=50% o SVM best for dataset2 while DT and NN best for dataset1 11/14/2018 37
  • 38. Model Development and Evaluation  Test Set Performance o 𝑎𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = 𝑇𝑃+𝑇𝑁 𝑇𝑃+𝑇𝑁+𝐹𝑃+𝐹𝑁 : overall accuracy o 𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 = 𝑇𝑃 𝑇𝑃+𝐹𝑃 : the proportion of correctly classified points within a true class o 𝑟𝑒𝑐𝑎𝑙𝑙 = 𝑇𝑃 𝑇𝑃+𝐹𝑁 : the proportion of correctly classified points within a classified or generated class 11/14/2018 38
  • 39. Model Development and Evaluation  Test Set Performance o 𝑎𝑐𝑐 = 𝑇𝑃+𝑇𝑁 𝑇𝑃+𝑇𝑁+𝐹𝑃+𝐹𝑁 ; 𝑝𝑟𝑒 = 𝑇𝑃 𝑇𝑃+𝐹𝑃 ; 𝑟𝑒𝑐𝑎𝑙𝑙 = 𝑇𝑃 𝑇𝑃+𝐹𝑁 o Thought: small class tends to have low precision o Thought: Some classifiers can sacrifice the precision and recall of small class in order to pursue a high accuracy or a or precision of big class. o Thought: understand class distribution Before choosing and develop classifier. 11/14/2018 39
  • 40. Model Development and Evaluation  Model Results 11/14/2018 40
  • 41. Model Development and Evaluation  Model Results 11/14/2018 41
  • 42. Model Development and Evaluation  Model Results o How to interpret the result?  Compare accuracy between sentiment-using and -unusing 11/14/2018 42
  • 43. Model Development and Evaluation  Model Results o The effect of considering sentiment of the terms depends on the classifier, the dataset, and the type of dependent variable. o NN, DT, and SVM work best o STM works bad, which is surprising  Good interpretability  Potential reason given by this paper  Should use linear regression (LR) rather than generalized linear model(GLM)  My thought:  GLM may work better since it has less assumption than LR  But why not good? In the generating process of STM, it assume response variable (class) is produced by or sampled from topics. However, the true ratings may not be produced in this way.  We only know sentiment has relationship with rating, but that dose not means sentiment produce rating 11/14/2018 43
  • 44. Conclusion  RatingBot to predict rating score  Limitations o The credibility of rating  Considering time component to capture the manipulation of companies to annual report to get higher rating o Try other classifiers, such as deep learning o Including other text sources, such as news, social media content 11/14/2018 44

Editor's Notes

  • #30: More reliable and robust Commonly used classifier