Towards better software quality assurance by providing intelligent support

Towards Providing Automated Supports to
Developers on Making Logging Decisions
Tse-Hsun (Peter) Chen
peterc@encs.concordia.ca

About myself
BSc
MSc, PhD
Associate
professor

Experiences
working in the
field
3

Industry collaboration and research outcome adoption

Main research area: improving
software quality and testing process
Software System
Performance
counters
System logs
Software tests Bug reports
…
Software
developers
Applying code analysis, machine learning, and
data analytics to provide automated support to
developers
I will focus on my research on
software logging in this talk

Logs are often the only source of
information for production systems
System running in
production
Operator Developer

Logs can be used to assist various
software development tasks
Performance
Analysis
Requirement
Tracking
Debugging
Monitoring

LOG.warn(“Can not parse job id from {} ”, path, e);
Verbosity Level Static Message Dynamic Variables
…
} catch (Exception e) {
}
Logging
Too much: performance overhead;
too many trivial logs
Too little: missing important information
What is a Logging Statement & Trade-off of Logging
Logs record important runtime info,
but with trade-offs
8

What is a Logging Statement & Trade-off of Logging
Deciding where to log is
challenging
“Logging and tracing is (IMO) a fine art, knowing what
to log and where takes experience.”

Where do developers log?
Studying where do developers log and
provide recommendations
Can we leverage existing code to
recommend logging locations?

Source Code Logging statements
(with surrounding code)
Our process of studying and
providing logging suggestions

Cassandra
Our Studied Systems
Elasticsearch Flink
HBase Kafka Wicket Zookeeper
We study where do developers log in 7 large-
scale open source systems

Manual study on
logging code

Manual study to
understand their
characteristics
Randomly sample 375
out of 14.9K logging
statements and their
surrendering code
Manually studying logging code
and its location

Manual study to
understand their
characteristics
Randomly sample 375
out of 14.9K logging
statements and their
surrendering code
Manually studying logging code
and its location
We uncover 6 categories of logging
locations, and the relationship between
logging statements and code

Category 1: Exception information logging in catch blocks
Categories of logging locations
Semantic information
Syntactic
information
The logging statements often record messages
or execution info related to the prior try block.

Category 2: Execution state logging in branch blocks
Semantic information
Syntactic
information
Logging statements often record execution
states in different branches.

Category 3: Logging the beginning/end of a method block
(method execution)
public void removeJob(JobID jobId){
...
// the end of the method
log.info(“Removed jobId {} from Zookeeper”, jobId);
}
Logging statements often record the
beginning or end of method execution.
Related to the semantic
of the method

Manual study on
logging code
Extracting Code Block
Feature
(Syntactic, Semantic,
Fusion)

if(Strings.isEmpty(sessionID)) {
LOG.error(“failed to get session ID”);
handleError();
return;
}
Syntactic
Semantic
Fusion
Extracting code block features

IfStatement, MethodInvocation,
string, is, empti, session, id,
if(Strings.isEmpty(sessionID)) {
handleError();
return;
}
Syntactic
Semantic
Fusion
MethodInvocation,
ReturnStatement
handle, error
IfStatement, MethodInvocation, string, is, empty, session, id
MethodInvocation, handle, error,
ReturnStatement
Extracting code block features

Manual study on
logging code
Feature
Fusion)
Deep Learning
Framework

Source Code
Code Block Features
……
Word Embedding Layer
RNN
Cell
RNN
Cell
RNN
Cell
…… RNN
Cell
RNN Layer (LSTM)
Output Layer

Manual study on
logging code
Feature
Fusion)
Deep Learning
Framework
Suggestion Results
(logged vs. non-logged block)

Research Questions
RQ1 RQ2
How effective are different block
features when suggesting logging
locations?
Are the trained models
transferable to other systems?
Evaluation of our logging location
suggestion models

• For each system, we use 60% for training, 20%
for validation, and 20% for testing.
• Compute balanced accuracy for evaluation
• How well can the model suggest logged and
non-logged code blocks
Process and metrics for DL model
evaluation

Balanced Accuracy of different block features
50
60
70
80
90
Try-Catch Branching Looping Method
Syntactic Semantic Fusion
85.8
77.4
69.0
63.2
Process and metrics for DL model
evaluation
Models trained using syntactic features
achieve the best results.

True Positive
(TP)
True Negative
(TN)
False Positive
(FP)
False Negative
(FN)
High overlaps in TN shows non-logged code has distinct characteristics
that are captured by all features. Syntactics has the lowest FNs.
20.1% of the TPs are missed by syntactic but captured by two other
block features. Only small overlaps on FPs among the features.
Studying the overlap among the results
using three different features

Manually Studying FPs and FNs
We further manually study a sample of False Positive and
False Negative in our suggestion results
We find that a large portion of the
FPs and FNs may be considered as TPs and TNs.
An example of FP:
Some misclassifications may
actually be correct
The object state is saved to a JSON files instead
of log files
The actual performance of our model may be
even better due to the diverse nature of how
developers write logging code.

Research Questions
RQ1 RQ2
locations?
suggestion models
We suggest logging
locations with reasonable
accuracy. Different features
capture different logging
info in the code.

Training a model using
syntactic features
Training cross-system models
Apply the models on
other systems

Balanced Accuracy for cross-system suggestions
RQ2: Are the trained models transferable to other systems?
10
30
50
70
90
Cassandra Flink Kafka Zookeeper
Within Cross
81.7% 80.0% 84.6% 83.9% 88.4% 80.1% 91.7%
The percentage is the ratio of Cross against Within
Although decreased, cross system suggestion still achieves
reasonable performance compared to within-system suggestion.
Results of cross system suggestion

Research Questions
RQ1 RQ2
locations?
suggestion models
We suggest logging
locations with reasonable
accuracy. Different features
capture different logging
info in the code.
Different systems may
share a similar implicit
logging guideline.

Towards better software quality assurance by providing intelligent support

Tse-Hsun (Peter) Chen
https://petertsehsun.github.io

Towards better software quality assurance by providing intelligent support

More Related Content

Similar to Towards better software quality assurance by providing intelligent support (20)

More from Concordia University (14)

Recently uploaded (20)

Towards better software quality assurance by providing intelligent support