Automated Evolution of Feature Logging Statement Levels Using Git Histories and Degree of Interest

Automated Evolution of Feature Logging
Statement Levels Using Git Histories and Degree
of Interest
Science of Computer Programming, Volume 214, 1 Feb 2022, 102724
Yiming Tang1
Allan Spektor2
Raffi Khatchadourian2,3
Mehdi
Bagherzadeh4
1
Concordia University, Canada
2
City University of New York (CUNY) Hunter College, USA
3
City University of New York (CUNY) Graduate Center, USA
4
Oakland University, USA
IEEE International Conference on Software Analysis, Evolution &
Re-engineering
March 17, 2022, Honolulu, HI, USA (remote)

Introduction Motivation Approach Evaluation Conclusion Logging Issues
Logging in Modern Software in the Big Data Era
Logging is pervasive in the modern software.
Big data systems deal with high-volumes of transactions.
Source code is tangled with scattered logging statements capturing
important event information.
Essential for reporting security and privacy breaches.
Yiming Tang, Allan Spektor, Raffi Khatchadourian, Mehdi Bagherzadeh Automated Evolution of Feature Logging Statement Levels 2 / 12

Feature Logging Statements
Modern software is also feature-heavy, implementing hundreds of
features.
Logging statements—although more informational—also capture
important aspects of feature implementations.
Useful for validating feature implementations and diagnosing
unintended interactions with other features.

Logging Issues
Source: Stuart Pilbrow / CC BY-SA
(https://creativecommons.org/licenses/by-sa/2.0)
Too much logging causes
information overload.
Makes postmortem analysis
difficult.
Understanding system behavior
in production and diagnosing
problems can be challenging.
Also challenging during
development as logs pertaining
to auxiliary features are tangled
with those under current
development.

Introduction Motivation Approach Evaluation Conclusion
Feature Logging Statement Level Evolution
Logging statements are typically associated with a log level.
Dictates if the log should be emitted, if at all.
Example
logger.log(Level.FINER, "Health:" + systemHealthStatus());
Outputs system health iff the run time level of logger ≤ Level.FINER.
As software evolves, logging statements levels correlated with
surrounding feature implementations may also need to be modified.
Ideally, feature log levels would evolve with the system as it is
developed.
Higher log levels (e.g., INFO) being assigned to logs corresponding to
features with more current stakeholder interest.
Lower log levels for those with less interest (e.g., FINEST).

Introduction Motivation Approach Evaluation Conclusion Overview DOI Manipulation
Automation Approach Overview
Figure: Logging Level rejuvenation approach overview (details in paper).
Automatically evolve feature logging statement levels.
Mine Git repositories to discover the “interestingness” of code
surrounding feature logging statements.
Adapt Mylyn degree of interest (DOI) model [Kersten and Murphy,
2005].

What is Mylyn?
Standard Eclipse Integrated Development
Environment (IDE) plug-in.
Focuses graphical components of the IDE.
Only “interesting” artifacts related to the
currently active task are revealed [Kersten
and Murphy, 2006].
The more interaction with an artifact
(e.g., file), the more prominent it appears
in the IDE.
Less recently used artifacts appear less
prominently.

Mylyn Adaptation
Programmatically manipulate a DOI model using Git code changes.
Transform source code to “rejuvenate” feature logging statement
levels.
Pull those related to features whose implementation is worked on
more and more recently to the forefront.
Push those related to features whose implementations are worked on
less and less recently to the background.
Goals
Reduce information overload.
Support system evolution.
Automatically bring more relevant features to developers’ attention
and vice-versa.

Implementation
Implemented as an open-source plug-in to the Eclipse IDE.
May also be used with popular build systems via plug-ins.
Supports two popular logging frameworks, SLF4J and JUL.
Integrates with JGit and Mylyn.
Available at https://git.io/fjlTY.

Research Questions
1 How applicable is our tool to
and how does it behave with
real-world open source
software?
2 Does our tool help developers
focus on feature
implementation bugs?
3 Do developers find the results
acceptable? What is the impact
of our tool?

Evaluation Overview
18 Java projects, ˜3 MLOC, and ˜4K logging statements.
Fully-automated analysis running-time:
10.66 secs per analyzed logging statement.
0.89 secs per KLOC changed.
Developers do not actively think about how their logging statement
levels evolve with their software.
Successfully analyzes 99.26% of candidate logging statements.
Increases log level distributions by an average of ˜20%.
Ideally transforms log levels in bug contexts ˜83% of the time.
Preliminary pull request study successfully integrated into 2 large
and popular open-source projects (comparable to related work [S. Li
et al., 2018]).
More details in the paper!

Conclusion
Feature logging statements document important values and track
progress of feature implementations.
As interest of features evolve, feature logging levels may also require
modification to combat information overload.
Our approach discovers and rectifies mismatches between feature
interest levels and logging levels.
Results show that the technique is promising in alleviating the
burden of manually evolving logging levels.
Future Work
Expand pull request study.
Issue widescale developer surveys.
Enhance feature logging statement classification heuristics with ML.

Appendix Additional Material
For Further Reading I
Apache Software Foundation (2020). Log4j. Log4j 2 Architecture. url:
http://logging.apache.org/log4j/2.x/manual/architecture.html#Logger_Hierarchy (visited on 06/12/2020).
Chen, Boyuan and Zhen Ming (Jack) Jiang (2017). “Characterizing and Detecting Anti-Patterns in the Logging Code”. In:
International Conference on Software Engineering. ICSE ’17. Buenos Aires, Argentina: IEEE Press, pp. 71–81. isbn:
9781538638682. doi: 10.1109/ICSE.2017.15.
Eclipse Foundation, Inc. (2020). JGit. url: http://eclip.se/gF (visited on 03/02/2020).
Hassani, Mehran et al. (Mar. 2018). “Studying and detecting log-related issues”. In: Empirical Software Engineering. issn:
1573-7616. doi: 10.1007/s10664-018-9603-z. url: https://doi.org/10.1007/s10664-018-9603-z.
He, Pinjia et al. (2018). “Characterizing the Natural Language Descriptions in Software Logging Statements”. In: International
Conference on Automated Software Engineering. ASE 2018. Montpellier, France: ACM, pp. 178–189. isbn: 9781450359375. doi:
10.1145/3238147.3238193.
Kabinna, Suhas et al. (Feb. 2018). “Examining the Stability of Logging Statements”. In: Empirical Softw. Engg. 23.1,
pp. 290–333. issn: 1382-3256. doi: 10.1007/s10664-017-9518-0.
Kersten, Mik and Gail C. Murphy (2005). “Mylar: a degree-of-interest model for IDEs”. In: International Conference on
Aspect-Oriented Software Development. Chicago, Illinois: ACM, pp. 159–168. isbn: 1-59593-042-6. doi:
10.1145/1052898.1052912.
Kersten, Mik and Gail C. Murphy (2006). “Using Task Context to Improve Programmer Productivity”. In: ACM Symposium on
the Foundations of Software Engineering. SIGSOFT ’06/FSE-14. Portland, Oregon, USA: ACM, pp. 1–11. isbn: 1-59593-468-5.
doi: 10.1145/1181775.1181777.
Li, Heng, Weiyi Shang, and Ahmed E. Hassan (Aug. 2017). “Which Log Level Should Developers Choose for a New Logging
Statement?” In: Empirical Softw. Engg. 22.4, pp. 1684–1716. issn: 1382-3256. doi: 10.1007/s10664-016-9456-2.

For Further Reading II
Li, Shanshan et al. (2018). “Logtracker: Learning Log Revision Behaviors Proactively from Software Evolution History”. In:
International Conference on Program Comprehension. ICPC ’18. Gothenburg, Sweden: ACM, pp. 178–188. isbn:
978-1-4503-5714-2. doi: 10.1145/3196321.3196328.
Oracle (2018). Logger (Java SE 10 & JDK 10). url:
http://docs.oracle.com/javase/10/docs/api/java/util/logging/Logger.html (visited on 02/29/2020).

Related Work
Source: Jonathan Joseph Bondhus / CC BY-SA
(https://creativecommons.org/licenses/by-sa/3.0)
Existing approaches [Chen and
Jiang, 2017; Hassani et al.,
2018; He et al., 2018; Kabinna
et al., 2018; H. Li et al., 2017]
are inclined to focus on either
new logging statements or log
messages.
Logger hierarchies [Apache
Software Foundation, 2020;
Oracle, 2018] may be but still
require manual maintenance.

Rename Refactorings & Copying
Program elements (e.g.,
methods) changed in Git may
no longer exist in current
project version.
Must process rename
refactorings.
Maintain a data structure that
associates rename relationships
between program elements,
e.g., method signatures.
Use lightweight refactoring
approximations.
Use copy detection features of
Git at the file level.
New copy “inherits” old DOI
values.

Classifying Feature Logging Statements
Logging levels are often used to differentiate various logging
“categories” (e.g., severe errors, security breaches).
Need to distinguish between these and feature logs.
Derive a set of heuristics based on first-hand developer interactions.
Also distinguish between less-critical debugging logs (e.g., tracing)
using a keyword-based approach.
Goals
Focus on only manipulating logging statements tied to features to better
align them with developers’ current interests.

Automated Evolution of Feature Logging Statement Levels Using Git Histories and Degree of Interest

More Related Content

Similar to Automated Evolution of Feature Logging Statement Levels Using Git Histories and Degree of Interest (20)

More from Raffi Khatchadourian (20)

Recently uploaded (20)

Automated Evolution of Feature Logging Statement Levels Using Git Histories and Degree of Interest