SlideShare a Scribd company logo
A Primer on High-Quality
Identifier Naming
Anthony Peruma
Ph.D. Candidate - Rochester Institute of Technology, USA
Incoming Assistant Professor - University of Hawai‘i at Mānoa, USA
International Conference on Software and Systems Reuse (ICSR), 15-17 June 2022
About Anthony…
Experience/Qualifications
Assistant Professor - University of Hawai‘i at
Mānoa, USA (Starting August 2022)
Ph.D. Candidate - Rochester Institute of
Technology, USA (Expected June 2022)
Masters in Software Engineering - Rochester
Institute of Technology, USA
10+ years of industry experience
Research Interests
Program Comprehension - Identifier Naming
Software Quality - Test Smells
Software Refactoring
Software Maintenance & Evolution
Empirical Software Engineering
https://www.peruma.me
https://twitter.com/ShehanPeruma
Agenda
● Introduction
○ What are identifiers and why are their names important?
● Linguistic Anti-Patterns
○ Introduction to the types of identifier naming violations
● Grammar Patterns
○ Common semantic structures for identifier names
● Tools
○ Some tools that can help developers and researchers with identifier naming
● Conclusion
○ Summary and additional resources
Introduction
Software Maintenance
● Consumes 60% - 80% of
organization resources [1,2]
● Poor maintenance → Low
quality software
● Includes:
○ Fixing bugs
○ Incorporating new or
updating features
○ Improving the internal
quality of the system
[1]
Lientz, B. P., Swanson, E. B., & Tompkins, G. E. (1978). Characteristics of application software maintenance. Communications
of the ACM, 21(6), 466-471.
[2]
R.S. Pressman. Software engineering: a practitioner's approach. McGraw-Hill higher education. McGraw-Hill Education, 2010.
Program Comprehension
● Developers need to understand
the code before applying
changes or debugging
● 58% of developers time is spent
on comprehension activities [1]
● Poor code readability impacts
time and quality
● Application growth →
○ More classes/files →
■ More lines of code
[1]
Xia, X., Bao, L., Lo, D., Xing, Z., Hassan, A. E., & Li, S. (2017). Measuring program comprehension: A large-scale field study
with professionals. IEEE Transactions on Software Engineering, 44(10), 951-976.
Identifiers
● Lexical tokens that uniquely identify
elements in the source code
○ Classes, Methods, etc.
● Everywhere in source code – significant
part in code comprehension
○ Account for 70% of characters in the
code base [1]
● Must be read to understand behaviour and
before any other coding activity
● Automated techniques user identifier data
[1]
Deissenboeck, F., & Pizka, M. (2006). Concise and consistent naming. Software Quality Journal, 14(3), 261-282.
Lexical tokens that uniquely identify entities
What is a good name?
Crafting names can be challenging – probability of two developer picking the same name is 7% [1]
A strong/high-quality name must reflect its intended behavior
Name should concisely summarize the role of its correlating entity
Good names are important – Hence, many organizations emphasis the use of best practices and
coding standards by development teams
High quality identifiers improves comprehension time by 19+% [2]
Low quality names can lead to bugs and poor code quality (i.e., more-complex, less-readable and
less-maintainable) [3]
[1]
Feitelson, D., Mizrahi, A., Noy, N., Shabat, A. B., Eliyahu, O., & Sheffer, R. (2020). How developers choose names. IEEE Transactions on Software Engineering.
[2]
Hofmeister, J., Siegmund, J., & Holt, D. V. (2017, February). Shorter identifier names take longer to comprehend. In 2017 IEEE 24th International conference on software analysis, evolution and reengineering (SANER) (pp. 217-227). IEEE.
[3]
Butler, S., Wermelinger, M., Yu, Y., & Sharp, H. (2010, March). Exploring the influence of identifier names on code quality: An empirical study. In 2010 14th European Conference on Software Maintenance and Reengineering (pp. 156-165). IEEE.
Some poor-quality names are easy to spot…
… others are not so straightforward!
Challenges with determining the quality of names
● A readable name does not always mean its a high quality name
○ context is important!
● Words are diverse and subjective; for example, a single word can have multiple
meanings (homonyms)
○ Example: Bank can mean rivier bank or financial institution
● In English prose, the context is provided in natural language, this is not the case
with identifier names – context is part of the behaviour of the code
○ Example: the method: “doForward()” can either refer to an HTTP redirect operation
or to move an image on screen
● Challenge: understanding how to map the meaning of natural language phrases to the
behavior of the code
A Primer on High-Quality Identifier Naming
Challenges with renames
A “rename chain” - multiple instances
of developers renaming an identifier
A Primer on High-Quality Identifier Naming
Challenges with renaming models
● Models only provide name recommendations
● They do not provide details as to why the
proposed name is a good replacement
● Does not indicate what parts of the code are
influencing the model’s recommendation
● Developer will continually make the same
naming mistake
Challenges with renaming models
Source code can differ between different
environments; a model built and evaluated in one
environment will perform badly in another
A Primer on High-Quality Identifier Naming
Linguistic Anti-Patterns
Smells
● Smells are specific structures in the code that deviate from fundamental programming
practices
● Smells make code harder to understand and make it more prone to bugs and changes
● Smells are a surface indication that usually corresponds to a deeper problem in the
software system
● Types of smells:
○ Code Smells (e.g., Long Method, Large Class, Dead Code, etc.)
○ Test Smells (e.g., Assertion Roulette, Eager Test, Lazy Test, etc.)
○ Database Smells (e.g., Multi-purpose column, Tables with many columns, etc.)
○ Linguistic Smells
○ ….
Linguistic Anti-Patterns
Represent deviations from well-established lexical naming practices in source code
Act as indicators of poor naming quality
Typically take the form of an identifier name that incorrectly describes the behavior of the entity
that it represents OR an entity that betrays the behavior conveyed linguistically by its
corresponding identifier
Leads to code misinterpretation by developers increasing cognitive load [1]
First conceptualized by Arnaoudova et al. [2]
Catalog of 15+ anti-patterns
[1]
Fakhoury, S., Ma, Y., Arnaoudova, V., & Adesope, O. (2018, May). The effect of poor source code lexicon and readability on developers' cognitive load. In 2018 IEEE/ACM 26th International Conference on Program Comprehension (ICPC) (pp. 286-28610). IEEE.
[2]
Arnaoudova, V., Di Penta, M., Antoniol, G., & Guéhéneuc, Y. G. (2013, March). A new family of software anti-patterns: Linguistic anti-patterns. In 2013 17th European Conference on Software Maintenance and Reengineering (pp. 187-196). IEEE.
Categories of linguistic anti-patterns
Methods:
1. Do more than they say
2. Say more than they do
3. Do the opposite than they say
4. The entity contains more than what it says
Attributes:
5. The name says more than the entity contains
6. The name says the opposite than the entity contains
Catalog of linguistic anti-patterns
“Get” more than accessor
“Is” returns more than a Boolean
“Set” method returns
Expecting but not getting single instance
Not implemented condition
Validation method does not confirm
“Get” method does not return
Not answered question
Transform method does not return
Expecting but not getting a collection
Method name and return type are opposite
Method signature and comment are opposite
Says one but contains many
Name suggests boolean but type is not
Says many but contains one
Attribute name and type are opposite
Attribute signature and comment are opposite
Get more than accessor
A getter that performs actions other than returning the corresponding attribute
Example: method getImageData which always returns a new object
How to resolve:
1. The method name should change so that it is not a getter or
2. the implementation should be corrected to conform to standard get-method behavior
Is returns more than a Boolean
The name of a method is a predicate suggesting a true/false value in return. However the return type is not Boolean but rather a more
complex type thus allowing a wider range of values without documenting them
Example: method isValid with return type int
How to resolve:
1. The type should be changed to boolean to reflect the function's behavior as a binary predicate.
2. Consider changing the name such that it does not imply a yes/no question and provides some indication of n-ary return values.
3. Carefully document the meaning of each value that can be returned. Thoroughly test each value.
Set method returns
A set method having a return type different than void without proper documentation of the return type/values
Example: method setBreadth has a non-void return type
How to resolve:
1. The word set, when used in this manner, has a specific definition in the programming domain. Consider using a different term, such as
change.
2. Correct the implementation such that it works like a stereotypical set method (i.e., void return, mutates a class attribute)
3. Carefully document the reasoning behind using set while also returning a value
Expecting but not getting single instance
The name of a method indicates that a single object is returned but the return type is a collection
Example: method getExpansion, which ends with a head-noun that is singular, but returns a List object
How to resolve:
1. Correct the method name so that it is plural-- getExpansions()
Not implemented condition
The comments of a method suggest a conditional behavior that is not implemented in the code. When the implementation is default
this should be documented.
Example: method getChildren has a comment which indicates there should be a conditional within its body.
How to resolve:
1. Complete implementation of the method
2. Document (i.e., update the comment) that the method is incomplete and does not implement the behavior indicated in its comment
Validation method does not confirm
A validation method (e.g., name starting with "validate", "check", "ensure") does not confirm the validation, i.e., the method neither
provides a return value informing whether the validation was successful, nor documents how to proceed to understand
Example: method checkCollision returns void despite indicating that it is designed to perform validation
How to resolve:
1. Change method to return confirmation (i.e., true or false)
2. Consider changing the name to avoid implication of validation behavior (i.e., avoid terms like check and is)
3. If the previous options are not available then thoroughly document method behavior, consider highlighting irregular validation behavior
Get method does not return
The name suggests that the method returns something (e.g., name starts with "get" or "return") but the return type is void. The
documentation should explain where the resulting data is stored and how to obtain it
Example: method getMethodBodies has a void return type but its name indicates that it is a getter method
How to resolve:
1. Change method to return correct entity
2. Consider changing the name to avoid the word get
3. If the previous options are not available then thoroughly document method behavior, consider highlighting irregular getter behavior
Not answered question
The name of a method is in the form of predicate whereas the return type is not Boolean
Example: method isValid with a void return type
How to resolve:
1. Change method to return correct entity
2. Consider changing the name to avoid the word get
3. If the previous options are not available then thoroughly document method behavior, consider highlighting irregular getter behavior
Transform method does not return
The name of a method suggests the transformation of an object but there is no return value and it is not clear from the documentation
where the result is stored.
Example: method javaToNative has a void return type but indicates that it performs a transformation (i.e., type conversion).
How to resolve:
1. Change method to return correct entity
2. If the previous option is not available then thoroughly document method behavior, consider highlighting irregular transformation
behavior
Expecting but not getting a collection
The name of a method suggests that a collection should be returned but a single object or nothing is returned
Example: method getStats with a Boolean return type; making it difficult to understand the reason behind the plurality of the method name.
How to resolve:
1. Change the name of the method (and any related identifier names) so that it is singular instead of plural
Method name and return type are opposite
The intent of the method suggested by its name is in contradiction with what it returns
Example: method disable with return type ControlEnableState. The words "disable" and "enable" having opposite meanings.
How to resolve:
1. Change method name so that it aligns better with the return type (i.e., change disable to enable)
2. Change type name to align better with method name (i.e., to ControlDisableState)
Method signature and comment are opposite
The documentation of a method is in contradiction with its declaration
Example: method isNavigateForwardEnabled is in contradiction with its comment documenting "a back navigation", as "forward" and "back" are
antonyms
How to resolve:
1. Change the comment to specify that this method is for forward navigation
Says one but contains many
The name of an attribute suggests a single instance, while its type suggests that the attribute stores a collection of objects
Example: attribute _target that is of type Vector. It is unclear whether a change aspects one or multiple instances in the collection.
How to resolve:
1. Change the identifier name to reflect plurality of its type (i.e., _target -> _targets)
Name suggests boolean but type is not
The name of an attribute suggests that its value is true or false, but its declaring type is not Boolean
Example: attribute isReached that is of type int[] where the declared type and values are not documented.
How to resolve:
1. Change the name of the identifier to be more descriptive with respect to what kind of array it represents.
2. Consider removing the word is and using a different term unless the array represents a sequence of appropriate (i.e., boolean-like)
values
3. If appropriate, consider using a boolean array
4. Carefully document the data represented by the array, including the reasoning for its integer type and whether different integer values
have different meanings
Says many but contains one
The name of an attribute suggests multiple instances, but its type suggests a single one
Example: attribute _stats that is of type Boolean. Documenting such inconsistencies avoids additional comprehension effort to understand the
purpose of the attribute.
How to resolve:
1. Change identifier name to singular instead of plural
Attribute name and type are opposite
The name of an attribute is in contradiction with its type as they contain antonyms
Example: attribute start that is of type MAssociationEnd. The use of antonyms can induce wrong assumptions.
How to resolve:
1. Change identifier name to align with type name (i.e., change start to end)
Attribute signature and comment are opposite
The declaration of an attribute is in contradiction with its documentation
Example: attribute INCLUDE_NAME_DEFAULT whose comment documents an "exclude pattern". Whether the pattern is included or excluded is
thus unclear
How to resolve:
1. Change identifier name to align with comment (i.e., include -> exclude)
2. Change comment to align with method name (i.e., exclude -> include)
Grammar Patterns
Challenges with determining the quality of names
One challenge to studying identifiers is the difficulty in understanding how to map the meaning of natural
language phrases to the behavior of the code.
A second challenge lies in the natural language analysis techniques themselves, many of which are not
trained to be applied to software
Part-of-Speech
Part-of-speech is a category to which a word is assigned in accordance with its syntactic
functions
In English, the main parts of speech are noun, pronoun, adjective, determiner, verb,
adverb, preposition, conjunction, and interjection
Can help us reason about the meaning of words and code behavior
Grammar Patterns
Identifier Phase Structure != Human Language Phrase Structure
A grammar pattern is the sequence of part-of-speech tags assigned to individual words
within an identifier
Grammar patterns allow a more efficient analysis by broadly categorizing words into their
corresponding part-of-speech
Sample grammar patterns
Common identifier naming patterns
Common identifier naming patterns
Noun Phrase - common naming pattern for identifiers that are not function names; A good identifier will include only
enough noun-modifiers to concisely define the concept represented by the head-noun
Plural noun phrase - Identifiers that follow this pattern are usually not function names; these identifiers are more likely to
have a collection data type
Verb Phrase - typically either function identifiers or identifiers with a boolean type; for non-boolean the verb is an action,
otherwise its a predicate
Prepositional Phrase - used in many types of identifiers; The preposition typically explains how the entity represented by
the accompanying noun or verb-phrase are related
Noun phrase with leading determiner - used in many types of identifiers; determiner tells us how much of the population,
which is specified by the noun-phrase, is represented, or acted on, by the identifier
Verb Pattern - typically function names or identifiers with a boolean type; missing a noun phrase; the noun phrase is
implied by the program context or it is present in the function parameters.
Tools
Tools to analyze/transform identifiers
Naming Violation Detection
● Detects 19 types of linguistic anti-patterns
● Provides an explanation of the violation
● Analyzes C# & Java source code
● Supports project-specific customizations
● Average precision: 75.27%
● Open-source
● https://github.com/SCANL/ProjectSunshine/blob/
master/documentaion/IDEAL/SetupAndUse.md
Ensemble Part-of-Speech Tagger
● Tagger uses machine-learning and the
output from multiple part-of-speech
taggers to annotate natural language text
● The ensemble uses three state-of-the-art
part-of-speech taggers: SWUM, POSSE,
and Stanford
● Accuracy of 86%; Outperforms Stanford by
51%
● Open-source
Peruma, A., Arnaoudova, V., & Newman, C. D. (2021, September). Ideal: An
open-source identifier name appraisal tool. In 2021 IEEE International Conference
on Software Maintenance and Evolution (ICSME) (pp. 599-603). IEEE.
Newman, C. D., Decker, M. J., Alsuhaibani, R., Peruma, A., Mkaouer, M., Mohapatra, S.,
... & Hill, E. (2021). An ensemble approach for annotating source code identifiers with
part-of-speech tags. IEEE Transactions on Software Engineering.
Tools to analyze/transform identifiers
These are just some of the identifier related tools that are available for the developer and research community
Rename recommendation models
G. Li et al., “A Survey on Renamings of Software Entities”, in ACM Comput. Surv.
Code readability models
S. Scalabrino et al., “A comprehensive model for code readability.” in Journal of Software: Evolution and Process
LAPD: linguistic anti-pattern detector
V. Arnaoudova, et al., "Linguistic antipatterns: What they are and how developers perceive them," in Empirical Software Engineering.
Spiral: splitters for identifiers in source code files
M. Hucka, “Spiral: splitters for identifiers in source code files,” in Journal of Open Source Software.
Nominal: Java library to test compliance of identifier names with naming conventions
S. Butler et al., "Investigating naming convention adherence in Java references," 2015 IEEE International Conference on Software Maintenance and Evolution.
Demo
● The Ensemble Tagger
● IDEAL
Conclusion
Summary
● Naming identifiers is one of the most challenging tasks for developers
● A high-quality name should reflect its intended behavior
● Names are diverse and subjective – this makes it challenging to
automatically determine their quality
● Linguistic anti-patterns – deviations from lexical naming practices
● Grammar patterns – allow a more efficient analysis of names
● Availability of tools to assist developers with crafting and maintaining
names, but they are not a complete one-stop solution
Additional sources
● Identifier Naming Structure Catalogue
○ https://github.com/SCANL/identifier_name_structure_catalogue
Thanks!
Anthony Peruma
https://www.peruma.me
https://www.scanl.org

More Related Content

PDF
A Primer on High-Quality Identifier Naming [ASE 2022]
PDF
Supporting the Maintenance of Identifier Names: A Holistic Approach to High-Q...
PDF
Aq35241246
PDF
Contextualizing Rename Decisions using Refactorings and Commit Messages
PDF
Object oriented software engineering concepts
PPT
Getting Unstuck: Working with Legacy Code and Data
PDF
Multiple Methods and Techniques in Analyzing Computer-Supported Collaborative...
PDF
Reflective Plan Examples
A Primer on High-Quality Identifier Naming [ASE 2022]
Supporting the Maintenance of Identifier Names: A Holistic Approach to High-Q...
Aq35241246
Contextualizing Rename Decisions using Refactorings and Commit Messages
Object oriented software engineering concepts
Getting Unstuck: Working with Legacy Code and Data
Multiple Methods and Techniques in Analyzing Computer-Supported Collaborative...
Reflective Plan Examples

Similar to A Primer on High-Quality Identifier Naming (20)

PDF
IRJET- Short-Text Semantic Similarity using Glove Word Embedding
PDF
Patterns of Value
PDF
Butler
PDF
Analyzing Text Preprocessing and Feature Selection Methods for Sentiment Anal...
PPTX
Successful Single-Source Content Development
DOCX
SummaryAssessment Type Analytical Report semiotic analysisDu.docx
PPT
Using Qualitative Data Analysis Software By Michelle C. Bligh, Ph.D., Claremo...
PDF
Paper id 28201441
PDF
Natural Language Processing Through Different Classes of Machine Learning
PPT
Design pattern & categories
PDF
PDF
IDEAL: An Open-Source Identifier Name Appraisal Tool
PPT
Requirement Management.ppt
PPTX
Object oriented data model
PPTX
VOC real world enterprise needs
PDF
II BCA JAVA PROGRAMMING NOTES FOR FIVE UNITS.pdf
PDF
IRJET- Rating Prediction based on Textual Review: Machine Learning Approach, ...
DOCX
Java OOPs Concepts.docx
PDF
Implementation of Semantic Analysis Using Domain Ontology
PDF
It Doesn't Do What You Think It Does
IRJET- Short-Text Semantic Similarity using Glove Word Embedding
Patterns of Value
Butler
Analyzing Text Preprocessing and Feature Selection Methods for Sentiment Anal...
Successful Single-Source Content Development
SummaryAssessment Type Analytical Report semiotic analysisDu.docx
Using Qualitative Data Analysis Software By Michelle C. Bligh, Ph.D., Claremo...
Paper id 28201441
Natural Language Processing Through Different Classes of Machine Learning
Design pattern & categories
IDEAL: An Open-Source Identifier Name Appraisal Tool
Requirement Management.ppt
Object oriented data model
VOC real world enterprise needs
II BCA JAVA PROGRAMMING NOTES FOR FIVE UNITS.pdf
IRJET- Rating Prediction based on Textual Review: Machine Learning Approach, ...
Java OOPs Concepts.docx
Implementation of Semantic Analysis Using Domain Ontology
It Doesn't Do What You Think It Does
Ad

More from University of Hawai‘i at Mānoa (20)

PDF
Exploring Code Comprehension in Scientific Programming: Preliminary Insight...
PDF
Exploring Accessibility Trends and Challenges in Mobile App Development: A St...
PDF
The Impact of Generative AI-Powered Code Generation Tools on Software Enginee...
PDF
Mobile App Security Trends and Topics: An Examination of Questions From Stack...
PDF
On the Rationale and Use of Assertion Messages in Test Code: Insights from So...
PDF
A Developer-Centric Study Exploring Mobile Application Security Practices and...
PDF
Building Hawaii’s IT Future Together CIO Council & UH Manoa ICS Collaboration
PDF
Impostor Syndrome in Final Year Computer Science Students: An Eye Tracking an...
PDF
An Exploratory Study on the Occurrence of Self-Admitted Technical Debt in And...
PDF
Performance Comparison of Binary Machine Learning Classifiers in Identifying ...
PDF
Rename Chains: An Exploratory Study on the Occurrence and Characteristics of ...
PDF
Preparing for the Academic Job Market: Experience and Tips from a Recent F...
PDF
Refactoring Debt: Myth or Reality? An Exploratory Study on the Relationship B...
PDF
Test Anti-Patterns: From Definition to Detection
PDF
Refactoring Debt: Myth or Reality? An Exploratory Study on the Relationship B...
PDF
Understanding Digits in Identifier Names: An Exploratory Study
PDF
How Do I Refactor This? An Empirical Study on Refactoring Trends and Topics i...
PDF
Using Grammar Patterns to Interpret Test Method Name Evolution
PDF
On the Distribution of "Simple Stupid Bugs" in Unit Test Files: An Explorator...
PDF
An Exploratory Study on the Refactoring of Unit Test Files in Android Applica...
Exploring Code Comprehension in Scientific Programming: Preliminary Insight...
Exploring Accessibility Trends and Challenges in Mobile App Development: A St...
The Impact of Generative AI-Powered Code Generation Tools on Software Enginee...
Mobile App Security Trends and Topics: An Examination of Questions From Stack...
On the Rationale and Use of Assertion Messages in Test Code: Insights from So...
A Developer-Centric Study Exploring Mobile Application Security Practices and...
Building Hawaii’s IT Future Together CIO Council & UH Manoa ICS Collaboration
Impostor Syndrome in Final Year Computer Science Students: An Eye Tracking an...
An Exploratory Study on the Occurrence of Self-Admitted Technical Debt in And...
Performance Comparison of Binary Machine Learning Classifiers in Identifying ...
Rename Chains: An Exploratory Study on the Occurrence and Characteristics of ...
Preparing for the Academic Job Market: Experience and Tips from a Recent F...
Refactoring Debt: Myth or Reality? An Exploratory Study on the Relationship B...
Test Anti-Patterns: From Definition to Detection
Refactoring Debt: Myth or Reality? An Exploratory Study on the Relationship B...
Understanding Digits in Identifier Names: An Exploratory Study
How Do I Refactor This? An Empirical Study on Refactoring Trends and Topics i...
Using Grammar Patterns to Interpret Test Method Name Evolution
On the Distribution of "Simple Stupid Bugs" in Unit Test Files: An Explorator...
An Exploratory Study on the Refactoring of Unit Test Files in Android Applica...
Ad

Recently uploaded (20)

PDF
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
PDF
System and Network Administration Chapter 2
PDF
Softaken Excel to vCard Converter Software.pdf
PPT
Introduction Database Management System for Course Database
PPTX
Transform Your Business with a Software ERP System
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
PPTX
ManageIQ - Sprint 268 Review - Slide Deck
PPTX
VVF-Customer-Presentation2025-Ver1.9.pptx
PDF
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
PDF
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
PPTX
Introduction to Artificial Intelligence
PDF
Odoo Companies in India – Driving Business Transformation.pdf
PDF
How to Migrate SBCGlobal Email to Yahoo Easily
PDF
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
PPTX
ai tools demonstartion for schools and inter college
PDF
How to Choose the Right IT Partner for Your Business in Malaysia
PPTX
Online Work Permit System for Fast Permit Processing
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 41
PPTX
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
PDF
top salesforce developer skills in 2025.pdf
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
System and Network Administration Chapter 2
Softaken Excel to vCard Converter Software.pdf
Introduction Database Management System for Course Database
Transform Your Business with a Software ERP System
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
ManageIQ - Sprint 268 Review - Slide Deck
VVF-Customer-Presentation2025-Ver1.9.pptx
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
Introduction to Artificial Intelligence
Odoo Companies in India – Driving Business Transformation.pdf
How to Migrate SBCGlobal Email to Yahoo Easily
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
ai tools demonstartion for schools and inter college
How to Choose the Right IT Partner for Your Business in Malaysia
Online Work Permit System for Fast Permit Processing
Internet Downloader Manager (IDM) Crack 6.42 Build 41
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
top salesforce developer skills in 2025.pdf

A Primer on High-Quality Identifier Naming

  • 1. A Primer on High-Quality Identifier Naming Anthony Peruma Ph.D. Candidate - Rochester Institute of Technology, USA Incoming Assistant Professor - University of Hawai‘i at Mānoa, USA International Conference on Software and Systems Reuse (ICSR), 15-17 June 2022
  • 2. About Anthony… Experience/Qualifications Assistant Professor - University of Hawai‘i at Mānoa, USA (Starting August 2022) Ph.D. Candidate - Rochester Institute of Technology, USA (Expected June 2022) Masters in Software Engineering - Rochester Institute of Technology, USA 10+ years of industry experience Research Interests Program Comprehension - Identifier Naming Software Quality - Test Smells Software Refactoring Software Maintenance & Evolution Empirical Software Engineering https://www.peruma.me https://twitter.com/ShehanPeruma
  • 3. Agenda ● Introduction ○ What are identifiers and why are their names important? ● Linguistic Anti-Patterns ○ Introduction to the types of identifier naming violations ● Grammar Patterns ○ Common semantic structures for identifier names ● Tools ○ Some tools that can help developers and researchers with identifier naming ● Conclusion ○ Summary and additional resources
  • 5. Software Maintenance ● Consumes 60% - 80% of organization resources [1,2] ● Poor maintenance → Low quality software ● Includes: ○ Fixing bugs ○ Incorporating new or updating features ○ Improving the internal quality of the system [1] Lientz, B. P., Swanson, E. B., & Tompkins, G. E. (1978). Characteristics of application software maintenance. Communications of the ACM, 21(6), 466-471. [2] R.S. Pressman. Software engineering: a practitioner's approach. McGraw-Hill higher education. McGraw-Hill Education, 2010.
  • 6. Program Comprehension ● Developers need to understand the code before applying changes or debugging ● 58% of developers time is spent on comprehension activities [1] ● Poor code readability impacts time and quality ● Application growth → ○ More classes/files → ■ More lines of code [1] Xia, X., Bao, L., Lo, D., Xing, Z., Hassan, A. E., & Li, S. (2017). Measuring program comprehension: A large-scale field study with professionals. IEEE Transactions on Software Engineering, 44(10), 951-976.
  • 7. Identifiers ● Lexical tokens that uniquely identify elements in the source code ○ Classes, Methods, etc. ● Everywhere in source code – significant part in code comprehension ○ Account for 70% of characters in the code base [1] ● Must be read to understand behaviour and before any other coding activity ● Automated techniques user identifier data [1] Deissenboeck, F., & Pizka, M. (2006). Concise and consistent naming. Software Quality Journal, 14(3), 261-282.
  • 8. Lexical tokens that uniquely identify entities
  • 9. What is a good name? Crafting names can be challenging – probability of two developer picking the same name is 7% [1] A strong/high-quality name must reflect its intended behavior Name should concisely summarize the role of its correlating entity Good names are important – Hence, many organizations emphasis the use of best practices and coding standards by development teams High quality identifiers improves comprehension time by 19+% [2] Low quality names can lead to bugs and poor code quality (i.e., more-complex, less-readable and less-maintainable) [3] [1] Feitelson, D., Mizrahi, A., Noy, N., Shabat, A. B., Eliyahu, O., & Sheffer, R. (2020). How developers choose names. IEEE Transactions on Software Engineering. [2] Hofmeister, J., Siegmund, J., & Holt, D. V. (2017, February). Shorter identifier names take longer to comprehend. In 2017 IEEE 24th International conference on software analysis, evolution and reengineering (SANER) (pp. 217-227). IEEE. [3] Butler, S., Wermelinger, M., Yu, Y., & Sharp, H. (2010, March). Exploring the influence of identifier names on code quality: An empirical study. In 2010 14th European Conference on Software Maintenance and Reengineering (pp. 156-165). IEEE.
  • 10. Some poor-quality names are easy to spot…
  • 11. … others are not so straightforward!
  • 12. Challenges with determining the quality of names ● A readable name does not always mean its a high quality name ○ context is important! ● Words are diverse and subjective; for example, a single word can have multiple meanings (homonyms) ○ Example: Bank can mean rivier bank or financial institution ● In English prose, the context is provided in natural language, this is not the case with identifier names – context is part of the behaviour of the code ○ Example: the method: “doForward()” can either refer to an HTTP redirect operation or to move an image on screen ● Challenge: understanding how to map the meaning of natural language phrases to the behavior of the code
  • 14. Challenges with renames A “rename chain” - multiple instances of developers renaming an identifier
  • 16. Challenges with renaming models ● Models only provide name recommendations ● They do not provide details as to why the proposed name is a good replacement ● Does not indicate what parts of the code are influencing the model’s recommendation ● Developer will continually make the same naming mistake
  • 17. Challenges with renaming models Source code can differ between different environments; a model built and evaluated in one environment will perform badly in another
  • 20. Smells ● Smells are specific structures in the code that deviate from fundamental programming practices ● Smells make code harder to understand and make it more prone to bugs and changes ● Smells are a surface indication that usually corresponds to a deeper problem in the software system ● Types of smells: ○ Code Smells (e.g., Long Method, Large Class, Dead Code, etc.) ○ Test Smells (e.g., Assertion Roulette, Eager Test, Lazy Test, etc.) ○ Database Smells (e.g., Multi-purpose column, Tables with many columns, etc.) ○ Linguistic Smells ○ ….
  • 21. Linguistic Anti-Patterns Represent deviations from well-established lexical naming practices in source code Act as indicators of poor naming quality Typically take the form of an identifier name that incorrectly describes the behavior of the entity that it represents OR an entity that betrays the behavior conveyed linguistically by its corresponding identifier Leads to code misinterpretation by developers increasing cognitive load [1] First conceptualized by Arnaoudova et al. [2] Catalog of 15+ anti-patterns [1] Fakhoury, S., Ma, Y., Arnaoudova, V., & Adesope, O. (2018, May). The effect of poor source code lexicon and readability on developers' cognitive load. In 2018 IEEE/ACM 26th International Conference on Program Comprehension (ICPC) (pp. 286-28610). IEEE. [2] Arnaoudova, V., Di Penta, M., Antoniol, G., & Guéhéneuc, Y. G. (2013, March). A new family of software anti-patterns: Linguistic anti-patterns. In 2013 17th European Conference on Software Maintenance and Reengineering (pp. 187-196). IEEE.
  • 22. Categories of linguistic anti-patterns Methods: 1. Do more than they say 2. Say more than they do 3. Do the opposite than they say 4. The entity contains more than what it says Attributes: 5. The name says more than the entity contains 6. The name says the opposite than the entity contains
  • 23. Catalog of linguistic anti-patterns “Get” more than accessor “Is” returns more than a Boolean “Set” method returns Expecting but not getting single instance Not implemented condition Validation method does not confirm “Get” method does not return Not answered question Transform method does not return Expecting but not getting a collection Method name and return type are opposite Method signature and comment are opposite Says one but contains many Name suggests boolean but type is not Says many but contains one Attribute name and type are opposite Attribute signature and comment are opposite
  • 24. Get more than accessor A getter that performs actions other than returning the corresponding attribute Example: method getImageData which always returns a new object How to resolve: 1. The method name should change so that it is not a getter or 2. the implementation should be corrected to conform to standard get-method behavior
  • 25. Is returns more than a Boolean The name of a method is a predicate suggesting a true/false value in return. However the return type is not Boolean but rather a more complex type thus allowing a wider range of values without documenting them Example: method isValid with return type int How to resolve: 1. The type should be changed to boolean to reflect the function's behavior as a binary predicate. 2. Consider changing the name such that it does not imply a yes/no question and provides some indication of n-ary return values. 3. Carefully document the meaning of each value that can be returned. Thoroughly test each value.
  • 26. Set method returns A set method having a return type different than void without proper documentation of the return type/values Example: method setBreadth has a non-void return type How to resolve: 1. The word set, when used in this manner, has a specific definition in the programming domain. Consider using a different term, such as change. 2. Correct the implementation such that it works like a stereotypical set method (i.e., void return, mutates a class attribute) 3. Carefully document the reasoning behind using set while also returning a value
  • 27. Expecting but not getting single instance The name of a method indicates that a single object is returned but the return type is a collection Example: method getExpansion, which ends with a head-noun that is singular, but returns a List object How to resolve: 1. Correct the method name so that it is plural-- getExpansions()
  • 28. Not implemented condition The comments of a method suggest a conditional behavior that is not implemented in the code. When the implementation is default this should be documented. Example: method getChildren has a comment which indicates there should be a conditional within its body. How to resolve: 1. Complete implementation of the method 2. Document (i.e., update the comment) that the method is incomplete and does not implement the behavior indicated in its comment
  • 29. Validation method does not confirm A validation method (e.g., name starting with "validate", "check", "ensure") does not confirm the validation, i.e., the method neither provides a return value informing whether the validation was successful, nor documents how to proceed to understand Example: method checkCollision returns void despite indicating that it is designed to perform validation How to resolve: 1. Change method to return confirmation (i.e., true or false) 2. Consider changing the name to avoid implication of validation behavior (i.e., avoid terms like check and is) 3. If the previous options are not available then thoroughly document method behavior, consider highlighting irregular validation behavior
  • 30. Get method does not return The name suggests that the method returns something (e.g., name starts with "get" or "return") but the return type is void. The documentation should explain where the resulting data is stored and how to obtain it Example: method getMethodBodies has a void return type but its name indicates that it is a getter method How to resolve: 1. Change method to return correct entity 2. Consider changing the name to avoid the word get 3. If the previous options are not available then thoroughly document method behavior, consider highlighting irregular getter behavior
  • 31. Not answered question The name of a method is in the form of predicate whereas the return type is not Boolean Example: method isValid with a void return type How to resolve: 1. Change method to return correct entity 2. Consider changing the name to avoid the word get 3. If the previous options are not available then thoroughly document method behavior, consider highlighting irregular getter behavior
  • 32. Transform method does not return The name of a method suggests the transformation of an object but there is no return value and it is not clear from the documentation where the result is stored. Example: method javaToNative has a void return type but indicates that it performs a transformation (i.e., type conversion). How to resolve: 1. Change method to return correct entity 2. If the previous option is not available then thoroughly document method behavior, consider highlighting irregular transformation behavior
  • 33. Expecting but not getting a collection The name of a method suggests that a collection should be returned but a single object or nothing is returned Example: method getStats with a Boolean return type; making it difficult to understand the reason behind the plurality of the method name. How to resolve: 1. Change the name of the method (and any related identifier names) so that it is singular instead of plural
  • 34. Method name and return type are opposite The intent of the method suggested by its name is in contradiction with what it returns Example: method disable with return type ControlEnableState. The words "disable" and "enable" having opposite meanings. How to resolve: 1. Change method name so that it aligns better with the return type (i.e., change disable to enable) 2. Change type name to align better with method name (i.e., to ControlDisableState)
  • 35. Method signature and comment are opposite The documentation of a method is in contradiction with its declaration Example: method isNavigateForwardEnabled is in contradiction with its comment documenting "a back navigation", as "forward" and "back" are antonyms How to resolve: 1. Change the comment to specify that this method is for forward navigation
  • 36. Says one but contains many The name of an attribute suggests a single instance, while its type suggests that the attribute stores a collection of objects Example: attribute _target that is of type Vector. It is unclear whether a change aspects one or multiple instances in the collection. How to resolve: 1. Change the identifier name to reflect plurality of its type (i.e., _target -> _targets)
  • 37. Name suggests boolean but type is not The name of an attribute suggests that its value is true or false, but its declaring type is not Boolean Example: attribute isReached that is of type int[] where the declared type and values are not documented. How to resolve: 1. Change the name of the identifier to be more descriptive with respect to what kind of array it represents. 2. Consider removing the word is and using a different term unless the array represents a sequence of appropriate (i.e., boolean-like) values 3. If appropriate, consider using a boolean array 4. Carefully document the data represented by the array, including the reasoning for its integer type and whether different integer values have different meanings
  • 38. Says many but contains one The name of an attribute suggests multiple instances, but its type suggests a single one Example: attribute _stats that is of type Boolean. Documenting such inconsistencies avoids additional comprehension effort to understand the purpose of the attribute. How to resolve: 1. Change identifier name to singular instead of plural
  • 39. Attribute name and type are opposite The name of an attribute is in contradiction with its type as they contain antonyms Example: attribute start that is of type MAssociationEnd. The use of antonyms can induce wrong assumptions. How to resolve: 1. Change identifier name to align with type name (i.e., change start to end)
  • 40. Attribute signature and comment are opposite The declaration of an attribute is in contradiction with its documentation Example: attribute INCLUDE_NAME_DEFAULT whose comment documents an "exclude pattern". Whether the pattern is included or excluded is thus unclear How to resolve: 1. Change identifier name to align with comment (i.e., include -> exclude) 2. Change comment to align with method name (i.e., exclude -> include)
  • 42. Challenges with determining the quality of names One challenge to studying identifiers is the difficulty in understanding how to map the meaning of natural language phrases to the behavior of the code. A second challenge lies in the natural language analysis techniques themselves, many of which are not trained to be applied to software
  • 43. Part-of-Speech Part-of-speech is a category to which a word is assigned in accordance with its syntactic functions In English, the main parts of speech are noun, pronoun, adjective, determiner, verb, adverb, preposition, conjunction, and interjection Can help us reason about the meaning of words and code behavior
  • 44. Grammar Patterns Identifier Phase Structure != Human Language Phrase Structure A grammar pattern is the sequence of part-of-speech tags assigned to individual words within an identifier Grammar patterns allow a more efficient analysis by broadly categorizing words into their corresponding part-of-speech
  • 47. Common identifier naming patterns Noun Phrase - common naming pattern for identifiers that are not function names; A good identifier will include only enough noun-modifiers to concisely define the concept represented by the head-noun Plural noun phrase - Identifiers that follow this pattern are usually not function names; these identifiers are more likely to have a collection data type Verb Phrase - typically either function identifiers or identifiers with a boolean type; for non-boolean the verb is an action, otherwise its a predicate Prepositional Phrase - used in many types of identifiers; The preposition typically explains how the entity represented by the accompanying noun or verb-phrase are related Noun phrase with leading determiner - used in many types of identifiers; determiner tells us how much of the population, which is specified by the noun-phrase, is represented, or acted on, by the identifier Verb Pattern - typically function names or identifiers with a boolean type; missing a noun phrase; the noun phrase is implied by the program context or it is present in the function parameters.
  • 48. Tools
  • 49. Tools to analyze/transform identifiers Naming Violation Detection ● Detects 19 types of linguistic anti-patterns ● Provides an explanation of the violation ● Analyzes C# & Java source code ● Supports project-specific customizations ● Average precision: 75.27% ● Open-source ● https://github.com/SCANL/ProjectSunshine/blob/ master/documentaion/IDEAL/SetupAndUse.md Ensemble Part-of-Speech Tagger ● Tagger uses machine-learning and the output from multiple part-of-speech taggers to annotate natural language text ● The ensemble uses three state-of-the-art part-of-speech taggers: SWUM, POSSE, and Stanford ● Accuracy of 86%; Outperforms Stanford by 51% ● Open-source Peruma, A., Arnaoudova, V., & Newman, C. D. (2021, September). Ideal: An open-source identifier name appraisal tool. In 2021 IEEE International Conference on Software Maintenance and Evolution (ICSME) (pp. 599-603). IEEE. Newman, C. D., Decker, M. J., Alsuhaibani, R., Peruma, A., Mkaouer, M., Mohapatra, S., ... & Hill, E. (2021). An ensemble approach for annotating source code identifiers with part-of-speech tags. IEEE Transactions on Software Engineering.
  • 50. Tools to analyze/transform identifiers These are just some of the identifier related tools that are available for the developer and research community Rename recommendation models G. Li et al., “A Survey on Renamings of Software Entities”, in ACM Comput. Surv. Code readability models S. Scalabrino et al., “A comprehensive model for code readability.” in Journal of Software: Evolution and Process LAPD: linguistic anti-pattern detector V. Arnaoudova, et al., "Linguistic antipatterns: What they are and how developers perceive them," in Empirical Software Engineering. Spiral: splitters for identifiers in source code files M. Hucka, “Spiral: splitters for identifiers in source code files,” in Journal of Open Source Software. Nominal: Java library to test compliance of identifier names with naming conventions S. Butler et al., "Investigating naming convention adherence in Java references," 2015 IEEE International Conference on Software Maintenance and Evolution.
  • 51. Demo ● The Ensemble Tagger ● IDEAL
  • 53. Summary ● Naming identifiers is one of the most challenging tasks for developers ● A high-quality name should reflect its intended behavior ● Names are diverse and subjective – this makes it challenging to automatically determine their quality ● Linguistic anti-patterns – deviations from lexical naming practices ● Grammar patterns – allow a more efficient analysis of names ● Availability of tools to assist developers with crafting and maintaining names, but they are not a complete one-stop solution
  • 54. Additional sources ● Identifier Naming Structure Catalogue ○ https://github.com/SCANL/identifier_name_structure_catalogue