SlideShare a Scribd company logo
Using Source Code Metrics to Predict Change-
Prone Java Interfaces
Daniele Romano and Martin Pinzger
 Williamsburg, ICSM 2011
29 Sept 2011




         Delft
         University of
         Technology

         Challenge the future
Contributions
•  Correlation source code metrics vs #changes in interfaces:
   •  C&K metrics
   •  complexity and usage metrics
   •  interface usage cohesion metric
•  Predictive power of source code metrics for interfaces:
   •  prediction models
•  10 open source projects
   •  8 Eclipse projects
   •  Hibernate 2 and Hibernate 3

                                                             2
Motivations
•  Changes in interfaces are not desirable
   •  changes can have stronger impact
   •  interfaces define contracts
   •  existing object oriented metrics not sound for interfaces


 •  Related work about metrics as quality predictors
     •  no differences among the kind of class




                                                           3
Hypotheses

•  H1
   • InterfaceUsageCohesion (IUC) has a stronger
   correlation with number of Source Code Changes
   (#SCC) of interfaces than the C&K metrics
•  H2
   • IUC can improve the performance of prediction models
   to classify Java interfaces into change- and not-
   change-prone



                                                      4
The Approach

                      source code
                       repository




          metrics                   Changes
        computation                 Retrieval




Spearman rank         Correlation
                                       Metrics train models
  correlation         Prediction
                        analysis    Changes classify interfaces
                        analysis
      H1                                        H2

                                                        5
Metrics Computation

                     Evolizer Model
source code            Importer
 repository

                                      Famix
                                      Model
                      Computation
 Metrics
 Values               Understand




                                         6
Changes Computation

                                   Evolizer
      source code               Version Control
       repository                 Connector        Revisions
                                                     Info
                                                       &
                                                  Subsequent
                         Changes Computation         files
Fine-Grained
                            Evolizer Change
Source Code                     Distiller
  Changes
   (SCC)                    AST Comparison




                                                      7
Why SCC?
•  Filtering out useless changes due to modification of:
   •  licenses
   •  comments
•  More precise measurement




#Revision=1      #LineModified=1       #SCC=2

                                                           8
C&K Correlation for Interfaces
     Project            CB0            NOC        RFC         DIT     LCOM       WMC
Hibernate3           0.535**     0.029         0.592**     0.058     0.103     0.657**
Hibernate2           0.373**     0.065         0.325**     -0.01     0.006     0.522**
ecl.debug.core       0.484**     0.105         0.486**     0.232*    0.337     0.597**
ecl.debug.ui         0.216*      0.033         0.152       0.324**   0.214*    0.131
ecl.jface            0.239*      0.012         0.174**     0.103     0.320**   0.137
ecl.jdt.debug        0.512**     0.256**       0.349**     -0.049    0.238**   0.489**
ecl.team.core        0.367*      0.102         0.497**     0.243     0.400     0.451**
ecl.team.cvs.core    0.688**     -0.013        0.738**     0.618**   0.610**   0.744**
ecl.team.ui          0.301*      -0.003        0.299*      -0.103*   0.395**   0.299*
update.core          0.499**     -0.007        0.381**     0.146     0.482**   0.729**
     Median          0.428       0.031         0.365       0.124     0.328     0.505

            *= significant at α=0.05    **= significant at α=0.01

                                                                                   9
Weighted Methods per Class (WMC)


   •  ci cyclomatic complexity of the ith method
   •  n number of methods in a class



                                   Number of Methods




                                                   10
Interface Segregation Principle
  ISP
       defined by Robert C. Martin
       cope with fat interfaces


  Fat   interface
       interfaces that serve different clients
       each kind of client uses a different set of methods
       the interface should be split in more interface, each one
        designed to serve a specific client




                                                                    11
Interface Segregation Principle (I)




 Different clients do not share any methods

ClusterClients(i): counts the number of clients
that do not share any method of the interface i


                                                  12
Interface Usage Cohesion




 Different clients share a method




                                    13
Other metrics for interfaces…

•  Number Of Methods (NOM)
•  Number Of Arguments (NOA)
•  Arguments Per Procedure (APP)
•  Number of Clients (Cli)
•  Number of Invocations (Inv)
•  Number of Implementing Classes (Impl)




                                           14
Correlation for Interfaces
     Project            Inv            Cli     NOM          Clust      IUC
Hibernate3           0.544**    0.433**      0.657**     0.302**     -0.601**
Hibernate2           0.165      0.104        0.522**     0.016       -0.373**
ecl.debug.core       0.317**    0.327**      0.597**     0.273**     -0.682**
ecl.debug.ui         0.497**    0.498**      0.131       0.418**     -0.508**
ecl.jface            0.205      0.099        0.137       0.106**     -0.363**
ecl.jdt.debug        0.495**    0.471        0.489**     0.474**     -0.605**
ecl.team.core        0.261      0.278        0.451**     0.328*      -0.475**
ecl.team.cvs.core    0.557**    0.608** 0.744**          0.369       -0.819**
ecl.team.ui          0.290      0.270        0.299       0.056       -0.618**
update.core          0.677**    0.656** 0.729**          0.606**     -0.656**
     Median          0.317      0.327        0.505       0.328       -0.605

            *= significant at α=0.05     **= significant at α=0.01

                                                                                15
Prediction Analysis
•  Three Machine Learning Algorithms
    •  upport Vector Machine
     S
    •  aïve Bayes Network
     N

    •  eural Nets
     N

•  Interfaces classification:




•  Training using 10 fold cross-validation
    •  {CBO, RFC, LCOM, WMC} = CK
    •  {CBO, RFC, LCOM, WMC, IUC} = IUC
                                             16
Prediction – AUC values
                       NBayes             LibSVN                NN
      Project        CK     IUC      CK        IUC         CK        IUC
ecl.team.cvs.core   0.55    0.75    0.692    0.811   0.8         0.8
ecl.debug.core      0.75    0.79    0.806    0.828   0.85        0.875
ecl.debug.ui        0.66    0.72    0.71     0.742   0.748       0.766
Hibernate2          0.745   0.807   0.735    0.708   0.702       0.747
Hibernate3          0.835   0.862   0.64     0.856   0.874       0.843
ecl.jdt.debug       0.79    0.738   0.741    0.82    0.77        0.762
ecl.jface           0.639   0.734   0.607    0.778   0.553       0.542
ecl.team.core       0.708   0.792   0.617    0.608   0.725       0.85
ecl.team.ui         0.88     0.8    0.74     0.884   0.65        0.75
update.core         0.782   0.811   0.794    0.817   0.675       0.744
      Median        0.747   0.791   0.722    0.814   0.736       0.764


                                                                        17
Results
•  H1 ACCEPTED
   • IUC has a stronger correlation with #SCC of interfaces
   than the C&K metrics
   •  UIC shows the best correlation

•  H2 PARTIALLY ACCEPTED
   • IUC can improve the performance of prediction models
   to classify Java interfaces into change- and not-
   change-prone
   •  Despite the improvements Wilcoxon test showed a
   significant difference only for the LibSVM

                                                        18
Implications
• Researchers
  •  taking in account the nature of the measured entities


• Quality Engineers
  •  enlarge metrics suites


• Developers and Architects
  •  Measure the ISP violation



                                                        19
Future Work

• Metrics measurement overtime

• Further validation

• Are the shared methods the problem?

• Component Based System and Service Oriented System




                                                20
21

More Related Content

PDF
ERA - Measuring Maintainability of Spreadsheets in the Wild
PDF
Traceability - Structural Conformance Checking with Design Tests: An Evaluati...
PDF
Dynamic Analysis - SCOTCH: Improving Test-to-Code Traceability using Slicing ...
PDF
Industry - Relating Developers' Concepts and Artefact Vocabulary in a Financ...
PDF
Industry - Estimating software maintenance effort from use cases an indu...
PDF
Postdoc Symposium - Abram Hindle
PDF
Faults and Regression testing - Localizing Failure-Inducing Program Edits Bas...
PDF
ICSM'01 Most Influential Paper - Rainer Koschke
ERA - Measuring Maintainability of Spreadsheets in the Wild
Traceability - Structural Conformance Checking with Design Tests: An Evaluati...
Dynamic Analysis - SCOTCH: Improving Test-to-Code Traceability using Slicing ...
Industry - Relating Developers' Concepts and Artefact Vocabulary in a Financ...
Industry - Estimating software maintenance effort from use cases an indu...
Postdoc Symposium - Abram Hindle
Faults and Regression testing - Localizing Failure-Inducing Program Edits Bas...
ICSM'01 Most Influential Paper - Rainer Koschke

Viewers also liked (20)

PDF
Reliability and Quality - Predicting post-release defects using pre-release f...
PDF
Richard Kemmerer Keynote icsm11
PDF
Lionel Briand ICSM 2011 Keynote
PDF
Metrics - You can't control the unfamiliar
PDF
Industry - The Evolution of Information Systems. A Case Study on Document Man...
PDF
Impact Analysis - ImpactScale: Quantifying Change Impact to Predict Faults in...
PDF
Components - Crossing the Boundaries while Analyzing Heterogeneous Component-...
PDF
ERA Poster - Measuring Disruption from Software Evolution Activities Using Gr...
PDF
Industry - Testing & Quality Assurance in Data Migration Projects
PDF
Natural Language Analysis - Mining Java Class Naming Conventions
PDF
ERA - Measuring Disruption from Software Evolution Activities Using Graph-Bas...
PDF
ERA - Tracking Technical Debt
PDF
Tutorial 2 - Practical Combinatorial (t-way) Methods for Detecting Complex Fa...
PDF
Natural Language Analysis - Expanding Identifiers to Normalize Source Code Vo...
PDF
Industry - Precise Detection of Un-Initialized Variables in Large, Real-life ...
PDF
Components - Graph Based Detection of Library API Limitations
PDF
ERA - Clustering and Recommending Collections of Code Relevant to Task
PDF
Impact analysis - A Seismology-inspired Approach to Study Change Propagation
PDF
ERA - A Comparison of Stemmers on Source Code Identifiers for Software Search
PDF
Postdoc symposium - A Logic Meta-Programming Foundation for Example-Driven Pa...
Reliability and Quality - Predicting post-release defects using pre-release f...
Richard Kemmerer Keynote icsm11
Lionel Briand ICSM 2011 Keynote
Metrics - You can't control the unfamiliar
Industry - The Evolution of Information Systems. A Case Study on Document Man...
Impact Analysis - ImpactScale: Quantifying Change Impact to Predict Faults in...
Components - Crossing the Boundaries while Analyzing Heterogeneous Component-...
ERA Poster - Measuring Disruption from Software Evolution Activities Using Gr...
Industry - Testing & Quality Assurance in Data Migration Projects
Natural Language Analysis - Mining Java Class Naming Conventions
ERA - Measuring Disruption from Software Evolution Activities Using Graph-Bas...
ERA - Tracking Technical Debt
Tutorial 2 - Practical Combinatorial (t-way) Methods for Detecting Complex Fa...
Natural Language Analysis - Expanding Identifiers to Normalize Source Code Vo...
Industry - Precise Detection of Un-Initialized Variables in Large, Real-life ...
Components - Graph Based Detection of Library API Limitations
ERA - Clustering and Recommending Collections of Code Relevant to Task
Impact analysis - A Seismology-inspired Approach to Study Change Propagation
ERA - A Comparison of Stemmers on Source Code Identifiers for Software Search
Postdoc symposium - A Logic Meta-Programming Foundation for Example-Driven Pa...
Ad

Similar to Metrics - Using Source Code Metrics to Predict Change-Prone Java Interfaces (20)

PDF
Keynote HotSWUp 2012
PDF
Changes and Bugs: Mining and Predicting Development Activities
PDF
Unit testingandcontinousintegrationfreenest1dot4
KEY
Taking a Test Drive: iOS Dev UK guide to TDD
PPTX
Agile and CMMI - a potential blend
PDF
Function Points
PPTX
Quality Coding: What's New with Visual Studio 2012
PPTX
Quality Coding: What’s New with Visual Studio 2012
PPTX
Quality Coding with Visual Studio 2012
PDF
ICSM12.ppt
DOC
Student copybca sem3-se
PPTX
Testing Sap: Modern Methodology
PDF
A tale of bug prediction in software development
PPTX
Cost estimation techniques
PDF
01 software test engineering (manual testing)
KEY
Unit testing for Cocoa developers
PPTX
Project Scheduling and Tracking in Software Engineering.pptx
PPT
Impact of design complexity on software quality - A systematic review
PPTX
A New Reusability Metric for Object-Oriented Software
PPTX
Software estimation techniques
Keynote HotSWUp 2012
Changes and Bugs: Mining and Predicting Development Activities
Unit testingandcontinousintegrationfreenest1dot4
Taking a Test Drive: iOS Dev UK guide to TDD
Agile and CMMI - a potential blend
Function Points
Quality Coding: What's New with Visual Studio 2012
Quality Coding: What’s New with Visual Studio 2012
Quality Coding with Visual Studio 2012
ICSM12.ppt
Student copybca sem3-se
Testing Sap: Modern Methodology
A tale of bug prediction in software development
Cost estimation techniques
01 software test engineering (manual testing)
Unit testing for Cocoa developers
Project Scheduling and Tracking in Software Engineering.pptx
Impact of design complexity on software quality - A systematic review
A New Reusability Metric for Object-Oriented Software
Software estimation techniques
Ad

Recently uploaded (20)

PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
KodekX | Application Modernization Development
PPTX
Spectroscopy.pptx food analysis technology
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Encapsulation theory and applications.pdf
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
Empathic Computing: Creating Shared Understanding
PPT
Teaching material agriculture food technology
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
Approach and Philosophy of On baking technology
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Chapter 3 Spatial Domain Image Processing.pdf
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
Review of recent advances in non-invasive hemoglobin estimation
KodekX | Application Modernization Development
Spectroscopy.pptx food analysis technology
Diabetes mellitus diagnosis method based random forest with bat algorithm
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Mobile App Security Testing_ A Comprehensive Guide.pdf
Advanced methodologies resolving dimensionality complications for autism neur...
Understanding_Digital_Forensics_Presentation.pptx
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Encapsulation theory and applications.pdf
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Empathic Computing: Creating Shared Understanding
Teaching material agriculture food technology
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Approach and Philosophy of On baking technology

Metrics - Using Source Code Metrics to Predict Change-Prone Java Interfaces

  • 1. Using Source Code Metrics to Predict Change- Prone Java Interfaces Daniele Romano and Martin Pinzger Williamsburg, ICSM 2011 29 Sept 2011 Delft University of Technology Challenge the future
  • 2. Contributions •  Correlation source code metrics vs #changes in interfaces: •  C&K metrics •  complexity and usage metrics •  interface usage cohesion metric •  Predictive power of source code metrics for interfaces: •  prediction models •  10 open source projects •  8 Eclipse projects •  Hibernate 2 and Hibernate 3 2
  • 3. Motivations •  Changes in interfaces are not desirable •  changes can have stronger impact •  interfaces define contracts •  existing object oriented metrics not sound for interfaces •  Related work about metrics as quality predictors •  no differences among the kind of class 3
  • 4. Hypotheses •  H1 • InterfaceUsageCohesion (IUC) has a stronger correlation with number of Source Code Changes (#SCC) of interfaces than the C&K metrics •  H2 • IUC can improve the performance of prediction models to classify Java interfaces into change- and not- change-prone 4
  • 5. The Approach source code repository metrics Changes computation Retrieval Spearman rank Correlation Metrics train models correlation Prediction analysis Changes classify interfaces analysis H1 H2 5
  • 6. Metrics Computation Evolizer Model source code Importer repository Famix Model Computation Metrics Values Understand 6
  • 7. Changes Computation Evolizer source code Version Control repository Connector Revisions Info & Subsequent Changes Computation files Fine-Grained Evolizer Change Source Code Distiller Changes (SCC) AST Comparison 7
  • 8. Why SCC? •  Filtering out useless changes due to modification of: •  licenses •  comments •  More precise measurement #Revision=1 #LineModified=1 #SCC=2 8
  • 9. C&K Correlation for Interfaces Project CB0 NOC RFC DIT LCOM WMC Hibernate3 0.535** 0.029 0.592** 0.058 0.103 0.657** Hibernate2 0.373** 0.065 0.325** -0.01 0.006 0.522** ecl.debug.core 0.484** 0.105 0.486** 0.232* 0.337 0.597** ecl.debug.ui 0.216* 0.033 0.152 0.324** 0.214* 0.131 ecl.jface 0.239* 0.012 0.174** 0.103 0.320** 0.137 ecl.jdt.debug 0.512** 0.256** 0.349** -0.049 0.238** 0.489** ecl.team.core 0.367* 0.102 0.497** 0.243 0.400 0.451** ecl.team.cvs.core 0.688** -0.013 0.738** 0.618** 0.610** 0.744** ecl.team.ui 0.301* -0.003 0.299* -0.103* 0.395** 0.299* update.core 0.499** -0.007 0.381** 0.146 0.482** 0.729** Median 0.428 0.031 0.365 0.124 0.328 0.505 *= significant at α=0.05 **= significant at α=0.01 9
  • 10. Weighted Methods per Class (WMC) •  ci cyclomatic complexity of the ith method •  n number of methods in a class Number of Methods 10
  • 11. Interface Segregation Principle   ISP   defined by Robert C. Martin   cope with fat interfaces   Fat interface   interfaces that serve different clients   each kind of client uses a different set of methods   the interface should be split in more interface, each one designed to serve a specific client 11
  • 12. Interface Segregation Principle (I) Different clients do not share any methods ClusterClients(i): counts the number of clients that do not share any method of the interface i 12
  • 13. Interface Usage Cohesion Different clients share a method 13
  • 14. Other metrics for interfaces… •  Number Of Methods (NOM) •  Number Of Arguments (NOA) •  Arguments Per Procedure (APP) •  Number of Clients (Cli) •  Number of Invocations (Inv) •  Number of Implementing Classes (Impl) 14
  • 15. Correlation for Interfaces Project Inv Cli NOM Clust IUC Hibernate3 0.544** 0.433** 0.657** 0.302** -0.601** Hibernate2 0.165 0.104 0.522** 0.016 -0.373** ecl.debug.core 0.317** 0.327** 0.597** 0.273** -0.682** ecl.debug.ui 0.497** 0.498** 0.131 0.418** -0.508** ecl.jface 0.205 0.099 0.137 0.106** -0.363** ecl.jdt.debug 0.495** 0.471 0.489** 0.474** -0.605** ecl.team.core 0.261 0.278 0.451** 0.328* -0.475** ecl.team.cvs.core 0.557** 0.608** 0.744** 0.369 -0.819** ecl.team.ui 0.290 0.270 0.299 0.056 -0.618** update.core 0.677** 0.656** 0.729** 0.606** -0.656** Median 0.317 0.327 0.505 0.328 -0.605 *= significant at α=0.05 **= significant at α=0.01 15
  • 16. Prediction Analysis •  Three Machine Learning Algorithms •  upport Vector Machine S •  aïve Bayes Network N •  eural Nets N •  Interfaces classification: •  Training using 10 fold cross-validation •  {CBO, RFC, LCOM, WMC} = CK •  {CBO, RFC, LCOM, WMC, IUC} = IUC 16
  • 17. Prediction – AUC values NBayes LibSVN NN Project CK IUC CK IUC CK IUC ecl.team.cvs.core 0.55 0.75 0.692 0.811 0.8 0.8 ecl.debug.core 0.75 0.79 0.806 0.828 0.85 0.875 ecl.debug.ui 0.66 0.72 0.71 0.742 0.748 0.766 Hibernate2 0.745 0.807 0.735 0.708 0.702 0.747 Hibernate3 0.835 0.862 0.64 0.856 0.874 0.843 ecl.jdt.debug 0.79 0.738 0.741 0.82 0.77 0.762 ecl.jface 0.639 0.734 0.607 0.778 0.553 0.542 ecl.team.core 0.708 0.792 0.617 0.608 0.725 0.85 ecl.team.ui 0.88 0.8 0.74 0.884 0.65 0.75 update.core 0.782 0.811 0.794 0.817 0.675 0.744 Median 0.747 0.791 0.722 0.814 0.736 0.764 17
  • 18. Results •  H1 ACCEPTED • IUC has a stronger correlation with #SCC of interfaces than the C&K metrics •  UIC shows the best correlation •  H2 PARTIALLY ACCEPTED • IUC can improve the performance of prediction models to classify Java interfaces into change- and not- change-prone •  Despite the improvements Wilcoxon test showed a significant difference only for the LibSVM 18
  • 19. Implications • Researchers •  taking in account the nature of the measured entities • Quality Engineers •  enlarge metrics suites • Developers and Architects •  Measure the ISP violation 19
  • 20. Future Work • Metrics measurement overtime • Further validation • Are the shared methods the problem? • Component Based System and Service Oriented System 20
  • 21. 21