SlideShare a Scribd company logo
Metrics and Problem Detection



Tudor Gîrba
www.tudorgirba.com
Software is complex.



   29% Succeeded

      18% Failed



   53% Challenged



 The Standish Group, 2004
How large is your project?
How large is your project?


    1’000’000 lines of code
How large is your project?


    1’000’000 lines of code
    * 2 = 2’000’000 seconds
How large is your project?


    1’000’000 lines of code
    * 2 = 2’000’000 seconds
      / 3600 = 560 hours
How large is your project?


    1’000’000 lines of code
    * 2 = 2’000’000 seconds
      / 3600 = 560 hours
         / 8 = 70 days
How large is your project?


    1’000’000 lines of code
    * 2 = 2’000’000 seconds
      / 3600 = 560 hours
         / 8 = 70 days
       / 20 = 3 months
}
                                                 }
                                             {
                                             {



                                                 }
                                                 }
                                             {
                                             {




                                         g
                                   rin
                              ee
                        gin
                   en
               d
          ar
     rw
fo
fo
                                             rw
                                              ar
                                                d
                                                  en
                                                     gin
                                                      ee
                                                       rin
                                                           g
{               {
    {                   {                                      {               {
                            }                                      {               {
        }
        }                       actual development                     }               }
            }       {       }                                              }               }
What is the current state?




                                           fo
                                             rw
                                              ar
What should we do?




                                                d
                                                  en
Where to start?




                                                     gin
                                                      ee
How to proceed?




                                                       rin
                                                           g
{               {
    {                   {                                      {               {
                            }                                      {               {
        }
        }                       actual development                     }               }
            }       {       }                                              }               }
fo
                                                               rw
                                              g
                                            rin




                                                                ar
                                         ee




                                                                  d
                                        gin




                                                                    en
                                                                       gin
                                       en




                                                                        ee
                                   se




                                                                         rin
                                 erv




                                                                             g
                                re




{               {
    {                   {                                                        {               {
                            }                                                        {               {
        }
        }                                         actual development                     }               }
            }       {       }                                                                }               }
Reverse engineering is analyzing a subject system to:
 identify components and their relationships, and
 create more abstract representations.




                                 Chikofky & Cross, 90
{               {
                  {                   {
                                          }
                      }
                      }
                          }       {       }




A large system contains lots of details.
ity?
                  its qual
         ju dge
How to                 {               {
                           {                   {
                                                   }
                               }
                               }
                                   }       {       }




    A large system contains lots of details.
http://moose.unibe.ch




http://loose.upt.ro/incode
1
Metrics
                             2Design
                             Problems




              3
          Code Duplication
Metrics




          1
Youcannot control
what you cannot measure.




                           Tom de Marco
Metrics are functions that assign numbers to
products, processes and resources.
Software metrics are measurements which
relate to software systems, processes or
related documents.
Metrics compress system traits into numbers.
Let’s see some examples...
Examples of size metrics


NOM - number of methods
NOA - number of attributes
LOC - number of lines of code
NOS - number of statements
NOC - number of children



                                      Lorenz, Kidd, 1994
                                Chidamber, Kemerer, 1994
McCabe cyclomatic complexity (CYCLO) counts
the number of independent paths through the code of a
function.

                                                         McCabe, 1977




  it reveals the minimum number of tests to write

  interpretation can’t directly lead to improvement action
Weighted Method Count (WMC) sums up the
complexity of class’ methods (measured by the metric
of your choice; usually CYCLO).

                                             Chidamber, Kemerer, 1994




  it is configurable, thus adaptable to our precise needs

  interpretation can’t directly lead to improvement action
Depth of Inheritance Tree (DIT) is the (maximum)
depth level of a class in a class hierarchy.


                                             Chidamber, Kemerer, 1994




  inheritance is measured

  only the potential and not the real impact is quantified
Coupling between objects (CBO) shows the number
of classes from which methods or attributes are used.


                                              Chidamber, Kemerer, 1994




  it takes into account real dependencies not just declared ones

  no differentiation of types and/or intensity of coupling
Tight Class Cohesion (TCC) counts the relative
number of method-pairs that access attributes of the
class in common.

                                                   Bieman, Kang, 1995


         TCC = 2 / 10 = 0.2




  interpretation can lead to improvement action

  ratio values allow comparison between systems
Access To Foreign Data (ATFD) counts how many
attributes from other classes are accessed directly from
a measured class.

                                           Marinescu 2006
...
Design Problems




                  2
McCall, 1977
Metrics Assess and Improve Quality!
Metrics Assess and Improve Quality!




                     a lly ?
              Re
McCall, 1977
Problem 1: metrics granularity




                                           ?
capture symptoms, not causes of problems

in isolation,
they don’t lead to improvement solutions
Problem 1: metrics granularity




                                           ?
capture symptoms, not causes of problems

in isolation,
they don’t lead to improvement solutions




Problem 2: implicit mapping
we don’t reason in terms of metrics,
but in terms of design principles
2   big obstacles in using metrics:


     Thresholds make metrics hard to interpret

     Granularity make metrics hard to use in isolation
Can metrics help me
             in what I really care for? :)
fo
                                             rw
                                              ar
                                                d
                                                  en
                                                     gin
                                                      ee
                                                       rin
                                                           g
{               {
    {                   {                                      {               {
                            }                                      {               {
        }
        }                       actual development                     }               }
            }       {       }                                              }               }
fo
                                             rw
                                              ar
How do I understand code?




                                                d
                                                  en
                                                     gin
                                                      ee
                                                       rin
                                                           g
{               {
    {                   {                                      {               {
                            }                                      {               {
        }
        }                       actual development                     }               }
            }       {       }                                              }               }
fo
                                             rw
                                              ar
How do I understand code?




                                                d
                                                  en
How do I improve code?




                                                     gin
                                                      ee
                                                       rin
                                                           g
{               {
    {                   {                                      {               {
                            }                                      {               {
        }
        }                       actual development                     }               }
            }       {       }                                              }               }
etr ics!
                                                 ith m
                                          de al w
                                an t to
  on ot w
Id




                                                                 fo
                                                                 rw
                                                                  ar
How do I understand code?




                                                                      d
                                                                      en
How do I improve code?




                                                                          gin
                                                                           ee
                                                                            rin
                                                                                g
{               {
    {                   {                                                           {               {
                            }                                                           {               {
        }
        }                                           actual development                      }               }
            }       {       }                                                                   }               }
How to get an initial   understanding of a system?
Metric   Value
LOC      35175
NOM       3618
NOC        384
CYCLO     5579
NOP         19
CALLS    15128
FANOUT    8590
AHH        0.12
ANDC       0.31
Metric   Value
LOC      35175
NOM       3618
NOC        384
CYCLO     5579
NOP         19
CALLS    15128
FANOUT    8590
AHH        0.12
ANDC       0.31
Metric               Value
LOC                  35175
NOM                   3618
NOC                    384
CYCLO                 5579
NOP                     19
CALLS                15128
FANOUT                8590
                  ha t?
                ww
AHH
         An d no       0.12
ANDC                   0.31
We need means to compare.
hierarchies?

               coupling?
The Overview Pyramid provides a metrics
  overview.                         Lanza, Marinescu 2006


                                  Inheritance
                                 ANDC      0.31
                                 AHH       0.12
                         20.21   NOP         19
               9.42      NOC                384
        9.72   NOM                         3618    NOM        418
0.15    LOC                               35175 15128       CALLS      0.56
CYCLO                                      5579 8590                FANOUT

                      Size                              Communication
The Overview Pyramid provides a metrics
  overview.                         Lanza, Marinescu 2006




                                 ANDC    0.31
                                 AHH     0.12
                         20.21   NOP       19
               9.42      NOC              384
        9.72   NOM                       3618    NOM     418
0.15    LOC                             35175 15128    CALLS      0.56
CYCLO                                    5579 8590             FANOUT

                      Size
The Overview Pyramid provides a metrics
  overview.                         Lanza, Marinescu 2006




                              ANDC    0.31
                              AHH     0.12
                      20.21   NOP       19
               9.42   NOC              384
        9.72   NOM                    3618    NOM        418
0.15    LOC                          35175 15128       CALLS      0.56
CYCLO                                 5579 8590                FANOUT

                                                   Communication
The Overview Pyramid provides a metrics
  overview.                         Lanza, Marinescu 2006


                               Inheritance
                              ANDC      0.31
                              AHH       0.12
                      20.21   NOP         19
               9.42   NOC                384
        9.72   NOM                      3618    NOM     418
0.15    LOC                            35175 15128    CALLS      0.56
CYCLO                                   5579 8590             FANOUT
The Overview Pyramid provides a metrics
  overview.                         Lanza, Marinescu 2006




                              ANDC    0.31
                              AHH     0.12
                      20.21   NOP       19
               9.42   NOC              384
        9.72   NOM                    3618    NOM     418
0.15    LOC                          35175 15128    CALLS      0.56
CYCLO                                 5579 8590             FANOUT
Java                 C++
            LOW    AVG    HIGH   LOW    AVG    HIGH

CYCLO/LOC   0.16   0.20   0.24   0.20   0.25   0.30

LOC/NOM      7      10     13     5     10      16

NOM/NOC      4      7      10     4      9      15

   ...
The Overview Pyramid provides a metrics
  overview.                         Lanza, Marinescu 2006




                                 ANDC           0.31
                                 AHH            0.12
                         20.21   NOP              19
                  9.42   NOC                     384
          9.72    NOM                           3618    NOM       418
0.15      LOC                               35175 15128         CALLS      0.56
CYCLO                                           5579 8590               FANOUT




       close to high         close to average               close to low
The Overview Pyramid provides a metrics
overview.                         Lanza, Marinescu 2006




 close to high     close to average     close to low
etr ics!
                                                 ith m
                                          de al w
                                an t to
  on ot w
Id




                                                                 fo
                                                                 rw
                                                                  ar
How do I understand code?




                                                                      d
                                                                      en
How do I improve code?




                                                                          gin
                                                                           ee
                                                                            rin
                                                                                g
{               {
    {                   {                                                           {               {
                            }                                                           {               {
        }
        }                                           actual development                      }               }
            }       {       }                                                                   }               }
How do I improve code?
Quality is more than 0 bugs.



Breaking design principles, rules and best practices
          deteriorates the code;
          it leads to design problems.
Imagine changing just a small design fragment
Imagine changing just a small design fragment
Imagine changing just a small design fragment




and33%
of all classes
would require changes
expensive
Design problems
are frequent
                    unavoidable
expensive
Design problems
are frequent
                    unavoidable


                                                           th em?
                                                     ate
                                               limin
                                           nd e
                                ete ct a
                         to d
                     How
God Classes tend to centralize the intelligence of the
system, to do everything and to use data from small
data-classes.
                                                Riel, 1996
God Classes tend
    to centralize the intelligence of the system,
    to do everything and
    to use data from small data-classes.
God Classes
    centralize the intelligence of the system,
    do everything and
    use data from small data-classes.
God Classes
    are complex,
    are not cohesive,
    access external data.
God Classes
    are complex,
     
    WMC is high
    are not cohesive,

    TCC is low
    access external data.
 ATFD more than few
God Classes
    are complex,
     
    WMC is high
    are not cohesive,

    TCC is low
    access external data.
 ATFD more than few

                                                  sing
                                        uer ies u
                                    to q s
                                s in ator
                           etric per
                       se m ical o
                 Co mpo log
Detection Strategies are metric-based queries to
detect design flaws.                  Lanza, Marinescu 2006




                  Rule 1


          METRIC 1 > Threshold 1

                                   AND   Quality problem

                  Rule 2


          METRIC 2 < Threshold 2
A God Class centralizes too much intelligence in
the system.                         Lanza, Marinescu 2006


       Class uses directly more than a
       few attributes of other classes

               ATFD > FEW




        Functional complexity of the
             class is very high
                                         AND   GodClass
            WMC ! VERY HIGH




           Class cohesion is low


            TCC < ONE THIRD
An Envious Method is more interested in data
from a handful of classes.         Lanza, Marinescu 2006


      Method uses directly more than
      a few attributes of other classes

               ATFD > FEW




      Method uses far more attributes
       of other classes than its own
                                          AND   Feature Envy
            LAA < ONE THIRD




        The used "foreign" attributes
      belong to very few other classes

                FDP ! FEW
Data Classes are dumb data holders.
                                                  Lanza, Marinescu 2006




        Interface of class reveals data
         rather than offering services

            WOC < ONE THIRD
                                            AND    Data Class


     Class reveals many attributes and is
                not complex
Data Classes are dumb data holders.
                                             Lanza, Marinescu 2006

     More than a few public
             data

     NOAP + NOAM > FEW
                                 AND
    Complexity of class is not
              high

          WMC < HIGH
                                            Class reveals many
                                       OR   attributes and is not
     Class has many public                         complex
             data

    NOAP + NOAM > MANY
                                 AND
    Complexity of class is not
          very high

      WMC < VERY HIGH
Shotgun Surgery depicts that a change in an
operation triggers many (small) in a lot of different
operation and classes.
                                      Lanza, Marinescu 2006
Code Duplication




                   3
What is Code Duplication?
What is Code Duplication?

                                     obl em?
                                 a pr
                      hy is it
                A nd w
Code Duplication Detection




     Lexical Equivalence

     Syntactical Equivalence

     Semantic Equivalence
Visualization of Copied Code Sequences

                File A      File B




      File A




      File B
Transformation                   Comparison




           Source Code            Transformed Code              Duplication Data




Author            Level         Transformed Code              Comparison Technique

Johnson 94        Lexical       Substrings                    String-Matching

Ducasse 99        Lexical       Normalized Strings            String-Matching

Baker 95          Syntactical   Parameterized Strings         String-Matching

Mayrand 96        Syntactical   Metrics Tuples                Discrete comparison

Kontogiannis 97   Syntactical   Metrics Tuples                Euclidean distance

Baxter 98         Syntactical   AST                           Tree-Matching
Noise Elimination




…
//assign same fastid as container    fastid=NULL;
fastid = NULL;                       constchar*fidptr=get_fastid();
const char* fidptr = get_fastid();   if(fidptr!=NULL)
if(fidptr != NULL) {                 intl=strlen(fidptr)
  int l = strlen(fidptr);            fastid = newchar[l+]
  fastid = newchar[ l + 1 ];
Enhanced Simple Detection Approach
lines from source
         a   b   c   d    a    b     c   d




 lines
 from
source
lines from source
         a   b   c   d    a    x     y   d




 lines
 from
source
lines from source
         a   b   c   a    b    x     y   c




 lines
 from
source
lines from source
         a   x   b    x   c    x     d   x




 lines
 from
source
lines from source 2




  lines
  from
source 1
lines from source 2




  lines
  from
source 1
lines from source 2




  lines
  from
source 1




           exact
           chunk
lines from source 2




  lines
  from
source 1




           exact      line
           chunk      bias
lines from source 2




  lines
  from
source 1




           exact      line       exact
           chunk      bias       chunk
Significant Duplication is large and should be
looked at                             Lanza, Marinescu 2006
1
Metrics
                             2Design
                             Problems




              3
          Code Duplication
Shotgun
                                Surgery                        has

     uses                  is

                           has (partial) Feature                     Data
                                          Envy         uses          Class

                                             is partially
                      God        has
Intensive             Class
Coupling                                   Brain        has
               has
                                          Method
Extensive             Brain         has                       Significant
Coupling              Class                                   Duplication
               has
                           is
                 is
                                                 has
     Refused
                     is   Tradition
      Parent
                          Breaker
     Bequest
                                has (subclass)


                            Futile
                          Hierarchy                                  Lanza, Marinescu 2006

   Identity               Collaboration            Classification
Disharmonies              Disharmonies             Disharmonies
Follow a clear and repeatable process
Follow a clear and repeatable process
Follow a clear and repeatable process
Follow a clear and repeatable process



                                                     mb ers!
                                              so f nu
                                       in term
                            qu ality
                   ab out
         re ason
D on’t
QA is part of the the Development Process




         http://loose.upt.ro/incode
Tudor Gîrba
       www.tudorgirba.com




creativecommons.org/licenses/by/3.0/

More Related Content

PDF
05 Problem Detection
PDF
Pragmatic Design Quality Assessment - (Tutorial at ICSE 2008)
PDF
Assessment Through Exploration
PDF
Humane assessment at ICSM 2010
PDF
History Analysis (EVO 2008)
PDF
Holistic software assessment at the University of Zurich
PDF
What history can tell us
PDF
Software understanding in the large (EVO 2008)
05 Problem Detection
Pragmatic Design Quality Assessment - (Tutorial at ICSE 2008)
Assessment Through Exploration
Humane assessment at ICSM 2010
History Analysis (EVO 2008)
Holistic software assessment at the University of Zurich
What history can tell us
Software understanding in the large (EVO 2008)

Viewers also liked (6)

PDF
Dynamic Analysis (EVO 2008)
PPTX
Open Data: Barriers, Risks, and Opportunities
PDF
Software Visualization (EVO 2008)
PDF
Migration and Testing (EVO 2008)
PDF
Restructuring (EVO 2008)
PDF
Beyond software evolution: Software environmentalism
Dynamic Analysis (EVO 2008)
Open Data: Barriers, Risks, and Opportunities
Software Visualization (EVO 2008)
Migration and Testing (EVO 2008)
Restructuring (EVO 2008)
Beyond software evolution: Software environmentalism
Ad

Similar to Problem Detection (EVO 2008) (20)

PDF
Moose Overview
PDF
Helping you reengineering your legacy
PDF
Reverse Engineering (EVO 2008)
PDF
Modeling History to Understand Software Evolution with Hismo 2008-03-12
PDF
Modeling History to Understand Software Evolution With Hismo 2008-02-25
PDF
Assessing software systems
PDF
Software in Pictures 2008-03-12
PDF
Reverse Engineering 2007-11-27
PDF
Reverse Engineering Techniques 2007-11-29
PDF
Humane assessment with Moose at Benevol 2010
PDF
Enhancing agile development through software assessment
PDF
A Moose Slideshow
PDF
A Curious Course on Coroutines and Concurrency
PDF
Moose Tutorial at WCRE 2008
PDF
The humane software assessment (Choose Forum 2009)
PDF
Enhancing benefits from aquatic ecosystems: Nakambe sub-basin Case study
PDF
Software Evolution
PDF
Présentation du projet Moose
PPT
6.09 Develop A Plan And Execute
PDF
South Lincoln County, Workshop Presentation (Feb 29, 2012)
Moose Overview
Helping you reengineering your legacy
Reverse Engineering (EVO 2008)
Modeling History to Understand Software Evolution with Hismo 2008-03-12
Modeling History to Understand Software Evolution With Hismo 2008-02-25
Assessing software systems
Software in Pictures 2008-03-12
Reverse Engineering 2007-11-27
Reverse Engineering Techniques 2007-11-29
Humane assessment with Moose at Benevol 2010
Enhancing agile development through software assessment
A Moose Slideshow
A Curious Course on Coroutines and Concurrency
Moose Tutorial at WCRE 2008
The humane software assessment (Choose Forum 2009)
Enhancing benefits from aquatic ecosystems: Nakambe sub-basin Case study
Software Evolution
Présentation du projet Moose
6.09 Develop A Plan And Execute
South Lincoln County, Workshop Presentation (Feb 29, 2012)
Ad

More from Tudor Girba (20)

PDF
Software craftsmanship meetup (Zurich 2015) on solving real problems without ...
PDF
GT Spotter
PDF
Don't demo facts. Demo stories! (handouts)
PDF
Don't demo facts. Demo stories!
PDF
Humane assessment on cards
PDF
Underneath Scrum: Reflective Thinking
PDF
1800+ TED talks later
PDF
Software assessment by example (lecture at the University of Bern)
PDF
Humane assessment: Taming the elephant from the development room
PDF
Moose: how to solve real problems without reading code
PDF
Software Environmentalism (ECOOP 2014 Keynote)
PPTX
The emergent nature of software systems
PDF
Presenting is storytelling at Uni Zurich - slides (2014-03-05)
PDF
Presenting is storytelling at Uni Zurich - handouts (2014-03-05)
PDF
Underneath Scrum: Reflective Thinking (talk at Scrum Breakfast Bern, 2013)
PDF
Demo-driven innovation teaser
PDF
Software assessment essentials (lecture at the University of Bern 2013)
PDF
Demo-driven innovation (University of Zurich, June 2013)
PDF
Humane assessment with Moose at GOTO Aarhus 2011
PDF
Flexible analysis with Moose at Jazoon 2011
Software craftsmanship meetup (Zurich 2015) on solving real problems without ...
GT Spotter
Don't demo facts. Demo stories! (handouts)
Don't demo facts. Demo stories!
Humane assessment on cards
Underneath Scrum: Reflective Thinking
1800+ TED talks later
Software assessment by example (lecture at the University of Bern)
Humane assessment: Taming the elephant from the development room
Moose: how to solve real problems without reading code
Software Environmentalism (ECOOP 2014 Keynote)
The emergent nature of software systems
Presenting is storytelling at Uni Zurich - slides (2014-03-05)
Presenting is storytelling at Uni Zurich - handouts (2014-03-05)
Underneath Scrum: Reflective Thinking (talk at Scrum Breakfast Bern, 2013)
Demo-driven innovation teaser
Software assessment essentials (lecture at the University of Bern 2013)
Demo-driven innovation (University of Zurich, June 2013)
Humane assessment with Moose at GOTO Aarhus 2011
Flexible analysis with Moose at Jazoon 2011

Recently uploaded (20)

PPTX
Tartificialntelligence_presentation.pptx
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Zenith AI: Advanced Artificial Intelligence
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
Approach and Philosophy of On baking technology
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Unlocking AI with Model Context Protocol (MCP)
PPTX
OMC Textile Division Presentation 2021.pptx
PDF
Hybrid model detection and classification of lung cancer
PDF
Getting Started with Data Integration: FME Form 101
PPTX
SOPHOS-XG Firewall Administrator PPT.pptx
PDF
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf
PPTX
A Presentation on Touch Screen Technology
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PPTX
Group 1 Presentation -Planning and Decision Making .pptx
PDF
A novel scalable deep ensemble learning framework for big data classification...
PDF
Mushroom cultivation and it's methods.pdf
PPTX
TLE Review Electricity (Electricity).pptx
PPTX
cloud_computing_Infrastucture_as_cloud_p
PDF
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
Tartificialntelligence_presentation.pptx
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Zenith AI: Advanced Artificial Intelligence
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Approach and Philosophy of On baking technology
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Unlocking AI with Model Context Protocol (MCP)
OMC Textile Division Presentation 2021.pptx
Hybrid model detection and classification of lung cancer
Getting Started with Data Integration: FME Form 101
SOPHOS-XG Firewall Administrator PPT.pptx
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf
A Presentation on Touch Screen Technology
Building Integrated photovoltaic BIPV_UPV.pdf
Group 1 Presentation -Planning and Decision Making .pptx
A novel scalable deep ensemble learning framework for big data classification...
Mushroom cultivation and it's methods.pdf
TLE Review Electricity (Electricity).pptx
cloud_computing_Infrastucture_as_cloud_p
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...

Problem Detection (EVO 2008)

  • 1. Metrics and Problem Detection Tudor Gîrba www.tudorgirba.com
  • 2. Software is complex. 29% Succeeded 18% Failed 53% Challenged The Standish Group, 2004
  • 3. How large is your project?
  • 4. How large is your project? 1’000’000 lines of code
  • 5. How large is your project? 1’000’000 lines of code * 2 = 2’000’000 seconds
  • 6. How large is your project? 1’000’000 lines of code * 2 = 2’000’000 seconds / 3600 = 560 hours
  • 7. How large is your project? 1’000’000 lines of code * 2 = 2’000’000 seconds / 3600 = 560 hours / 8 = 70 days
  • 8. How large is your project? 1’000’000 lines of code * 2 = 2’000’000 seconds / 3600 = 560 hours / 8 = 70 days / 20 = 3 months
  • 9. } } { { } } { { g rin ee gin en d ar rw fo
  • 10. fo rw ar d en gin ee rin g { { { { { { } { { } } actual development } } } { } } }
  • 11. What is the current state? fo rw ar What should we do? d en Where to start? gin ee How to proceed? rin g { { { { { { } { { } } actual development } } } { } } }
  • 12. fo rw g rin ar ee d gin en gin en ee se rin erv g re { { { { { { } { { } } actual development } } } { } } }
  • 13. Reverse engineering is analyzing a subject system to: identify components and their relationships, and create more abstract representations. Chikofky & Cross, 90
  • 14. { { { { } } } } { } A large system contains lots of details.
  • 15. ity? its qual ju dge How to { { { { } } } } { } A large system contains lots of details.
  • 17. 1 Metrics 2Design Problems 3 Code Duplication
  • 18. Metrics 1
  • 19. Youcannot control what you cannot measure. Tom de Marco
  • 20. Metrics are functions that assign numbers to products, processes and resources.
  • 21. Software metrics are measurements which relate to software systems, processes or related documents.
  • 22. Metrics compress system traits into numbers.
  • 23. Let’s see some examples...
  • 24. Examples of size metrics NOM - number of methods NOA - number of attributes LOC - number of lines of code NOS - number of statements NOC - number of children Lorenz, Kidd, 1994 Chidamber, Kemerer, 1994
  • 25. McCabe cyclomatic complexity (CYCLO) counts the number of independent paths through the code of a function. McCabe, 1977  it reveals the minimum number of tests to write  interpretation can’t directly lead to improvement action
  • 26. Weighted Method Count (WMC) sums up the complexity of class’ methods (measured by the metric of your choice; usually CYCLO). Chidamber, Kemerer, 1994  it is configurable, thus adaptable to our precise needs  interpretation can’t directly lead to improvement action
  • 27. Depth of Inheritance Tree (DIT) is the (maximum) depth level of a class in a class hierarchy. Chidamber, Kemerer, 1994  inheritance is measured  only the potential and not the real impact is quantified
  • 28. Coupling between objects (CBO) shows the number of classes from which methods or attributes are used. Chidamber, Kemerer, 1994  it takes into account real dependencies not just declared ones  no differentiation of types and/or intensity of coupling
  • 29. Tight Class Cohesion (TCC) counts the relative number of method-pairs that access attributes of the class in common. Bieman, Kang, 1995 TCC = 2 / 10 = 0.2  interpretation can lead to improvement action  ratio values allow comparison between systems
  • 30. Access To Foreign Data (ATFD) counts how many attributes from other classes are accessed directly from a measured class. Marinescu 2006
  • 31. ...
  • 34. Metrics Assess and Improve Quality!
  • 35. Metrics Assess and Improve Quality! a lly ? Re
  • 37. Problem 1: metrics granularity ? capture symptoms, not causes of problems in isolation, they don’t lead to improvement solutions
  • 38. Problem 1: metrics granularity ? capture symptoms, not causes of problems in isolation, they don’t lead to improvement solutions Problem 2: implicit mapping we don’t reason in terms of metrics, but in terms of design principles
  • 39. 2 big obstacles in using metrics: Thresholds make metrics hard to interpret Granularity make metrics hard to use in isolation
  • 40. Can metrics help me in what I really care for? :)
  • 41. fo rw ar d en gin ee rin g { { { { { { } { { } } actual development } } } { } } }
  • 42. fo rw ar How do I understand code? d en gin ee rin g { { { { { { } { { } } actual development } } } { } } }
  • 43. fo rw ar How do I understand code? d en How do I improve code? gin ee rin g { { { { { { } { { } } actual development } } } { } } }
  • 44. etr ics! ith m de al w an t to on ot w Id fo rw ar How do I understand code? d en How do I improve code? gin ee rin g { { { { { { } { { } } actual development } } } { } } }
  • 45. How to get an initial understanding of a system?
  • 46. Metric Value LOC 35175 NOM 3618 NOC 384 CYCLO 5579 NOP 19 CALLS 15128 FANOUT 8590 AHH 0.12 ANDC 0.31
  • 47. Metric Value LOC 35175 NOM 3618 NOC 384 CYCLO 5579 NOP 19 CALLS 15128 FANOUT 8590 AHH 0.12 ANDC 0.31
  • 48. Metric Value LOC 35175 NOM 3618 NOC 384 CYCLO 5579 NOP 19 CALLS 15128 FANOUT 8590 ha t? ww AHH An d no 0.12 ANDC 0.31
  • 49. We need means to compare.
  • 50. hierarchies? coupling?
  • 51. The Overview Pyramid provides a metrics overview. Lanza, Marinescu 2006 Inheritance ANDC 0.31 AHH 0.12 20.21 NOP 19 9.42 NOC 384 9.72 NOM 3618 NOM 418 0.15 LOC 35175 15128 CALLS 0.56 CYCLO 5579 8590 FANOUT Size Communication
  • 52. The Overview Pyramid provides a metrics overview. Lanza, Marinescu 2006 ANDC 0.31 AHH 0.12 20.21 NOP 19 9.42 NOC 384 9.72 NOM 3618 NOM 418 0.15 LOC 35175 15128 CALLS 0.56 CYCLO 5579 8590 FANOUT Size
  • 53. The Overview Pyramid provides a metrics overview. Lanza, Marinescu 2006 ANDC 0.31 AHH 0.12 20.21 NOP 19 9.42 NOC 384 9.72 NOM 3618 NOM 418 0.15 LOC 35175 15128 CALLS 0.56 CYCLO 5579 8590 FANOUT Communication
  • 54. The Overview Pyramid provides a metrics overview. Lanza, Marinescu 2006 Inheritance ANDC 0.31 AHH 0.12 20.21 NOP 19 9.42 NOC 384 9.72 NOM 3618 NOM 418 0.15 LOC 35175 15128 CALLS 0.56 CYCLO 5579 8590 FANOUT
  • 55. The Overview Pyramid provides a metrics overview. Lanza, Marinescu 2006 ANDC 0.31 AHH 0.12 20.21 NOP 19 9.42 NOC 384 9.72 NOM 3618 NOM 418 0.15 LOC 35175 15128 CALLS 0.56 CYCLO 5579 8590 FANOUT
  • 56. Java C++ LOW AVG HIGH LOW AVG HIGH CYCLO/LOC 0.16 0.20 0.24 0.20 0.25 0.30 LOC/NOM 7 10 13 5 10 16 NOM/NOC 4 7 10 4 9 15 ...
  • 57. The Overview Pyramid provides a metrics overview. Lanza, Marinescu 2006 ANDC 0.31 AHH 0.12 20.21 NOP 19 9.42 NOC 384 9.72 NOM 3618 NOM 418 0.15 LOC 35175 15128 CALLS 0.56 CYCLO 5579 8590 FANOUT close to high close to average close to low
  • 58. The Overview Pyramid provides a metrics overview. Lanza, Marinescu 2006 close to high close to average close to low
  • 59. etr ics! ith m de al w an t to on ot w Id fo rw ar How do I understand code? d en How do I improve code? gin ee rin g { { { { { { } { { } } actual development } } } { } } }
  • 60. How do I improve code?
  • 61. Quality is more than 0 bugs. Breaking design principles, rules and best practices deteriorates the code; it leads to design problems.
  • 62. Imagine changing just a small design fragment
  • 63. Imagine changing just a small design fragment
  • 64. Imagine changing just a small design fragment and33% of all classes would require changes
  • 66. expensive Design problems are frequent unavoidable th em? ate limin nd e ete ct a to d How
  • 67. God Classes tend to centralize the intelligence of the system, to do everything and to use data from small data-classes. Riel, 1996
  • 68. God Classes tend to centralize the intelligence of the system, to do everything and to use data from small data-classes.
  • 69. God Classes centralize the intelligence of the system, do everything and use data from small data-classes.
  • 70. God Classes are complex, are not cohesive, access external data.
  • 71. God Classes are complex, WMC is high are not cohesive, TCC is low access external data. ATFD more than few
  • 72. God Classes are complex, WMC is high are not cohesive, TCC is low access external data. ATFD more than few sing uer ies u to q s s in ator etric per se m ical o Co mpo log
  • 73. Detection Strategies are metric-based queries to detect design flaws. Lanza, Marinescu 2006 Rule 1 METRIC 1 > Threshold 1 AND Quality problem Rule 2 METRIC 2 < Threshold 2
  • 74. A God Class centralizes too much intelligence in the system. Lanza, Marinescu 2006 Class uses directly more than a few attributes of other classes ATFD > FEW Functional complexity of the class is very high AND GodClass WMC ! VERY HIGH Class cohesion is low TCC < ONE THIRD
  • 75. An Envious Method is more interested in data from a handful of classes. Lanza, Marinescu 2006 Method uses directly more than a few attributes of other classes ATFD > FEW Method uses far more attributes of other classes than its own AND Feature Envy LAA < ONE THIRD The used "foreign" attributes belong to very few other classes FDP ! FEW
  • 76. Data Classes are dumb data holders. Lanza, Marinescu 2006 Interface of class reveals data rather than offering services WOC < ONE THIRD AND Data Class Class reveals many attributes and is not complex
  • 77. Data Classes are dumb data holders. Lanza, Marinescu 2006 More than a few public data NOAP + NOAM > FEW AND Complexity of class is not high WMC < HIGH Class reveals many OR attributes and is not Class has many public complex data NOAP + NOAM > MANY AND Complexity of class is not very high WMC < VERY HIGH
  • 78. Shotgun Surgery depicts that a change in an operation triggers many (small) in a lot of different operation and classes. Lanza, Marinescu 2006
  • 80. What is Code Duplication?
  • 81. What is Code Duplication? obl em? a pr hy is it A nd w
  • 82. Code Duplication Detection Lexical Equivalence Syntactical Equivalence Semantic Equivalence
  • 83. Visualization of Copied Code Sequences File A File B File A File B
  • 84. Transformation Comparison Source Code Transformed Code Duplication Data Author Level Transformed Code Comparison Technique Johnson 94 Lexical Substrings String-Matching Ducasse 99 Lexical Normalized Strings String-Matching Baker 95 Syntactical Parameterized Strings String-Matching Mayrand 96 Syntactical Metrics Tuples Discrete comparison Kontogiannis 97 Syntactical Metrics Tuples Euclidean distance Baxter 98 Syntactical AST Tree-Matching
  • 85. Noise Elimination … //assign same fastid as container fastid=NULL; fastid = NULL; constchar*fidptr=get_fastid(); const char* fidptr = get_fastid(); if(fidptr!=NULL) if(fidptr != NULL) { intl=strlen(fidptr) int l = strlen(fidptr); fastid = newchar[l+] fastid = newchar[ l + 1 ];
  • 87. lines from source a b c d a b c d lines from source
  • 88. lines from source a b c d a x y d lines from source
  • 89. lines from source a b c a b x y c lines from source
  • 90. lines from source a x b x c x d x lines from source
  • 91. lines from source 2 lines from source 1
  • 92. lines from source 2 lines from source 1
  • 93. lines from source 2 lines from source 1 exact chunk
  • 94. lines from source 2 lines from source 1 exact line chunk bias
  • 95. lines from source 2 lines from source 1 exact line exact chunk bias chunk
  • 96. Significant Duplication is large and should be looked at Lanza, Marinescu 2006
  • 97. 1 Metrics 2Design Problems 3 Code Duplication
  • 98. Shotgun Surgery has uses is has (partial) Feature Data Envy uses Class is partially God has Intensive Class Coupling Brain has has Method Extensive Brain has Significant Coupling Class Duplication has is is has Refused is Tradition Parent Breaker Bequest has (subclass) Futile Hierarchy Lanza, Marinescu 2006 Identity Collaboration Classification Disharmonies Disharmonies Disharmonies
  • 99. Follow a clear and repeatable process
  • 100. Follow a clear and repeatable process
  • 101. Follow a clear and repeatable process
  • 102. Follow a clear and repeatable process mb ers! so f nu in term qu ality ab out re ason D on’t
  • 103. QA is part of the the Development Process http://loose.upt.ro/incode
  • 104. Tudor Gîrba www.tudorgirba.com creativecommons.org/licenses/by/3.0/