SlideShare a Scribd company logo
Proceedings of the 2010 7th IEEE Working Conference on Mining Software Repositories, p.1-10




        Predicting the Severity of a Reported Bug




Ahmed Lamkanfi, Serge Demeyer | Emanuel Giger | Bart Goethals
Ansymo                       | s.e.a.l.      | ADReM
Proceedings of the 2010 7th IEEE Working Conference on Mining Software Repositories, p.1-10




        Predicting the Severity of a Reported Bug




Ahmed Lamkanfi, Serge Demeyer | Emanuel Giger | Bart Goethals
Ansymo                       | s.e.a.l.      | ADReM
MIning Software Repositories (MSR) 2010 presentation
MIning Software Repositories (MSR) 2010 presentation
Severity of a bug is important
✓ Critical factor in deciding how soon it needs to
   be fixed, i.e. when prioritizing bugs
Priority is business
Severity
           is techn
                      ical
✓ Severity varies:
  ➡ trivial, minor, normal major, critical and blocker
  ➡ clear guidelines exist to classify severity of bug
    reports
✓ Severity varies:
  ➡ trivial, minor, normal major, critical and blocker
  ➡ clear guidelines exist to classify severity of bug
    reports

✓ Both a short and longer description of the
  problem
✓ Severity varies:
  ➡ trivial, minor, normal major, critical and blocker
  ➡ clear guidelines exist to classify severity of bug
    reports

✓ Both a short and longer description of the
  problem
✓ Bugs are grouped according to products
  and components
 ➡ e.g.: plug-ins, bookmarks are components of
   product Firefox
Can we accurately predict the severity of a reported
     bug by analyzing its textual descriptions?
Can we accurately predict the severity of a reported
     bug by analyzing its textual descriptions?


             Also the following questions:
Can we accurately predict the severity of a reported
     bug by analyzing its textual descriptions?


             Also the following questions:

                 Potential indicators?
Can we accurately predict the severity of a reported
     bug by analyzing its textual descriptions?


             Also the following questions:

                 Potential indicators?
            Short versus long description?
Can we accurately predict the severity of a reported
     bug by analyzing its textual descriptions?


             Also the following questions:

                 Potential indicators?
            Short versus long description?
       Per component versus cross-component?
Approach
We use text mining to classify bug reports
•   Bayesian classifier: based on the probabilistic
    occurrence of words
•   training and evaluation period
•   in first instance, per component
We use text mining to classify bug reports
•   Bayesian classifier: based on the probabilistic
    occurrence of words
•   training and evaluation period
•   in first instance, per component
We use text mining to classify bug reports
•   Bayesian classifier: based on the probabilistic
    occurrence of words
•   training and evaluation period
•   in first instance, per component
We use text mining to classify bug reports
•    Bayesian classifier: based on the probabilistic
     occurrence of words
•    training and evaluation period
•    in first instance, per component




    Non-severe bugs                      Severe bugs
     (trivial, minor)               (major, critical, blocker)
We use text mining to classify bug reports
•    Bayesian classifier: based on the probabilistic
     occurrence of words
•    training and evaluation period
•    in first instance, per component




                         Undecided


    Non-severe bugs      Default          Severe bugs
     (trivial, minor)   (normal)     (major, critical, blocker)
Evaluation of the approach:
✓ precision and recall:




Cases drawn from the open-source community
 ✓ Mozilla, Eclipse and GNOME
Results
How does the basic approach perform?
➡ per component and using short description
How does the basic approach perform?
➡ per component and using short description


                         Non-severe                 Severe
 component           precision   recall   precision          recall
  Mozilla: Layout
                       0.701     0.785      0.752            0.653

Mozilla: Bookmarks
                       0.692     0.703      0.698            0.687

    Eclipse: UI
                       0.707     0.633      0.668            0.738

 Eclipse: JDT-UI
                       0.653     0.714      0.685            0.621

GNOME: Calendar
                       0.828     0.783      0.794            0.837

GNOME:Contacts
                       0.767     0.706      0.728            0.785
What keywords are good indicators of
            severity?
What keywords are good indicators of
                    severity?


 Component                   Non-severe                        Severe
                           inconsist, favicon, credit,   Fault, machin, reboot,
Mozilla Firefox- General      extra, consum, licens,        reinstal, lockup,
                            underlin, typo, inspector,     seemingli, perman,
                                     titlebar          instantli, segfault, compil


                             deprec, style, runnabl,       hang, freez, deadlock,
    Eclipse JDT UI             system, cce, tvt35,         thread, slow, anymor,
                           whitespac, node, put, param    memori, tick, jvm, adapt


                           mnemon, outbox, typo, pad,    deadlock, sigsegv, relat,
    GNOME Mailer              follow, titl, high,         caus, snapshot, segment,
                             acceler, decod, reflec      core, unexpectedli, build,
                                                                    loop
How does the approach perform when using
          the longer description?
How does the approach perform when using
                the longer description?

                         Non-severe                 Severe
 component           precision   recall   precision          recall
  Mozilla: Layout
                       0.583     0.961      0.890            0.314

Mozilla: Bookmarks
                       0.536     0.963      0.820            0.166

  Mozilla: Firefox
                       0.578     0.948      0.856            0.308
     general
    Eclipse: UI
                       0.548     0.976      0.892            0.197

 Eclipse: JDT-UI
                       0.547     0.973      0.881            0.195

 Eclipse: JDT-Text
                       0.570     0.988      0.955            0.257
How does the approach perform when using
                the longer description?

                         Non-severe                 Severe
 component           precision   recall   precision          recall
  Mozilla: Layout
                       0.583     0.961      0.890            0.314

Mozilla: Bookmarks
                       0.536     0.963      0.820            0.166

  Mozilla: Firefox
                       0.578     0.948      0.856            0.308
     general
    Eclipse: UI
                       0.548     0.976      0.892            0.197

 Eclipse: JDT-UI
                       0.547     0.973      0.881            0.195

Eclipse: JDT-Text
                       0.570     0.988      0.955            0.257
How does the approach perform when
combining bugs from different components?
How does the approach perform when
   combining bugs from different components?

                Non-severe                 Severe
component   precision   recall   precision          recall

  Mozilla
              0.704     0.750      0.733            0.685

  Eclipse
              0.693     0.553      0.628            0.755

  GNOME
              0.817     0.737      0.760            0.835
How does the approach perform when
   combining bugs from different components?

                   Non-severe                        Severe
component     precision       recall       precision          recall

   Mozilla
                 0.704         0.750         0.733            0.685

  Eclipse
                 0.693         0.553         0.628            0.755

  GNOME
                 0.817         0.737         0.760            0.835




Much larger training set necessary
  ✓± 2000 reports instead of ± 500 per severity!
Conclusions
✓ It is possible to predict the severity of a
  reported bug

✓ Short description better source for
  predictions

✓ Cross-component approach works, but
  requires more training samples

More Related Content

PDF
Security in windows azure
PDF
The Eclipse and Mozilla Defect Tracking Dataset: a Genuine Dataset for Mining...
PDF
Predicting Reassignments of Bug Reports — an Exploratory Investigation
PDF
Comparing Text Mining Algorithms for Predicting the Severity of a Reported Bug
KEY
Filtering Bug Reports for Fix-Time Analysis
PDF
Amazon EC2 in der Praxis
PDF
Running on Amazon EC2
PDF
MongoDB World 2018: Tutorial - MongoDB Meets Chaos Monkey
Security in windows azure
The Eclipse and Mozilla Defect Tracking Dataset: a Genuine Dataset for Mining...
Predicting Reassignments of Bug Reports — an Exploratory Investigation
Comparing Text Mining Algorithms for Predicting the Severity of a Reported Bug
Filtering Bug Reports for Fix-Time Analysis
Amazon EC2 in der Praxis
Running on Amazon EC2
MongoDB World 2018: Tutorial - MongoDB Meets Chaos Monkey

Similar to MIning Software Repositories (MSR) 2010 presentation (20)

PDF
Frontend automation and stability
PPTX
Sitecore on Azure
PDF
Dev and Ops Collaboration and Awareness at Etsy and Flickr
PDF
Rails in the Cloud - Experiences from running on EC2
PDF
Rails in the Cloud
PDF
Configuration management 101 - A tale of disaster recovery using CFEngine 3
PDF
Operational Visibiliy and Analytics - BU Seminar
PDF
Dev Environments: The Next Generation
PDF
So. many. vulnerabilities. Why are containers such a mess and what to do abou...
PDF
Returnil 2010
PDF
Javaland 2017: "You´ll do microservices now". Now what?
PDF
Structured Software Design
PPTX
TYPO3 CMS deployment with Jenkins CI
PDF
Project SpaceLock - Architecture & Design
ODP
Sai devops - the art of being specializing generalist
PDF
StHack 2014 - Jerome "@funoverip" Nokin Turning your managed av into my botnet
PDF
Lessons Learned from Migrating Legacy Enterprise Applications to Microservices
PDF
Configuration management: automating and rationalizing server setup with CFEn...
PDF
Configuration management: automating and rationalizing server setup with CFEn...
PDF
State of jQuery June 2013 - Portland
Frontend automation and stability
Sitecore on Azure
Dev and Ops Collaboration and Awareness at Etsy and Flickr
Rails in the Cloud - Experiences from running on EC2
Rails in the Cloud
Configuration management 101 - A tale of disaster recovery using CFEngine 3
Operational Visibiliy and Analytics - BU Seminar
Dev Environments: The Next Generation
So. many. vulnerabilities. Why are containers such a mess and what to do abou...
Returnil 2010
Javaland 2017: "You´ll do microservices now". Now what?
Structured Software Design
TYPO3 CMS deployment with Jenkins CI
Project SpaceLock - Architecture & Design
Sai devops - the art of being specializing generalist
StHack 2014 - Jerome "@funoverip" Nokin Turning your managed av into my botnet
Lessons Learned from Migrating Legacy Enterprise Applications to Microservices
Configuration management: automating and rationalizing server setup with CFEn...
Configuration management: automating and rationalizing server setup with CFEn...
State of jQuery June 2013 - Portland
Ad

Recently uploaded (20)

PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Getting Started with Data Integration: FME Form 101
PPTX
Programs and apps: productivity, graphics, security and other tools
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
A comparative analysis of optical character recognition models for extracting...
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Empathic Computing: Creating Shared Understanding
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PDF
Unlocking AI with Model Context Protocol (MCP)
PPTX
Spectroscopy.pptx food analysis technology
PDF
Machine learning based COVID-19 study performance prediction
PPTX
A Presentation on Artificial Intelligence
PDF
Approach and Philosophy of On baking technology
PPTX
1. Introduction to Computer Programming.pptx
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
Mobile App Security Testing_ A Comprehensive Guide.pdf
Getting Started with Data Integration: FME Form 101
Programs and apps: productivity, graphics, security and other tools
20250228 LYD VKU AI Blended-Learning.pptx
A comparative analysis of optical character recognition models for extracting...
Advanced methodologies resolving dimensionality complications for autism neur...
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Empathic Computing: Creating Shared Understanding
gpt5_lecture_notes_comprehensive_20250812015547.pdf
Unlocking AI with Model Context Protocol (MCP)
Spectroscopy.pptx food analysis technology
Machine learning based COVID-19 study performance prediction
A Presentation on Artificial Intelligence
Approach and Philosophy of On baking technology
1. Introduction to Computer Programming.pptx
Reach Out and Touch Someone: Haptics and Empathic Computing
Digital-Transformation-Roadmap-for-Companies.pptx
Ad

MIning Software Repositories (MSR) 2010 presentation

  • 1. Proceedings of the 2010 7th IEEE Working Conference on Mining Software Repositories, p.1-10 Predicting the Severity of a Reported Bug Ahmed Lamkanfi, Serge Demeyer | Emanuel Giger | Bart Goethals Ansymo | s.e.a.l. | ADReM
  • 2. Proceedings of the 2010 7th IEEE Working Conference on Mining Software Repositories, p.1-10 Predicting the Severity of a Reported Bug Ahmed Lamkanfi, Serge Demeyer | Emanuel Giger | Bart Goethals Ansymo | s.e.a.l. | ADReM
  • 5. Severity of a bug is important ✓ Critical factor in deciding how soon it needs to be fixed, i.e. when prioritizing bugs
  • 7. Severity is techn ical
  • 8. ✓ Severity varies: ➡ trivial, minor, normal major, critical and blocker ➡ clear guidelines exist to classify severity of bug reports
  • 9. ✓ Severity varies: ➡ trivial, minor, normal major, critical and blocker ➡ clear guidelines exist to classify severity of bug reports ✓ Both a short and longer description of the problem
  • 10. ✓ Severity varies: ➡ trivial, minor, normal major, critical and blocker ➡ clear guidelines exist to classify severity of bug reports ✓ Both a short and longer description of the problem ✓ Bugs are grouped according to products and components ➡ e.g.: plug-ins, bookmarks are components of product Firefox
  • 11. Can we accurately predict the severity of a reported bug by analyzing its textual descriptions?
  • 12. Can we accurately predict the severity of a reported bug by analyzing its textual descriptions? Also the following questions:
  • 13. Can we accurately predict the severity of a reported bug by analyzing its textual descriptions? Also the following questions: Potential indicators?
  • 14. Can we accurately predict the severity of a reported bug by analyzing its textual descriptions? Also the following questions: Potential indicators? Short versus long description?
  • 15. Can we accurately predict the severity of a reported bug by analyzing its textual descriptions? Also the following questions: Potential indicators? Short versus long description? Per component versus cross-component?
  • 17. We use text mining to classify bug reports • Bayesian classifier: based on the probabilistic occurrence of words • training and evaluation period • in first instance, per component
  • 18. We use text mining to classify bug reports • Bayesian classifier: based on the probabilistic occurrence of words • training and evaluation period • in first instance, per component
  • 19. We use text mining to classify bug reports • Bayesian classifier: based on the probabilistic occurrence of words • training and evaluation period • in first instance, per component
  • 20. We use text mining to classify bug reports • Bayesian classifier: based on the probabilistic occurrence of words • training and evaluation period • in first instance, per component Non-severe bugs Severe bugs (trivial, minor) (major, critical, blocker)
  • 21. We use text mining to classify bug reports • Bayesian classifier: based on the probabilistic occurrence of words • training and evaluation period • in first instance, per component Undecided Non-severe bugs Default Severe bugs (trivial, minor) (normal) (major, critical, blocker)
  • 22. Evaluation of the approach: ✓ precision and recall: Cases drawn from the open-source community ✓ Mozilla, Eclipse and GNOME
  • 24. How does the basic approach perform? ➡ per component and using short description
  • 25. How does the basic approach perform? ➡ per component and using short description Non-severe Severe component precision recall precision recall Mozilla: Layout 0.701 0.785 0.752 0.653 Mozilla: Bookmarks 0.692 0.703 0.698 0.687 Eclipse: UI 0.707 0.633 0.668 0.738 Eclipse: JDT-UI 0.653 0.714 0.685 0.621 GNOME: Calendar 0.828 0.783 0.794 0.837 GNOME:Contacts 0.767 0.706 0.728 0.785
  • 26. What keywords are good indicators of severity?
  • 27. What keywords are good indicators of severity? Component Non-severe Severe inconsist, favicon, credit, Fault, machin, reboot, Mozilla Firefox- General extra, consum, licens, reinstal, lockup, underlin, typo, inspector, seemingli, perman, titlebar instantli, segfault, compil deprec, style, runnabl, hang, freez, deadlock, Eclipse JDT UI system, cce, tvt35, thread, slow, anymor, whitespac, node, put, param memori, tick, jvm, adapt mnemon, outbox, typo, pad, deadlock, sigsegv, relat, GNOME Mailer follow, titl, high, caus, snapshot, segment, acceler, decod, reflec core, unexpectedli, build, loop
  • 28. How does the approach perform when using the longer description?
  • 29. How does the approach perform when using the longer description? Non-severe Severe component precision recall precision recall Mozilla: Layout 0.583 0.961 0.890 0.314 Mozilla: Bookmarks 0.536 0.963 0.820 0.166 Mozilla: Firefox 0.578 0.948 0.856 0.308 general Eclipse: UI 0.548 0.976 0.892 0.197 Eclipse: JDT-UI 0.547 0.973 0.881 0.195 Eclipse: JDT-Text 0.570 0.988 0.955 0.257
  • 30. How does the approach perform when using the longer description? Non-severe Severe component precision recall precision recall Mozilla: Layout 0.583 0.961 0.890 0.314 Mozilla: Bookmarks 0.536 0.963 0.820 0.166 Mozilla: Firefox 0.578 0.948 0.856 0.308 general Eclipse: UI 0.548 0.976 0.892 0.197 Eclipse: JDT-UI 0.547 0.973 0.881 0.195 Eclipse: JDT-Text 0.570 0.988 0.955 0.257
  • 31. How does the approach perform when combining bugs from different components?
  • 32. How does the approach perform when combining bugs from different components? Non-severe Severe component precision recall precision recall Mozilla 0.704 0.750 0.733 0.685 Eclipse 0.693 0.553 0.628 0.755 GNOME 0.817 0.737 0.760 0.835
  • 33. How does the approach perform when combining bugs from different components? Non-severe Severe component precision recall precision recall Mozilla 0.704 0.750 0.733 0.685 Eclipse 0.693 0.553 0.628 0.755 GNOME 0.817 0.737 0.760 0.835 Much larger training set necessary ✓± 2000 reports instead of ± 500 per severity!
  • 34. Conclusions ✓ It is possible to predict the severity of a reported bug ✓ Short description better source for predictions ✓ Cross-component approach works, but requires more training samples