SlideShare a Scribd company logo
Supporting Program
Comprehension with Source
   Code Summarization
     Sonia Haiduc*, Jairo Aponte**, Andrian Marcus*

                    ICSE NIER 2010



 *                                          **
Developers read source code

• Before performing maintenance on a
  system, developers need to understand
  its source code

• During comprehension, programmers
  search and browse the code
Skimming vs. reading code
• Skimming (Starke’09): quickly reading the names of
  software artifacts
  + Fast
  – Insufficient information
  – Shallow understanding

• Reading in depth
   – Slow
   – Too much information
   + Deeper understanding
Code summaries

• Automatically generated, short, yet accurate
  descriptions of source code entities

• They give more information than just the
  header or the name of an artifact

• Significantly shorter and faster to read than
  the source code they summarize
What should we summarize?
• Code
   –   Packages
   –   Classes
   –   Methods
   –   Method sequences
   –   Etc.

• Other artifacts
   – Bug reports (ICSE 2010 - S. Rastakar, G. Murphy, G. Murray)
   – E-mails
   – Etc.
What should we include
         in code summaries?

• Semantic information
  – What does the source code do?
  – Identifiers and comments that capture the main concepts


• Structural information
  – How does the code work?
  – Class relationships, callers and callees, members of a
    class, etc.
Description: VFS virtual file system read write
              mkdir directory path save      +
Internal classes: DirectoryEntry             +
Methods: listDirectory, mkdir, constructPath +
Fields: WRITE_CAP, READ_CAP, lock            +
Sub-classes: FileVFS, FavoritesVFS           +
Other: ...
How should we generate
        code summaries?

• Semantic information: automatic text
  summarization
  – Machine Learning
  – Discourse-based approaches
  – Term-based Text Retrieval techniques


• Structural information: static analysis
How can we evaluate code
          summaries?

• How good are the automatic summaries
  when compared to manual ones?

• How useful are the automatic code
  summaries for SE tasks?
Preliminary evaluation

• Compared automatic code summaries
  with developer code summaries

• 6 developers, 12 methods in ATunes

• Used only lexical information – 5 most
  relevant terms
Results
• Automatic source code summaries good in
  reflecting developers’ summaries

• Text Retrieval techniques work as well on
  source code as on natural language in reflecting
  human summaries

• Developers make use of structural information in
  their code summaries:
  – Method name terms
  – Class name terms
  – Formal parameter types terms
What are we doing now?

• What type and how much structural
  information should be included in code
  summaries?
• How do developers generate summaries?
• Are different summaries needed for
  different tasks?
• How useful are the code summaries for
  SE tasks?, etc.
In summary…
• Automatic code summaries:
  –   Short yet accurate descriptions of source code
  –   Can reduce the effort of program comprehension
  –   Embed both semantic and structural information
  –   Can be generated for a variety of software entities

• Visit my poster
  (HINT: look for the huge and colorful one)
• www.cs.wayne.edu/~severe and
  www.cs.wayne.edu/~shaiduc
• sonja@wayne.edu

More Related Content

PPT
Supporting program comprehension with source code summarization
PDF
Summarization Techniques for Code, Changes, and Testing
DOCX
Mit4021–%20 c# and .net
PDF
EE5440 – Computer Architecture Course Outline
DOC
Sudeep-Resume
PPTX
Coding standards
DOCX
Resume upto august 2016
PDF
GENERIC CODE CLONING METHOD FOR DETECTION OF CLONE CODE IN SOFTWARE DEVELOPMENT
Supporting program comprehension with source code summarization
Summarization Techniques for Code, Changes, and Testing
Mit4021–%20 c# and .net
EE5440 – Computer Architecture Course Outline
Sudeep-Resume
Coding standards
Resume upto august 2016
GENERIC CODE CLONING METHOD FOR DETECTION OF CLONE CODE IN SOFTWARE DEVELOPMENT

What's hot (15)

PDF
Algorithms and Application Programming
PDF
Euro python 2015 writing quality code
DOC
Mca 108
PPT
Chap 1-dhamdhere system programming
PDF
Intelligent Hiring with Resume Parser and Ranking using Natural Language Proc...
PDF
IRJET- Querying Database using Natural Language Interface
PPT
Topic modeling
PPTX
Resume parser
DOC
Mca 204
DOCX
Ramakeerthi_1+yr_resume
PDF
Performance Evaluation List
PPTX
Intro lecture infs429
PDF
Python - code quality and production monitoring
PDF
Project report
Algorithms and Application Programming
Euro python 2015 writing quality code
Mca 108
Chap 1-dhamdhere system programming
Intelligent Hiring with Resume Parser and Ranking using Natural Language Proc...
IRJET- Querying Database using Natural Language Interface
Topic modeling
Resume parser
Mca 204
Ramakeerthi_1+yr_resume
Performance Evaluation List
Intro lecture infs429
Python - code quality and production monitoring
Project report
Ad

Similar to Supporting program comprehension with source code summarization icse nier 2010 (20)

PPTX
Research software identification - Catherine Jones
PPTX
Tips to kick-start your Software Engineering Career - Ferdous Mahmud Shaon
PDF
Tips to Kick-start your Software Engineering Career
PDF
Code Inspection
PPTX
Towards Reusable Research Software
PPT
compiler construvtion aaaaaaaaaaaaaaaaaads
DOCX
Page 18Goal Implement a complete search engine. Milestones.docx
PDF
The Final Frontier
PPTX
Dice.com Bay Area Search - Beyond Learning to Rank Talk
PPTX
"Hands Off! Best Practices for Code Hand Offs"
PPTX
Automatic and rapid generation of massive knowledge repositories from data
PPTX
Introducing Systems Analysis Design Development
PDF
Software citation
PPTX
Introducing systems analysis, design & development Concepts
PDF
Autopsy 3.0 - Open Source Digital Forensics Conference
PDF
Hire a Machine to Code - Michael Arthur Bucko & Aurélien Nicolas
PPTX
Implementation of an Artificial Intelligence Powered Code Editor
PDF
CS6007 information retrieval - 5 units notes
PPT
Information Architecture Explained
PDF
Object Pascal Clean Code Guidelines Proposal (at EKON 22)
Research software identification - Catherine Jones
Tips to kick-start your Software Engineering Career - Ferdous Mahmud Shaon
Tips to Kick-start your Software Engineering Career
Code Inspection
Towards Reusable Research Software
compiler construvtion aaaaaaaaaaaaaaaaaads
Page 18Goal Implement a complete search engine. Milestones.docx
The Final Frontier
Dice.com Bay Area Search - Beyond Learning to Rank Talk
"Hands Off! Best Practices for Code Hand Offs"
Automatic and rapid generation of massive knowledge repositories from data
Introducing Systems Analysis Design Development
Software citation
Introducing systems analysis, design & development Concepts
Autopsy 3.0 - Open Source Digital Forensics Conference
Hire a Machine to Code - Michael Arthur Bucko & Aurélien Nicolas
Implementation of an Artificial Intelligence Powered Code Editor
CS6007 information retrieval - 5 units notes
Information Architecture Explained
Object Pascal Clean Code Guidelines Proposal (at EKON 22)
Ad

Recently uploaded (20)

PPTX
human mycosis Human fungal infections are called human mycosis..pptx
PPTX
Week 4 Term 3 Study Techniques revisited.pptx
PDF
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
PDF
Classroom Observation Tools for Teachers
PDF
Abdominal Access Techniques with Prof. Dr. R K Mishra
PPTX
PPH.pptx obstetrics and gynecology in nursing
PDF
Microbial disease of the cardiovascular and lymphatic systems
PDF
TR - Agricultural Crops Production NC III.pdf
PDF
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
PDF
Supply Chain Operations Speaking Notes -ICLT Program
PPTX
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
PDF
Complications of Minimal Access Surgery at WLH
PPTX
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
PDF
O7-L3 Supply Chain Operations - ICLT Program
PDF
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
PDF
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
PDF
102 student loan defaulters named and shamed – Is someone you know on the list?
PDF
Pre independence Education in Inndia.pdf
PDF
01-Introduction-to-Information-Management.pdf
PDF
VCE English Exam - Section C Student Revision Booklet
human mycosis Human fungal infections are called human mycosis..pptx
Week 4 Term 3 Study Techniques revisited.pptx
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
Classroom Observation Tools for Teachers
Abdominal Access Techniques with Prof. Dr. R K Mishra
PPH.pptx obstetrics and gynecology in nursing
Microbial disease of the cardiovascular and lymphatic systems
TR - Agricultural Crops Production NC III.pdf
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
Supply Chain Operations Speaking Notes -ICLT Program
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
Complications of Minimal Access Surgery at WLH
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
O7-L3 Supply Chain Operations - ICLT Program
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
102 student loan defaulters named and shamed – Is someone you know on the list?
Pre independence Education in Inndia.pdf
01-Introduction-to-Information-Management.pdf
VCE English Exam - Section C Student Revision Booklet

Supporting program comprehension with source code summarization icse nier 2010

  • 1. Supporting Program Comprehension with Source Code Summarization Sonia Haiduc*, Jairo Aponte**, Andrian Marcus* ICSE NIER 2010 * **
  • 2. Developers read source code • Before performing maintenance on a system, developers need to understand its source code • During comprehension, programmers search and browse the code
  • 3. Skimming vs. reading code • Skimming (Starke’09): quickly reading the names of software artifacts + Fast – Insufficient information – Shallow understanding • Reading in depth – Slow – Too much information + Deeper understanding
  • 4. Code summaries • Automatically generated, short, yet accurate descriptions of source code entities • They give more information than just the header or the name of an artifact • Significantly shorter and faster to read than the source code they summarize
  • 5. What should we summarize? • Code – Packages – Classes – Methods – Method sequences – Etc. • Other artifacts – Bug reports (ICSE 2010 - S. Rastakar, G. Murphy, G. Murray) – E-mails – Etc.
  • 6. What should we include in code summaries? • Semantic information – What does the source code do? – Identifiers and comments that capture the main concepts • Structural information – How does the code work? – Class relationships, callers and callees, members of a class, etc.
  • 7. Description: VFS virtual file system read write mkdir directory path save + Internal classes: DirectoryEntry + Methods: listDirectory, mkdir, constructPath + Fields: WRITE_CAP, READ_CAP, lock + Sub-classes: FileVFS, FavoritesVFS + Other: ...
  • 8. How should we generate code summaries? • Semantic information: automatic text summarization – Machine Learning – Discourse-based approaches – Term-based Text Retrieval techniques • Structural information: static analysis
  • 9. How can we evaluate code summaries? • How good are the automatic summaries when compared to manual ones? • How useful are the automatic code summaries for SE tasks?
  • 10. Preliminary evaluation • Compared automatic code summaries with developer code summaries • 6 developers, 12 methods in ATunes • Used only lexical information – 5 most relevant terms
  • 11. Results • Automatic source code summaries good in reflecting developers’ summaries • Text Retrieval techniques work as well on source code as on natural language in reflecting human summaries • Developers make use of structural information in their code summaries: – Method name terms – Class name terms – Formal parameter types terms
  • 12. What are we doing now? • What type and how much structural information should be included in code summaries? • How do developers generate summaries? • Are different summaries needed for different tasks? • How useful are the code summaries for SE tasks?, etc.
  • 13. In summary… • Automatic code summaries: – Short yet accurate descriptions of source code – Can reduce the effort of program comprehension – Embed both semantic and structural information – Can be generated for a variety of software entities • Visit my poster (HINT: look for the huge and colorful one) • www.cs.wayne.edu/~severe and www.cs.wayne.edu/~shaiduc • sonja@wayne.edu