SlideShare a Scribd company logo
On the Use of
     Relevance Feedback in
   IR-Based Concept Location


Gregory Gay*, Sonia Haiduc**, Andrian Marcus**, Tim
                     Menzies*

   * West Virginia University, Morgantown, WV, USA
      ** Wayne State University, Detroit, MI, USA
Software change
IR-based concept location
      Query




Ranked list of results   Source code
Challenge: the query




• Text in the query needs to match the text in the
  source code
• Difficult to formulate good queries
  - unfamiliar source code
  - unknown target
  -> hard to describe something that you do not know
Eclipse bug #13926

Bug description:
 JFace Text Editor Leaves a Black
 Rectangle on Content Assist text insertion.
 Inserting a selected completion proposal
 from the context information popup causes
 a black rectangle to appear on top of the
 display.
Queries
• Q1: jface text editor black rectangle insert text

• Q2: jface text editor rectangle insert context
  information

• Q3: jface text editor content assist

• Q4: jface insert selected completion proposal
  context information
Queries and results
• Q1: jface text editor black rectangle insert text
  – position of modified method: 7496
• Q2: jface text editor rectangle insert context
  information
  – position of modified method: 258
• Q3: jface text editor content assist
  – position of modified method: 119
• Q4: jface insert selected completion proposal
  context information
  – position of modified method: 723

  Whole change request: 54
IR CL in unfamiliar software

Developers:
• Rarely begin with a good query: hard to choose
  the right words
• Analyze very briefly list of results before
  reformulating query
• Even after reformulation, vague idea of what to
  look for -> queries not always better
• Can recognize whether the results retrieved are
  relevant or not to the problem
Questions

• Is there a way to make the query formulation
  easier on the developers?

• Is there a way to ensure that the subsequent
  queries lead us in the right direction?

• Can we do this by following the common
  practices of the developers?

• Can we improve IR-based CL using this
  approach?
Relevance feedback

•   Uses developer feedback about relevancy of
    returned results to automatically reformulate
    queries
•   Queries are reformulated by:
    –   Adding terms from relevant documents
    –   Removing terms from irrelevant documents
•   Iterative process
•   Common technique in text retrieval
•   Used also in SE
JFace Text Editor Leaves a Black Rectangle on Content Assist text
   insertion. Inserting a selected completion proposal from the
   context information popup causes a black rectangle to appear on
   top of the display.

1. createContextInfoPopup() in
   org.eclipse.jface.text.contentassist.ContextInformationPopup
2. configure() in
   org.eclipse.jdt.internal.debug.ui.JDIContentAssistPreference
3. showContextProposals() in
   org.eclipse.jface.text.contentassist.ContextInformationPopup


     + words in      documents          - words in      documents


                               New Query
IRRF tool
• IR Engine: Lucene
  – based on the Vector Space Model (VSM)
  – input: methods, query
  – output: a ranked list of methods ordered by their
    textual similarity to the query

• Relevance feedback: Rocchio algorithm
  – the classic algorithm for RF; used also in SE
  – models a way of incorporating relevance
    feedback information into the VSM
Evaluation

• Extracted bug descriptions and set of methods
  modified in the bug fixes from bug tracking
  systems
• Consider bug descriptions as initial queries for IR
• Measure #methods investigated until reaching a
  modified method before and after using RF
• Relevance feedback:
  – one developer provides feedback
  – feedback round ends after marking N methods as
    relevant or irrelevant (N = 1, 3 ,5)
Stop criteria

• Target method in top N results

• More than 50 methods analyzed

• Position of target methods in the ranked list
  of results increases for 2 consecutive rounds
  -> query moving away from wanted methods
Systems


 System     Vers.   LOC    Methods      Classes
 Eclipse     2.0 2,500,000 74,996        7,500
  jEdit     4.2      300,000   5,366     750
Adempiere   3.1.0    330,000   28,622   1,900
Results

 System     RF improves   RF does not
                 IR       improve IR
 Eclipse          6            1

  jEdit         3             3

Adempiere       4             1

   All          13            5
Results
• Eclipse:
 Report #    Baseline    IRRF N=1    IRRF N=3    IRRF N=5
  19686        428        453 (5r)   48 (16r)    46 m(9r)


• jEdit:
 Report #    Baseline    IRRF N=1    IRRF N=3    IRRF N=5
 1607211       354        103(5r)     36 (12r)    28 (6r)


• Adempiere:
 Report #    Baseline    IRRF N=1    IRRF N=3    IRRF N=5
 1628050        52         3 (3r)      5 (2r)      7 (2r)
Questions – revisited (1)
• Is there a way to make the query formulation
  easier on the developers?
  – automatic query formulation


• Is there a way to ensure that the subsequent
  queries lead us in the right direction?
  – add terms from relevant documents, remove terms
    from irrelevant documents
  – stop when we move away from the target (results
    worsen for 2 consecutive rounds)
Questions – revisited (2)
• Can we do this by following the common
  practices of the developers?
  – developers still analyze only a few results in
    the result list before reformulation


• Can we improve IR-based CL using
  relevance feedback?
  – in some cases yes
Current and future work
• Studies involving more systems and more
  developers

• Automatically calibrating the parameters
  for a specific system and a specific set of
  change requests

• Study the circumstances when RF does
  not improve IR

More Related Content

PPT
Decision Support Analyss for Software Effort Estimation by Analogy
PDF
A survey of fault prediction using machine learning algorithms
PPT
Software testing
PPTX
Software Testing Training : Tonex Training
PPTX
ICSME14 - On the Impact of Refactoring Operations on Code Quality Metrics
PPTX
Fundamentals of Software Engineering
PPTX
Introduction to formal methods
Decision Support Analyss for Software Effort Estimation by Analogy
A survey of fault prediction using machine learning algorithms
Software testing
Software Testing Training : Tonex Training
ICSME14 - On the Impact of Refactoring Operations on Code Quality Metrics
Fundamentals of Software Engineering
Introduction to formal methods

What's hot (19)

PPT
Lecture 1
DOCX
Softwaretestingstrategies
DOCX
SYNTAX Directed Translation Report || Compiler Construction
PDF
SOFTWARE TESTING: ISSUES AND CHALLENGES OF ARTIFICIAL INTELLIGENCE & MACHINE ...
PDF
Cyclomatic complexity
PPT
Generating test cases using UML Communication Diagram
PPS
Formal Methods
PDF
Complexity metrics and models
PPSX
DISE - Programming Concepts
PPTX
Fundamentals of Software Engineering
PPTX
Towards Privacy-Preserving Evaluation for Information Retrieval Models over I...
PPTX
Feature Selection Techniques for Software Fault Prediction (Summary)
PPT
Dynamic analysis in Software Testing
PPS
Mca se chapter_9_formal_methods
PPT
Promise 2011: "An Iterative Semi-supervised Approach to Software Fault Predic...
PPTX
Introduction To Algorithms
PPT
Software testing- an introduction
PPT
Testing ppt
PPT
Unit 5 testing -software quality assurance
Lecture 1
Softwaretestingstrategies
SYNTAX Directed Translation Report || Compiler Construction
SOFTWARE TESTING: ISSUES AND CHALLENGES OF ARTIFICIAL INTELLIGENCE & MACHINE ...
Cyclomatic complexity
Generating test cases using UML Communication Diagram
Formal Methods
Complexity metrics and models
DISE - Programming Concepts
Fundamentals of Software Engineering
Towards Privacy-Preserving Evaluation for Information Retrieval Models over I...
Feature Selection Techniques for Software Fault Prediction (Summary)
Dynamic analysis in Software Testing
Mca se chapter_9_formal_methods
Promise 2011: "An Iterative Semi-supervised Approach to Software Fault Predic...
Introduction To Algorithms
Software testing- an introduction
Testing ppt
Unit 5 testing -software quality assurance
Ad

Viewers also liked (7)

PPTX
Functions of information retrival system(1)
PDF
Information retrieval concept, practice and challenge
KEY
Information Retrieval Challenges
PPTX
Data Mining and Information Retrival
PPTX
Information retrival system and PageRank algorithm
PPTX
Information retrieval system!
PPTX
Information retrieval s
Functions of information retrival system(1)
Information retrieval concept, practice and challenge
Information Retrieval Challenges
Data Mining and Information Retrival
Information retrival system and PageRank algorithm
Information retrieval system!
Information retrieval s
Ad

Similar to Concept Location using Information Retrieval and Relevance Feedback (20)

PDF
Combining IR with Relevance Feedback for Concept Location
PPT
A data driven approach to query expansion in question answering
PPT
Modern information Retrieval-Relevance Feedback
PDF
Adaptive User Feedback for IR-based Traceability Recovery
PDF
Exploring session search
PPT
hjhjhjhkhjhkhkhhjhjhkjhjkhjIR-Lecture-6b.ppt
PPTX
Machine Learned Relevance at A Large Scale Search Engine
PPTX
Beyond document retrieval using semantic annotations
PPTX
An IDE-Based Context-Aware Meta Search Engine
PDF
Faceted Search and Solr
PPTX
Interactivity and feedback
PDF
Software Engineering Research: Leading a Double-Agent Life.
PDF
iFL: An Interactive Environment for Understanding Feature Implementations
PDF
Philosophy of IR Evaluation Ellen Voorhees
PDF
From Bugs to Decision Support - Selected Research Highlights
PDF
Lionel Briand ICSM 2011 Keynote
PPTX
Introduction to Information Retrieval
PPTX
Empirical Software Engineering for Software Environments - University of Cali...
PDF
Parameterizing and Assembling IR-based Solutions for SE Tasks using Genetic A...
PDF
Search & Recommendation: Birds of a Feather?
Combining IR with Relevance Feedback for Concept Location
A data driven approach to query expansion in question answering
Modern information Retrieval-Relevance Feedback
Adaptive User Feedback for IR-based Traceability Recovery
Exploring session search
hjhjhjhkhjhkhkhhjhjhkjhjkhjIR-Lecture-6b.ppt
Machine Learned Relevance at A Large Scale Search Engine
Beyond document retrieval using semantic annotations
An IDE-Based Context-Aware Meta Search Engine
Faceted Search and Solr
Interactivity and feedback
Software Engineering Research: Leading a Double-Agent Life.
iFL: An Interactive Environment for Understanding Feature Implementations
Philosophy of IR Evaluation Ellen Voorhees
From Bugs to Decision Support - Selected Research Highlights
Lionel Briand ICSM 2011 Keynote
Introduction to Information Retrieval
Empirical Software Engineering for Software Environments - University of Cali...
Parameterizing and Assembling IR-based Solutions for SE Tasks using Genetic A...
Search & Recommendation: Birds of a Feather?

Recently uploaded (20)

PPTX
GDM (1) (1).pptx small presentation for students
PDF
Supply Chain Operations Speaking Notes -ICLT Program
PDF
O5-L3 Freight Transport Ops (International) V1.pdf
PDF
STATICS OF THE RIGID BODIES Hibbelers.pdf
PPTX
Cell Types and Its function , kingdom of life
PPTX
Pharmacology of Heart Failure /Pharmacotherapy of CHF
PDF
01-Introduction-to-Information-Management.pdf
PDF
Module 4: Burden of Disease Tutorial Slides S2 2025
PDF
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
PDF
Anesthesia in Laparoscopic Surgery in India
PDF
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
PDF
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
PPTX
Institutional Correction lecture only . . .
PDF
FourierSeries-QuestionsWithAnswers(Part-A).pdf
PDF
Pre independence Education in Inndia.pdf
PPTX
Pharma ospi slides which help in ospi learning
PPTX
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
PDF
Insiders guide to clinical Medicine.pdf
PDF
RMMM.pdf make it easy to upload and study
PDF
O7-L3 Supply Chain Operations - ICLT Program
GDM (1) (1).pptx small presentation for students
Supply Chain Operations Speaking Notes -ICLT Program
O5-L3 Freight Transport Ops (International) V1.pdf
STATICS OF THE RIGID BODIES Hibbelers.pdf
Cell Types and Its function , kingdom of life
Pharmacology of Heart Failure /Pharmacotherapy of CHF
01-Introduction-to-Information-Management.pdf
Module 4: Burden of Disease Tutorial Slides S2 2025
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
Anesthesia in Laparoscopic Surgery in India
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
Institutional Correction lecture only . . .
FourierSeries-QuestionsWithAnswers(Part-A).pdf
Pre independence Education in Inndia.pdf
Pharma ospi slides which help in ospi learning
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
Insiders guide to clinical Medicine.pdf
RMMM.pdf make it easy to upload and study
O7-L3 Supply Chain Operations - ICLT Program

Concept Location using Information Retrieval and Relevance Feedback

  • 1. On the Use of Relevance Feedback in IR-Based Concept Location Gregory Gay*, Sonia Haiduc**, Andrian Marcus**, Tim Menzies* * West Virginia University, Morgantown, WV, USA ** Wayne State University, Detroit, MI, USA
  • 3. IR-based concept location Query Ranked list of results Source code
  • 4. Challenge: the query • Text in the query needs to match the text in the source code • Difficult to formulate good queries - unfamiliar source code - unknown target -> hard to describe something that you do not know
  • 5. Eclipse bug #13926 Bug description: JFace Text Editor Leaves a Black Rectangle on Content Assist text insertion. Inserting a selected completion proposal from the context information popup causes a black rectangle to appear on top of the display.
  • 6. Queries • Q1: jface text editor black rectangle insert text • Q2: jface text editor rectangle insert context information • Q3: jface text editor content assist • Q4: jface insert selected completion proposal context information
  • 7. Queries and results • Q1: jface text editor black rectangle insert text – position of modified method: 7496 • Q2: jface text editor rectangle insert context information – position of modified method: 258 • Q3: jface text editor content assist – position of modified method: 119 • Q4: jface insert selected completion proposal context information – position of modified method: 723 Whole change request: 54
  • 8. IR CL in unfamiliar software Developers: • Rarely begin with a good query: hard to choose the right words • Analyze very briefly list of results before reformulating query • Even after reformulation, vague idea of what to look for -> queries not always better • Can recognize whether the results retrieved are relevant or not to the problem
  • 9. Questions • Is there a way to make the query formulation easier on the developers? • Is there a way to ensure that the subsequent queries lead us in the right direction? • Can we do this by following the common practices of the developers? • Can we improve IR-based CL using this approach?
  • 10. Relevance feedback • Uses developer feedback about relevancy of returned results to automatically reformulate queries • Queries are reformulated by: – Adding terms from relevant documents – Removing terms from irrelevant documents • Iterative process • Common technique in text retrieval • Used also in SE
  • 11. JFace Text Editor Leaves a Black Rectangle on Content Assist text insertion. Inserting a selected completion proposal from the context information popup causes a black rectangle to appear on top of the display. 1. createContextInfoPopup() in org.eclipse.jface.text.contentassist.ContextInformationPopup 2. configure() in org.eclipse.jdt.internal.debug.ui.JDIContentAssistPreference 3. showContextProposals() in org.eclipse.jface.text.contentassist.ContextInformationPopup + words in documents - words in documents New Query
  • 12. IRRF tool • IR Engine: Lucene – based on the Vector Space Model (VSM) – input: methods, query – output: a ranked list of methods ordered by their textual similarity to the query • Relevance feedback: Rocchio algorithm – the classic algorithm for RF; used also in SE – models a way of incorporating relevance feedback information into the VSM
  • 13. Evaluation • Extracted bug descriptions and set of methods modified in the bug fixes from bug tracking systems • Consider bug descriptions as initial queries for IR • Measure #methods investigated until reaching a modified method before and after using RF • Relevance feedback: – one developer provides feedback – feedback round ends after marking N methods as relevant or irrelevant (N = 1, 3 ,5)
  • 14. Stop criteria • Target method in top N results • More than 50 methods analyzed • Position of target methods in the ranked list of results increases for 2 consecutive rounds -> query moving away from wanted methods
  • 15. Systems System Vers. LOC Methods Classes Eclipse 2.0 2,500,000 74,996 7,500 jEdit 4.2 300,000 5,366 750 Adempiere 3.1.0 330,000 28,622 1,900
  • 16. Results System RF improves RF does not IR improve IR Eclipse 6 1 jEdit 3 3 Adempiere 4 1 All 13 5
  • 17. Results • Eclipse: Report # Baseline IRRF N=1 IRRF N=3 IRRF N=5 19686 428 453 (5r) 48 (16r) 46 m(9r) • jEdit: Report # Baseline IRRF N=1 IRRF N=3 IRRF N=5 1607211 354 103(5r) 36 (12r) 28 (6r) • Adempiere: Report # Baseline IRRF N=1 IRRF N=3 IRRF N=5 1628050 52 3 (3r) 5 (2r) 7 (2r)
  • 18. Questions – revisited (1) • Is there a way to make the query formulation easier on the developers? – automatic query formulation • Is there a way to ensure that the subsequent queries lead us in the right direction? – add terms from relevant documents, remove terms from irrelevant documents – stop when we move away from the target (results worsen for 2 consecutive rounds)
  • 19. Questions – revisited (2) • Can we do this by following the common practices of the developers? – developers still analyze only a few results in the result list before reformulation • Can we improve IR-based CL using relevance feedback? – in some cases yes
  • 20. Current and future work • Studies involving more systems and more developers • Automatically calibrating the parameters for a specific system and a specific set of change requests • Study the circumstances when RF does not improve IR