SlideShare a Scribd company logo
Graph-Based Pattern-Oriented, ContextSensitive Source Code Completion
Nguyen, T.T. ; Nguyen,
H.A. ; Tamrawi, A. ;
Nguyen, H.V. ; Al-Kofahi,
J. ; Nguyen, T.N.
Presented By: Mohammad Masudur Rahman
Contents
 Code

Completion
 Thesis Statement
 Motivating Example
 Terminologies
 Methodology
 Empirical Evaluation & Results
 My Observation & Future Thoughts
2
Code Completion








3

Built-in feature of modern all IDEs
Speed up development
Longer Identifier names for program comprehension
Less overhead for developers
Mostly single variable, method supports- API
packages
Template based support – control structure, event
handling and others
Thesis Statement
 Novel

approach with graph-based code
completion
 Graph based feature extracting, searching,
ranking of API usage pattern, matching with
editing context of current code.
 Empirical evaluation shows correctness and
usefulness- 95% precision, 92% recall, 93%
f-score over 24 real world systems
4
Motivating Example (Single-line)

5

Fig 1: Current State of Code Completion (Eclipse 3.6)
Motivating Example (Multi-line)

6

Fig 2: SWT Usage Example
Motivating Example (Query)

Fig 3: SWT Query Example
7
Terminologies
 GRAPACC
 API

Usage Pattern
 Groum Based Model
 Context-sensitive Weight

8
GRAPACC
 Graph-Based
 Pattern-Oriented
 Context-Sensitive
 Code

9

Completion
API Usage Pattern

10

Fig 4: SWT API Usage
Groum Based Model

11

Fig 5: Groum Conversion
Context-Sensitive Weight
1
w f (q) =
(d + 1)
Wf (q)=Context-sensitive weight of feature q
q= feature of Query, Q
d=distance to the closest token in Groum Model

12
Methodology
 Query

Processing and Feature Extraction
 Pattern Managing, Searching and Ranking
 Pattern Oriented Code Completion

13
Query Processing and Feature
Extraction
 Tokenizing
 Partial

Parsing
 Groum Building
 Feature Extracting and Weighting

14
Tokenizing, Partial Parsing
 Lexical

analysis
 Preserves keywords related to control
structure, rest are removed elsewhere but
saved
 Eclipse java parser
 PPA tool returns AST (Abstract Syntax
Tree)
 Unresolved nodes assigned ‘Unknown Type’
15
Groum Building
 Groum

from AST
 Unresolved nodes are
discarded but considered
as tokens
 Query converted to the
following Groum

16

Fig 6: Groum of Query
Feature Extraction & Weighting
 Groum

nodes mapped to tokens in
tokenization step
 Feature extracted from Groum for path, L<=3
 3 factors contribute to feature weight
 Structured based factor (size)
 Structured based factor (centrality)
 User based factor
17
Feature Extraction & Weighting
 ws(q)=

size based weight for feature, q of
Query, Q (w(q)=1+size(q); 1<= size(q)<=3)
 wc(q)= Centrality based weight for feature, q
of Query, Q (wc(q)=n / s, n=no of
neighbors, s=size)
 (wf(q)=1/(d+1)), distance between focus
node and the closest token in feature path
Groum Model
18
Feature Extraction & Weighting







19

w(q)= total weight for feature, q of Query, Q
ws(q)= size based weight for feature, q of Query, Q
wc(q)= Centrality based weight for feature, q of
Query, Q
wf(q)= used based weight for feature, q of Query, Q
Pattern Managing, Searching and
Ranking
 Pr(P)

is popularity of pattern P = frequency of
Pattern P
 Weight of feature p in Pattern P using inverse
indexing
 Np,P=occurrence

20

of feature p in P, NP=total no of

features in P
 Np=No of patterns containing p, N=total no of pattern
in database
Pattern Managing, Searching and
Ranking








21

For each feature p, L(p), a list of patterns from which
p can be extracted
p for pattern feature, q for query feature
Now sim(p,q)>∂,then p is added to F, set of mapped
features for q
For each pєF, top n ranked patterns from L(p) is
added to C, candidate patterns for relevance
computation
Now for each P in C, compute fit(P,Q)
Feature Similarity

is a name-based similarity between two
features given that feature is a collection of
labels and has the form
Of X.Y.Z where
X=package name
Y=class name
Z=method name
22
Name-based Similarity (nsim)

=





23

wsim(X, X’) is word-based similarity
X, X’ are broken down and two sequence of words
L(x) and L(y)
Similarity computed as Lo/Lm
Lo is length of LCS, Lm is average length of two
sequences
Pattern Matching (Relevance)

24
Pattern Matching

SM(P,Q)=total weight of Matched feature pair
Fit (P, Q)=Relevance degree between P and Q
25

Pr(P)=Popularity of Pattern P
Pattern Oriented Code Completion

 Matched

pattern is selected and
corresponding node in Groum is matched
 The missing nodes are fulfilled with code
26
Empirical Evaluation
 Precision
 Recall
 F-score
 java.io,

java.util :API used as library
 28 real world open-source systems
 4 for training, 24 for testing

27
Empirical Evaluation

28
My Observation
 Planning

to use semantic web technology
 Data and control dependency relationship
can be improved using semantic relationship
like conceptual similarity
 Matching of pattern is complex and errorprone, semantic score can be beneficial

29
Thanks
 Questions??

30

More Related Content

PDF
TINET_FRnOG_2008_public
PPTX
(Icca 2014) shortest path analysis in social graphs
PPT
` Traffic Classification based on Machine Learning
PPT
Profiling Java Programs for Parallelism
PDF
Algorithem complexity in data sructure
PPT
Complexity of Algorithm
PPT
Basic terminologies & asymptotic notations
PPTX
Algorithm and complexity
TINET_FRnOG_2008_public
(Icca 2014) shortest path analysis in social graphs
` Traffic Classification based on Machine Learning
Profiling Java Programs for Parallelism
Algorithem complexity in data sructure
Complexity of Algorithm
Basic terminologies & asymptotic notations
Algorithm and complexity

What's hot (20)

PPTX
Getting the Most out of Transition-based Dependency Parsing
PPTX
141205 graphulo ingraphblas
PPTX
141222 graphulo ingraphblas
 
PDF
Data Structure: Algorithm and analysis
PDF
Paper Study - Demand-Driven Computation of Interprocedural Data Flow
PPTX
Design and Analysis of Algorithms
PPTX
Using Triple Pattern Fragments To Enable Streaming of Top-k Shortest Paths vi...
PPT
Algorithmic Notations
PDF
Algorithm Analyzing
DOC
Algorithms Question bank
PPT
DESIGN AND ANALYSIS OF ALGORITHMS
PPT
Analysis Of Algorithms I
PDF
PPTX
asymptotic analysis and insertion sort analysis
PPTX
40+ examples of user defined methods in java with explanation
PPT
how to calclute time complexity of algortihm
PDF
PDF
Computer Science Engineering : Data structure & algorithm, THE GATE ACADEMY
PPTX
Introduction to Algorithms and Asymptotic Notation
PPTX
Introduction to datastructure and algorithm
Getting the Most out of Transition-based Dependency Parsing
141205 graphulo ingraphblas
141222 graphulo ingraphblas
 
Data Structure: Algorithm and analysis
Paper Study - Demand-Driven Computation of Interprocedural Data Flow
Design and Analysis of Algorithms
Using Triple Pattern Fragments To Enable Streaming of Top-k Shortest Paths vi...
Algorithmic Notations
Algorithm Analyzing
Algorithms Question bank
DESIGN AND ANALYSIS OF ALGORITHMS
Analysis Of Algorithms I
asymptotic analysis and insertion sort analysis
40+ examples of user defined methods in java with explanation
how to calclute time complexity of algortihm
Computer Science Engineering : Data structure & algorithm, THE GATE ACADEMY
Introduction to Algorithms and Asymptotic Notation
Introduction to datastructure and algorithm
Ad

Viewers also liked (7)

PPT
Learn, Use, Share3
PDF
Michael Karasick - Web 2.0 Innovation
PPT
Learn, Use, Share3
PPT
Learn, Use, Share1
PDF
Instalación y configuración de Postfix en Ubuntu 15.04
PPTX
Nitin Paranjape - Toastmasters Reverberations 2010
PDF
Study: The Future of VR, AR and Self-Driving Cars
Learn, Use, Share3
Michael Karasick - Web 2.0 Innovation
Learn, Use, Share3
Learn, Use, Share1
Instalación y configuración de Postfix en Ubuntu 15.04
Nitin Paranjape - Toastmasters Reverberations 2010
Study: The Future of VR, AR and Self-Driving Cars
Ad

Similar to Graph-Based Code Completion (20)

PDF
TMPA-2015: Implementing the MetaVCG Approach in the C-light System
PDF
An Optimized Parallel Algorithm for Longest Common Subsequence Using Openmp –...
PDF
PPT
Pointcuts and Analysis
PDF
The Road Not Taken: Estimating Path Execution Frequency Statically
PDF
An improved spfa algorithm for single source shortest path problem using forw...
PDF
An improved spfa algorithm for single source shortest path problem using forw...
PDF
International Journal of Managing Information Technology (IJMIT)
PDF
Dycops2019
PPT
Semantic Intensity Spectrum and Semantic Integration Algorithms
PPT
IR-ranking
PDF
Lk module3
PPTX
Subjective evaluation answer ppt
PDF
Automatic Task-based Code Generation for High Performance DSEL
PDF
Implementation of K-Nearest Neighbor Algorithm
DOC
Discovering Novel Information with sentence Level clustering From Multi-docu...
PPT
Simulation Software Performances And Examples
PDF
PSF_Introduction_to_R_Package_for_Pattern_Sequence (1)
PPT
For project
PDF
Continuous Architecting of Stream-Based Systems
TMPA-2015: Implementing the MetaVCG Approach in the C-light System
An Optimized Parallel Algorithm for Longest Common Subsequence Using Openmp –...
Pointcuts and Analysis
The Road Not Taken: Estimating Path Execution Frequency Statically
An improved spfa algorithm for single source shortest path problem using forw...
An improved spfa algorithm for single source shortest path problem using forw...
International Journal of Managing Information Technology (IJMIT)
Dycops2019
Semantic Intensity Spectrum and Semantic Integration Algorithms
IR-ranking
Lk module3
Subjective evaluation answer ppt
Automatic Task-based Code Generation for High Performance DSEL
Implementation of K-Nearest Neighbor Algorithm
Discovering Novel Information with sentence Level clustering From Multi-docu...
Simulation Software Performances And Examples
PSF_Introduction_to_R_Package_for_Pattern_Sequence (1)
For project
Continuous Architecting of Stream-Based Systems

More from Masud Rahman (20)

PDF
Explaining Software Bugs Leveraging Code Structures in Neural Machine Transla...
PDF
Can Hessian-Based Insights Support Fault Diagnosis in Attention-based Models?
PDF
Improved Detection and Diagnosis of Faults in Deep Neural Networks Using Hier...
PPTX
HereWeCode 2022: Dalhousie University
PPTX
The Forgotten Role of Search Queries in IR-based Bug Localization: An Empiric...
PPTX
PhD Seminar - Masud Rahman, University of Saskatchewan
PPTX
PhD proposal of Masud Rahman
PPTX
PhD Comprehensive exam of Masud Rahman
PPTX
Doctoral Symposium of Masud Rahman
PPTX
Supporting Source Code Search with Context-Aware and Semantics-Driven Code Se...
PDF
Poster: Improving Bug Localization with Report Quality Dynamics and Query Ref...
PDF
Impact of Continuous Integration on Code Reviews
PPTX
Predicting Usefulness of Code Review Comments using Textual Features and Deve...
PPTX
STRICT: Information Retrieval Based Search Term Identification for Concept Lo...
PPTX
An Insight into the Unresolved Questions at Stack Overflow
PPTX
An Insight into the Pull Requests of GitHub
PPTX
Recommending Insightful Comments for Source Code using Crowdsourced Knowledge
PPTX
TextRank Based Search Term Identification for Software Change Tasks
PPTX
CMPT-842-BRACK
PPTX
RACK: Code Search in the IDE using Crowdsourced Knowledge
Explaining Software Bugs Leveraging Code Structures in Neural Machine Transla...
Can Hessian-Based Insights Support Fault Diagnosis in Attention-based Models?
Improved Detection and Diagnosis of Faults in Deep Neural Networks Using Hier...
HereWeCode 2022: Dalhousie University
The Forgotten Role of Search Queries in IR-based Bug Localization: An Empiric...
PhD Seminar - Masud Rahman, University of Saskatchewan
PhD proposal of Masud Rahman
PhD Comprehensive exam of Masud Rahman
Doctoral Symposium of Masud Rahman
Supporting Source Code Search with Context-Aware and Semantics-Driven Code Se...
Poster: Improving Bug Localization with Report Quality Dynamics and Query Ref...
Impact of Continuous Integration on Code Reviews
Predicting Usefulness of Code Review Comments using Textual Features and Deve...
STRICT: Information Retrieval Based Search Term Identification for Concept Lo...
An Insight into the Unresolved Questions at Stack Overflow
An Insight into the Pull Requests of GitHub
Recommending Insightful Comments for Source Code using Crowdsourced Knowledge
TextRank Based Search Term Identification for Software Change Tasks
CMPT-842-BRACK
RACK: Code Search in the IDE using Crowdsourced Knowledge

Recently uploaded (20)

PPTX
Tissue processing ( HISTOPATHOLOGICAL TECHNIQUE
PPTX
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
PPTX
master seminar digital applications in india
PDF
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
PDF
A GUIDE TO GENETICS FOR UNDERGRADUATE MEDICAL STUDENTS
PDF
Chinmaya Tiranga quiz Grand Finale.pdf
PPTX
human mycosis Human fungal infections are called human mycosis..pptx
PPTX
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
PPTX
Microbial diseases, their pathogenesis and prophylaxis
PDF
Computing-Curriculum for Schools in Ghana
PDF
Complications of Minimal Access Surgery at WLH
PDF
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
PPTX
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
PDF
RMMM.pdf make it easy to upload and study
PDF
VCE English Exam - Section C Student Revision Booklet
PPTX
GDM (1) (1).pptx small presentation for students
PPTX
Institutional Correction lecture only . . .
PDF
102 student loan defaulters named and shamed – Is someone you know on the list?
PDF
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
PPTX
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
Tissue processing ( HISTOPATHOLOGICAL TECHNIQUE
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
master seminar digital applications in india
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
A GUIDE TO GENETICS FOR UNDERGRADUATE MEDICAL STUDENTS
Chinmaya Tiranga quiz Grand Finale.pdf
human mycosis Human fungal infections are called human mycosis..pptx
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
Microbial diseases, their pathogenesis and prophylaxis
Computing-Curriculum for Schools in Ghana
Complications of Minimal Access Surgery at WLH
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
RMMM.pdf make it easy to upload and study
VCE English Exam - Section C Student Revision Booklet
GDM (1) (1).pptx small presentation for students
Institutional Correction lecture only . . .
102 student loan defaulters named and shamed – Is someone you know on the list?
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
school management -TNTEU- B.Ed., Semester II Unit 1.pptx

Graph-Based Code Completion

  • 1. Graph-Based Pattern-Oriented, ContextSensitive Source Code Completion Nguyen, T.T. ; Nguyen, H.A. ; Tamrawi, A. ; Nguyen, H.V. ; Al-Kofahi, J. ; Nguyen, T.N. Presented By: Mohammad Masudur Rahman
  • 2. Contents  Code Completion  Thesis Statement  Motivating Example  Terminologies  Methodology  Empirical Evaluation & Results  My Observation & Future Thoughts 2
  • 3. Code Completion       3 Built-in feature of modern all IDEs Speed up development Longer Identifier names for program comprehension Less overhead for developers Mostly single variable, method supports- API packages Template based support – control structure, event handling and others
  • 4. Thesis Statement  Novel approach with graph-based code completion  Graph based feature extracting, searching, ranking of API usage pattern, matching with editing context of current code.  Empirical evaluation shows correctness and usefulness- 95% precision, 92% recall, 93% f-score over 24 real world systems 4
  • 5. Motivating Example (Single-line) 5 Fig 1: Current State of Code Completion (Eclipse 3.6)
  • 7. Motivating Example (Query) Fig 3: SWT Query Example 7
  • 8. Terminologies  GRAPACC  API Usage Pattern  Groum Based Model  Context-sensitive Weight 8
  • 9. GRAPACC  Graph-Based  Pattern-Oriented  Context-Sensitive  Code 9 Completion
  • 10. API Usage Pattern 10 Fig 4: SWT API Usage
  • 11. Groum Based Model 11 Fig 5: Groum Conversion
  • 12. Context-Sensitive Weight 1 w f (q) = (d + 1) Wf (q)=Context-sensitive weight of feature q q= feature of Query, Q d=distance to the closest token in Groum Model 12
  • 13. Methodology  Query Processing and Feature Extraction  Pattern Managing, Searching and Ranking  Pattern Oriented Code Completion 13
  • 14. Query Processing and Feature Extraction  Tokenizing  Partial Parsing  Groum Building  Feature Extracting and Weighting 14
  • 15. Tokenizing, Partial Parsing  Lexical analysis  Preserves keywords related to control structure, rest are removed elsewhere but saved  Eclipse java parser  PPA tool returns AST (Abstract Syntax Tree)  Unresolved nodes assigned ‘Unknown Type’ 15
  • 16. Groum Building  Groum from AST  Unresolved nodes are discarded but considered as tokens  Query converted to the following Groum 16 Fig 6: Groum of Query
  • 17. Feature Extraction & Weighting  Groum nodes mapped to tokens in tokenization step  Feature extracted from Groum for path, L<=3  3 factors contribute to feature weight  Structured based factor (size)  Structured based factor (centrality)  User based factor 17
  • 18. Feature Extraction & Weighting  ws(q)= size based weight for feature, q of Query, Q (w(q)=1+size(q); 1<= size(q)<=3)  wc(q)= Centrality based weight for feature, q of Query, Q (wc(q)=n / s, n=no of neighbors, s=size)  (wf(q)=1/(d+1)), distance between focus node and the closest token in feature path Groum Model 18
  • 19. Feature Extraction & Weighting     19 w(q)= total weight for feature, q of Query, Q ws(q)= size based weight for feature, q of Query, Q wc(q)= Centrality based weight for feature, q of Query, Q wf(q)= used based weight for feature, q of Query, Q
  • 20. Pattern Managing, Searching and Ranking  Pr(P) is popularity of pattern P = frequency of Pattern P  Weight of feature p in Pattern P using inverse indexing  Np,P=occurrence 20 of feature p in P, NP=total no of features in P  Np=No of patterns containing p, N=total no of pattern in database
  • 21. Pattern Managing, Searching and Ranking      21 For each feature p, L(p), a list of patterns from which p can be extracted p for pattern feature, q for query feature Now sim(p,q)>∂,then p is added to F, set of mapped features for q For each pєF, top n ranked patterns from L(p) is added to C, candidate patterns for relevance computation Now for each P in C, compute fit(P,Q)
  • 22. Feature Similarity is a name-based similarity between two features given that feature is a collection of labels and has the form Of X.Y.Z where X=package name Y=class name Z=method name 22
  • 23. Name-based Similarity (nsim) =     23 wsim(X, X’) is word-based similarity X, X’ are broken down and two sequence of words L(x) and L(y) Similarity computed as Lo/Lm Lo is length of LCS, Lm is average length of two sequences
  • 25. Pattern Matching SM(P,Q)=total weight of Matched feature pair Fit (P, Q)=Relevance degree between P and Q 25 Pr(P)=Popularity of Pattern P
  • 26. Pattern Oriented Code Completion  Matched pattern is selected and corresponding node in Groum is matched  The missing nodes are fulfilled with code 26
  • 27. Empirical Evaluation  Precision  Recall  F-score  java.io, java.util :API used as library  28 real world open-source systems  4 for training, 24 for testing 27
  • 29. My Observation  Planning to use semantic web technology  Data and control dependency relationship can be improved using semantic relationship like conceptual similarity  Matching of pattern is complex and errorprone, semantic score can be beneficial 29