RACK: Code Search in the IDE using Crowdsourced Knowledge

RACK: CODE SEARCH IN THE IDE
USING CROWDSOURCED
KNOWLEDGE
Mohammad Masudur Rahman, Chanchal K. Roy, and
David Lo+
University of Saskatchewan, Canada,
Singapore Management University+, Singapore
International Conference on Software Engineering
(ICSE 2017), Buenos Aires, Argentina

CODE SEARCH ENGINE (BLACK DUCK)
2
 Natural language query does not work well with code search
engines (e.g., Black Duck, Krugle, GitHub search)
 Performs simple keyword matching, not sufficient.
Query
No results

CODE SEARCH ENGINE (GITHUB & KRUGLE)
 Query should be carefully prepared, not an easy task.
 Query should contain relevant API names.
 Developer needs up-front knowledge on required APIs.
3
Keyword matching
NL query to
be replaced
by relevant
APIs?

WEB SEARCH ENGINE (GOOGLE)
4
 Thousands of results from web search engines (e.g., Google)
 Cannot guarantee relevant code examples.
 Information overload for the developers.
Search Query: How to send email in Java?

5
RACK: Code Search in the IDE using Crowdsourced
Knowledge
Research Problem: Code Example Search using
Natural Language Queries

MOTIVATING EXAMPLE
6
Question title
Relevant API Item of our interest

CHALLENGES FOR RACK
7
RQ1: Do accepted answers refer to API names
frequently?
RQ2: What percentage of standard APIs are
referred to in the accepted answers?
RQ3: Do titles from Stack Overflow questions contain
potential keywords for code search?

ANSWERING RQ1: API CLASSES/ANSWER
8
 Heavy-tailed distribution, i.e., Poisson
 Mean, λ=2.37 with 95% CI between
2.33 and 2.41

ANSWERING RQ2: CORE API CLASS
COVERAGE IN STACK OVERFLOW
9

RQ3: SEARCH QUERY KEYWORDS IN SO
QUESTION TITLES
10
Fig 6: Coverage of query keywords in Stack Overflow question titles

SUMMARY OF FINDINGS
11
• At least 2 API classes per answer
• 60% of core API classes covered by Stack Overflow
• 73% real life query terms matched with question title
terms of Stack Overflow
SO is good for relevant API names for
code example search

12
Token-API mapping
database construction
Query reformulation or
translation
Code example
search
WORKING METHODOLOGY OF RACK

STEP I: CONSTRUCTION OF TOKEN-API
MAPPING DATABASE
13
1
2
3 4
5 6
7
8

STEP II: NL QUERY REFORMULATION
14
1 2
3
4 5 6

STEP III: CODE SEARCH WITH GITHUB
15
1 2 3 4 5

DATASET & EXPERIMENTAL SETUP
16
Java2s
150 code search
queries
Validation
(Thung et al, ASE 2013)
Evaluation
(4 metrics)

PERFORMANCE OF RACK: ANSWERING RQ4
17
Performance Metric Top-3 Top-5 Top-10
Top-K Accuracy 49.33% 62.67% 78.67%
MRR@K 0.17 0.17 0.17
MAP@K 30.39% 33.36% 34.92%
MR@K 23.71% 33.48% 45.02%
• Provides about 79% Top-K accuracy
• Precision is about 35%, Recall is about 45%
• Main strength– exploiting the wide coverage of API
packages and libraries from Stack Overflow.

COMPARISON WITH EXISTING TECHNIQUES:
ANSWERING RQ7
18
Fig : Comparison using box plots

RACK: PROPOSED CODE SEARCH TOOL
19
1
Alice needs the code example for
sending an email
2
Alice submits a NL query to RACK
3
RACK returns a ranked list of
relevant API classes
4
Alice selects
API classes
based on the
metrics and her
experience
5
Makes code search
6
Relevant code
example returned
7
Checks Top-K examples
RACK returns Top-K
relevant examples
8
9
Alice can copy/paste
and continue her
work in the IDE

RACK: TOOL DEMO
20
Click here to check the demo

TAKE-HOME MESSAGES
 Research problem: Effective code search using natural
language queries.
 Neither existing code search engines nor web search
engines are sufficient for this.
 First step: Reformulate NL query into relevant API names
– was done using crowd sourced knowledge from Stack
Overflow.
 Second step: Code search using GitHub search API in the
IDE.
 Our solution packaged as an Eclipse plugin and a web
service.
 Please give it a try, and let us know!
21

THANK YOU ! QUESTIONS?
22
RACK page: http://homepage.usask.ca/~masud.rahman/rack
Email us: chanchal.roy@usask.ca or masud.rahman@usask.ca

HEURISTICS: REFORMULATION OF NL QUERY
 Keyword--API Co-occurrence (KAC)
 Considers co-occurrence between query keywords & API
classes in Stack Overflow questions & answers.
 Co-occurrence due to semantic relevance or by chance
 Random co-occurrences discarded using threshold.
 Candidate API Selection
 denotes a keyword, denotes an API, and denotes
the association between them in SO Q & A
 KAC-Score Calculation
 Normalized API score using co-occurrence frequency
23
))(|{][  jijji AKrankfreqAAAKL
|][|
]))[(,(
1),(
i
ij
ijKAC
KL
KLsortByFreqArank
KAS 
iK jA ji AK 

HEURISTICS: REFORMULATION OF NL QUERY
 Keyword—Keyword Coherence (KKC)
 Candidate APIs should be coherent with one another.
 Two APIs are coherent if their co-occurred keywords are
semantically relevant.
 Keyword relevance based on contextual words in Stack
Overflow question titles. Total key pairs =
 Candidate API Selection
 refer to contextual word list of
 KKC-Score Calculation
24
}),cos(|][][{],[  jijijicoh CCKLKLKKL
ji CC , ji KK ,
2nC
)()(|),cos(),,( jjjijijijKKC AKAKCCKKAS 

RACK: Code Search in the IDE using Crowdsourced Knowledge

More Related Content

What's hot (20)

Similar to RACK: Code Search in the IDE using Crowdsourced Knowledge (20)

More from Masud Rahman (20)

Recently uploaded (20)

RACK: Code Search in the IDE using Crowdsourced Knowledge

Editor's Notes