SlideShare a Scribd company logo
An {Execution-Semantic,
Content-and-Context}-Based
Code-Clone
{Detection,Analysis}
Toshihiro Kamiya
Future University Hakodate
kamiya@fun.ac.jp
Toshihiro Kamiya: An Execution-Semantic and Content-and-Context-Based Code-Clone Detection and Analysis,
Proceedings of the 9th IEEE International Workshop on Software Clones (IWSC'15), pp. 1-7 (2015).
TOC
● Problem/Motivation
● Outline of proposed method
● Example
● Algorithm of clone detection
● Visualization
● Implementation
● Preliminary experiment
The problems / Motivation
● In functional PLs, developers can define their own control
structure.
– Analyzing only pre-defined control statements is no longer sufficient to
represent code pattern.
– E.g., if (C) A; else B; ⇔ myIf(C, lambdaA, lambdaB);
→ inter-procedural analysis
● Dynamic dispatching makes inter-procedural analysis difficult.
– Esp. in functional + OO + dynamically typed PLs
(no explicit type declaration → hard to analyze dispatches in a static
way)
Idea
Detect clones from an execution trace !
● Dispatches and control structures have been
expanded (resolved).
● Detected clones are inter-procedural, type 3
clones.
Outline of proposed method
● Execution trace
→ Call tree
→ Contents and Context (for each node)
●
main()
os.listdir()
print_extensions
_w_for_stmt()
print_extensions
_w_map_func()
os.path.
splitext() print str.join()get_extensions() print
map()
lambda() at line 8
os.path.
splitext()
contents
context
Clone detection
Clone analysis
Contents
Context
Example code
These two functions are...
A helper function
...a semantic clone.
The same
functionality: finds
extensions of given
files and prints
them out
Shared items
and differences
Distinct loops.
for vs map
All shared items are
contained in a function.
Shared items are
spread into functions.
Detection steps
Input: a call tree (← execution trace ← target
program)
1. Extracts contents and context of each node
2. Identifies sets of contents-sharing nodes
3. Removes redundant nodes (filtering with
contexts)
Input
…
call __main__//<module> runpy//_run_code 69
:
load_const __main__//<module> 0
load_const __main__//<module> 12
load_const __main__//<module> 21
load_const __main__//<module> 30
load_const __main__//<module> 39
call __main__//main __main__//<module> 63
:
call __main__//print_extensions_w_for_stmt __main__//main 24
: <list>
call posixpath//splitext __main__//print_extensions_w_for_stmt 25
: 'about.txt'
call genericpath//_splitext posixpath//splitext 18
: 'about.txt' '/' None '.'
load_const genericpath//_splitext 0
return genericpath//_splitext 139
: * 'about' '.txt'
return posixpath//splitext 21
: * 'about' '.txt'
call pygoat.hook/Out/write __main__//print_extensions_w_for_stmt 32
: '.txt'
return pygoat.hook/Out/write 15
call pygoat.hook/Out/write __main__//print_extensions_w_for_stmt 33
: 'n'
return pygoat.hook/Out/write 15
call posixpath//splitext __main__//print_extensions_w_for_stmt 25
: 'pygoat.data'
call genericpath//_splitext posixpath//splitext 18
: 'pygoat.data' '/' None '.'
load_const genericpath//_splitext 0
return genericpath//_splitext 139
: * 'pygoat' '.data'
return posixpath//splitext 21
: * 'pygoat' '.data'
call pygoat.hook/Out/write __main__//print_extensions_w_for_stmt 32
: '.data'
return pygoat.hook/Out/write 15
call pygoat.hook/Out/write __main__//print_extensions_w_for_stmt 33
: 'n'
return pygoat.hook/Out/write 15
call posixpath//splitext __main__//print_extensions_w_for_stmt 25
: 'greeting.md'
call genericpath//_splitext posixpath//splitext 18
: 'greeting.md' '/' None '.'
load_const genericpath//_splitext 0
return genericpath//_splitext 139
: * 'greeting' '.md'
return posixpath//splitext 21
: * 'greeting' '.md'
call pygoat.hook/Out/write __main__//print_extensions_w_for_stmt 32
: '.md'
return pygoat.hook/Out/write 15
call pygoat.hook/Out/write __main__//print_extensions_w_for_stmt 33
Program
Execution trace
main()
os.listdir()
print_extensions
_w_for_stmt()
print_extensions
_w_map_func()
os.path.
splitext() print str.join()get_extensions() print
map()
lambda() at line 8
os.path.
splitext()
Call tree
Input: a call tree (← execution trace ← target
program)
1. Extracts contents and context of each node
2. Identifies sets of contents-sharing nodes
3. Removes redundant nodes (filtering with
contexts)
Step 1.
1. Extracts contents and context of each node
main()
os.listdir()
print_extensions
_w_for_stmt()
print_extensions
_w_map_func()
os.path.
splitext() print str.join()get_extensions() print
map()
lambda() at line 8
os.path.
splitext()
main()
get_extensions(),
map(),
lambda() at line 8,
os.listdir(),
os.path.split(),
print,
print_extensions_w_for_stmt(),
print_extensions_w_map_func(),
str.join()
print_extensions_w_for_stmt()
main()
os.path.split()
print
print_extensions_w_map_func()
main()
get_extensions(),
map(),
lambda() at line 8,
os.path.split(),
print,
str.join()
Input: a call tree (← execution trace ← target
program)
1. Extracts contents and context of each node
2. Identifies sets of contents-sharing nodes
3. Removes redundant nodes (filtering with
contexts)
Step 2.
2. Identifies sets of contents-sharing nodes
main()
get_extensions(),
map(),
lambda() at line 8,
os.listdir(),
os.path.split(),
print,
print_extensions_w_for_stmt(),
print_extensions_w_map_func(),
str.join()
print_extensions_w_for_stmt()
main()
os.path.split()
print
print_extensions_w_map_func()
main()
get_extensions(),
map(),
lambda() at line 8,
os.path.split(),
print,
str.join()
Input: a call tree (← execution trace ← target
program)
1. Extracts contents and context of each node
2. Identifies sets of contents-sharing nodes
3. Removes redundant nodes (filtering with
contexts)
Step 3.
3. Removes redundant nodes (filtering with
contexts) main()
get_extensions(),
map(),
lambda() at line 8,
os.listdir(),
os.path.split(),
print,
print_extensions_w_for_stmt(),
print_extensions_w_map_func(),
str.join()
print_extensions_w_for_stmt()
main()
os.path.split()
print
print_extensions_w_map_func()
main()
get_extensions(),
map(),
lambda() at line 8,
os.path.split(),
print,
str.join()
Included by all of other
nodes in the set
⇒ redundant
Input: a call tree (← execution trace ← target
program)
1. Extracts contents and context of each node
2. Identifies sets of contents-sharing nodes
3. Removes redundant nodes (filtering with
contexts)
Detection result
A clone class:
{ print_extensions_w_map_func(),
print_extensions_w_for_stmt() }
Shared items:
{ os.path.split(), print }
print_extensions_w_for_stmt()
main()
os.path.split()
print
print_extensions_w_map_func()
main()
get_extensions(),
map(),
lambda() at line 8,
os.path.split(),
print,
str.join()
Detection result
A clone class:
{ print_extensions_w_map_func(),
print_extensions_w_for_stmt() }
Shared items:
{ os.path.split(), print }
dagified (merged) by label
(DAG = directed acyclic graph)
Context
Contents
main()
print_extensions
_w_for_stmt()
print_extensions
_w_map_func()
get_extensions()print
map()
lambda() at line 8
os.path.
splitext()
Content-and-context analysis for triaging
● Clone class (a), shared items (b), distinct contents (or gap) (c)
● The distinct contents (c) shared the same set of
(sub-)contents (d) → (c) is another clone class.
● If (c) is merged before (a), (c) will not be a gap of (a)
anymore.
(a)
(b)
(c)
(d)
Detected from markdown2's
code (described later)
Tool prototype
Target program Inputs / Test
cases
Execution
(Python
interpreter)
Execution trace
Debugging /
profiling APIs
Execution trace
extraction
String balloon
generation
String balloons
Frequent item-set
mining
(Apriori)
Similar sets of
contents
Redundant context
removal
Code clones
Step 1
Step 2
Step 3
Detection
Visualization Metrics calculation
Analysis
● Input: Python source code
● Uses a frequent item-set mining
algorithm / implementation
– Apriori (www.borgelt.net/apriori.html)
● Heuristics / optimizations
– Max. depth of contents from a target node
(default 5)
– Max. number of content items of a
candidate node (default 25)
● Filters out the nodes with large contents, i.e.,
nodes near to the root of call tree
– Removal of basic, primitive functions
– ...
Content-and-context clone on call graph
Preliminary experiment
for each of the parameter(“Max. number of
content items of a candidate node”) values:
10, 15, …, 30.
Target product Collection of exe. seq. # function
calls
# unique
labels
markdown2 Running 144 unit tests 227K 1128
wxPython Invoking a sample
program “pySketch”
483K 1058
Results
Results
Exponential to
number of contents
Too “peaky” for practical use
Summary
● A code-clone detection from a dynamic info, execution trace
– Aiming to apply functional/dynamically typed PLs
● Context-and-content analysis for triage
● Algorithm, implementation, heuristics
● Preliminary experiment
– Targets: markdown2 and wxPython
– Peaky, sensitive to a parameter Max. number of content items of a candidate node →
Needs refinements
Omitted, refer the paper:
● Threats to validity
● Future plan
(a)
(b)
(c)
(d)

More Related Content

PDF
Introducing Parameter Sensitivity to Dynamic Code-Clone Analysis Methods
PDF
Clone detection in Python
PPTX
Clonedigger-Python
PDF
Unsupervised Machine Learning for clone detection
PPTX
C Language (All Concept)
PDF
C programming language
PDF
File Handling in C Programming
PPT
C++ Interview Questions
Introducing Parameter Sensitivity to Dynamic Code-Clone Analysis Methods
Clone detection in Python
Clonedigger-Python
Unsupervised Machine Learning for clone detection
C Language (All Concept)
C programming language
File Handling in C Programming
C++ Interview Questions

What's hot (20)

PDF
Notes part 8
PDF
answer-model-qp-15-pcd13pcd
PPTX
C language updated
PDF
Embedded C - Lecture 2
PDF
Hands-on Introduction to the C Programming Language
PDF
C Programming Project
PPTX
Yacc (yet another compiler compiler)
PDF
Advanced C Language for Engineering
ODP
OpenGurukul : Language : C Programming
PDF
Programming languages
PDF
C Programming Tutorial - www.infomtec.com
PDF
C programming day#1
PPT
C++ Programming Course
PDF
Function overloading ppt
PPTX
Overview of c language
PDF
L6
DOC
'C' language notes (a.p)
PPT
C language basics
PDF
Unit iii
PDF
C intro
Notes part 8
answer-model-qp-15-pcd13pcd
C language updated
Embedded C - Lecture 2
Hands-on Introduction to the C Programming Language
C Programming Project
Yacc (yet another compiler compiler)
Advanced C Language for Engineering
OpenGurukul : Language : C Programming
Programming languages
C Programming Tutorial - www.infomtec.com
C programming day#1
C++ Programming Course
Function overloading ppt
Overview of c language
L6
'C' language notes (a.p)
C language basics
Unit iii
C intro
Ad

Similar to An Execution-Semantic and Content-and-Context-Based Code-Clone Detection and Analysis (20)

PDF
Not Your Fathers C - C Application Development In 2016
ODP
Linux kernel tracing superpowers in the cloud
PPTX
Andriy Shalaenko - GO security tips
PDF
Semmle Codeql
PDF
02 c++g3 d (1)
PDF
R programming for data science
PDF
LAS16-501: Introduction to LLVM - Projects, Components, Integration, Internals
PDF
Picking Mushrooms after Cppcheck
PDF
Scala laboratory: Globus. iteration #2
PDF
C notes.pdf
PDF
Modeling the Behavior of Threads in the PREEMPT_RT Linux Kernel Using Automata
PDF
breaking_dependencies_the_solid_principles__klaus_iglberger__cppcon_2020.pdf
PDF
Go 1.10 Release Party - PDX Go
PDF
Clang: More than just a C/C++ Compiler
PDF
Internship - Final Presentation (26-08-2015)
PDF
ceph::errorator<> throw/catch-free, compile time-checked exceptions for seast...
PPTX
Generate typings from JavaScript with TypeScript 3.7
PDF
C++ amp on linux
PDF
Checking the Open-Source Multi Theft Auto Game
Not Your Fathers C - C Application Development In 2016
Linux kernel tracing superpowers in the cloud
Andriy Shalaenko - GO security tips
Semmle Codeql
02 c++g3 d (1)
R programming for data science
LAS16-501: Introduction to LLVM - Projects, Components, Integration, Internals
Picking Mushrooms after Cppcheck
Scala laboratory: Globus. iteration #2
C notes.pdf
Modeling the Behavior of Threads in the PREEMPT_RT Linux Kernel Using Automata
breaking_dependencies_the_solid_principles__klaus_iglberger__cppcon_2020.pdf
Go 1.10 Release Party - PDX Go
Clang: More than just a C/C++ Compiler
Internship - Final Presentation (26-08-2015)
ceph::errorator<> throw/catch-free, compile time-checked exceptions for seast...
Generate typings from JavaScript with TypeScript 3.7
C++ amp on linux
Checking the Open-Source Multi Theft Auto Game
Ad

More from Kamiya Toshihiro (14)

PDF
ソースコード推薦あるいは修正の情報源としての質問掲示板とソースコードレポジトリの比較
PDF
Code Difference Visualization by a Call Tree
PDF
実行トレース間のデータの差異に基づくデータフロー解析手法の提案
PDF
コードクローン研究 ふりかえり ~ストロング・スタイルで行こう~
PDF
逆戻りデバッグ補助のための嵌入的スパイの試作
PDF
任意粒度機能モデルコードクローン検出手法のリファクタリング理解への適用の試み
PDF
任意粒度機能モデルに基づく動的型付けプログラミング言語向けソースコード検索手法の提案
PDF
Web アプリケーションの UI 機能テストの ための HTML 構造パターンの抽出手法
PDF
WebアプリケーションのUI機能テストのためのHTML構造パターンの提案
PDF
An Algorithm for Keyword Search on an Execution Path
PDF
And/Or/Callグラフの提案とソースコード検索への応用
PDF
PBLへのアジャイル開発手法導入の試み
PDF
任意粒度機能モデルに基づくコードクローン検出手法の大規模プログラムの適用に向けた改善
PDF
任意粒度機能モデルに基づくバイトコードからのコードクローン検出手法
ソースコード推薦あるいは修正の情報源としての質問掲示板とソースコードレポジトリの比較
Code Difference Visualization by a Call Tree
実行トレース間のデータの差異に基づくデータフロー解析手法の提案
コードクローン研究 ふりかえり ~ストロング・スタイルで行こう~
逆戻りデバッグ補助のための嵌入的スパイの試作
任意粒度機能モデルコードクローン検出手法のリファクタリング理解への適用の試み
任意粒度機能モデルに基づく動的型付けプログラミング言語向けソースコード検索手法の提案
Web アプリケーションの UI 機能テストの ための HTML 構造パターンの抽出手法
WebアプリケーションのUI機能テストのためのHTML構造パターンの提案
An Algorithm for Keyword Search on an Execution Path
And/Or/Callグラフの提案とソースコード検索への応用
PBLへのアジャイル開発手法導入の試み
任意粒度機能モデルに基づくコードクローン検出手法の大規模プログラムの適用に向けた改善
任意粒度機能モデルに基づくバイトコードからのコードクローン検出手法

Recently uploaded (20)

PPTX
Introduction to Cardiovascular system_structure and functions-1
PDF
Biophysics 2.pdffffffffffffffffffffffffff
PDF
Mastering Bioreactors and Media Sterilization: A Complete Guide to Sterile Fe...
PDF
IFIT3 RNA-binding activity primores influenza A viruz infection and translati...
PPTX
G5Q1W8 PPT SCIENCE.pptx 2025-2026 GRADE 5
PPTX
Taita Taveta Laboratory Technician Workshop Presentation.pptx
PDF
bbec55_b34400a7914c42429908233dbd381773.pdf
PPTX
ANEMIA WITH LEUKOPENIA MDS 07_25.pptx htggtftgt fredrctvg
PPTX
Microbiology with diagram medical studies .pptx
PDF
Formation of Supersonic Turbulence in the Primordial Star-forming Cloud
PDF
An interstellar mission to test astrophysical black holes
PPTX
ognitive-behavioral therapy, mindfulness-based approaches, coping skills trai...
PDF
Phytochemical Investigation of Miliusa longipes.pdf
PDF
Placing the Near-Earth Object Impact Probability in Context
PPTX
INTRODUCTION TO EVS | Concept of sustainability
PPTX
EPIDURAL ANESTHESIA ANATOMY AND PHYSIOLOGY.pptx
DOCX
Viruses (History, structure and composition, classification, Bacteriophage Re...
PDF
Sciences of Europe No 170 (2025)
PDF
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
PPTX
Classification Systems_TAXONOMY_SCIENCE8.pptx
Introduction to Cardiovascular system_structure and functions-1
Biophysics 2.pdffffffffffffffffffffffffff
Mastering Bioreactors and Media Sterilization: A Complete Guide to Sterile Fe...
IFIT3 RNA-binding activity primores influenza A viruz infection and translati...
G5Q1W8 PPT SCIENCE.pptx 2025-2026 GRADE 5
Taita Taveta Laboratory Technician Workshop Presentation.pptx
bbec55_b34400a7914c42429908233dbd381773.pdf
ANEMIA WITH LEUKOPENIA MDS 07_25.pptx htggtftgt fredrctvg
Microbiology with diagram medical studies .pptx
Formation of Supersonic Turbulence in the Primordial Star-forming Cloud
An interstellar mission to test astrophysical black holes
ognitive-behavioral therapy, mindfulness-based approaches, coping skills trai...
Phytochemical Investigation of Miliusa longipes.pdf
Placing the Near-Earth Object Impact Probability in Context
INTRODUCTION TO EVS | Concept of sustainability
EPIDURAL ANESTHESIA ANATOMY AND PHYSIOLOGY.pptx
Viruses (History, structure and composition, classification, Bacteriophage Re...
Sciences of Europe No 170 (2025)
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
Classification Systems_TAXONOMY_SCIENCE8.pptx

An Execution-Semantic and Content-and-Context-Based Code-Clone Detection and Analysis

  • 1. An {Execution-Semantic, Content-and-Context}-Based Code-Clone {Detection,Analysis} Toshihiro Kamiya Future University Hakodate kamiya@fun.ac.jp Toshihiro Kamiya: An Execution-Semantic and Content-and-Context-Based Code-Clone Detection and Analysis, Proceedings of the 9th IEEE International Workshop on Software Clones (IWSC'15), pp. 1-7 (2015).
  • 2. TOC ● Problem/Motivation ● Outline of proposed method ● Example ● Algorithm of clone detection ● Visualization ● Implementation ● Preliminary experiment
  • 3. The problems / Motivation ● In functional PLs, developers can define their own control structure. – Analyzing only pre-defined control statements is no longer sufficient to represent code pattern. – E.g., if (C) A; else B; ⇔ myIf(C, lambdaA, lambdaB); → inter-procedural analysis ● Dynamic dispatching makes inter-procedural analysis difficult. – Esp. in functional + OO + dynamically typed PLs (no explicit type declaration → hard to analyze dispatches in a static way)
  • 4. Idea Detect clones from an execution trace ! ● Dispatches and control structures have been expanded (resolved). ● Detected clones are inter-procedural, type 3 clones.
  • 5. Outline of proposed method ● Execution trace → Call tree → Contents and Context (for each node) ● main() os.listdir() print_extensions _w_for_stmt() print_extensions _w_map_func() os.path. splitext() print str.join()get_extensions() print map() lambda() at line 8 os.path. splitext() contents context Clone detection Clone analysis Contents Context
  • 7. These two functions are... A helper function
  • 8. ...a semantic clone. The same functionality: finds extensions of given files and prints them out
  • 10. and differences Distinct loops. for vs map All shared items are contained in a function. Shared items are spread into functions.
  • 11. Detection steps Input: a call tree (← execution trace ← target program) 1. Extracts contents and context of each node 2. Identifies sets of contents-sharing nodes 3. Removes redundant nodes (filtering with contexts)
  • 12. Input … call __main__//<module> runpy//_run_code 69 : load_const __main__//<module> 0 load_const __main__//<module> 12 load_const __main__//<module> 21 load_const __main__//<module> 30 load_const __main__//<module> 39 call __main__//main __main__//<module> 63 : call __main__//print_extensions_w_for_stmt __main__//main 24 : <list> call posixpath//splitext __main__//print_extensions_w_for_stmt 25 : 'about.txt' call genericpath//_splitext posixpath//splitext 18 : 'about.txt' '/' None '.' load_const genericpath//_splitext 0 return genericpath//_splitext 139 : * 'about' '.txt' return posixpath//splitext 21 : * 'about' '.txt' call pygoat.hook/Out/write __main__//print_extensions_w_for_stmt 32 : '.txt' return pygoat.hook/Out/write 15 call pygoat.hook/Out/write __main__//print_extensions_w_for_stmt 33 : 'n' return pygoat.hook/Out/write 15 call posixpath//splitext __main__//print_extensions_w_for_stmt 25 : 'pygoat.data' call genericpath//_splitext posixpath//splitext 18 : 'pygoat.data' '/' None '.' load_const genericpath//_splitext 0 return genericpath//_splitext 139 : * 'pygoat' '.data' return posixpath//splitext 21 : * 'pygoat' '.data' call pygoat.hook/Out/write __main__//print_extensions_w_for_stmt 32 : '.data' return pygoat.hook/Out/write 15 call pygoat.hook/Out/write __main__//print_extensions_w_for_stmt 33 : 'n' return pygoat.hook/Out/write 15 call posixpath//splitext __main__//print_extensions_w_for_stmt 25 : 'greeting.md' call genericpath//_splitext posixpath//splitext 18 : 'greeting.md' '/' None '.' load_const genericpath//_splitext 0 return genericpath//_splitext 139 : * 'greeting' '.md' return posixpath//splitext 21 : * 'greeting' '.md' call pygoat.hook/Out/write __main__//print_extensions_w_for_stmt 32 : '.md' return pygoat.hook/Out/write 15 call pygoat.hook/Out/write __main__//print_extensions_w_for_stmt 33 Program Execution trace main() os.listdir() print_extensions _w_for_stmt() print_extensions _w_map_func() os.path. splitext() print str.join()get_extensions() print map() lambda() at line 8 os.path. splitext() Call tree Input: a call tree (← execution trace ← target program) 1. Extracts contents and context of each node 2. Identifies sets of contents-sharing nodes 3. Removes redundant nodes (filtering with contexts)
  • 13. Step 1. 1. Extracts contents and context of each node main() os.listdir() print_extensions _w_for_stmt() print_extensions _w_map_func() os.path. splitext() print str.join()get_extensions() print map() lambda() at line 8 os.path. splitext() main() get_extensions(), map(), lambda() at line 8, os.listdir(), os.path.split(), print, print_extensions_w_for_stmt(), print_extensions_w_map_func(), str.join() print_extensions_w_for_stmt() main() os.path.split() print print_extensions_w_map_func() main() get_extensions(), map(), lambda() at line 8, os.path.split(), print, str.join() Input: a call tree (← execution trace ← target program) 1. Extracts contents and context of each node 2. Identifies sets of contents-sharing nodes 3. Removes redundant nodes (filtering with contexts)
  • 14. Step 2. 2. Identifies sets of contents-sharing nodes main() get_extensions(), map(), lambda() at line 8, os.listdir(), os.path.split(), print, print_extensions_w_for_stmt(), print_extensions_w_map_func(), str.join() print_extensions_w_for_stmt() main() os.path.split() print print_extensions_w_map_func() main() get_extensions(), map(), lambda() at line 8, os.path.split(), print, str.join() Input: a call tree (← execution trace ← target program) 1. Extracts contents and context of each node 2. Identifies sets of contents-sharing nodes 3. Removes redundant nodes (filtering with contexts)
  • 15. Step 3. 3. Removes redundant nodes (filtering with contexts) main() get_extensions(), map(), lambda() at line 8, os.listdir(), os.path.split(), print, print_extensions_w_for_stmt(), print_extensions_w_map_func(), str.join() print_extensions_w_for_stmt() main() os.path.split() print print_extensions_w_map_func() main() get_extensions(), map(), lambda() at line 8, os.path.split(), print, str.join() Included by all of other nodes in the set ⇒ redundant Input: a call tree (← execution trace ← target program) 1. Extracts contents and context of each node 2. Identifies sets of contents-sharing nodes 3. Removes redundant nodes (filtering with contexts)
  • 16. Detection result A clone class: { print_extensions_w_map_func(), print_extensions_w_for_stmt() } Shared items: { os.path.split(), print } print_extensions_w_for_stmt() main() os.path.split() print print_extensions_w_map_func() main() get_extensions(), map(), lambda() at line 8, os.path.split(), print, str.join()
  • 17. Detection result A clone class: { print_extensions_w_map_func(), print_extensions_w_for_stmt() } Shared items: { os.path.split(), print } dagified (merged) by label (DAG = directed acyclic graph) Context Contents main() print_extensions _w_for_stmt() print_extensions _w_map_func() get_extensions()print map() lambda() at line 8 os.path. splitext()
  • 18. Content-and-context analysis for triaging ● Clone class (a), shared items (b), distinct contents (or gap) (c) ● The distinct contents (c) shared the same set of (sub-)contents (d) → (c) is another clone class. ● If (c) is merged before (a), (c) will not be a gap of (a) anymore. (a) (b) (c) (d) Detected from markdown2's code (described later)
  • 19. Tool prototype Target program Inputs / Test cases Execution (Python interpreter) Execution trace Debugging / profiling APIs Execution trace extraction String balloon generation String balloons Frequent item-set mining (Apriori) Similar sets of contents Redundant context removal Code clones Step 1 Step 2 Step 3 Detection Visualization Metrics calculation Analysis ● Input: Python source code ● Uses a frequent item-set mining algorithm / implementation – Apriori (www.borgelt.net/apriori.html) ● Heuristics / optimizations – Max. depth of contents from a target node (default 5) – Max. number of content items of a candidate node (default 25) ● Filters out the nodes with large contents, i.e., nodes near to the root of call tree – Removal of basic, primitive functions – ... Content-and-context clone on call graph
  • 20. Preliminary experiment for each of the parameter(“Max. number of content items of a candidate node”) values: 10, 15, …, 30. Target product Collection of exe. seq. # function calls # unique labels markdown2 Running 144 unit tests 227K 1128 wxPython Invoking a sample program “pySketch” 483K 1058
  • 22. Results Exponential to number of contents Too “peaky” for practical use
  • 23. Summary ● A code-clone detection from a dynamic info, execution trace – Aiming to apply functional/dynamically typed PLs ● Context-and-content analysis for triage ● Algorithm, implementation, heuristics ● Preliminary experiment – Targets: markdown2 and wxPython – Peaky, sensitive to a parameter Max. number of content items of a candidate node → Needs refinements Omitted, refer the paper: ● Threats to validity ● Future plan (a) (b) (c) (d)