SlideShare a Scribd company logo
Intro Proposal Evaluation Conclusion
Models from Code or Code as a Model?
Antonio García-Domínguez, Dimitris Kolovos
Aston University, University of York
OCL’16
October 2nd, 2016
A. García, D. S. Kolovos Models from Code or Code as a Model? 1 / 13
Intro Proposal Evaluation Conclusion
Using codebases to drive software engineering tasks
Usual sequence of events
We had a business need
We invested in developing a program that covered it
Now we may need to:
Extract architecture or underlying business process?
Find bugs before they hit us?
Improve its design?
Migrate it to a new technology?
Code is the most accurate description: let’s use it
How do we extract knowledge?
Regexps do not scale to complex tasks
We need something that understands the language
The “extractor” is embedded within a process
A. García, D. S. Kolovos Models from Code or Code as a Model? 2 / 13
Intro Proposal Evaluation Conclusion
Existing approaches
Some well-known reverse engineering tools (“extractors”)
Eclipse MoDisco: EMF-based, implements KDM/ASTM
JaMoPP: EMF-based, uses custom metamodel
Moose: FAMIX-based tool
Rascal: extracts partial JDT representation into a model
Issue: extractors are one-off processes
They produce standalone models: the code is no longer needed
However, current tools are not incremental: if the code is
changed, the extraction has to be redone from scratch
Issue: extractors are not query-aware
Some tasks will only access a small part of the codebase
Extracting the rest only adds overhead
A. García, D. S. Kolovos Models from Code or Code as a Model? 3 / 13
Intro Proposal Evaluation Conclusion
Epsilon EMC JDT driver: use IDE indices as models
IDEs already extract representations all the time
Eclipse indexes Java projects on the background
Keeps fast pointers to classes / methods
Extra info available on demand through parsing
JDT indices are under active improvement:
See Stefan Xenos’ talk on EclipseCon NA’16
Cross-pollination from CDT project (C++)
Faster, more thorough indexing coming in future releases
Our proposal: Epsilon EMC JDT driver
Expose code as seen by IDE (JDT) as a model
On-demand loading + direct access to Java classes
Sources available on GitHub (epsilonlabs/emc-jdt)
A. García, D. S. Kolovos Models from Code or Code as a Model? 4 / 13
Intro Proposal Evaluation Conclusion
Making it possible: Epsilon architecture
Epsilon Object Language (EOL) = JavaScript + OCL
Epsilon Model Connectivity (EMC)
Core
Model Validation (EVL) Code Generation (EGL)
Model-to-model Transformation (ETL) ...
Task-specific
languages
Technology-specific
drivers
Eclipse Modeling Framework (EMF) Schema-less XML
Eclipse Java Developer Tools (JDT) CSV ...
extends
implements
All Epsilon languages are based on EOL
EOL accesses models through EMC interfaces
By implementing the EMC interfaces, all Epsilon languages
can use JDT indices as models
A. García, D. S. Kolovos Models from Code or Code as a Model? 5 / 13
Intro Proposal Evaluation Conclusion
Configuration dialog for an EMC JDT model
A. García, D. S. Kolovos Models from Code or Code as a Model? 6 / 13
Intro Proposal Evaluation Conclusion
allInstances() in EMC JDT
Reflection-based X.allInstances
1 Parse Java sources in the projects on the fly
2 Traverse JDT Document Object Model with ASTVisitor
3 Use Java reflection to fetch instances of X
4 Cache for later executions, if desired
Special case: TypeDeclaration.allInstances
Searchable, lazy (no parsing unless looping over it)
Supports two new operations:
c.select(it|it.name = expr) searches with JDT the relevant
compilation unit, parses it and returns the right DOM node
c.search(it|it.name = expr) works the same, but it returns the
raw index entry (a simpler JDT SourceType)
A. García, D. S. Kolovos Models from Code or Code as a Model? 7 / 13
Intro Proposal Evaluation Conclusion
Case study: validate code against UML models
Overview
We are maintaining a library, and we need to check its
compliance with a UML model as it changes
Here, “compliance” means “must have all the classes and
methods in the UML model” (code may have more)
Which is faster/more convenient:
extracting a model with MoDisco first, or
checking it directly with the EMC JDT driver?
A. García, D. S. Kolovos Models from Code or Code as a Model? 8 / 13
Intro Proposal Evaluation Conclusion
Experiment setup
Inputs
Source code: JFreeChart 1.0.17, 1.0.18 and 1.0.19
UML model for 1.0.17 extracted by Modelio
Tools
MoDisco 0.13.2 discoverer extracted 1 .xmi per version
Epsilon interim (3d4408), emc-jdt interim (5b5ea)
Validation task: implemented in the Epsilon Validation Language
1 version for MoDisco, 2 for EMC JDT (select/search)
Rules:
1 Each UML class has its corresponding Java class
2 Each UML method is implemented in that Java class
3 Each nested class obeys the two above rules
4 validation errors found in 1.0.18 and 1.0.19
A. García, D. S. Kolovos Models from Code or Code as a Model? 9 / 13
Intro Proposal Evaluation Conclusion
Performance results
MoDisco validates all versions in 112.2s, JDT/select in 71.92s, JDT/search in 36.52s
1.0.17
1.0.18
1.0.19
1.0.17
1.0.18
1.0.19
1.0.17
1.0.18
1.0.19
0
10
20
30
40
JDT w/select JDT w/searchMoDisco
Executiontime(s)
Extract Load Validation
A. García, D. S. Kolovos Models from Code or Code as a Model? 10 / 13
Intro Proposal Evaluation Conclusion
Performance discussion
MoDisco: slower loading for faster validation
When using .xmi files, we load entire model into memory
Pro: with everything in memory, validation is faster
Con: huge models won’t fit in memory
We’ll need a store with on-demand loading (e.g. CDO)
On-demand loading can change performance profile
Amortisation of extraction costs depends on codebase
Frozen codebase (e.g. legacy systems):
Full extraction is quickly amortised
MoDisco is a better choice
Quickly changing codebase (e.g. actively developed systems):
Extracting on demand is usually better (models don’t live long)
EMC JDT is a better choice
A. García, D. S. Kolovos Models from Code or Code as a Model? 11 / 13
Intro Proposal Evaluation Conclusion
Conclusion and future lines of work
Summary
Codebases are a valuable input for many SE tasks
Two options to query codebases:
Extract standalone models (MoDisco)
Use code directly as a model (EMC JDT)
EMC JDT is faster for changing codebases
Future work
Further optimisations to improve performance
Evaluate impact of future JDT versions
More filtering fields for searchable collections
More shorthand properties for common scenarios
Port approach to other languages (e.g. C++ through CDT)
A. García, D. S. Kolovos Models from Code or Code as a Model? 12 / 13
End of the presentation
Questions?
@antoniogado
A. García, D. S. Kolovos Models from Code or Code as a Model? 13 / 13
Extra features in EMC JDT
Shorthand properties
Quick access to commonly needed information
EMC PropertyGetter computes value on demand
FieldDeclaration: “name”
BodyDeclaration: “public”, “static”...
A. García, D. S. Kolovos Models from Code or Code as a Model? 14 / 13

More Related Content

PPT
Expressive And Modular Predicate Dispatch In Java
PDF
AutomationML: A Model-Driven View
PPTX
mbeddr meets IncQuer - Combining the Best Features of Two Modeling Worlds
PPT
Introduction to programming languages part 2
PPT
Pragmatic Model Driven Development using openArchitectureWare
PDF
ctchou-resume
PPTX
Introduction To C#
PDF
llvm-py: Writing Compilers In Python
Expressive And Modular Predicate Dispatch In Java
AutomationML: A Model-Driven View
mbeddr meets IncQuer - Combining the Best Features of Two Modeling Worlds
Introduction to programming languages part 2
Pragmatic Model Driven Development using openArchitectureWare
ctchou-resume
Introduction To C#
llvm-py: Writing Compilers In Python

What's hot (15)

PDF
ctchou-resume
PPTX
Cd2Alloy
PPTX
Introduction to Programming
PPT
Introduction to llvm
PDF
Oleksii Moskalenko "Continuous Delivery of ML Pipelines to Production"
ODP
Benefits of Extensions
PPTX
Paradigms
PPTX
Prgramming paradigms
PPTX
CS152 Programming Paradigm
KEY
Language Engineering in the Cloud
ODP
OpenOffice++: Improving the Quality of Open Source Software
PPTX
PPT
Programming Methodology
PPTX
Programming Paradigm & Languages
PDF
UnDeveloper Studio
ctchou-resume
Cd2Alloy
Introduction to Programming
Introduction to llvm
Oleksii Moskalenko "Continuous Delivery of ML Pipelines to Production"
Benefits of Extensions
Paradigms
Prgramming paradigms
CS152 Programming Paradigm
Language Engineering in the Cloud
OpenOffice++: Improving the Quality of Open Source Software
Programming Methodology
Programming Paradigm & Languages
UnDeveloper Studio
Ad

Similar to OCL'16 slides: Models from Code or Code as a Model? (20)

PPT
MDE=Model Driven Everything (Spanish Eclipse Day 2009)
PDF
On the Customization of Model Management Systems for File-Centric IDEs
PDF
Enriching Tool Support for Model-Driven Software Development
PDF
Generic Model-based Approaches for Software Reverse Engineering and Comprehen...
PPTX
PPTX
Spy On Your Models, Standard talk at EclipseCon 2011
PDF
Developing a new EMC Driver for Eclipse Epsilon
PPSX
MDE in Practice
PPTX
MODEL-DRIVEN ENGINEERING (MDE) in Practice
PDF
On Modeling and Testing When Unpredictability Becomes the Pattern (April 2nd,...
PDF
You need to extend your models? EMF Facet vs. EMF Profiles
PDF
EMF Facet vs. EMF Profiles - EclipseCon North America 2012, Modeling Symposium
PDF
Model-Driven Software Engineering in Practice - Chapter 10 - Managing models
PPTX
Incremental Model Queries for Model-Dirven Software Engineering
PDF
Data Mining-based Tools to Support Library Update. PhD Defence of Oleksandr Z...
PPT
Discover models out of existing applications with Eclipse/MoDisco
ODP
Java code coverage with JCov. Implementation details and use cases.
PDF
6 - Architetture Software - Model transformation
PPT
MoDisco Poster EclipseCon 2009
PPT
Model-Driven Engineering: a bottom-up approach based on meta-programming (ELA...
MDE=Model Driven Everything (Spanish Eclipse Day 2009)
On the Customization of Model Management Systems for File-Centric IDEs
Enriching Tool Support for Model-Driven Software Development
Generic Model-based Approaches for Software Reverse Engineering and Comprehen...
Spy On Your Models, Standard talk at EclipseCon 2011
Developing a new EMC Driver for Eclipse Epsilon
MDE in Practice
MODEL-DRIVEN ENGINEERING (MDE) in Practice
On Modeling and Testing When Unpredictability Becomes the Pattern (April 2nd,...
You need to extend your models? EMF Facet vs. EMF Profiles
EMF Facet vs. EMF Profiles - EclipseCon North America 2012, Modeling Symposium
Model-Driven Software Engineering in Practice - Chapter 10 - Managing models
Incremental Model Queries for Model-Dirven Software Engineering
Data Mining-based Tools to Support Library Update. PhD Defence of Oleksandr Z...
Discover models out of existing applications with Eclipse/MoDisco
Java code coverage with JCov. Implementation details and use cases.
6 - Architetture Software - Model transformation
MoDisco Poster EclipseCon 2009
Model-Driven Engineering: a bottom-up approach based on meta-programming (ELA...
Ad

More from Antonio García-Domínguez (17)

PDF
MODELS 2022 Journal-First presentation: ETeMoX - explaining reinforcement lea...
PDF
MODELS 2022 Picto Web tool demo
PDF
EduSymp 2022 slides (The Epsilon Playground)
PDF
History-Aware Explanations: Towards Enabling Human-in-the-Loop in Self-Adapti...
PDF
Boosting individual feedback with AutoFeedback
PDF
MODELS 2019: Querying and annotating model histories with time-aware patterns
PDF
Tips and resources for publication-grade figures and tables
PDF
COMMitMDE'18: Eclipse Hawk: model repository querying as a service
PDF
MRT 2018: reflecting on the past and the present with temporal graph models
PDF
Hawk: indexado de modelos en bases de datos NoSQL
PDF
Software and product quality for videogames
PDF
Developing a new Epsilon Language through Annotations: TestLang
PDF
MoDELS'16 presentation: Integration of a Graph-Based Model Indexer in Commerc...
PDF
ECMFA 2016 slides
PDF
BMSD 2015 slides (revised)
PDF
Elaboración de un buen póster científico
PDF
Software libre para la integración de información en la Universidad de Cádiz
MODELS 2022 Journal-First presentation: ETeMoX - explaining reinforcement lea...
MODELS 2022 Picto Web tool demo
EduSymp 2022 slides (The Epsilon Playground)
History-Aware Explanations: Towards Enabling Human-in-the-Loop in Self-Adapti...
Boosting individual feedback with AutoFeedback
MODELS 2019: Querying and annotating model histories with time-aware patterns
Tips and resources for publication-grade figures and tables
COMMitMDE'18: Eclipse Hawk: model repository querying as a service
MRT 2018: reflecting on the past and the present with temporal graph models
Hawk: indexado de modelos en bases de datos NoSQL
Software and product quality for videogames
Developing a new Epsilon Language through Annotations: TestLang
MoDELS'16 presentation: Integration of a Graph-Based Model Indexer in Commerc...
ECMFA 2016 slides
BMSD 2015 slides (revised)
Elaboración de un buen póster científico
Software libre para la integración de información en la Universidad de Cádiz

Recently uploaded (20)

PPTX
Operating system designcfffgfgggggggvggggggggg
PDF
Upgrade and Innovation Strategies for SAP ERP Customers
PDF
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
PPTX
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
PDF
Navsoft: AI-Powered Business Solutions & Custom Software Development
PDF
Understanding Forklifts - TECH EHS Solution
PDF
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
PDF
AI in Product Development-omnex systems
PDF
Which alternative to Crystal Reports is best for small or large businesses.pdf
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 41
PPTX
CHAPTER 2 - PM Management and IT Context
PDF
How to Migrate SBCGlobal Email to Yahoo Easily
PDF
How to Choose the Right IT Partner for Your Business in Malaysia
PDF
Digital Strategies for Manufacturing Companies
PDF
Adobe Illustrator 28.6 Crack My Vision of Vector Design
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
PDF
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
PDF
PTS Company Brochure 2025 (1).pdf.......
PDF
Nekopoi APK 2025 free lastest update
PDF
System and Network Administraation Chapter 3
Operating system designcfffgfgggggggvggggggggg
Upgrade and Innovation Strategies for SAP ERP Customers
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
Navsoft: AI-Powered Business Solutions & Custom Software Development
Understanding Forklifts - TECH EHS Solution
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
AI in Product Development-omnex systems
Which alternative to Crystal Reports is best for small or large businesses.pdf
Internet Downloader Manager (IDM) Crack 6.42 Build 41
CHAPTER 2 - PM Management and IT Context
How to Migrate SBCGlobal Email to Yahoo Easily
How to Choose the Right IT Partner for Your Business in Malaysia
Digital Strategies for Manufacturing Companies
Adobe Illustrator 28.6 Crack My Vision of Vector Design
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
PTS Company Brochure 2025 (1).pdf.......
Nekopoi APK 2025 free lastest update
System and Network Administraation Chapter 3

OCL'16 slides: Models from Code or Code as a Model?

  • 1. Intro Proposal Evaluation Conclusion Models from Code or Code as a Model? Antonio García-Domínguez, Dimitris Kolovos Aston University, University of York OCL’16 October 2nd, 2016 A. García, D. S. Kolovos Models from Code or Code as a Model? 1 / 13
  • 2. Intro Proposal Evaluation Conclusion Using codebases to drive software engineering tasks Usual sequence of events We had a business need We invested in developing a program that covered it Now we may need to: Extract architecture or underlying business process? Find bugs before they hit us? Improve its design? Migrate it to a new technology? Code is the most accurate description: let’s use it How do we extract knowledge? Regexps do not scale to complex tasks We need something that understands the language The “extractor” is embedded within a process A. García, D. S. Kolovos Models from Code or Code as a Model? 2 / 13
  • 3. Intro Proposal Evaluation Conclusion Existing approaches Some well-known reverse engineering tools (“extractors”) Eclipse MoDisco: EMF-based, implements KDM/ASTM JaMoPP: EMF-based, uses custom metamodel Moose: FAMIX-based tool Rascal: extracts partial JDT representation into a model Issue: extractors are one-off processes They produce standalone models: the code is no longer needed However, current tools are not incremental: if the code is changed, the extraction has to be redone from scratch Issue: extractors are not query-aware Some tasks will only access a small part of the codebase Extracting the rest only adds overhead A. García, D. S. Kolovos Models from Code or Code as a Model? 3 / 13
  • 4. Intro Proposal Evaluation Conclusion Epsilon EMC JDT driver: use IDE indices as models IDEs already extract representations all the time Eclipse indexes Java projects on the background Keeps fast pointers to classes / methods Extra info available on demand through parsing JDT indices are under active improvement: See Stefan Xenos’ talk on EclipseCon NA’16 Cross-pollination from CDT project (C++) Faster, more thorough indexing coming in future releases Our proposal: Epsilon EMC JDT driver Expose code as seen by IDE (JDT) as a model On-demand loading + direct access to Java classes Sources available on GitHub (epsilonlabs/emc-jdt) A. García, D. S. Kolovos Models from Code or Code as a Model? 4 / 13
  • 5. Intro Proposal Evaluation Conclusion Making it possible: Epsilon architecture Epsilon Object Language (EOL) = JavaScript + OCL Epsilon Model Connectivity (EMC) Core Model Validation (EVL) Code Generation (EGL) Model-to-model Transformation (ETL) ... Task-specific languages Technology-specific drivers Eclipse Modeling Framework (EMF) Schema-less XML Eclipse Java Developer Tools (JDT) CSV ... extends implements All Epsilon languages are based on EOL EOL accesses models through EMC interfaces By implementing the EMC interfaces, all Epsilon languages can use JDT indices as models A. García, D. S. Kolovos Models from Code or Code as a Model? 5 / 13
  • 6. Intro Proposal Evaluation Conclusion Configuration dialog for an EMC JDT model A. García, D. S. Kolovos Models from Code or Code as a Model? 6 / 13
  • 7. Intro Proposal Evaluation Conclusion allInstances() in EMC JDT Reflection-based X.allInstances 1 Parse Java sources in the projects on the fly 2 Traverse JDT Document Object Model with ASTVisitor 3 Use Java reflection to fetch instances of X 4 Cache for later executions, if desired Special case: TypeDeclaration.allInstances Searchable, lazy (no parsing unless looping over it) Supports two new operations: c.select(it|it.name = expr) searches with JDT the relevant compilation unit, parses it and returns the right DOM node c.search(it|it.name = expr) works the same, but it returns the raw index entry (a simpler JDT SourceType) A. García, D. S. Kolovos Models from Code or Code as a Model? 7 / 13
  • 8. Intro Proposal Evaluation Conclusion Case study: validate code against UML models Overview We are maintaining a library, and we need to check its compliance with a UML model as it changes Here, “compliance” means “must have all the classes and methods in the UML model” (code may have more) Which is faster/more convenient: extracting a model with MoDisco first, or checking it directly with the EMC JDT driver? A. García, D. S. Kolovos Models from Code or Code as a Model? 8 / 13
  • 9. Intro Proposal Evaluation Conclusion Experiment setup Inputs Source code: JFreeChart 1.0.17, 1.0.18 and 1.0.19 UML model for 1.0.17 extracted by Modelio Tools MoDisco 0.13.2 discoverer extracted 1 .xmi per version Epsilon interim (3d4408), emc-jdt interim (5b5ea) Validation task: implemented in the Epsilon Validation Language 1 version for MoDisco, 2 for EMC JDT (select/search) Rules: 1 Each UML class has its corresponding Java class 2 Each UML method is implemented in that Java class 3 Each nested class obeys the two above rules 4 validation errors found in 1.0.18 and 1.0.19 A. García, D. S. Kolovos Models from Code or Code as a Model? 9 / 13
  • 10. Intro Proposal Evaluation Conclusion Performance results MoDisco validates all versions in 112.2s, JDT/select in 71.92s, JDT/search in 36.52s 1.0.17 1.0.18 1.0.19 1.0.17 1.0.18 1.0.19 1.0.17 1.0.18 1.0.19 0 10 20 30 40 JDT w/select JDT w/searchMoDisco Executiontime(s) Extract Load Validation A. García, D. S. Kolovos Models from Code or Code as a Model? 10 / 13
  • 11. Intro Proposal Evaluation Conclusion Performance discussion MoDisco: slower loading for faster validation When using .xmi files, we load entire model into memory Pro: with everything in memory, validation is faster Con: huge models won’t fit in memory We’ll need a store with on-demand loading (e.g. CDO) On-demand loading can change performance profile Amortisation of extraction costs depends on codebase Frozen codebase (e.g. legacy systems): Full extraction is quickly amortised MoDisco is a better choice Quickly changing codebase (e.g. actively developed systems): Extracting on demand is usually better (models don’t live long) EMC JDT is a better choice A. García, D. S. Kolovos Models from Code or Code as a Model? 11 / 13
  • 12. Intro Proposal Evaluation Conclusion Conclusion and future lines of work Summary Codebases are a valuable input for many SE tasks Two options to query codebases: Extract standalone models (MoDisco) Use code directly as a model (EMC JDT) EMC JDT is faster for changing codebases Future work Further optimisations to improve performance Evaluate impact of future JDT versions More filtering fields for searchable collections More shorthand properties for common scenarios Port approach to other languages (e.g. C++ through CDT) A. García, D. S. Kolovos Models from Code or Code as a Model? 12 / 13
  • 13. End of the presentation Questions? @antoniogado A. García, D. S. Kolovos Models from Code or Code as a Model? 13 / 13
  • 14. Extra features in EMC JDT Shorthand properties Quick access to commonly needed information EMC PropertyGetter computes value on demand FieldDeclaration: “name” BodyDeclaration: “public”, “static”... A. García, D. S. Kolovos Models from Code or Code as a Model? 14 / 13