Distributed Model-to-Model
Transformation with ATL on MapReduce
Jordi CABOT
ICREA
Universitat Oberta de Catalunya
Amine BENELALLAM, Abel GOMEZ,
and Massimo TISI
AtlanMod team (Inria, Mines Nantes, Lina)
The 8th ACM SIGPLAN International Conference on Software Language Engineering (co-located with SPLASH), Oct 26 2015, Pittsburgh, USA
Context
Model Transformation
Transformation spec {
S::Square →T1::Triangle
S::Circle → T1::Octagon
....
}
Source Models
1
2 5
4 63
Target Model
1
2 5
4 63
Consumes Produces
Consumes
Why Distributing Model
Transformations ?
>:(
Scalability issues in MTs
Complex Transformations
taking hours to run
Very Large Models (VLMs)
not fitting into a memory of
a single machine
● Frequent increase in scope between
releases
● +900 Meta-Classes & thousands of
properties
● Models go up to Gbs
Increasing complexity of data &
systems
Distributing Model Transformation
Consumes Produces
Consumes Produces
Distributed
Environment
Transformation
spec
Source Model
1
2
5
4
6
3
Target Model
1
2 5
4 63
Why not using GPL ?
Using a General Purpose Language (GPL) for distributed MT:
1. Required familiarity with concurrency theory
○ not common among MDE application developers
2. New class of errors w.r.t. sequential programming
○ e.g. linked to task synchronization and shared data access
3. Complex analysis for performance optimization
--MEETs-->
Meet ATL-MR
Case Study: Analysis of Data-Flow in
Java Programs (TTC13 [1])
[1] T. Horn. The TTC 2013 Flowgraphs Case. arXiv preprint, arXiv:1312.0341, 2013.
Case Study: Analysis of Data-Flow in
Java Programs
int fact (int a) {
int r = 1;
while (a>0) {
r *= a--;
}
return r;
}
int fact (int a)
int r = 1;
while (a>0)
r *= a--;
return r;
a
r
int fact(int a)
int r = 1;
while (a>0)
r *= a--;
return r;
(a) Java code (c) Data-Flow(b) Control-Flow
def use cfNext/dfNext
Atlanmod Transformation Language
(ATL)
module ControlFlow2DataFlow;
create OUT : DataFlow from IN : ControlFlow;
rule SimpleStatment {
from
s : ControlFlow!SimpleStmt (
not ( s.def−>isEmpty( ) and s.use−> isEmpty ( ) )
)
to
t : DataFlow!SimpleStmt (
txt <− s.txt ,
dfNext <− s.computeNextDataFlows ( )
)
}
[...]
Module
Rule
Input
pattern
Output
pattern
guard
binding
ATL helper
ATL Helper
helper Context ControlFlow!FlowInstr def :computeNextDataFLows() : Sequence (ControlFlow!FlowInstr) =
self.def ->collect(d | self.users(d)
->reject(fi | if fi = self then not fi.isInALoop else false endif )
->select(fi | thisModule.isDefinedBy(fi,Sequence{fi},self, Sequence{}, self.definers(d)->excluding( self))))
->flatten();
helper def : isDefinedBy(start : ControlFlow!FlowInstr, input : Sequence(ControlFlow!FlowInstr), end : ControlFlow!
FlowInstr, visited :Sequence(ControlFlow!FlowInstr), forbidden : Sequence(ControlFlow!FlowInstr)) : Boolean =
if input->exists(i | i = end) then true
else let newInput : Sequence(ControlFlow!FlowInstr) = input ->collect(i |i.cfPrev) ->flatten() ->reject(i | visited ->exists(v
| v = i) or forbidden ->exists(f| f = i)) in
if newInput ->isEmpty() then false
else thisModule.isDefinedBy(start, newInput, end, visited->union(newInput)->asSet() ->asSequence(), forbidden)
endif
endif;
ATL Execution Semantic: Match
phase
int fact (int a)
int r = 1;
while (a>0)
r *= a--;
return r;
a
r
rule:Method
rule:Stmnt
rule:Stmnt
rule:Stmnt
rule:Stmnt
ATL Execution Semantic: Apply
phase
int fact (int a)
int r = 1;
while (a>0)
r *= a--;
return r;
a
r
rule:Method
rule:Stmnt
rule:Stmnt
rule:Stmnt
rule:Stmnt
ATL Execution Semantic: Apply
phase
int fact (int a)
int r = 1;
while (a>0)
r *= a--;
return r;
a
r
rule:Method int fact(int a)
rule:Stmnt
rule:Stmnt
rule:Stmnt
rule:Stmnt
ATL Execution Semantic: Apply
phase
int fact (int a)
int r = 1;
while (a>0)
r *= a--;
return r;
a
r
rule:Method
rule:Stmnt
rule:Stmnt
rule:Stmnt
rule:Stmnt
int fact(int a)
ATL Execution Semantic: Apply
phase
int fact (int a)
int r = 1;
while (a>0)
r *= a--;
return r;
a
r
rule:Method
rule:Stmnt
rule:Stmnt
rule:Stmnt
rule:Stmnt
int fact(int a)
int r = 1;
while (a>0)
r *= a--;
return r;
MapReduce
Log0
Record
Log1
Log2
map1
Log3
Log4
Log5
map2
Log6
Log7
Log8
map3
<+,1>
<+,1>
<*,1>
SPLIT1SPLIT2SPLIT3
<X,1>
<+,1>
<*,1>
<X,1>
<*,1>
<+,1>
shuffle/sort
<+,1>
<+,1>
<+,1>
<+,1>
<*,1>
<*,1>
<*,1>
<X,1>
red1
red2
<X,1>
<*,3>
<X,2>
<+,4>
Map phase Reduce phase
Why MapReduce for ATL?
● Well-suited for Write Once Read Many (WORM) data
● Two-phased execution model
Also MapReduce:
● Supports different types of inputs (XML, DB, Text)
● Handles machine failures, efficient communication, and performance issues
ATL & MapReduce
Alignment
Semantics Alignment
Reduce
read
traces
global
resolve
Map
read
model
subset
create
trace
properties
local
match/
apply
save
model
match apply map reduce
Control-Flow to Data-Flow in MapReduce:
Local Match/Apply
int fact (int a)
int r = 1;
while (a>0)
r *= a--;
return r;
a
r
rule:Method
rule:Stmnt
rule:Stmnt
rule:Stmnt
rule:Stmnt
int r = 1;
while (a>0)
r *= a--;
return r;
map1
map2
Control-Flow to Data-Flow in MapReduce:
Local Match/Apply
int fact (int a)
int r = 1;
while (a>0)
r *= a--;
return r;
a
r
rule:Method
rule:Stmnt
rule:Stmnt
rule:Stmnt
rule:Stmnt
int fact(int a)
int r = 1;
while (a>0)
r *= a--;
return r;
map1
map2
Control-Flow to Data-Flow in MapReduce:
Local Match/Apply
int fact (int a)
int r = 1;
while (a>0)
r *= a--;
return r;
a
r
rule:Method
rule:Stmnt
rule:Stmnt
rule:Stmnt
rule:Stmnt
int fact(int a)
int r = 1;
while (a>0)
r *= a--;
return r;
map1
map2
Control-Flow to Data-Flow in MapReduce:
Local Match/Apply
int fact (int a)
int r = 1;
while (a>0)
r *= a--;
return r;
a
r
rule:Method
rule:Stmnt
rule:Stmnt
rule:Stmnt
rule:Stmnt
int fact(int a)
int r = 1;
while (a>0)
r *= a--;
return r;
map1
map2
dfNext
Control-Flow to Data-Flow in MapReduce:
Local Match/Apply
int fact (int a)
int r = 1;
while (a>0)
r *= a--;
return r;
a
r
rule:Method
rule:Stmnt
rule:Stmnt
rule:Stmnt
rule:Stmnt
int fact(int a)
int r = 1;
while (a>0)
r *= a--;
return r;
map1
map2
dfNext
Control-Flow to Data-Flow in MapReduce:
Global Resolve
int fact (int a)
int r = 1;
while (a>0)
r *= a--;
return r;
a
r
rule:Method
rule:Stmnt
rule:Stmnt
rule:Stmnt
rule:Stmnt
int fact(int a)
int r = 1;
while (a>0)
r *= a--;
return r;
red1
red2
dfNext
Control-Flow to Data-Flow in MapReduce:
Global Resolve
int fact (int a)
int r = 1;
while (a>0)
r *= a--;
return r;
a
r
rule:Method
rule:Stmnt
rule:Stmnt
rule:Stmnt
rule:Stmnt
int fact(int a)
int r = 1;
while (a>0)
r *= a--;
return r;
red1
red2
int fact (int a)
int r = 1;
while (a>0)
r *= a--;
return r;
a
r
rule:Method
rule:Stmnt
rule:Stmnt
rule:Stmnt
rule:Stmnt
int fact(int a)
int r = 1;
while (a>0)
r *= a--;
return r;
Control-Flow to Data-Flow in MapReduce:
Global Resolve
red1
red2
Extended Tracing Model
ATL-MR in Action
Hadoop Distributed File System (HDFS)
objectUID_5
objectUID_6
objectUID_7
objectUID_8
map2
objectUID_1
objectUID_2
objectUID_3
objectUID_4 map1
load transformation
data
<rule2,traceUID5>
<rule1,traceUID6>
<rule1,traceUID7>
<rule2,traceUID8>
<rule1,traceUID1>
<rule2,traceUID2>
<rule2,traceUID3>
<rule1,traceUID4>
shuffle/sort
<rule2,traceUID2>
<rule2,traceUID3>
<rule2,traceUID5>
<rule2,traceUID8>
<rule1,traceUID1>
<rule1,traceUID4>
<rule1,traceUID6>
<rule1,traceUID7> red1
red2
save traces and
partial models
LMA mode1
GR mode(2)
load traces and
partial models
save models
[1] LMA: Local Match/Apply
[2] GR: Global Resolve
[3] ATL-MR: https://github.com/atlanmod/ATL_MR
Evaluation
Experiment I: Speed-up Curve
● 5 models extracted from
automatically generated Java files:
○ similar size (~1500 LOCs)
○ sequential transformation ranges from
620s to 778s
● Run on identical set of machines
(m1.large) over Amazon Elastic
MapReduce (EMR)
○ 10 times for each number of nodes
○ 280 hours of computation
● Almost linear speed-up up to 8
nodes
○ ~3 times faster on 8 nodes
Experiment II: Size/Speed-Up Correlation
● 5 models extracted from automatically
generated Java files:
○ increasing size (13.500 to 105.000 LOCs)
○ sequential transformation ranges from 319s to
17 998s (~4h)
● Run on a cluster of 12 instances built on top of
OpenVC
○ 8 slaves
○ 4 machines orchestrating Hadoop/Hbase
● Almost-linear speed-up for large models
○ Up to 6X faster on 8 nodes
● Speed-up increases with model size
Challenges
Challenges In Distributing Model
Transformation
Fact II: Persistence
backends are not suited
for R/W concurrency
Rule applications might
not have the same
complexity
Unable to parallelize
the reduce phase
Unable to guarantee a balanced
workload, MapReduce default
scheduler is not enough
Fact I: Models might
densely interconnected &
unbalanced
NeoEMF an Extensible Persistence
Backend
● Lazy loading and unloading
○ enabling transformation of big
models
● Distributed storage and access
○ permitting the parallelization of the
reduce phase
● Compliant with MapReduce
● Fail-safe (no data loss)
Model
Manager
Persistence
Manager
Persistence
Backend
NeoEMF
/Map
EMF
/Graph
Model-based Tools
Caching
Strategy
Model Access API
Persistence
API
Backend API
Client
Code
/HBase
HBase ZooKeeperGraphDB MapDB
[1] NeoEMF: http://www.neoemf.com
Future Work
1. Optimization of load balancing
○ efficient distribution of the input model over map workers
2. Parallelization of the Global Resolve phase and the transformation of Very
Large Models
○ integrating ATL-MR with NeoEMF/HBase
Conclusion
● We align Rule-based Model Transformation with the MapReduce execution
model
○ We introduce an execution semantics of ATL on top of MapReduce
○ We experimentally show the good scalability of our solution
● For ATL users: Keep the same syntax and embrace the Cloud
● For MapReduce users: Model Transformation as yet another high-level
language for MapReduce
Check us out on Github
https://github.com/atlanmod/ATL_MR
Questions

More Related Content

PDF
Sequential Pattern Mining and GSP
PDF
pattern mining
PDF
Breaking the Nonsmooth Barrier: A Scalable Parallel Method for Composite Opti...
PDF
Mikio Braun – Data flow vs. procedural programming
PPTX
Exploring Optimization in Vowpal Wabbit
PDF
Parallel Optimization in Machine Learning
KEY
Linuxconf 2011 parallel languages talk
ODP
Wapid and wobust active online machine leawning with Vowpal Wabbit
Sequential Pattern Mining and GSP
pattern mining
Breaking the Nonsmooth Barrier: A Scalable Parallel Method for Composite Opti...
Mikio Braun – Data flow vs. procedural programming
Exploring Optimization in Vowpal Wabbit
Parallel Optimization in Machine Learning
Linuxconf 2011 parallel languages talk
Wapid and wobust active online machine leawning with Vowpal Wabbit

What's hot (20)

PDF
Technical Tricks of Vowpal Wabbit
PPTX
Online learning, Vowpal Wabbit and Hadoop
PDF
Understanding Garbage Collection
PPTX
Mahout scala and spark bindings
PDF
Terascale Learning
PDF
Co-occurrence Based Recommendations with Mahout, Scala and Spark
PDF
CRDTs and Redis
PDF
2014-06-20 Multinomial Logistic Regression with Apache Spark
PPTX
Scaling out logistic regression with Spark
PDF
EuroPython 2017 - PyData - Deep Learning your Broadband Network @ HOME
PPTX
Cape2013 scilab-workshop-19Oct13
PDF
A Brief History of Stream Processing
PDF
Large scale logistic regression and linear support vector machines using spark
PDF
Speaker Diarization
PDF
Multinomial Logistic Regression with Apache Spark
PDF
Recent Developments in Spark MLlib and Beyond
PDF
Scilab-by-dr-gomez-june2014
PDF
Europy17_dibernardo
PPTX
Experiments & Experiences with Scilab in Undergraduate Education
Technical Tricks of Vowpal Wabbit
Online learning, Vowpal Wabbit and Hadoop
Understanding Garbage Collection
Mahout scala and spark bindings
Terascale Learning
Co-occurrence Based Recommendations with Mahout, Scala and Spark
CRDTs and Redis
2014-06-20 Multinomial Logistic Regression with Apache Spark
Scaling out logistic regression with Spark
EuroPython 2017 - PyData - Deep Learning your Broadband Network @ HOME
Cape2013 scilab-workshop-19Oct13
A Brief History of Stream Processing
Large scale logistic regression and linear support vector machines using spark
Speaker Diarization
Multinomial Logistic Regression with Apache Spark
Recent Developments in Spark MLlib and Beyond
Scilab-by-dr-gomez-june2014
Europy17_dibernardo
Experiments & Experiences with Scilab in Undergraduate Education
Ad

Similar to SLE2015: Distributed ATL (20)

PDF
Data Analytics and Simulation in Parallel with MATLAB*
PDF
Spark Summit EU talk by Ram Sriharsha and Vlad Feinberg
PPT
Task and Data Parallelism
PDF
Apache Flink Deep Dive
PPTX
Apache pig presentation_siddharth_mathur
PDF
Making fitting in RooFit faster
PDF
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...
PDF
FPGA_Logic.pdf
PPTX
Pregel
PDF
Integrative Parallel Programming in HPC
PDF
Concurrency
PDF
My Postdoctoral Research
PDF
cb streams - gavin pickin
PPTX
Swift Parallel Scripting for High-Performance Workflow
PDF
Exascale Deep Learning for Climate Analytics
PPTX
Online learning with structured streaming, spark summit brussels 2016
PPTX
Deep learning requirement and notes for novoice
ODP
Parallel Programming on the ANDC cluster
PDF
NVIDIA HPC ソフトウエア斜め読み
PDF
Java 8
Data Analytics and Simulation in Parallel with MATLAB*
Spark Summit EU talk by Ram Sriharsha and Vlad Feinberg
Task and Data Parallelism
Apache Flink Deep Dive
Apache pig presentation_siddharth_mathur
Making fitting in RooFit faster
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...
FPGA_Logic.pdf
Pregel
Integrative Parallel Programming in HPC
Concurrency
My Postdoctoral Research
cb streams - gavin pickin
Swift Parallel Scripting for High-Performance Workflow
Exascale Deep Learning for Climate Analytics
Online learning with structured streaming, spark summit brussels 2016
Deep learning requirement and notes for novoice
Parallel Programming on the ANDC cluster
NVIDIA HPC ソフトウエア斜め読み
Java 8
Ad

Recently uploaded (20)

DOCX
Modern SharePoint Intranet Templates That Boost Employee Engagement in 2025.docx
PDF
EaseUS PDF Editor Pro 6.2.0.2 Crack with License Key 2025
PDF
Cost to Outsource Software Development in 2025
PPTX
CNN LeNet5 Architecture: Neural Networks
PPTX
Computer Software and OS of computer science of grade 11.pptx
PDF
Types of Token_ From Utility to Security.pdf
PDF
How AI/LLM recommend to you ? GDG meetup 16 Aug by Fariman Guliev
DOCX
How to Use SharePoint as an ISO-Compliant Document Management System
PPTX
Trending Python Topics for Data Visualization in 2025
PDF
Multiverse AI Review 2025: Access All TOP AI Model-Versions!
PDF
DuckDuckGo Private Browser Premium APK for Android Crack Latest 2025
PDF
Website Design Services for Small Businesses.pdf
PPTX
Oracle Fusion HCM Cloud Demo for Beginners
PDF
Wondershare Recoverit Full Crack New Version (Latest 2025)
PPTX
AMADEUS TRAVEL AGENT SOFTWARE | AMADEUS TICKETING SYSTEM
PDF
Ableton Live Suite for MacOS Crack Full Download (Latest 2025)
PPTX
Cybersecurity: Protecting the Digital World
PPTX
GSA Content Generator Crack (2025 Latest)
PDF
iTop VPN Crack Latest Version Full Key 2025
PPTX
Tech Workshop Escape Room Tech Workshop
Modern SharePoint Intranet Templates That Boost Employee Engagement in 2025.docx
EaseUS PDF Editor Pro 6.2.0.2 Crack with License Key 2025
Cost to Outsource Software Development in 2025
CNN LeNet5 Architecture: Neural Networks
Computer Software and OS of computer science of grade 11.pptx
Types of Token_ From Utility to Security.pdf
How AI/LLM recommend to you ? GDG meetup 16 Aug by Fariman Guliev
How to Use SharePoint as an ISO-Compliant Document Management System
Trending Python Topics for Data Visualization in 2025
Multiverse AI Review 2025: Access All TOP AI Model-Versions!
DuckDuckGo Private Browser Premium APK for Android Crack Latest 2025
Website Design Services for Small Businesses.pdf
Oracle Fusion HCM Cloud Demo for Beginners
Wondershare Recoverit Full Crack New Version (Latest 2025)
AMADEUS TRAVEL AGENT SOFTWARE | AMADEUS TICKETING SYSTEM
Ableton Live Suite for MacOS Crack Full Download (Latest 2025)
Cybersecurity: Protecting the Digital World
GSA Content Generator Crack (2025 Latest)
iTop VPN Crack Latest Version Full Key 2025
Tech Workshop Escape Room Tech Workshop

SLE2015: Distributed ATL

  • 1. Distributed Model-to-Model Transformation with ATL on MapReduce Jordi CABOT ICREA Universitat Oberta de Catalunya Amine BENELALLAM, Abel GOMEZ, and Massimo TISI AtlanMod team (Inria, Mines Nantes, Lina) The 8th ACM SIGPLAN International Conference on Software Language Engineering (co-located with SPLASH), Oct 26 2015, Pittsburgh, USA
  • 3. Model Transformation Transformation spec { S::Square →T1::Triangle S::Circle → T1::Octagon .... } Source Models 1 2 5 4 63 Target Model 1 2 5 4 63 Consumes Produces Consumes
  • 5. Scalability issues in MTs Complex Transformations taking hours to run Very Large Models (VLMs) not fitting into a memory of a single machine
  • 6. ● Frequent increase in scope between releases ● +900 Meta-Classes & thousands of properties ● Models go up to Gbs Increasing complexity of data & systems
  • 7. Distributing Model Transformation Consumes Produces Consumes Produces Distributed Environment Transformation spec Source Model 1 2 5 4 6 3 Target Model 1 2 5 4 63
  • 8. Why not using GPL ? Using a General Purpose Language (GPL) for distributed MT: 1. Required familiarity with concurrency theory ○ not common among MDE application developers 2. New class of errors w.r.t. sequential programming ○ e.g. linked to task synchronization and shared data access 3. Complex analysis for performance optimization
  • 10. Case Study: Analysis of Data-Flow in Java Programs (TTC13 [1]) [1] T. Horn. The TTC 2013 Flowgraphs Case. arXiv preprint, arXiv:1312.0341, 2013.
  • 11. Case Study: Analysis of Data-Flow in Java Programs int fact (int a) { int r = 1; while (a>0) { r *= a--; } return r; } int fact (int a) int r = 1; while (a>0) r *= a--; return r; a r int fact(int a) int r = 1; while (a>0) r *= a--; return r; (a) Java code (c) Data-Flow(b) Control-Flow def use cfNext/dfNext
  • 12. Atlanmod Transformation Language (ATL) module ControlFlow2DataFlow; create OUT : DataFlow from IN : ControlFlow; rule SimpleStatment { from s : ControlFlow!SimpleStmt ( not ( s.def−>isEmpty( ) and s.use−> isEmpty ( ) ) ) to t : DataFlow!SimpleStmt ( txt <− s.txt , dfNext <− s.computeNextDataFlows ( ) ) } [...] Module Rule Input pattern Output pattern guard binding ATL helper
  • 13. ATL Helper helper Context ControlFlow!FlowInstr def :computeNextDataFLows() : Sequence (ControlFlow!FlowInstr) = self.def ->collect(d | self.users(d) ->reject(fi | if fi = self then not fi.isInALoop else false endif ) ->select(fi | thisModule.isDefinedBy(fi,Sequence{fi},self, Sequence{}, self.definers(d)->excluding( self)))) ->flatten(); helper def : isDefinedBy(start : ControlFlow!FlowInstr, input : Sequence(ControlFlow!FlowInstr), end : ControlFlow! FlowInstr, visited :Sequence(ControlFlow!FlowInstr), forbidden : Sequence(ControlFlow!FlowInstr)) : Boolean = if input->exists(i | i = end) then true else let newInput : Sequence(ControlFlow!FlowInstr) = input ->collect(i |i.cfPrev) ->flatten() ->reject(i | visited ->exists(v | v = i) or forbidden ->exists(f| f = i)) in if newInput ->isEmpty() then false else thisModule.isDefinedBy(start, newInput, end, visited->union(newInput)->asSet() ->asSequence(), forbidden) endif endif;
  • 14. ATL Execution Semantic: Match phase int fact (int a) int r = 1; while (a>0) r *= a--; return r; a r rule:Method rule:Stmnt rule:Stmnt rule:Stmnt rule:Stmnt
  • 15. ATL Execution Semantic: Apply phase int fact (int a) int r = 1; while (a>0) r *= a--; return r; a r rule:Method rule:Stmnt rule:Stmnt rule:Stmnt rule:Stmnt
  • 16. ATL Execution Semantic: Apply phase int fact (int a) int r = 1; while (a>0) r *= a--; return r; a r rule:Method int fact(int a) rule:Stmnt rule:Stmnt rule:Stmnt rule:Stmnt
  • 17. ATL Execution Semantic: Apply phase int fact (int a) int r = 1; while (a>0) r *= a--; return r; a r rule:Method rule:Stmnt rule:Stmnt rule:Stmnt rule:Stmnt int fact(int a)
  • 18. ATL Execution Semantic: Apply phase int fact (int a) int r = 1; while (a>0) r *= a--; return r; a r rule:Method rule:Stmnt rule:Stmnt rule:Stmnt rule:Stmnt int fact(int a) int r = 1; while (a>0) r *= a--; return r;
  • 20. Why MapReduce for ATL? ● Well-suited for Write Once Read Many (WORM) data ● Two-phased execution model Also MapReduce: ● Supports different types of inputs (XML, DB, Text) ● Handles machine failures, efficient communication, and performance issues
  • 23. Control-Flow to Data-Flow in MapReduce: Local Match/Apply int fact (int a) int r = 1; while (a>0) r *= a--; return r; a r rule:Method rule:Stmnt rule:Stmnt rule:Stmnt rule:Stmnt int r = 1; while (a>0) r *= a--; return r; map1 map2
  • 24. Control-Flow to Data-Flow in MapReduce: Local Match/Apply int fact (int a) int r = 1; while (a>0) r *= a--; return r; a r rule:Method rule:Stmnt rule:Stmnt rule:Stmnt rule:Stmnt int fact(int a) int r = 1; while (a>0) r *= a--; return r; map1 map2
  • 25. Control-Flow to Data-Flow in MapReduce: Local Match/Apply int fact (int a) int r = 1; while (a>0) r *= a--; return r; a r rule:Method rule:Stmnt rule:Stmnt rule:Stmnt rule:Stmnt int fact(int a) int r = 1; while (a>0) r *= a--; return r; map1 map2
  • 26. Control-Flow to Data-Flow in MapReduce: Local Match/Apply int fact (int a) int r = 1; while (a>0) r *= a--; return r; a r rule:Method rule:Stmnt rule:Stmnt rule:Stmnt rule:Stmnt int fact(int a) int r = 1; while (a>0) r *= a--; return r; map1 map2 dfNext
  • 27. Control-Flow to Data-Flow in MapReduce: Local Match/Apply int fact (int a) int r = 1; while (a>0) r *= a--; return r; a r rule:Method rule:Stmnt rule:Stmnt rule:Stmnt rule:Stmnt int fact(int a) int r = 1; while (a>0) r *= a--; return r; map1 map2 dfNext
  • 28. Control-Flow to Data-Flow in MapReduce: Global Resolve int fact (int a) int r = 1; while (a>0) r *= a--; return r; a r rule:Method rule:Stmnt rule:Stmnt rule:Stmnt rule:Stmnt int fact(int a) int r = 1; while (a>0) r *= a--; return r; red1 red2 dfNext
  • 29. Control-Flow to Data-Flow in MapReduce: Global Resolve int fact (int a) int r = 1; while (a>0) r *= a--; return r; a r rule:Method rule:Stmnt rule:Stmnt rule:Stmnt rule:Stmnt int fact(int a) int r = 1; while (a>0) r *= a--; return r; red1 red2
  • 30. int fact (int a) int r = 1; while (a>0) r *= a--; return r; a r rule:Method rule:Stmnt rule:Stmnt rule:Stmnt rule:Stmnt int fact(int a) int r = 1; while (a>0) r *= a--; return r; Control-Flow to Data-Flow in MapReduce: Global Resolve red1 red2
  • 32. ATL-MR in Action Hadoop Distributed File System (HDFS) objectUID_5 objectUID_6 objectUID_7 objectUID_8 map2 objectUID_1 objectUID_2 objectUID_3 objectUID_4 map1 load transformation data <rule2,traceUID5> <rule1,traceUID6> <rule1,traceUID7> <rule2,traceUID8> <rule1,traceUID1> <rule2,traceUID2> <rule2,traceUID3> <rule1,traceUID4> shuffle/sort <rule2,traceUID2> <rule2,traceUID3> <rule2,traceUID5> <rule2,traceUID8> <rule1,traceUID1> <rule1,traceUID4> <rule1,traceUID6> <rule1,traceUID7> red1 red2 save traces and partial models LMA mode1 GR mode(2) load traces and partial models save models [1] LMA: Local Match/Apply [2] GR: Global Resolve [3] ATL-MR: https://github.com/atlanmod/ATL_MR
  • 34. Experiment I: Speed-up Curve ● 5 models extracted from automatically generated Java files: ○ similar size (~1500 LOCs) ○ sequential transformation ranges from 620s to 778s ● Run on identical set of machines (m1.large) over Amazon Elastic MapReduce (EMR) ○ 10 times for each number of nodes ○ 280 hours of computation ● Almost linear speed-up up to 8 nodes ○ ~3 times faster on 8 nodes
  • 35. Experiment II: Size/Speed-Up Correlation ● 5 models extracted from automatically generated Java files: ○ increasing size (13.500 to 105.000 LOCs) ○ sequential transformation ranges from 319s to 17 998s (~4h) ● Run on a cluster of 12 instances built on top of OpenVC ○ 8 slaves ○ 4 machines orchestrating Hadoop/Hbase ● Almost-linear speed-up for large models ○ Up to 6X faster on 8 nodes ● Speed-up increases with model size
  • 37. Challenges In Distributing Model Transformation Fact II: Persistence backends are not suited for R/W concurrency Rule applications might not have the same complexity Unable to parallelize the reduce phase Unable to guarantee a balanced workload, MapReduce default scheduler is not enough Fact I: Models might densely interconnected & unbalanced
  • 38. NeoEMF an Extensible Persistence Backend ● Lazy loading and unloading ○ enabling transformation of big models ● Distributed storage and access ○ permitting the parallelization of the reduce phase ● Compliant with MapReduce ● Fail-safe (no data loss) Model Manager Persistence Manager Persistence Backend NeoEMF /Map EMF /Graph Model-based Tools Caching Strategy Model Access API Persistence API Backend API Client Code /HBase HBase ZooKeeperGraphDB MapDB [1] NeoEMF: http://www.neoemf.com
  • 39. Future Work 1. Optimization of load balancing ○ efficient distribution of the input model over map workers 2. Parallelization of the Global Resolve phase and the transformation of Very Large Models ○ integrating ATL-MR with NeoEMF/HBase
  • 40. Conclusion ● We align Rule-based Model Transformation with the MapReduce execution model ○ We introduce an execution semantics of ATL on top of MapReduce ○ We experimentally show the good scalability of our solution ● For ATL users: Keep the same syntax and embrace the Cloud ● For MapReduce users: Model Transformation as yet another high-level language for MapReduce
  • 41. Check us out on Github https://github.com/atlanmod/ATL_MR