SLE2015: Distributed ATL

Distributed Model-to-Model
Transformation with ATL on MapReduce
Jordi CABOT
ICREA
Universitat Oberta de Catalunya
Amine BENELALLAM, Abel GOMEZ,
and Massimo TISI
AtlanMod team (Inria, Mines Nantes, Lina)
The 8th ACM SIGPLAN International Conference on Software Language Engineering (co-located with SPLASH), Oct 26 2015, Pittsburgh, USA

Model Transformation
Transformation spec {
S::Square →T1::Triangle
S::Circle → T1::Octagon
....
}
Source Models
1
2 5
4 63
Target Model
1
2 5
4 63
Consumes Produces
Consumes

Why Distributing Model
Transformations ?
>:(

Scalability issues in MTs
Complex Transformations
taking hours to run
Very Large Models (VLMs)
not fitting into a memory of
a single machine

● Frequent increase in scope between
releases
● +900 Meta-Classes & thousands of
properties
● Models go up to Gbs
Increasing complexity of data &
systems

Distributing Model Transformation
Consumes Produces
Consumes Produces
Distributed
Environment
Transformation
spec
Source Model
1
2
5
4
6
3
Target Model
1
2 5
4 63

Why not using GPL ?
Using a General Purpose Language (GPL) for distributed MT:
1. Required familiarity with concurrency theory
○ not common among MDE application developers
2. New class of errors w.r.t. sequential programming
○ e.g. linked to task synchronization and shared data access
3. Complex analysis for performance optimization

Case Study: Analysis of Data-Flow in
Java Programs (TTC13 [1])
[1] T. Horn. The TTC 2013 Flowgraphs Case. arXiv preprint, arXiv:1312.0341, 2013.

Case Study: Analysis of Data-Flow in
Java Programs
int fact (int a) {
int r = 1;
while (a>0) {
r *= a--;
}
return r;
}
int fact (int a)
int r = 1;
while (a>0)
r *= a--;
return r;
a
r
int fact(int a)
int r = 1;
while (a>0)
r *= a--;
return r;
(a) Java code (c) Data-Flow(b) Control-Flow
def use cfNext/dfNext

Atlanmod Transformation Language
(ATL)
module ControlFlow2DataFlow;
create OUT : DataFlow from IN : ControlFlow;
rule SimpleStatment {
from
s : ControlFlow!SimpleStmt (
not ( s.def−>isEmpty( ) and s.use−> isEmpty ( ) )
)
to
t : DataFlow!SimpleStmt (
txt <− s.txt ,
dfNext <− s.computeNextDataFlows ( )
)
}
[...]
Module
Rule
Input
pattern
Output
pattern
guard
binding
ATL helper

ATL Helper
helper Context ControlFlow!FlowInstr def :computeNextDataFLows() : Sequence (ControlFlow!FlowInstr) =
self.def ->collect(d | self.users(d)
->reject(fi | if fi = self then not fi.isInALoop else false endif )
->select(fi | thisModule.isDefinedBy(fi,Sequence{fi},self, Sequence{}, self.definers(d)->excluding( self))))
->flatten();
helper def : isDefinedBy(start : ControlFlow!FlowInstr, input : Sequence(ControlFlow!FlowInstr), end : ControlFlow!
FlowInstr, visited :Sequence(ControlFlow!FlowInstr), forbidden : Sequence(ControlFlow!FlowInstr)) : Boolean =
if input->exists(i | i = end) then true
else let newInput : Sequence(ControlFlow!FlowInstr) = input ->collect(i |i.cfPrev) ->flatten() ->reject(i | visited ->exists(v
| v = i) or forbidden ->exists(f| f = i)) in
if newInput ->isEmpty() then false
else thisModule.isDefinedBy(start, newInput, end, visited->union(newInput)->asSet() ->asSequence(), forbidden)
endif
endif;

ATL Execution Semantic: Match
phase
int fact (int a)
int r = 1;
while (a>0)
r *= a--;
return r;
a
r
rule:Method
rule:Stmnt
rule:Stmnt
rule:Stmnt
rule:Stmnt

ATL Execution Semantic: Apply
phase
int fact (int a)
int r = 1;
while (a>0)
r *= a--;
return r;
a
r
rule:Method
rule:Stmnt
rule:Stmnt
rule:Stmnt
rule:Stmnt

phase
int fact (int a)
int r = 1;
while (a>0)
r *= a--;
return r;
a
r
rule:Method int fact(int a)
rule:Stmnt
rule:Stmnt
rule:Stmnt
rule:Stmnt

phase
int fact (int a)
int r = 1;
while (a>0)
r *= a--;
return r;
a
r
rule:Method
rule:Stmnt
rule:Stmnt
rule:Stmnt
rule:Stmnt
int fact(int a)

phase
int fact (int a)
int r = 1;
while (a>0)
r *= a--;
return r;
a
r
rule:Method
rule:Stmnt
rule:Stmnt
rule:Stmnt
rule:Stmnt
int fact(int a)
int r = 1;
while (a>0)
r *= a--;
return r;

MapReduce
Log0
Record
Log1
Log2
map1
Log3
Log4
Log5
map2
Log6
Log7
Log8
map3
<+,1>
<+,1>
<*,1>
SPLIT1SPLIT2SPLIT3
<X,1>
<+,1>
<*,1>
<X,1>
<*,1>
<+,1>
shuffle/sort
<+,1>
<+,1>
<+,1>
<+,1>
<*,1>
<*,1>
<*,1>
<X,1>
red1
red2
<X,1>
<*,3>
<X,2>
<+,4>
Map phase Reduce phase

Why MapReduce for ATL?
● Well-suited for Write Once Read Many (WORM) data
● Two-phased execution model
Also MapReduce:
● Supports different types of inputs (XML, DB, Text)
● Handles machine failures, efficient communication, and performance issues

Semantics Alignment
Reduce
read
traces
global
resolve
Map
read
model
subset
create
trace
properties
local
match/
apply
save
model
match apply map reduce

Control-Flow to Data-Flow in MapReduce:
Local Match/Apply
int fact (int a)
int r = 1;
while (a>0)
r *= a--;
return r;
a
r
rule:Method
rule:Stmnt
rule:Stmnt
rule:Stmnt
rule:Stmnt
int r = 1;
while (a>0)
r *= a--;
return r;
map1
map2

Local Match/Apply
int fact (int a)
int r = 1;
while (a>0)
r *= a--;
return r;
a
r
rule:Method
rule:Stmnt
rule:Stmnt
rule:Stmnt
rule:Stmnt
int fact(int a)
int r = 1;
while (a>0)
r *= a--;
return r;
map1
map2

Local Match/Apply
int fact (int a)
int r = 1;
while (a>0)
r *= a--;
return r;
a
r
rule:Method
rule:Stmnt
rule:Stmnt
rule:Stmnt
rule:Stmnt
int fact(int a)
int r = 1;
while (a>0)
r *= a--;
return r;
map1
map2
dfNext

Global Resolve
int fact (int a)
int r = 1;
while (a>0)
r *= a--;
return r;
a
r
rule:Method
rule:Stmnt
rule:Stmnt
rule:Stmnt
rule:Stmnt
int fact(int a)
int r = 1;
while (a>0)
r *= a--;
return r;
red1
red2
dfNext

Global Resolve
int fact (int a)
int r = 1;
while (a>0)
r *= a--;
return r;
a
r
rule:Method
rule:Stmnt
rule:Stmnt
rule:Stmnt
rule:Stmnt
int fact(int a)
int r = 1;
while (a>0)
r *= a--;
return r;
red1
red2

int fact (int a)
int r = 1;
while (a>0)
r *= a--;
return r;
a
r
rule:Method
rule:Stmnt
rule:Stmnt
rule:Stmnt
rule:Stmnt
int fact(int a)
int r = 1;
while (a>0)
r *= a--;
return r;
Global Resolve
red1
red2

ATL-MR in Action
Hadoop Distributed File System (HDFS)
objectUID_5
objectUID_6
objectUID_7
objectUID_8
map2
objectUID_1
objectUID_2
objectUID_3
objectUID_4 map1
load transformation
data
<rule2,traceUID5>
<rule1,traceUID6>
<rule1,traceUID7>
<rule2,traceUID8>
<rule1,traceUID1>
<rule2,traceUID2>
<rule2,traceUID3>
<rule1,traceUID4>
shuffle/sort
<rule2,traceUID2>
<rule2,traceUID3>
<rule2,traceUID5>
<rule2,traceUID8>
<rule1,traceUID1>
<rule1,traceUID4>
<rule1,traceUID6>
<rule1,traceUID7> red1
red2
save traces and
partial models
LMA mode1
GR mode(2)
load traces and
partial models
save models
[1] LMA: Local Match/Apply
[2] GR: Global Resolve
[3] ATL-MR: https://github.com/atlanmod/ATL_MR

Experiment I: Speed-up Curve
● 5 models extracted from
automatically generated Java files:
○ similar size (~1500 LOCs)
○ sequential transformation ranges from
620s to 778s
● Run on identical set of machines
(m1.large) over Amazon Elastic
MapReduce (EMR)
○ 10 times for each number of nodes
○ 280 hours of computation
● Almost linear speed-up up to 8
nodes
○ ~3 times faster on 8 nodes

Experiment II: Size/Speed-Up Correlation
● 5 models extracted from automatically
generated Java files:
○ increasing size (13.500 to 105.000 LOCs)
○ sequential transformation ranges from 319s to
17 998s (~4h)
● Run on a cluster of 12 instances built on top of
OpenVC
○ 8 slaves
○ 4 machines orchestrating Hadoop/Hbase
● Almost-linear speed-up for large models
○ Up to 6X faster on 8 nodes
● Speed-up increases with model size

Challenges In Distributing Model
Transformation
Fact II: Persistence
backends are not suited
for R/W concurrency
Rule applications might
not have the same
complexity
Unable to parallelize
the reduce phase
Unable to guarantee a balanced
workload, MapReduce default
scheduler is not enough
Fact I: Models might
densely interconnected &
unbalanced

NeoEMF an Extensible Persistence
Backend
● Lazy loading and unloading
○ enabling transformation of big
models
● Distributed storage and access
○ permitting the parallelization of the
reduce phase
● Compliant with MapReduce
● Fail-safe (no data loss)
Model
Manager
Persistence
Manager
Persistence
Backend
NeoEMF
/Map
EMF
/Graph
Model-based Tools
Caching
Strategy
Model Access API
Persistence
API
Backend API
Client
Code
/HBase
HBase ZooKeeperGraphDB MapDB
[1] NeoEMF: http://www.neoemf.com

Future Work
1. Optimization of load balancing
○ efficient distribution of the input model over map workers
2. Parallelization of the Global Resolve phase and the transformation of Very
Large Models
○ integrating ATL-MR with NeoEMF/HBase

Conclusion
● We align Rule-based Model Transformation with the MapReduce execution
model
○ We introduce an execution semantics of ATL on top of MapReduce
○ We experimentally show the good scalability of our solution
● For ATL users: Keep the same syntax and embrace the Cloud
● For MapReduce users: Model Transformation as yet another high-level
language for MapReduce

Check us out on Github
https://github.com/atlanmod/ATL_MR

SLE2015: Distributed ATL

More Related Content

What's hot (20)

Similar to SLE2015: Distributed ATL (20)

Recently uploaded (20)

SLE2015: Distributed ATL