SlideShare a Scribd company logo
Scalable	Machine	Learning	
with	Apache	SystemML
Berthold	Reinwald,	Nakul	Jindal
IBM
June	21st,	2016
1
Agenda
• What	is	Apache	SystemML
• How	to	implement	SystemML algorithms
è data	scientist
• How	to	run	SystemML algorithms
è user
• How	does	SystemML work
è SystemML developer
2
What	is	Apache	SystemML
• In	a	nutshell
• a	language	for	data	scientists	to	implement	scalable	ML	algorithms	
• 2	language	variants:	R-like	and	Python-like	syntax
• Strong	foundation	of	linear	algebra	operations	and	statistical	functions
• Comes	with	approx.	20+	algorithms	pre-implemented
• Cost-based	optimizer	to	compile	execution	plans
• Depending	on	data	characteristics	(tall/skinny,	short/wide;	dense/sparse)	
and	cluster	characteristics
• ranging	from	single	node	to	clusters	(MapReduce,	Spark);	hybrid	plans
• APIs	&	Tools
• Command	line:	hadoop jar,	spark-submit,	standalone	Java	app
• JMLC:	embed	as	library
• Spark	MLContext:	Scala,	Python,	and	Java
• Tools
• REPL	(Scala	Spark	and	pyspark)
• Spark	ML	pipeline
3
Big	Data	Analytics	- Characteristics
• Large	number	of	models
• Large	number	of	data	points
• Large	number	of	features
• Sparse	data
• Large	number/size	of	intermediates
• Large	number	of	pairs
• Custom	analytics
4
SystemML	– Declarative	ML
• Analytics	language	for	data	scientists
(“The	SQL	for	analytics”)
• Algorithms	expressed	in	a	declarative,	
high-level	language	DML	with	R-like	syntax
• Productivity	of	data	scientists	
• Enable
• Solutions	 development
• Tools
• Compiler
• Cost-based	optimizer	to	generate	
execution	plans	and	to	parallelize
• based	on	data	characteristics
• based	on	cluster	and	machine	characteristics
• Physical	operators	for	in-memory	single	node	
and	cluster	execution
• Performance	&	Scalability
5
High-Level	SystemML	Architecture
6
Hadoop or Spark Cluster
(scale-out)
In-Memory Single Node
(scale-up)
Runtime
Compiler
Language
DML Scripts DML (Declarative Machine
Learning Language)
Apache	SystemML Incubator	Project
• June,	2015:	SystemML open	source	announced	at	
Spark	Summit
• Sep.,	2015:	public	github
• Oct.,	2015:	1st open	source	binary	release	(0.8.0)
• Nov.,	2015:	Enter	Apache	incubation
• http://systemml.apache.org/
• https://github.com/apache/incubator-systemml
• Jan.,	2016:	SystemML 0.9.0	(1st Apache	release)
• June,	2016:	SystemML 0.10.0	release
7
Apache	SystemML	Incubator
http://systemml.apache.org/
• Get	SystemML
• Documentation
• DML	Reference	Guide
• Algorithms	Guide
• Running
• Community
• JIRA	server
• GitHub
8
DML	Language	Reference	Guide
9
https://apache.github.io/incubator-systemml/dml-language-reference.html
Sample	Code
A = 1.0 # A is an integer
X <- matrix(“4 3 2 5 7 8”, rows=3, cols=2) # X = matrix of size 3,2 '<-' is assignment
Y = matrix(1, rows=3, cols=2) # Y = matrix of size 3,2 with all 1s
b <- t(X) %*% Y # %*% is matrix multiply, t(X) is transpose
S = "hello world"
i=0
while(i < max_iteration) {
H = (H * (t(W) %*% (V/(W%*%H))))/t(colSums(W)) # * is element by element mult
W = (W * ((V/(W%*%H)) %*% t(H)))/t(rowSums(H))
i = i + 1; # i is an integer
}
print (toString(H)) # toString converts a matrix to a string
10
Sample	Code
source("nn/layers/affine.dml") as affine # import a file in the “affine“ namespace
[W, b] = affine::init(D, M) # calls the init function, multiple
return
parfor (i in 1:nrow(X)) { # i iterates over 1 through num rows in X in parallel
for (j in 1:ncol(X)) { # j iterates over 1 through num cols in X
# Computation ...
}
}
write (M, fileM, format=“text”) # M=matrix, fileM=file, also writes to
HDFS
X = read (fileX) # fileX=file, also reads from HDFS
if (ncol (A) > 1) {
# Matrix A is being sliced by a given range of columns
A[,1:(ncol (A) - 1)] = A[,1:(ncol (A) - 1)] - A[,2:ncol (A)];
}
11
Sample	Code
interpSpline = function(
double x, matrix[double] X, matrix[double] Y, matrix[double] K) return (double q) {
i = as.integer(nrow(X) - sum(ppred(X, x, ">=")) + 1)
# misc computation …
q = as.scalar(qm)
}
eigen = externalFunction(Matrix[Double] A)
return(Matrix[Double] eval, Matrix[Double] evec)
implemented in (classname="org.apache.sysml.udf.lib.EigenWrapper",
exectype="mem")
12
Sample	Code	(From	LinearRegDS.dml*)
A = t(X) %*% X
b = t(X) %*% y
if (intercept_status == 2) {
A = t(diag (scale_X) %*% A + shift_X %*% A [m_ext, ])
A = diag (scale_X) %*% A + shift_X %*% A [m_ext, ]
b = diag (scale_X) %*% b + shift_X %*% b [m_ext, ]
}
A = A + diag (lambda)
print ("Calling the Direct Solver...")
beta_unscaled = solve (A, b)
*https://github.com/apache/incubator-systemml/blob/master/scripts/algorithms/LinearRegDS.dml#L133
13
DML	Editor	Support
• Very	rudimentary	editor	support
• Bit	of	shameless	self-promotion	 :	
• Atom	– Hackable	Text	editor
• Install	package	- https://atom.io/packages/language-dml
• From	GUI	- http://flight-manual.atom.io/using-atom/sections/atom-packages/
• Or	from	command	line	– apm	install	language-dml
• Rudimentary	snippet	based	completion	of	builtin	function
• Vim
• Install	package	- https://github.com/nakul02/vim-dml
• Works	with	Vundle	(vim	package	manager)
• There	is	an	experimental	Zeppelin	Notebook	integration	with	DML	–
• https://issues.apache.org/jira/browse/SYSTEMML-542
• Available	as	a	docker	image	to	play	with	- https://hub.docker.com/r/nakul02/incubator-
zeppelin/
• Please	send	feedback	when	using	these,	requests	for	features,	bugs
• I’ll	work	on	them	when	I	can
14
SystemML Algorithms
15
Category Description
Descriptive Statistics
Univariate
Bivariate
Stratified Bivariate
Classification
Logistic Regression (multinomial)
Multi-Class SVM
Naïve Bayes (multinomial)
Decision Trees
Random Forest
Clustering k-Means
Regression
Linear Regression system of equations
CG (conjugate gradient descent)
Generalized Linear Models
(GLM)
Distributions: Gaussian, Poisson, Gamma, InverseGaussian, Binomial, Bernoulli
Links for all distributions: identity, log, sq. root,inverse, 1/μ2
Links for Binomial / Bernoulli: logit, probit, cloglog, cauchit
Stepwise
Linear
GLM
Dimension Reduction PCA
Matrix Factorization ALS
direct solve
CG (conjugate gradient descent)
Survival Models
Kaplan Meier Estimate
Cox Proportional Hazard Regression
Predict Algorithm-specific scoring
Transformation (native) Recoding, dummy coding, binning, scaling, missing value imputation
Documentation: https://apache.github.io/incubator-systemml/algorithms-reference.html
Scripts:	/usr/SystemML/systemml-0.10.0-incubating/scripts/algorithms/
Running	/	Invoking	SystemML
• Command	line
• Standalone	(Java	application	in	single	JVM,	in	bin	folder)
• Spark	(spark-submit,	in	scripts	folder)
• hadoop command	line
• APIs	(MLContext)
• Scala,	e.g.	run	from	Spark	shell
• Python,	e.g.	run	from	PySpark
• Java
• In-Memory
16
MLContext	API	– Example	Usage
val ml = new MLContext(sc)
val X_train = sc.textFile("amazon0601.txt")
.filter(!_.startsWith("#"))
.map(_.split("t") match{case Array(prod1, prod2)=>(prod1.toInt, prod2.toInt,1.0)})
.toDF("prod_i", "prod_j", "x_ij")
.filter("prod_i < 5000 AND prod_j < 5000") // Change to smaller number
.cache()
17
MLContext API	– Example	Usage
val pnmf =
"""
# data & args
X = read($X)
rank = as.integer($rank)
# Computation ....
write(negloglik, $negloglikout)
write(W, $Wout)
write(H, $Hout)
"""
18
MLContext API	– Example	Usage
val pnmf =
"""
# data & args
X = read($X)
rank = as.integer($rank)
# Computation ....
write(negloglik, $negloglikout)
write(W, $Wout)
write(H, $Hout)
"""
ml.registerInput("X", X_train)
ml.registerOutput("W")
ml.registerOutput("H")
ml.registerOutput("negloglik")
val outputs = ml.executeScript(pnmf,
Map("maxiter" -> "100", "rank" -> "10"))
val negloglik = getScalarDouble(outputs,
"negloglik")
19
Run	LinReg	CG	from	Spark	Shell	
(MLContext)
20
Run	SystemML	in	ML	Pipeline
21
End-to-end	on	Spark	…	in	Code
22
import org.apache.spark.sql._
val ctx = new org.apache.spark.sql.SQLContext(sc)
val tweets = ctx.jsonFile("hdfs:/twitter/decahose")
tweets.registerAsTable("tweetTable")
ctx.sql("SELECT text FROM tweetTable LIMIT 5").collect.foreach(println)
ctx.sql("SELECT lang, COUNT(*) AS cnt FROM tweetTable 
GROUP BY lang ORDER BY cnt DESC LIMIT 10").collect.foreach(println)
val texts = ctx.sql("SELECT text FROM tweetTable").map(_.head.toString)
def featurize(str: String): Vector = { ... }
val vectors = texts.map(featurize).toDF.cache()
val mcV = new MatrixCharacteristics(vectors.count, vocabSize, 1000,1000)
val V = RDDConvertUtilsExt(sc, vectors, mcV, false, "_1")
val ml = new com.ibm.bi.dml.api.MLContext(sc)
ml.registerInput("V", V, mcV)
ml.registerOutput("W")
ml.registerOutput("H")
val args = Array(numTopics, numGNMFIter)
val out = ml.execute("GNMF.dml", args)
val W = out.getDF("W")
val H = out.getDF("H")
def getWords(r: Row): Array[(String, Double)] = { ... }
val topics = H.rdd.map(getWords)
Twitter Data
Explore Data
In SQL
Data Set
Training Set
Topic Modeling
SQLML
Get Topics
SystemML	Architecture	
Language
• R- like syntax
• Linear algebra, statisticalfunctions, controlstructures, etc.
• User-defined & externalfunction
• Parsing
• Statement blocks & statements
• Program Analysis, type inference, dead code elimination
High-Level Operator (HOP) Component
• Dataflow in DAGs of operations on matrices, frames, and scalars
• Choosing from alternative execution plans based on memoryand
cost estimates: operatorordering & selection; hybrid plans
Low-Level Operator (LOP) Component
• Low-levelphysicalexecution plan (LOPDags)overkey-value pairs
• “Piggybacking”operationsinto minimalnumber Map-Reduce jobs
Runtime
• Hybrid Runtime
• CP: single machine operations & orchestrate jobs
• MR: generic Map-Reduce jobs & operations
• SP: Spark Jobs
• Numerically stable operators
• Dense / sparse matrix representation
• Multi-Levelbuffer pool (caching) to evict in-memory objects
• Dynamic Recompilation for initial unknowns
Command	
Line
JMLC
Spark	
MLContext
APIs
High-Level	 Operators
Parser/Language
Low-Level	 Operators
Compiler
Runtime
Control	 Program
Runtime
Program
Buffer	Pool
ParFor Optimizer/
Runtime
MR
InstSpark	
Inst
CP
Inst
Recompiler
Cost-based
optimizations
DFS	IOMem/FS	IO
Generic
MR	Jobs
MatrixBlock Library
(single/multi-threaded)
23
SystemML	Compilation	Chain
24
CP + b sb _mVar1
SPARK mapmm X.MATRIX.DOUBLE _mvar1.MATRIX.DOUBLE
_mVar2.MATRIX.DOUBLE RIGHT false NONE
CP * y _mVar2 _mVar3
Selected	Algebraic	Simplification	
Rewrites
25
Name Dynamic	Pattern
Remove	Unnecessary	Indexing X[a:b,c:d] = Y à X = Y iff dims(X)=dims(Y)
X = Y[, 1] à X = Y iff ncol(Y)=1
Remove	Empty	
Matrix	Multiply
X%*%Y à matrix(0,nrow(X),ncol(Y))
iff nnz(X)=0|nnz(Y)=0
Removed	Unnecessary	Outer
Product
X*(Y%*%matrix(1,...)) à X*Y
iff ncol(Y)=1
Simplify	Diag Aggregates sum(diag(X))àtrace(X) iff ncol(X)=1
SimplifyMatrix	Mult Diag diag(X)%*%Y à X*Y iff ncol(X)=1&ncol(Y)=1
Simplify	Diag Matrix	Mult diag(X%*%Y) à rowSums(X*t(Y)) iff ncol(Y)>1
Simplify	Dot	Product	Sum	 sum(X^2) à t(X)%*%X iff ncol(X)=1
Name Static	Pattern
Remove	Unnecessary	Operations t(t(X)), X/1, X*1, X-0 à X matrix(1,)/X à 1/X
rand(,min=-1,max=1)*7 à rand(,min=-7,max=7)
Binary	to Unary X+X à 2*X X*X à X^2 X-X*Y à X*(1-Y)
Simplify	Diag Aggregates trace(X%*%Y)àsum(X*t(Y))
A	Data	Scientist	– Linear	Regression
26
X ≈
Explanatory/
Independent Variables
Predicted/
Dependant VariableModel
w
w = argminw ||Xw-y||2 +λ||w||2
Optimization Problem:
next	direction
Iterate	until	
convergence
initialize
step	size
update		w
initial	direction
accuracy
measures
Conjugate GradientMethod:
• Start off with the (negative) gradient
• For each step
1. Move to the optimal point along the chosen direction;
2. Recompute the gradient;
3. Project it onto the subspace conjugate* to allprior directions;
4. Use this as the next direction
(* conjugate =orthogonalgiven A as the metric)
A = XT X + λ
y
SystemML – Run	LinReg CG	on	Spark
27
100M
10,000
100M
1
yX
100M
1,000
X
100M
100
X
100M
10
X
100M
1
y
100M
1
y
100M
1
y
8 TB
800 GB
80 GB
8 GB …
tMMp
…
Multithreaded
Single Node
20 GB Driver on 16c
6 x 55 GB Executors
Hybrid Plan
with RDD caching
and fused operator
Hybrid Plan
with RDD out-of-
core and fused
operator
Hybrid Plan
with RDD out-of-
core and different
operators
…
x.persist();
...
X.mapValues(tMMp
)
.reduce ()
…
Driver
Fused
Executors
…
RDD	cache:	X
tMMv tMMv
…
x.persist();
...
X.mapValues(tMMp)
.reduce()
...
Executors
…
RDD	cache:	X
tMMv tMMv
Driver
Spilling
…
x.persist();
...
// 2 MxV mult
// with broadcast,
// mapToPair, and
// reduceByKey
... Executors
…
RDD	cache:	X
Mv
tvM
Mv
tvM
Driver
Driver
Cache
LinReg CG	for	varying	Data
28
8 GB
100M x 10
80 GB
100M x 100
800 GB
100M x 1K
8 TB
100M x 10K
CP+Spark 21 92 2,065 40,395
Spark 76 124 2,159 40,130
CP+MR 24 277 2,613 41,006
10
100
1,000
10,000
100,000
ExecutionTimeinsecs(logscale)
Data Size
Note	
Driver	w+h	20	GB,	16c	
6	Executors	each	55	GB,	24c	
Convergence	in	3-4	itera+ons	
SystemML	as	of	10/2015	
Single node MT
avoids Spark Ctx
& distributed ops
3.6 x
Hybrid plan &
RDD caching
3x
Out of Core
1.2x
Fully Utilized
Ø Cost-based	optimization	 is	
important
Ø Hybrid	execution	 plans	
benefit	especially	medium-
sized	data	sets	
Ø Aggregated	in-memory	data	
sets	are	sweet	spot	for	
Spark	esp.	for	iterative	
algorithms
Ø Graceful	 degradation	for	
out-of-core
Apache	SystemML	- Summary
• Cost-based	compilation	of	machine	learning	algorithms	generates	execution	plans
• for	single-node	in-memory,	cluster,	and	hybrid	execution
• for	varying	data	characteristics:
• varying	number	of	observations	(1,000s	to	10s	of	billions)
• varying	number	of	variables	(10s	to	10s	of	millions)
• dense	and	sparse	data
• for	varying	cluster	characteristics	(memory	configurations,	degree	of	parallelism)
• Out-of-the-box,	scalable	machine	learning	algorithms
• e.g.	descriptive	statistics,	regression,	clustering,	and	classification
• "Roll-your-own"	algorithms
• Enable	programmer	productivity	(no	worry	about	scalability,	numeric	stability,	and	
optimizations)
• Fast	turn-around	for	new	algorithms
• Higher-level	language	shields	algorithm	development	investment	from	platform	
progression
• Yarn	for	resource	negotiation	and	elasticity
• Spark	for	in-memory,	iterative	processing
29
Roadmap
• Algorithms
• kNN,	word2vec,	non-linear	SVM,	etc.
• Deep	learning
• Engine
• Compressed	Linear	Algebra
• Code	Gen
• Extensions	for	Deep	Learning
• GPU	backend
• Usability
• DML	notebook
• Language	integration
• API	cleanup
30
Research	Papers
• Ahmed	Elgohary,	Matthias	Boehm,	Peter	J.	Haas,	Frederick	R.	Reiss,	Berthold	Reinwald:	Compressed	
Linear	Algebra	for	Large	Scale	Machine	Learning.	Conditional	Accept	at	VLDB	2016
• Matthias	Boehm,	Michael	W.	Dusenberry,	Deron	Eriksson,	Alexandre	V.	Evfimievski,	FarazMakari
Manshadi,	Niketan Pansare,	Berthold	Reinwald,	Frederick	R.	Reiss,	PrithvirajSen,	Arvind	C.	Surve,	
Shirish Tatikonda.	SystemML:	 Declarative	Machine	Learning	on	Spark.	VLDB	2016
• Botong Huang, Matthias	Boehm, Yuanyuan Tian, Berthold	Reinwald, Shirish Tatikonda, Frederick	R.	
Reiss:	Resource	Elasticity	for	Large-Scale	 Machine	Learning. SIGMOD	Conference 2015:137-152
• Arash Ashari,Shirish Tatikonda, Matthias	Boehm, Berthold	Reinwald, Keith	Campbell, John	
Keenleyside, P.	Sadayappan:	On	optimizing	machine	 learning	workloads	via	kernel	
fusion. PPOPP 2015:173-182
• Sebastian	Schelter, Juan	Soto, Volker	Markl, Douglas	Burdick, Berthold	Reinwald, Alexandre	V.	
Evfimievski:	Efficient	sample	generation	for	scalable	meta	learning. ICDE 2015:1191-1202
• Matthias	Boehm, Douglas	R.	Burdick,Alexandre	V.	Evfimievski, Berthold	Reinwald, Frederick	R.	
Reiss, PrithvirajSen, Shirish Tatikonda, Yuanyuan Tian:	SystemML's Optimizer:	Plan	Generation	for	
Large-Scale	 Machine	Learning	Programs. IEEE	Data	Eng.	Bull. 37(3):52-62 (2014)
• Matthias	Boehm, Shirish Tatikonda, Berthold	Reinwald, PrithvirajSen, Yuanyuan Tian, Douglas	
Burdick, Shivakumar Vaithyanathan:	 Hybrid	Parallelization	Strategies	 for	Large-Scale	Machine	
Learning	in	SystemML. PVLDB 7(7): 553-564 (2014)
• Peter	D.	Kirchner, Matthias	Boehm, Berthold	Reinwald, Daby M.	Sow, Michael	Schmidt, Deepak	S.	
Turaga, Alain	Biem:	Large	Scale	Discriminative	Metric	Learning. IPDPS	Workshops2014:1656-1663
• Yuanyuan Tian, Shirish Tatikonda, Berthold	Reinwald:	Scalable	and	Numerically	Stable	Descriptive	
Statistics	 in	SystemML. ICDE 2012:1351-1359
• Amol	Ghoting, Rajasekar Krishnamurthy,Edwin	P.	D.	Pednault, Berthold	Reinwald, Vikas
Sindhwani, Shirish Tatikonda, Yuanyuan Tian, Shivakumar Vaithyanathan:	SystemML:	 Declarative	
machine	learning	on	MapReduce. ICDE 2011:231-242
31
Custom
Algorithm
Optimizer
Resource
Elasticity
GPU
Sampling
Numeric
Stability
Task
Parallelism
1st
paper
on Spark
Compression
32
Thank You

More Related Content

PDF
An Introduction to Higher Order Functions in Spark SQL with Herman van Hovell
PPT
PPT
L4 functions
PPT
Rewriting Java In Scala
PDF
Java/Scala Lab: Анатолий Кметюк - Scala SubScript: Алгебра для реактивного пр...
PDF
Functional Programming Patterns for the Pragmatic Programmer
PDF
Probabilistic data structures. Part 2. Cardinality
PDF
A taste of Functional Programming
An Introduction to Higher Order Functions in Spark SQL with Herman van Hovell
L4 functions
Rewriting Java In Scala
Java/Scala Lab: Анатолий Кметюк - Scala SubScript: Алгебра для реактивного пр...
Functional Programming Patterns for the Pragmatic Programmer
Probabilistic data structures. Part 2. Cardinality
A taste of Functional Programming

What's hot (20)

PDF
Introduction to functional programming using Ocaml
PPTX
Systematic Generation Data and Types in C++
PPTX
multiple linear regression
PPTX
Exploratory data analysis using r
PPTX
simple linear regression
PDF
SupportVectorRegression
PPTX
support vector regression
PDF
Scheme 核心概念(一)
PDF
Hive function-cheat-sheet
PPTX
Thinking Functionally with JavaScript
PPT
Queue implementation
PPTX
logistic regression with python and R
PDF
No more promises lets RxJS 2 Edit
PDF
Java patterns in Scala
PPTX
ScalaDays 2013 Keynote Speech by Martin Odersky
PPTX
Introduction to java 8 stream api
PPTX
New features in jdk8 iti
PPTX
decision tree regression
Introduction to functional programming using Ocaml
Systematic Generation Data and Types in C++
multiple linear regression
Exploratory data analysis using r
simple linear regression
SupportVectorRegression
support vector regression
Scheme 核心概念(一)
Hive function-cheat-sheet
Thinking Functionally with JavaScript
Queue implementation
logistic regression with python and R
No more promises lets RxJS 2 Edit
Java patterns in Scala
ScalaDays 2013 Keynote Speech by Martin Odersky
Introduction to java 8 stream api
New features in jdk8 iti
decision tree regression
Ad

Viewers also liked (18)

PPTX
Our Culture
PDF
Star Wars and Character Merchandising
DOCX
SUJEET MISHRA (1)
PPTX
Drogas y alcoholismo en el mundo juvenil
PPTX
Chemicalcombinationsbalancingchemeqns
PPTX
HOW TECHNOLOGY HAS CHANGED EDUCATION - DEU
PPTX
Білорусь
PPTX
Growing your eBay Sales with Linnworks
DOCX
PPTX
Витухина Юлия Анатольевна
DOCX
Herbario
PDF
week 7 (2)
PPTX
Theoryofsupply
PDF
On - Fideicomiso Ganadero (1)
PDF
Extracto de el gran juego
PDF
Maryland summit jhh 2015 how to live longer and better with lupus
DOCX
Social Media
Our Culture
Star Wars and Character Merchandising
SUJEET MISHRA (1)
Drogas y alcoholismo en el mundo juvenil
Chemicalcombinationsbalancingchemeqns
HOW TECHNOLOGY HAS CHANGED EDUCATION - DEU
Білорусь
Growing your eBay Sales with Linnworks
Витухина Юлия Анатольевна
Herbario
week 7 (2)
Theoryofsupply
On - Fideicomiso Ganadero (1)
Extracto de el gran juego
Maryland summit jhh 2015 how to live longer and better with lupus
Social Media
Ad

Similar to Overview of Apache SystemML by Berthold Reinwald and Nakul Jindal (20)

PDF
DML Syntax and Invocation process
PDF
S1 DML Syntax and Invocation
PDF
SystemML - Datapalooza Denver - 05.17.16 MWD
PPTX
System mldl meetup
PPTX
2018 03 25 system ml ai and openpower meetup
PPTX
System mldl meetup
PDF
Advanced Data Science on Spark-(Reza Zadeh, Stanford)
PDF
MLlib sparkmeetup_8_6_13_final_reduced
PDF
Bringing Algebraic Semantics to Mahout
PDF
Apache SystemML Architecture by Niketan Panesar
PDF
Apache SystemML Architecture by Niketan Panesar
PDF
Alpine Tech Talk: System ML by Berthold Reinwald
PDF
What's new in Apache SystemML - Declarative Machine Learning
PPTX
Big data analytics_beyond_hadoop_public_18_july_2013
PDF
Big Data Everywhere Chicago: Apache Spark Plus Many Other Frameworks -- How S...
PDF
Spark Under the Hood - Meetup @ Data Science London
PDF
MLlib: Spark's Machine Learning Library
PPTX
Building Custom
Machine Learning Algorithms
with Apache SystemML
PDF
Building Custom Machine Learning Algorithms With Apache SystemML
PDF
Apache Spark MLlib 2.0 Preview: Data Science and Production
DML Syntax and Invocation process
S1 DML Syntax and Invocation
SystemML - Datapalooza Denver - 05.17.16 MWD
System mldl meetup
2018 03 25 system ml ai and openpower meetup
System mldl meetup
Advanced Data Science on Spark-(Reza Zadeh, Stanford)
MLlib sparkmeetup_8_6_13_final_reduced
Bringing Algebraic Semantics to Mahout
Apache SystemML Architecture by Niketan Panesar
Apache SystemML Architecture by Niketan Panesar
Alpine Tech Talk: System ML by Berthold Reinwald
What's new in Apache SystemML - Declarative Machine Learning
Big data analytics_beyond_hadoop_public_18_july_2013
Big Data Everywhere Chicago: Apache Spark Plus Many Other Frameworks -- How S...
Spark Under the Hood - Meetup @ Data Science London
MLlib: Spark's Machine Learning Library
Building Custom
Machine Learning Algorithms
with Apache SystemML
Building Custom Machine Learning Algorithms With Apache SystemML
Apache Spark MLlib 2.0 Preview: Data Science and Production

More from Arvind Surve (17)

PDF
Apache SystemML Optimizer and Runtime techniques by Arvind Surve and Matthias...
PDF
Apache SystemML Optimizer and Runtime techniques by Matthias Boehm
PDF
Clustering and Factorization using Apache SystemML by Prithviraj Sen
PDF
Clustering and Factorization using Apache SystemML by Alexandre V Evfimievski
PDF
Classification using Apache SystemML by Prithviraj Sen
PDF
Regression using Apache SystemML by Alexandre V Evfimievski
PDF
Data preparation, training and validation using SystemML by Faraz Makari Mans...
PDF
Apache SystemML 2016 Summer class primer by Berthold Reinwald
PDF
Apache SystemML Optimizer and Runtime techniques by Arvind Surve and Matthias...
PDF
Apache SystemML Optimizer and Runtime techniques by Matthias Boehm
PDF
Clustering and Factorization using Apache SystemML by Prithviraj Sen
PDF
Clustering and Factorization using Apache SystemML by Alexandre V Evfimievski
PDF
Classification using Apache SystemML by Prithviraj Sen
PDF
Regression using Apache SystemML by Alexandre V Evfimievski
PDF
Data preparation, training and validation using SystemML by Faraz Makari Mans...
PDF
Overview of Apache SystemML by Berthold Reinwald and Nakul Jindal
PDF
Apache SystemML 2016 Summer class primer by Berthold Reinwald
Apache SystemML Optimizer and Runtime techniques by Arvind Surve and Matthias...
Apache SystemML Optimizer and Runtime techniques by Matthias Boehm
Clustering and Factorization using Apache SystemML by Prithviraj Sen
Clustering and Factorization using Apache SystemML by Alexandre V Evfimievski
Classification using Apache SystemML by Prithviraj Sen
Regression using Apache SystemML by Alexandre V Evfimievski
Data preparation, training and validation using SystemML by Faraz Makari Mans...
Apache SystemML 2016 Summer class primer by Berthold Reinwald
Apache SystemML Optimizer and Runtime techniques by Arvind Surve and Matthias...
Apache SystemML Optimizer and Runtime techniques by Matthias Boehm
Clustering and Factorization using Apache SystemML by Prithviraj Sen
Clustering and Factorization using Apache SystemML by Alexandre V Evfimievski
Classification using Apache SystemML by Prithviraj Sen
Regression using Apache SystemML by Alexandre V Evfimievski
Data preparation, training and validation using SystemML by Faraz Makari Mans...
Overview of Apache SystemML by Berthold Reinwald and Nakul Jindal
Apache SystemML 2016 Summer class primer by Berthold Reinwald

Recently uploaded (20)

PDF
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
PDF
2.FourierTransform-ShortQuestionswithAnswers.pdf
PDF
Classroom Observation Tools for Teachers
PDF
Abdominal Access Techniques with Prof. Dr. R K Mishra
PDF
O7-L3 Supply Chain Operations - ICLT Program
PPTX
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
PPTX
Cell Types and Its function , kingdom of life
PDF
Origin of periodic table-Mendeleev’s Periodic-Modern Periodic table
PDF
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
PDF
FourierSeries-QuestionsWithAnswers(Part-A).pdf
PPTX
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
PDF
Microbial disease of the cardiovascular and lymphatic systems
PPTX
Microbial diseases, their pathogenesis and prophylaxis
PPTX
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
PPTX
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
PDF
Pre independence Education in Inndia.pdf
PDF
Supply Chain Operations Speaking Notes -ICLT Program
PDF
Mark Klimek Lecture Notes_240423 revision books _173037.pdf
PPTX
Introduction to Child Health Nursing – Unit I | Child Health Nursing I | B.Sc...
PDF
TR - Agricultural Crops Production NC III.pdf
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
2.FourierTransform-ShortQuestionswithAnswers.pdf
Classroom Observation Tools for Teachers
Abdominal Access Techniques with Prof. Dr. R K Mishra
O7-L3 Supply Chain Operations - ICLT Program
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
Cell Types and Its function , kingdom of life
Origin of periodic table-Mendeleev’s Periodic-Modern Periodic table
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
FourierSeries-QuestionsWithAnswers(Part-A).pdf
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
Microbial disease of the cardiovascular and lymphatic systems
Microbial diseases, their pathogenesis and prophylaxis
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
Pre independence Education in Inndia.pdf
Supply Chain Operations Speaking Notes -ICLT Program
Mark Klimek Lecture Notes_240423 revision books _173037.pdf
Introduction to Child Health Nursing – Unit I | Child Health Nursing I | B.Sc...
TR - Agricultural Crops Production NC III.pdf

Overview of Apache SystemML by Berthold Reinwald and Nakul Jindal

  • 2. Agenda • What is Apache SystemML • How to implement SystemML algorithms è data scientist • How to run SystemML algorithms è user • How does SystemML work è SystemML developer 2
  • 3. What is Apache SystemML • In a nutshell • a language for data scientists to implement scalable ML algorithms • 2 language variants: R-like and Python-like syntax • Strong foundation of linear algebra operations and statistical functions • Comes with approx. 20+ algorithms pre-implemented • Cost-based optimizer to compile execution plans • Depending on data characteristics (tall/skinny, short/wide; dense/sparse) and cluster characteristics • ranging from single node to clusters (MapReduce, Spark); hybrid plans • APIs & Tools • Command line: hadoop jar, spark-submit, standalone Java app • JMLC: embed as library • Spark MLContext: Scala, Python, and Java • Tools • REPL (Scala Spark and pyspark) • Spark ML pipeline 3
  • 4. Big Data Analytics - Characteristics • Large number of models • Large number of data points • Large number of features • Sparse data • Large number/size of intermediates • Large number of pairs • Custom analytics 4
  • 5. SystemML – Declarative ML • Analytics language for data scientists (“The SQL for analytics”) • Algorithms expressed in a declarative, high-level language DML with R-like syntax • Productivity of data scientists • Enable • Solutions development • Tools • Compiler • Cost-based optimizer to generate execution plans and to parallelize • based on data characteristics • based on cluster and machine characteristics • Physical operators for in-memory single node and cluster execution • Performance & Scalability 5
  • 6. High-Level SystemML Architecture 6 Hadoop or Spark Cluster (scale-out) In-Memory Single Node (scale-up) Runtime Compiler Language DML Scripts DML (Declarative Machine Learning Language)
  • 7. Apache SystemML Incubator Project • June, 2015: SystemML open source announced at Spark Summit • Sep., 2015: public github • Oct., 2015: 1st open source binary release (0.8.0) • Nov., 2015: Enter Apache incubation • http://systemml.apache.org/ • https://github.com/apache/incubator-systemml • Jan., 2016: SystemML 0.9.0 (1st Apache release) • June, 2016: SystemML 0.10.0 release 7
  • 8. Apache SystemML Incubator http://systemml.apache.org/ • Get SystemML • Documentation • DML Reference Guide • Algorithms Guide • Running • Community • JIRA server • GitHub 8
  • 10. Sample Code A = 1.0 # A is an integer X <- matrix(“4 3 2 5 7 8”, rows=3, cols=2) # X = matrix of size 3,2 '<-' is assignment Y = matrix(1, rows=3, cols=2) # Y = matrix of size 3,2 with all 1s b <- t(X) %*% Y # %*% is matrix multiply, t(X) is transpose S = "hello world" i=0 while(i < max_iteration) { H = (H * (t(W) %*% (V/(W%*%H))))/t(colSums(W)) # * is element by element mult W = (W * ((V/(W%*%H)) %*% t(H)))/t(rowSums(H)) i = i + 1; # i is an integer } print (toString(H)) # toString converts a matrix to a string 10
  • 11. Sample Code source("nn/layers/affine.dml") as affine # import a file in the “affine“ namespace [W, b] = affine::init(D, M) # calls the init function, multiple return parfor (i in 1:nrow(X)) { # i iterates over 1 through num rows in X in parallel for (j in 1:ncol(X)) { # j iterates over 1 through num cols in X # Computation ... } } write (M, fileM, format=“text”) # M=matrix, fileM=file, also writes to HDFS X = read (fileX) # fileX=file, also reads from HDFS if (ncol (A) > 1) { # Matrix A is being sliced by a given range of columns A[,1:(ncol (A) - 1)] = A[,1:(ncol (A) - 1)] - A[,2:ncol (A)]; } 11
  • 12. Sample Code interpSpline = function( double x, matrix[double] X, matrix[double] Y, matrix[double] K) return (double q) { i = as.integer(nrow(X) - sum(ppred(X, x, ">=")) + 1) # misc computation … q = as.scalar(qm) } eigen = externalFunction(Matrix[Double] A) return(Matrix[Double] eval, Matrix[Double] evec) implemented in (classname="org.apache.sysml.udf.lib.EigenWrapper", exectype="mem") 12
  • 13. Sample Code (From LinearRegDS.dml*) A = t(X) %*% X b = t(X) %*% y if (intercept_status == 2) { A = t(diag (scale_X) %*% A + shift_X %*% A [m_ext, ]) A = diag (scale_X) %*% A + shift_X %*% A [m_ext, ] b = diag (scale_X) %*% b + shift_X %*% b [m_ext, ] } A = A + diag (lambda) print ("Calling the Direct Solver...") beta_unscaled = solve (A, b) *https://github.com/apache/incubator-systemml/blob/master/scripts/algorithms/LinearRegDS.dml#L133 13
  • 14. DML Editor Support • Very rudimentary editor support • Bit of shameless self-promotion : • Atom – Hackable Text editor • Install package - https://atom.io/packages/language-dml • From GUI - http://flight-manual.atom.io/using-atom/sections/atom-packages/ • Or from command line – apm install language-dml • Rudimentary snippet based completion of builtin function • Vim • Install package - https://github.com/nakul02/vim-dml • Works with Vundle (vim package manager) • There is an experimental Zeppelin Notebook integration with DML – • https://issues.apache.org/jira/browse/SYSTEMML-542 • Available as a docker image to play with - https://hub.docker.com/r/nakul02/incubator- zeppelin/ • Please send feedback when using these, requests for features, bugs • I’ll work on them when I can 14
  • 15. SystemML Algorithms 15 Category Description Descriptive Statistics Univariate Bivariate Stratified Bivariate Classification Logistic Regression (multinomial) Multi-Class SVM Naïve Bayes (multinomial) Decision Trees Random Forest Clustering k-Means Regression Linear Regression system of equations CG (conjugate gradient descent) Generalized Linear Models (GLM) Distributions: Gaussian, Poisson, Gamma, InverseGaussian, Binomial, Bernoulli Links for all distributions: identity, log, sq. root,inverse, 1/μ2 Links for Binomial / Bernoulli: logit, probit, cloglog, cauchit Stepwise Linear GLM Dimension Reduction PCA Matrix Factorization ALS direct solve CG (conjugate gradient descent) Survival Models Kaplan Meier Estimate Cox Proportional Hazard Regression Predict Algorithm-specific scoring Transformation (native) Recoding, dummy coding, binning, scaling, missing value imputation Documentation: https://apache.github.io/incubator-systemml/algorithms-reference.html Scripts: /usr/SystemML/systemml-0.10.0-incubating/scripts/algorithms/
  • 16. Running / Invoking SystemML • Command line • Standalone (Java application in single JVM, in bin folder) • Spark (spark-submit, in scripts folder) • hadoop command line • APIs (MLContext) • Scala, e.g. run from Spark shell • Python, e.g. run from PySpark • Java • In-Memory 16
  • 17. MLContext API – Example Usage val ml = new MLContext(sc) val X_train = sc.textFile("amazon0601.txt") .filter(!_.startsWith("#")) .map(_.split("t") match{case Array(prod1, prod2)=>(prod1.toInt, prod2.toInt,1.0)}) .toDF("prod_i", "prod_j", "x_ij") .filter("prod_i < 5000 AND prod_j < 5000") // Change to smaller number .cache() 17
  • 18. MLContext API – Example Usage val pnmf = """ # data & args X = read($X) rank = as.integer($rank) # Computation .... write(negloglik, $negloglikout) write(W, $Wout) write(H, $Hout) """ 18
  • 19. MLContext API – Example Usage val pnmf = """ # data & args X = read($X) rank = as.integer($rank) # Computation .... write(negloglik, $negloglikout) write(W, $Wout) write(H, $Hout) """ ml.registerInput("X", X_train) ml.registerOutput("W") ml.registerOutput("H") ml.registerOutput("negloglik") val outputs = ml.executeScript(pnmf, Map("maxiter" -> "100", "rank" -> "10")) val negloglik = getScalarDouble(outputs, "negloglik") 19
  • 22. End-to-end on Spark … in Code 22 import org.apache.spark.sql._ val ctx = new org.apache.spark.sql.SQLContext(sc) val tweets = ctx.jsonFile("hdfs:/twitter/decahose") tweets.registerAsTable("tweetTable") ctx.sql("SELECT text FROM tweetTable LIMIT 5").collect.foreach(println) ctx.sql("SELECT lang, COUNT(*) AS cnt FROM tweetTable GROUP BY lang ORDER BY cnt DESC LIMIT 10").collect.foreach(println) val texts = ctx.sql("SELECT text FROM tweetTable").map(_.head.toString) def featurize(str: String): Vector = { ... } val vectors = texts.map(featurize).toDF.cache() val mcV = new MatrixCharacteristics(vectors.count, vocabSize, 1000,1000) val V = RDDConvertUtilsExt(sc, vectors, mcV, false, "_1") val ml = new com.ibm.bi.dml.api.MLContext(sc) ml.registerInput("V", V, mcV) ml.registerOutput("W") ml.registerOutput("H") val args = Array(numTopics, numGNMFIter) val out = ml.execute("GNMF.dml", args) val W = out.getDF("W") val H = out.getDF("H") def getWords(r: Row): Array[(String, Double)] = { ... } val topics = H.rdd.map(getWords) Twitter Data Explore Data In SQL Data Set Training Set Topic Modeling SQLML Get Topics
  • 23. SystemML Architecture Language • R- like syntax • Linear algebra, statisticalfunctions, controlstructures, etc. • User-defined & externalfunction • Parsing • Statement blocks & statements • Program Analysis, type inference, dead code elimination High-Level Operator (HOP) Component • Dataflow in DAGs of operations on matrices, frames, and scalars • Choosing from alternative execution plans based on memoryand cost estimates: operatorordering & selection; hybrid plans Low-Level Operator (LOP) Component • Low-levelphysicalexecution plan (LOPDags)overkey-value pairs • “Piggybacking”operationsinto minimalnumber Map-Reduce jobs Runtime • Hybrid Runtime • CP: single machine operations & orchestrate jobs • MR: generic Map-Reduce jobs & operations • SP: Spark Jobs • Numerically stable operators • Dense / sparse matrix representation • Multi-Levelbuffer pool (caching) to evict in-memory objects • Dynamic Recompilation for initial unknowns Command Line JMLC Spark MLContext APIs High-Level Operators Parser/Language Low-Level Operators Compiler Runtime Control Program Runtime Program Buffer Pool ParFor Optimizer/ Runtime MR InstSpark Inst CP Inst Recompiler Cost-based optimizations DFS IOMem/FS IO Generic MR Jobs MatrixBlock Library (single/multi-threaded) 23
  • 24. SystemML Compilation Chain 24 CP + b sb _mVar1 SPARK mapmm X.MATRIX.DOUBLE _mvar1.MATRIX.DOUBLE _mVar2.MATRIX.DOUBLE RIGHT false NONE CP * y _mVar2 _mVar3
  • 25. Selected Algebraic Simplification Rewrites 25 Name Dynamic Pattern Remove Unnecessary Indexing X[a:b,c:d] = Y à X = Y iff dims(X)=dims(Y) X = Y[, 1] à X = Y iff ncol(Y)=1 Remove Empty Matrix Multiply X%*%Y à matrix(0,nrow(X),ncol(Y)) iff nnz(X)=0|nnz(Y)=0 Removed Unnecessary Outer Product X*(Y%*%matrix(1,...)) à X*Y iff ncol(Y)=1 Simplify Diag Aggregates sum(diag(X))àtrace(X) iff ncol(X)=1 SimplifyMatrix Mult Diag diag(X)%*%Y à X*Y iff ncol(X)=1&ncol(Y)=1 Simplify Diag Matrix Mult diag(X%*%Y) à rowSums(X*t(Y)) iff ncol(Y)>1 Simplify Dot Product Sum sum(X^2) à t(X)%*%X iff ncol(X)=1 Name Static Pattern Remove Unnecessary Operations t(t(X)), X/1, X*1, X-0 à X matrix(1,)/X à 1/X rand(,min=-1,max=1)*7 à rand(,min=-7,max=7) Binary to Unary X+X à 2*X X*X à X^2 X-X*Y à X*(1-Y) Simplify Diag Aggregates trace(X%*%Y)àsum(X*t(Y))
  • 26. A Data Scientist – Linear Regression 26 X ≈ Explanatory/ Independent Variables Predicted/ Dependant VariableModel w w = argminw ||Xw-y||2 +λ||w||2 Optimization Problem: next direction Iterate until convergence initialize step size update w initial direction accuracy measures Conjugate GradientMethod: • Start off with the (negative) gradient • For each step 1. Move to the optimal point along the chosen direction; 2. Recompute the gradient; 3. Project it onto the subspace conjugate* to allprior directions; 4. Use this as the next direction (* conjugate =orthogonalgiven A as the metric) A = XT X + λ y
  • 27. SystemML – Run LinReg CG on Spark 27 100M 10,000 100M 1 yX 100M 1,000 X 100M 100 X 100M 10 X 100M 1 y 100M 1 y 100M 1 y 8 TB 800 GB 80 GB 8 GB … tMMp … Multithreaded Single Node 20 GB Driver on 16c 6 x 55 GB Executors Hybrid Plan with RDD caching and fused operator Hybrid Plan with RDD out-of- core and fused operator Hybrid Plan with RDD out-of- core and different operators … x.persist(); ... X.mapValues(tMMp ) .reduce () … Driver Fused Executors … RDD cache: X tMMv tMMv … x.persist(); ... X.mapValues(tMMp) .reduce() ... Executors … RDD cache: X tMMv tMMv Driver Spilling … x.persist(); ... // 2 MxV mult // with broadcast, // mapToPair, and // reduceByKey ... Executors … RDD cache: X Mv tvM Mv tvM Driver Driver Cache
  • 28. LinReg CG for varying Data 28 8 GB 100M x 10 80 GB 100M x 100 800 GB 100M x 1K 8 TB 100M x 10K CP+Spark 21 92 2,065 40,395 Spark 76 124 2,159 40,130 CP+MR 24 277 2,613 41,006 10 100 1,000 10,000 100,000 ExecutionTimeinsecs(logscale) Data Size Note Driver w+h 20 GB, 16c 6 Executors each 55 GB, 24c Convergence in 3-4 itera+ons SystemML as of 10/2015 Single node MT avoids Spark Ctx & distributed ops 3.6 x Hybrid plan & RDD caching 3x Out of Core 1.2x Fully Utilized Ø Cost-based optimization is important Ø Hybrid execution plans benefit especially medium- sized data sets Ø Aggregated in-memory data sets are sweet spot for Spark esp. for iterative algorithms Ø Graceful degradation for out-of-core
  • 29. Apache SystemML - Summary • Cost-based compilation of machine learning algorithms generates execution plans • for single-node in-memory, cluster, and hybrid execution • for varying data characteristics: • varying number of observations (1,000s to 10s of billions) • varying number of variables (10s to 10s of millions) • dense and sparse data • for varying cluster characteristics (memory configurations, degree of parallelism) • Out-of-the-box, scalable machine learning algorithms • e.g. descriptive statistics, regression, clustering, and classification • "Roll-your-own" algorithms • Enable programmer productivity (no worry about scalability, numeric stability, and optimizations) • Fast turn-around for new algorithms • Higher-level language shields algorithm development investment from platform progression • Yarn for resource negotiation and elasticity • Spark for in-memory, iterative processing 29
  • 30. Roadmap • Algorithms • kNN, word2vec, non-linear SVM, etc. • Deep learning • Engine • Compressed Linear Algebra • Code Gen • Extensions for Deep Learning • GPU backend • Usability • DML notebook • Language integration • API cleanup 30
  • 31. Research Papers • Ahmed Elgohary, Matthias Boehm, Peter J. Haas, Frederick R. Reiss, Berthold Reinwald: Compressed Linear Algebra for Large Scale Machine Learning. Conditional Accept at VLDB 2016 • Matthias Boehm, Michael W. Dusenberry, Deron Eriksson, Alexandre V. Evfimievski, FarazMakari Manshadi, Niketan Pansare, Berthold Reinwald, Frederick R. Reiss, PrithvirajSen, Arvind C. Surve, Shirish Tatikonda. SystemML: Declarative Machine Learning on Spark. VLDB 2016 • Botong Huang, Matthias Boehm, Yuanyuan Tian, Berthold Reinwald, Shirish Tatikonda, Frederick R. Reiss: Resource Elasticity for Large-Scale Machine Learning. SIGMOD Conference 2015:137-152 • Arash Ashari,Shirish Tatikonda, Matthias Boehm, Berthold Reinwald, Keith Campbell, John Keenleyside, P. Sadayappan: On optimizing machine learning workloads via kernel fusion. PPOPP 2015:173-182 • Sebastian Schelter, Juan Soto, Volker Markl, Douglas Burdick, Berthold Reinwald, Alexandre V. Evfimievski: Efficient sample generation for scalable meta learning. ICDE 2015:1191-1202 • Matthias Boehm, Douglas R. Burdick,Alexandre V. Evfimievski, Berthold Reinwald, Frederick R. Reiss, PrithvirajSen, Shirish Tatikonda, Yuanyuan Tian: SystemML's Optimizer: Plan Generation for Large-Scale Machine Learning Programs. IEEE Data Eng. Bull. 37(3):52-62 (2014) • Matthias Boehm, Shirish Tatikonda, Berthold Reinwald, PrithvirajSen, Yuanyuan Tian, Douglas Burdick, Shivakumar Vaithyanathan: Hybrid Parallelization Strategies for Large-Scale Machine Learning in SystemML. PVLDB 7(7): 553-564 (2014) • Peter D. Kirchner, Matthias Boehm, Berthold Reinwald, Daby M. Sow, Michael Schmidt, Deepak S. Turaga, Alain Biem: Large Scale Discriminative Metric Learning. IPDPS Workshops2014:1656-1663 • Yuanyuan Tian, Shirish Tatikonda, Berthold Reinwald: Scalable and Numerically Stable Descriptive Statistics in SystemML. ICDE 2012:1351-1359 • Amol Ghoting, Rajasekar Krishnamurthy,Edwin P. D. Pednault, Berthold Reinwald, Vikas Sindhwani, Shirish Tatikonda, Yuanyuan Tian, Shivakumar Vaithyanathan: SystemML: Declarative machine learning on MapReduce. ICDE 2011:231-242 31 Custom Algorithm Optimizer Resource Elasticity GPU Sampling Numeric Stability Task Parallelism 1st paper on Spark Compression