SlideShare a Scribd company logo
Research and Technology Center North America | Sauptik Dhar, Mohak Shah | 5/21/2017
© 2017 Robert Bosch LLC and affiliates. All rights reserved.
1
ADMM based Scalable Machine Learning on Apache Spark
Sauptik Dhar, Mohak Shah
Bosch AI Research
Data is transformational
Research and Technology Center North America | Sauptik Dhar, Mohak Shah | 5/21/2017
© 2017 Robert Bosch LLC and affiliates. All rights reserved.
2
Image: https://www.slideshare.net/mongodb/internet-of-things-and-big-data-vision-and-concrete-use-cases
Big data, Spark and Status-quo
Research and Technology Center North America | Sauptik Dhar, Mohak Shah | 5/21/2017
© 2017 Robert Bosch LLC and affiliates. All rights reserved.
3
 Challenges
 Learning  (Convex) Optimization
 Current solutions (MLLib/ML packages) adopt
‒ SGD: convergence dependent on step-size, conditionality
‒ LBFGS: Adapting to non-differentiable functions non-trivial
 ADMM (Alternating Direction Method of Multipliers)
 Large problem  (simpler) sub-problems
 Guaranteed convergence and robustness to step-size selection
 Robust to ill-conditioned problems
https://www.simplilearn.com/apache-spark-guide-
for-newbies-article
 ADMM for Spark
 Coverage (ML algorithms)
 Go beyond MLLib/ML for Python community
 Address sub-optimality leading from internal normalization (MLLib/ML)
ADMM advantages
Research and Technology Center North America | Sauptik Dhar, Mohak Shah | 5/21/2017
© 2017 Robert Bosch LLC and affiliates. All rights reserved.
4
 Generic ADMM based formulation: Coverage (ML algorithms)
 Robust Guarantees on Convergence and Accuracy
 Python API’s give accessibility to users, developers and designer
ADMML Package
Research and Technology Center North America | CR/RTC1.3-NA | 5/21/2017
© 2017 Robert Bosch LLC and affiliates. All rights reserved.
5
Developers
ADMM
engine
Loss +
Regularizer
ML
Algorithms
Designers
Users
PythonAPIs
Research and Technology Center North America | CR/RTC1.3-NA | 5/21/2017
© 2017 Robert Bosch LLC and affiliates. All rights reserved.
6
 Given: training data where
 Solve: where,
1
( , )N
i i i
y 
x Dx 
, ( )
{ 1, 1} , ( )
y
regression
classification



 

,,
min ( ( ), ) ( )bb
L f y Rww
x w
,
( ) T
b
f b w
x w x
Methods Loss Function Regularizer
Classification Elastic-net
Group
Logistic Regression
LS-SVM
Squared-Hinge SVM
Regression
Linear Regression
Huber
Pseudo-Huber
{ 1, 1}y  
y
,( ( ), )bL f yw x ( )R w
( )
1
(1 ) log(1 )
T
i i
N y b
i
N e 

 w x
2
1
(1 2 ) (1 ( ))
N T
i ii
N y b
  w x
2
1
(1 2 ) (max(1 ( )))
N T
i ii
N y b
  w x
2
1
(1 2 ) ( ( ))
N T
i ii
N y b
  w x
2
1 2
1 2( ( )) , if ( )
(1 )
( ) (1 2) , else
T T
i i i iN
i T
i i
y b y b
N
y b

 

     

  

w x w x
w x
2 2
1
(1 ) ( ( ))
N T
i ii
N y b 
    w x
2
1
(1 )
2
D j
jj
j
w
w  
 
 
 
 
 
  
2g g
g G
w


0 (L2-reg) 
1 (L1-reg) 
Generic ML formulation
Research and Technology Center North America | CR/RTC1.3-NA | 5/21/2017
© 2017 Robert Bosch LLC and affiliates. All rights reserved.
7
Solve smaller sub-problems which can be easily distributed.
min ( ( ), ) ( )L f y Rww
x w
,
min ( ( ), ) ( ) s.t.L f y R  ww z
x z w z
2: ( ( ), ) ( ) ( 2) ( )L f y R constw
x z w z u
      
1k 
21
argmin L( ( ), ) ( 2)k k k
f y 
   w
w
w x w z u
21 1
argmin ( ) ( 2)k k k
R  
   
z
z z w z u
1 1
( )k k k k 
  u u w z
ADMM algorithm
Augmented Lagrangian,
ADMM Steps at each iteration
 w - update:
 z - update:
 u – update:
Research and Technology Center North America | CR/RTC1.3-NA | 5/21/2017
© 2017 Robert Bosch LLC and affiliates. All rights reserved.
8
 Given with and
 Solve
 ADMM updates,
1
( , )N
i i i
y 
x Dx  y
21 2
2
1argmin ( )
2 21
k T k k
i i
N
y
N i
     

w x w w z u
w
2
21 1
21
1
1 =
argmin
2
1
(1 )
2
( )
; and (1 )
(1 )
j
D jk k k
jj
j
k k
j jk
ji j
j
S
Sz t
t
w z u
z
z
z z
w u

   

   
  
 
    
 
 



  

 

  

1
1 1
;
1 1
and ( )
N NT k k
i i i iD
i i
P P I q y
N N
 
 
     q x x x z u
2
2
1 1
min
1
(1 ) + ( )
2 2
ND Tj
j i ij
j i
y
Nw
w
w x w   
 
 
 
  
 
   
Example (Linear Regression with Elastic-Net)
Example (Logistic Regression with Elastic-Net)
Research and Technology Center North America | CR/RTC1.3-NA | 5/21/2017
© 2017 Robert Bosch LLC and affiliates. All rights reserved.
9
 Given with and
 Solve
 ADMM updates,
 Gradient
 Hessian
1
( , )N
i i i
y 
x Dx  { 1, 1}y  
2
1 1
min
1
(1 ) + log(1 )
2
T
i i
ND yj
jj
j i
e
Nw
w xw
w   
 
 
 
  
 
   
1
2
2
1argmin log(1 )
1
2
Tyk i i
k k
N
e
N i

  

  
x w
w
w
w z u
1 ( )(1 ) k k
i i iiN
y p     w z ux
1 (1 ) T
i i i ii I
N
p p  x x
Algorithm 1: Iterative Algorithm for
Input:
Output :
initialize
while not converged do
return
1k w
, ,k k k
w z u
1k 
w
(0)
, 0k
j wv
( )
( )
( )
( )
( 1) ( ) ( ) 1 ( )
1 (1 )
1
1
1 (1 )
(1 ) ( )
( )
T j
ij
i
Tj
i i i
j k k
i i i
j j j j
p e
P
N
j j
p p IiN
y pi
P



 
 


 
 
    
 
x v
q
xx
x w z u
v v q
1 ( )k j
 vw
Research and Technology Center North America | CR/RTC1.3-NA | 5/21/2017
© 2017 Robert Bosch LLC and affiliates. All rights reserved.
10
• rho_initialize
• rho_adjustment
• w_update
• z_update
admml.py
• mapHv
• _mapHv_logistic_binary
• _mapHv_sq_hinge_binary
• _mapHv_pseudo_huber_regression
• _mapHv_huber_regression
mappers.py
• combineHv
• log1pexpx
• sigmoid
• append_one
• rdd_to_df
• df_to_rdd
• parse_vector
• scale_data
• uniform_scale
• normalize_scale
utils.py
• _ADMM_ML_Algorithm
• ADMMLogistic
• ADMML2SVM
• ADMMLSSVM
• ADMMLeastSquares
• ADMMPseudoHuber
• ADMMHuber
• functionVals
• predict
mlalgs.py
examples.py
users
designers
developers
optimization
Code Structure (UML diagram)
Research and Technology Center North America | CR/RTC1.3-NA | 5/21/2017
© 2017 Robert Bosch LLC and affiliates. All rights reserved.
11
System Configuration:
– No. of nodes = 6 (Hortonworks 2.7)
– No. of cores (per node) = 12 (@ 3.20GHz)
– RAM size (per node) = 64 GB.
– Hard disk size (per node) = 2 TB.
Data : and
 Small Data: No. of samples = 10000000 ( ~5GB )
 Big Data: No. of samples = 100000000 ( ~50GB )
Spark Configuration:
– No. of executors = 15.
– Cores per executor = 2.
– Executor memory = 10 GB.
– Driver memory = 2GB.
20
[ 1,1]U x
1 2 3 4 5 6 7 8 9 10 205( ) 1( ) 5( ) 1( ) ... 2y x x x x x x x x x x x              
(0,1)  
Experiment (Regression)
Research and Technology Center North America | CR/RTC1.3-NA | 5/21/2017
© 2017 Robert Bosch LLC and affiliates. All rights reserved.
12
MLLIB (SGD) ML (L-BFGS) ADMM
Small Data: No. of samples = 10000000 ( ~5GB )
1584.6 (9.6) 290.8 (4.4)* 653.5(0.75)
Big Data: No. of samples = 100000000 ( ~50GB )
2597 (8.5) 5811 (7.6)
Table. Time comparisons ADMM vs. MLLib (in sec)

 ADMM provides competitive computation speeds compared to L-BFGS.
 Initial - selection and advanced adaptive strategies can lead to faster convergence.
For example: following [4] convergence in 5 iterations ~ 485 sec.
 For un-normalized data ML / MLLib internal normalization may lead to sub-optimal
solutions. For example: ML / MLLib : 15.07 and ADMML: 12.34

optf  optf 
* convergence in 1 iteration
convergence in 7 iteration†
†
Experiment (Regression)
 Generic ADMM based formulation: Coverage (ML algorithms)
 Robust Guarantees on Convergence and Accuracy
 Python API’s give accessibility to users, developers and designer
ADMML Package
Research and Technology Center North America | CR/RTC1.3-NA | 5/21/2017
© 2017 Robert Bosch LLC and affiliates. All rights reserved.
13
Developers
ADMM
engine
Loss +
Regularizer
ML
Algorithms
Designers
Users
PythonAPIs
Research and Technology Center North America | CR/RTC1.3-NA | 5/21/2017
© 2017 Robert Bosch LLC and affiliates. All rights reserved.
14
Please cite: Dhar S, Yi C, Ramakrishnan N, Shah M. ADMM based scalable machine learning on Spark. IEEE Big Data 2015.
https://github.com/DL-Benchmarks/ADMML
Contributors: Sauptik Dhar, Naveen Ramakrishnan, Jeff Irion, Jiayi Liu, Unmesh Kurup, Mohak Shah
Current Algorithms
Classification (Elastic/Group - Regularized)
- Logistic Regression
- LS-SVM
- L2-SVM
Regression (Elastic/Group - Regularized)
- Least Squares (i.e. Ridge Regression, Lasso, Elastic-net, Group-Lasso etc.)
- Huber
- Pseudo-Huber
Future Roadmap
- Initial step-size selection.
- Consensus ADMM (coverage for various discontinuous loss functions, like SVMs, alpha- trimmed functions etc.)
- Multiclass algorithms (like, multinomial logistic regression, C&S-SVM etc.)

More Related Content

PDF
Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...
PDF
Lazy Join Optimizations Without Upfront Statistics with Matteo Interlandi
PDF
Time-Evolving Graph Processing On Commodity Clusters
PDF
Spark ML with High Dimensional Labels Michael Zargham and Stefan Panayotov
PDF
A Graph-Based Method For Cross-Entity Threat Detection
PDF
Optimizing Terascale Machine Learning Pipelines with Keystone ML
PPTX
A Deep Dive into Spark SQL's Catalyst Optimizer with Yin Huai
PDF
Sketching Data with T-Digest In Apache Spark: Spark Summit East talk by Erik ...
Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...
Lazy Join Optimizations Without Upfront Statistics with Matteo Interlandi
Time-Evolving Graph Processing On Commodity Clusters
Spark ML with High Dimensional Labels Michael Zargham and Stefan Panayotov
A Graph-Based Method For Cross-Entity Threat Detection
Optimizing Terascale Machine Learning Pipelines with Keystone ML
A Deep Dive into Spark SQL's Catalyst Optimizer with Yin Huai
Sketching Data with T-Digest In Apache Spark: Spark Summit East talk by Erik ...

What's hot (20)

PPTX
Accumulo Summit 2015: Using D4M for rapid prototyping of analytics for Apache...
PDF
Implementing Near-Realtime Datacenter Health Analytics using Model-driven Ver...
PDF
Dynamic Community Detection for Large-scale e-Commerce data with Spark Stream...
PDF
Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models (DB T...
PDF
Flare: Scale Up Spark SQL with Native Compilation and Set Your Data on Fire! ...
PDF
Demystifying DataFrame and Dataset
PDF
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...
PDF
Ray: A Cluster Computing Engine for Reinforcement Learning Applications with ...
PDF
Generalized Linear Models with H2O
PDF
Deep Dive Into Catalyst: Apache Spark 2.0’s Optimizer
PDF
Designing Distributed Machine Learning on Apache Spark
PDF
Inside Apache SystemML by Frederick Reiss
PDF
Real-time Machine Learning Analytics Using Structured Streaming and Kinesis F...
PDF
Deep Dive Into Catalyst: Apache Spark 2.0'S Optimizer
PPTX
Exploiting GPU's for Columnar DataFrrames by Kiran Lonikar
PPTX
Large Scale Machine Learning with Apache Spark
PDF
Interactive Graph Analytics with Spark-(Daniel Darabos, Lynx Analytics)
PDF
GraphFrames: Graph Queries In Spark SQL
PDF
Scaling up data science applications
PDF
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...
Accumulo Summit 2015: Using D4M for rapid prototyping of analytics for Apache...
Implementing Near-Realtime Datacenter Health Analytics using Model-driven Ver...
Dynamic Community Detection for Large-scale e-Commerce data with Spark Stream...
Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models (DB T...
Flare: Scale Up Spark SQL with Native Compilation and Set Your Data on Fire! ...
Demystifying DataFrame and Dataset
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...
Ray: A Cluster Computing Engine for Reinforcement Learning Applications with ...
Generalized Linear Models with H2O
Deep Dive Into Catalyst: Apache Spark 2.0’s Optimizer
Designing Distributed Machine Learning on Apache Spark
Inside Apache SystemML by Frederick Reiss
Real-time Machine Learning Analytics Using Structured Streaming and Kinesis F...
Deep Dive Into Catalyst: Apache Spark 2.0'S Optimizer
Exploiting GPU's for Columnar DataFrrames by Kiran Lonikar
Large Scale Machine Learning with Apache Spark
Interactive Graph Analytics with Spark-(Daniel Darabos, Lynx Analytics)
GraphFrames: Graph Queries In Spark SQL
Scaling up data science applications
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...
Ad

Similar to ADMM-Based Scalable Machine Learning on Apache Spark with Sauptik Dhar and Mohak Shah (20)

PDF
OPTEX MATHEMATICAL MODELING AND MANAGEMENT SYSTEM
PDF
Crude-Oil Scheduling Technology: moving from simulation to optimization
PDF
Distributed Formal Concept Analysis Algorithms Based on an Iterative MapReduc...
PDF
IIBMP2019 講演資料「オープンソースで始める深層学習」
PDF
OPTEX Mathematical Modeling and Management System
PDF
OPTEX - Mathematical Modeling and Management System
PPTX
The Other HPC: High Productivity Computing
PDF
【論文紹介】Relay: A New IR for Machine Learning Frameworks
PDF
UK ATC 2015: Automated Post Processing of Multimodel Optimisation Data
PPTX
PDF
Smooth-and-Dive Accelerator: A Pre-MILP Primal Heuristic applied to Scheduling
PDF
Logarithmic Discrete Wavelet Transform for High-Quality Medical Image Compres...
PDF
A Future for R: Parallel and Distributed Processing in R for Everyone
PPTX
Introduction of Online Machine Learning Algorithms
PPTX
autoTVM
PDF
Automatic and Interpretable Machine Learning with H2O and LIME
PDF
Achitecture Aware Algorithms and Software for Peta and Exascale
PPT
mathematics laboratory lecture 1_matlab.ppt
PDF
Intro to Machine Learning for GPUs
PPTX
A Highly Parallel Semi-Dataflow FPGA Architecture for Large-Scale N-Body Simu...
OPTEX MATHEMATICAL MODELING AND MANAGEMENT SYSTEM
Crude-Oil Scheduling Technology: moving from simulation to optimization
Distributed Formal Concept Analysis Algorithms Based on an Iterative MapReduc...
IIBMP2019 講演資料「オープンソースで始める深層学習」
OPTEX Mathematical Modeling and Management System
OPTEX - Mathematical Modeling and Management System
The Other HPC: High Productivity Computing
【論文紹介】Relay: A New IR for Machine Learning Frameworks
UK ATC 2015: Automated Post Processing of Multimodel Optimisation Data
Smooth-and-Dive Accelerator: A Pre-MILP Primal Heuristic applied to Scheduling
Logarithmic Discrete Wavelet Transform for High-Quality Medical Image Compres...
A Future for R: Parallel and Distributed Processing in R for Everyone
Introduction of Online Machine Learning Algorithms
autoTVM
Automatic and Interpretable Machine Learning with H2O and LIME
Achitecture Aware Algorithms and Software for Peta and Exascale
mathematics laboratory lecture 1_matlab.ppt
Intro to Machine Learning for GPUs
A Highly Parallel Semi-Dataflow FPGA Architecture for Large-Scale N-Body Simu...
Ad

More from Databricks (20)

PPTX
DW Migration Webinar-March 2022.pptx
PPTX
Data Lakehouse Symposium | Day 1 | Part 1
PPT
Data Lakehouse Symposium | Day 1 | Part 2
PPTX
Data Lakehouse Symposium | Day 2
PPTX
Data Lakehouse Symposium | Day 4
PDF
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
PDF
Democratizing Data Quality Through a Centralized Platform
PDF
Learn to Use Databricks for Data Science
PDF
Why APM Is Not the Same As ML Monitoring
PDF
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
PDF
Stage Level Scheduling Improving Big Data and AI Integration
PDF
Simplify Data Conversion from Spark to TensorFlow and PyTorch
PDF
Scaling your Data Pipelines with Apache Spark on Kubernetes
PDF
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
PDF
Sawtooth Windows for Feature Aggregations
PDF
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
PDF
Re-imagine Data Monitoring with whylogs and Spark
PDF
Raven: End-to-end Optimization of ML Prediction Queries
PDF
Processing Large Datasets for ADAS Applications using Apache Spark
PDF
Massive Data Processing in Adobe Using Delta Lake
DW Migration Webinar-March 2022.pptx
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 4
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Democratizing Data Quality Through a Centralized Platform
Learn to Use Databricks for Data Science
Why APM Is Not the Same As ML Monitoring
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
Stage Level Scheduling Improving Big Data and AI Integration
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Sawtooth Windows for Feature Aggregations
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Re-imagine Data Monitoring with whylogs and Spark
Raven: End-to-end Optimization of ML Prediction Queries
Processing Large Datasets for ADAS Applications using Apache Spark
Massive Data Processing in Adobe Using Delta Lake

Recently uploaded (20)

PPTX
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
PPTX
climate analysis of Dhaka ,Banglades.pptx
PPTX
Introduction to Knowledge Engineering Part 1
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PDF
Lecture1 pattern recognition............
PPTX
Qualitative Qantitative and Mixed Methods.pptx
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PPT
Reliability_Chapter_ presentation 1221.5784
PPTX
Computer network topology notes for revision
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PDF
Mega Projects Data Mega Projects Data
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PDF
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
PPTX
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
Business Ppt On Nestle.pptx huunnnhhgfvu
climate analysis of Dhaka ,Banglades.pptx
Introduction to Knowledge Engineering Part 1
IBA_Chapter_11_Slides_Final_Accessible.pptx
Lecture1 pattern recognition............
Qualitative Qantitative and Mixed Methods.pptx
Miokarditis (Inflamasi pada Otot Jantung)
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
Reliability_Chapter_ presentation 1221.5784
Computer network topology notes for revision
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
STUDY DESIGN details- Lt Col Maksud (21).pptx
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
Mega Projects Data Mega Projects Data
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg

ADMM-Based Scalable Machine Learning on Apache Spark with Sauptik Dhar and Mohak Shah

  • 1. Research and Technology Center North America | Sauptik Dhar, Mohak Shah | 5/21/2017 © 2017 Robert Bosch LLC and affiliates. All rights reserved. 1 ADMM based Scalable Machine Learning on Apache Spark Sauptik Dhar, Mohak Shah Bosch AI Research
  • 2. Data is transformational Research and Technology Center North America | Sauptik Dhar, Mohak Shah | 5/21/2017 © 2017 Robert Bosch LLC and affiliates. All rights reserved. 2 Image: https://www.slideshare.net/mongodb/internet-of-things-and-big-data-vision-and-concrete-use-cases
  • 3. Big data, Spark and Status-quo Research and Technology Center North America | Sauptik Dhar, Mohak Shah | 5/21/2017 © 2017 Robert Bosch LLC and affiliates. All rights reserved. 3  Challenges  Learning  (Convex) Optimization  Current solutions (MLLib/ML packages) adopt ‒ SGD: convergence dependent on step-size, conditionality ‒ LBFGS: Adapting to non-differentiable functions non-trivial  ADMM (Alternating Direction Method of Multipliers)  Large problem  (simpler) sub-problems  Guaranteed convergence and robustness to step-size selection  Robust to ill-conditioned problems https://www.simplilearn.com/apache-spark-guide- for-newbies-article
  • 4.  ADMM for Spark  Coverage (ML algorithms)  Go beyond MLLib/ML for Python community  Address sub-optimality leading from internal normalization (MLLib/ML) ADMM advantages Research and Technology Center North America | Sauptik Dhar, Mohak Shah | 5/21/2017 © 2017 Robert Bosch LLC and affiliates. All rights reserved. 4
  • 5.  Generic ADMM based formulation: Coverage (ML algorithms)  Robust Guarantees on Convergence and Accuracy  Python API’s give accessibility to users, developers and designer ADMML Package Research and Technology Center North America | CR/RTC1.3-NA | 5/21/2017 © 2017 Robert Bosch LLC and affiliates. All rights reserved. 5 Developers ADMM engine Loss + Regularizer ML Algorithms Designers Users PythonAPIs
  • 6. Research and Technology Center North America | CR/RTC1.3-NA | 5/21/2017 © 2017 Robert Bosch LLC and affiliates. All rights reserved. 6  Given: training data where  Solve: where, 1 ( , )N i i i y  x Dx  , ( ) { 1, 1} , ( ) y regression classification       ,, min ( ( ), ) ( )bb L f y Rww x w , ( ) T b f b w x w x Methods Loss Function Regularizer Classification Elastic-net Group Logistic Regression LS-SVM Squared-Hinge SVM Regression Linear Regression Huber Pseudo-Huber { 1, 1}y   y ,( ( ), )bL f yw x ( )R w ( ) 1 (1 ) log(1 ) T i i N y b i N e    w x 2 1 (1 2 ) (1 ( )) N T i ii N y b   w x 2 1 (1 2 ) (max(1 ( ))) N T i ii N y b   w x 2 1 (1 2 ) ( ( )) N T i ii N y b   w x 2 1 2 1 2( ( )) , if ( ) (1 ) ( ) (1 2) , else T T i i i iN i T i i y b y b N y b                w x w x w x 2 2 1 (1 ) ( ( )) N T i ii N y b      w x 2 1 (1 ) 2 D j jj j w w                2g g g G w   0 (L2-reg)  1 (L1-reg)  Generic ML formulation
  • 7. Research and Technology Center North America | CR/RTC1.3-NA | 5/21/2017 © 2017 Robert Bosch LLC and affiliates. All rights reserved. 7 Solve smaller sub-problems which can be easily distributed. min ( ( ), ) ( )L f y Rww x w , min ( ( ), ) ( ) s.t.L f y R  ww z x z w z 2: ( ( ), ) ( ) ( 2) ( )L f y R constw x z w z u        1k  21 argmin L( ( ), ) ( 2)k k k f y     w w w x w z u 21 1 argmin ( ) ( 2)k k k R       z z z w z u 1 1 ( )k k k k    u u w z ADMM algorithm Augmented Lagrangian, ADMM Steps at each iteration  w - update:  z - update:  u – update:
  • 8. Research and Technology Center North America | CR/RTC1.3-NA | 5/21/2017 © 2017 Robert Bosch LLC and affiliates. All rights reserved. 8  Given with and  Solve  ADMM updates, 1 ( , )N i i i y  x Dx  y 21 2 2 1argmin ( ) 2 21 k T k k i i N y N i        w x w w z u w 2 21 1 21 1 1 = argmin 2 1 (1 ) 2 ( ) ; and (1 ) (1 ) j D jk k k jj j k k j jk ji j j S Sz t t w z u z z z z w u                                       1 1 1 ; 1 1 and ( ) N NT k k i i i iD i i P P I q y N N          q x x x z u 2 2 1 1 min 1 (1 ) + ( ) 2 2 ND Tj j i ij j i y Nw w w x w                   Example (Linear Regression with Elastic-Net)
  • 9. Example (Logistic Regression with Elastic-Net) Research and Technology Center North America | CR/RTC1.3-NA | 5/21/2017 © 2017 Robert Bosch LLC and affiliates. All rights reserved. 9  Given with and  Solve  ADMM updates,  Gradient  Hessian 1 ( , )N i i i y  x Dx  { 1, 1}y   2 1 1 min 1 (1 ) + log(1 ) 2 T i i ND yj jj j i e Nw w xw w                   1 2 2 1argmin log(1 ) 1 2 Tyk i i k k N e N i         x w w w w z u 1 ( )(1 ) k k i i iiN y p     w z ux 1 (1 ) T i i i ii I N p p  x x Algorithm 1: Iterative Algorithm for Input: Output : initialize while not converged do return 1k w , ,k k k w z u 1k  w (0) , 0k j wv ( ) ( ) ( ) ( ) ( 1) ( ) ( ) 1 ( ) 1 (1 ) 1 1 1 (1 ) (1 ) ( ) ( ) T j ij i Tj i i i j k k i i i j j j j p e P N j j p p IiN y pi P                     x v q xx x w z u v v q 1 ( )k j  vw
  • 10. Research and Technology Center North America | CR/RTC1.3-NA | 5/21/2017 © 2017 Robert Bosch LLC and affiliates. All rights reserved. 10 • rho_initialize • rho_adjustment • w_update • z_update admml.py • mapHv • _mapHv_logistic_binary • _mapHv_sq_hinge_binary • _mapHv_pseudo_huber_regression • _mapHv_huber_regression mappers.py • combineHv • log1pexpx • sigmoid • append_one • rdd_to_df • df_to_rdd • parse_vector • scale_data • uniform_scale • normalize_scale utils.py • _ADMM_ML_Algorithm • ADMMLogistic • ADMML2SVM • ADMMLSSVM • ADMMLeastSquares • ADMMPseudoHuber • ADMMHuber • functionVals • predict mlalgs.py examples.py users designers developers optimization Code Structure (UML diagram)
  • 11. Research and Technology Center North America | CR/RTC1.3-NA | 5/21/2017 © 2017 Robert Bosch LLC and affiliates. All rights reserved. 11 System Configuration: – No. of nodes = 6 (Hortonworks 2.7) – No. of cores (per node) = 12 (@ 3.20GHz) – RAM size (per node) = 64 GB. – Hard disk size (per node) = 2 TB. Data : and  Small Data: No. of samples = 10000000 ( ~5GB )  Big Data: No. of samples = 100000000 ( ~50GB ) Spark Configuration: – No. of executors = 15. – Cores per executor = 2. – Executor memory = 10 GB. – Driver memory = 2GB. 20 [ 1,1]U x 1 2 3 4 5 6 7 8 9 10 205( ) 1( ) 5( ) 1( ) ... 2y x x x x x x x x x x x               (0,1)   Experiment (Regression)
  • 12. Research and Technology Center North America | CR/RTC1.3-NA | 5/21/2017 © 2017 Robert Bosch LLC and affiliates. All rights reserved. 12 MLLIB (SGD) ML (L-BFGS) ADMM Small Data: No. of samples = 10000000 ( ~5GB ) 1584.6 (9.6) 290.8 (4.4)* 653.5(0.75) Big Data: No. of samples = 100000000 ( ~50GB ) 2597 (8.5) 5811 (7.6) Table. Time comparisons ADMM vs. MLLib (in sec)   ADMM provides competitive computation speeds compared to L-BFGS.  Initial - selection and advanced adaptive strategies can lead to faster convergence. For example: following [4] convergence in 5 iterations ~ 485 sec.  For un-normalized data ML / MLLib internal normalization may lead to sub-optimal solutions. For example: ML / MLLib : 15.07 and ADMML: 12.34  optf  optf  * convergence in 1 iteration convergence in 7 iteration† † Experiment (Regression)
  • 13.  Generic ADMM based formulation: Coverage (ML algorithms)  Robust Guarantees on Convergence and Accuracy  Python API’s give accessibility to users, developers and designer ADMML Package Research and Technology Center North America | CR/RTC1.3-NA | 5/21/2017 © 2017 Robert Bosch LLC and affiliates. All rights reserved. 13 Developers ADMM engine Loss + Regularizer ML Algorithms Designers Users PythonAPIs
  • 14. Research and Technology Center North America | CR/RTC1.3-NA | 5/21/2017 © 2017 Robert Bosch LLC and affiliates. All rights reserved. 14 Please cite: Dhar S, Yi C, Ramakrishnan N, Shah M. ADMM based scalable machine learning on Spark. IEEE Big Data 2015. https://github.com/DL-Benchmarks/ADMML Contributors: Sauptik Dhar, Naveen Ramakrishnan, Jeff Irion, Jiayi Liu, Unmesh Kurup, Mohak Shah Current Algorithms Classification (Elastic/Group - Regularized) - Logistic Regression - LS-SVM - L2-SVM Regression (Elastic/Group - Regularized) - Least Squares (i.e. Ridge Regression, Lasso, Elastic-net, Group-Lasso etc.) - Huber - Pseudo-Huber Future Roadmap - Initial step-size selection. - Consensus ADMM (coverage for various discontinuous loss functions, like SVMs, alpha- trimmed functions etc.) - Multiclass algorithms (like, multinomial logistic regression, C&S-SVM etc.)