SlideShare a Scribd company logo
AUSTRALIA CHINA INDIA ITALY MALAYSIA SOUTH AFRICA monash.edu
Algorithmic Acceleration of Parallel ALS
for Collaborative Filtering:
“Speeding up Distributed Big Data
Recommendation in Spark”
Hans De Sterck1,2, Manda Winlaw2, Mike Hynes2,
Anthony Caterini2
1 Monash University, School of Mathematical Sciences
2 University of Waterloo, Canada, Applied Mathematics
ICPADS 2015, Melbourne, December 2015
hans.desterck@monash.edu	
   ICPADS	
  2015	
  
a talk on algorithms for parallel big data
analytics ...
1.  distributed computing frameworks for Big Data
analytics – Spark (vs HPC, MPI, Hadoop, ...)
2.  recommendation – the Netflix prize problem
3.  our contribution: an algorithm to speed up ALS
for recommendation
4.  our contribution: efficient parallel speedup of ALS
recommendation in Spark
hans.desterck@monash.edu	
   ICPADS	
  2015	
  
1. distributed computing frameworks for Big Data
analytics – Spark (vs HPC, MPI, Hadoop, ...)
■ my research background:
– scalable	
  scien>fic	
  compu>ng	
  algorithms	
  
(HPC)	
  
– e.g.,	
  parallel	
  algebraic	
  mul>grid	
  (AMG)	
  for	
  
solving	
  linear	
  systems	
  Ax=b	
  
– e.g.,	
  on	
  Blue	
  Gene	
  (100,000s	
  of	
  cores),	
  MPI	
  
hans.desterck@monash.edu	
   ICPADS	
  2015	
  
distributed computing frameworks for Big Data
analytics – Spark (vs HPC, MPI, Hadoop, ...)
■ more recently: there is a new game of large-scale
distributed computing in town!
– Google	
  PageRank	
  (1998)	
  (already	
  17	
  years...)	
  
•  commodity	
  hardware	
  (fault-­‐tolerant	
  ...)	
  
•  compute	
  where	
  the	
  data	
  is	
  (data-­‐locality)	
  
•  scalability	
  is	
  essen>al!	
  (just	
  like	
  in	
  HPC)	
  
•  beginning	
  of	
  “Big	
  Data”,	
  “Cloud”,	
  “Data	
  
Analy>cs”,	
  ...	
  
– new	
  Big	
  Data	
  analy>cs	
  applica>ons	
  are	
  now	
  
appearing	
  everywhere!	
  
web	
  
crawl	
  
hans.desterck@monash.edu	
   ICPADS	
  2015	
  
distributed computing frameworks for Big Data
analytics – Spark (vs HPC, MPI, Hadoop, ...)
■ “Data Analytics” has grown its own “eco-system”,
“culture”, “software stack” (very different from HPC!)
•  MapReduce	
  
•  Hadoop	
  
•  Spark,	
  ...	
  
•  data	
  locality	
  
•  “implicit”	
  communica>on	
  (restricted	
  (vs	
  MPI),	
  “shuffle”)	
  
•  not	
  fast	
  (vs	
  HPC),	
  but	
  scalable	
  
•  fault-­‐tolerant	
  (replicate	
  data,	
  restart	
  tasks)	
  
(from	
  “Spark:	
  In-­‐Memory	
  Cluster	
  Compu>ng	
  for	
  Itera>ve	
  and	
  Interac>ve	
  Applica>ons”)	
  
hans.desterck@monash.edu	
   ICPADS	
  2015	
  
distributed computing frameworks for Big Data
analytics – Spark (vs HPC, MPI, Hadoop, ...)
■ MapReduce/Hadoop:
– major	
  disadvantage	
  for	
  itera>ve	
  algorithms:	
  writes	
  
everything	
  to	
  disk	
  between	
  itera>ons!,	
  extremely	
  
slow	
  (and:	
  not	
  programmer-­‐friendly)	
  
è only	
  very	
  simple	
  algorithms	
  are	
  feasible	
  in	
  
	
  MapReduce	
  
■ the Spark “revolution”:
– store	
  state	
  between	
  itera>ons	
  in	
  memory	
  
– more	
  general	
  opera>ons	
  than	
  Hadoop/MapReduce	
  
hans.desterck@monash.edu	
   ICPADS	
  2015	
  
distributed computing frameworks for Big Data
analytics – Spark (vs HPC, MPI, Hadoop, ...)
hans.desterck@monash.edu	
   ICPADS	
  2015	
  
distributed computing frameworks for Big Data
analytics – Spark (vs HPC, MPI, Hadoop, ...)
■ the Spark “revolution”:
– store	
  state	
  between	
  itera>ons	
  in	
  memory	
  
– more	
  general	
  opera>ons	
  than	
  Hadoop/MapReduce	
  
èmuch	
  faster	
  than	
  Hadoop!	
  (but	
  s>ll	
  much	
  slower	
  than	
  MPI)	
  
•  data	
  locality	
  
•  scalable	
  
•  fault-­‐tolerant	
  
•  “implicit”	
  communica>on	
  (restricted	
  (vs	
  MPI),	
  “shuffle”)	
  
sea change (vs Hadoop): more advanced iterative algorithms for
Data Analytics/Machine Learning are feasible in Spark
hans.desterck@monash.edu	
   ICPADS	
  2015	
  
k	
  
2. recommendation – the Netflix prize problem
■ sparse ratings matrix R
■ k latent features: user factors U, movie factors M
■ similar to SVD, but only match known ratings
■ minimize f=||R – UTM||2’ , and UTM gives predicted
ratings (collaborative filtering)
R	
  
n	
  users	
  
m	
  movies	
  
1	
  	
  	
  	
  2	
  	
  	
  	
  	
  	
  	
  	
  	
  5	
  i	
  
j	
  
≈	
   UT	
  
i	
  
j	
  
x	
  	
  x	
  	
  x	
  
x	
  
x	
  
x	
  M	
  
k	
  
hans.desterck@monash.edu	
   ICPADS	
  2015	
  
k	
  
recommendation – the Netflix prize problem
minimize f=||R – UTM||2’ : alternating least squares (ALS)
■ minimize ||R – U(0)T M(0)||2’ : freeze U(0), compute M(0) (LS)
■ minimize ||R – U(1)TM(0)||2’ : freeze M(0), compute U(1) (LS)
■ ... : local least squares problems (parallelizable)
R	
  
n	
  users	
  
m	
  movies	
  
1	
  	
  	
  	
  2	
  	
  	
  	
  	
  	
  	
  	
  	
  5	
  i	
  
j	
  
≈	
   UT	
  
i	
  
j	
  
x	
  	
  x	
  	
  x	
  
x	
  
x	
  
x	
  M	
  
k	
  
hans.desterck@monash.edu	
   ICPADS	
  2015	
  
recommendation – the Netflix prize problem
minimize f=||R – UTM||2’ : alternating least squares (ALS)
■ ALS can converge very slowly (block nonlinear Gauss-Seidel)
(g	
  =	
  grad	
  f	
  =	
  0)	
  
hans.desterck@monash.edu	
   ICPADS	
  2015	
  
3. our contribution: an algorithm to speed up ALS
for recommendation
min f(U,M)=||R – UTM||2’ , or g(U,M) = grad f(U,M) = 0
■ nonlinear conjugate gradient (NCG) optimization
algorithm for min f(x):
hans.desterck@monash.edu	
   ICPADS	
  2015	
  
our contribution: an algorithm to speed up ALS
for recommendation
min f(x)=||R – UTM||2’ , or g(x) = grad f(x) = 0
■ our idea: use ALS as a nonlinear preconditioner for NCG
define a preconditioned gradient direction:
(De	
  Sterck	
  and	
  Winlaw,	
  2015)	
  
hans.desterck@monash.edu	
   ICPADS	
  2015	
  
our contribution: an algorithm to speed up ALS
for recommendation
min f(x)=||R – UTM||2’ , or g(x) = grad f(x) = 0
■ our idea: use ALS as a nonlinear preconditioner for NCG
(NCG	
  accelerates	
  ALS)	
  
hans.desterck@monash.edu	
   ICPADS	
  2015	
  
our contribution: an algorithm to speed up ALS
for recommendation
min f(x)=||R – UTM||2’ , or g(x) = grad f(x) = 0
■ our idea: use ALS as a nonlinear preconditioner for NCG
hans.desterck@monash.edu	
   ICPADS	
  2015	
  
our contribution: an algorithm to speed up ALS
for recommendation
min f(x)=||R – UTM||2’ , or g(x) = grad f(x) = 0
■ our idea: use ALS as a nonlinear preconditioner for NCG
ALS-­‐NCG	
  is	
  much	
  
faster	
  than	
  the	
  widely	
  
used	
  ALS!	
  
hans.desterck@monash.edu	
   ICPADS	
  2015	
  
4. our contribution: efficient parallel speedup of
ALS recommendation in Spark
■ Spark “Resilient Distributed Datasets” (RDDs)
– par>>oned	
  collec>on	
  of	
  (key,	
  value)	
  
pairs	
  
– can	
  be	
  cached	
  in	
  memory	
  
– built	
  using	
  data	
  flow	
  operators	
  on	
  
other	
  RDDs	
  (map,	
  join,	
  group-­‐by-­‐key,	
  
reduce-­‐by-­‐key,	
  ...)	
  
– fault-­‐tolerance:	
  rebuild	
  from	
  lineage	
  
– “implicit”	
  communica>on	
  (shuffling)	
  	
  	
  
(≠	
  MPI)	
  
key	
   (value1,	
  value2,	
  ...)	
  
0	
  
1	
  
2	
  
3	
  
hans.desterck@monash.edu	
   ICPADS	
  2015	
  
our contribution: efficient parallel speedup of ALS
recommendation in Spark
■ efficient Spark programming: similar challenges as efficient
GPU programming with CUDA!
– of	
  course,	
  they	
  have	
  different	
  design	
  objec>ves	
  (GPU:	
  
close	
  to	
  metal,	
  as	
  fast	
  as	
  possible;	
  Spark:	
  scalable,	
  fault-­‐tolerant,	
  data	
  locality...)	
  
– but	
  ...	
  similari>es	
  in	
  how	
  one	
  gets	
  good	
  performance:	
  
•  Spark,	
  CUDA:	
  it	
  is	
  easy	
  to	
  write	
  code	
  that	
  produces	
  the	
  
correct	
  result	
  (but	
  may	
  be	
  very	
  far	
  from	
  achievable	
  speed)	
  
•  	
  Spark,	
  CUDA:	
  it	
  is	
  very	
  hard	
  to	
  write	
  efficient	
  code!	
  
–  implementa>on	
  choices	
  that	
  are	
  crucial	
  for	
  performance	
  are	
  most	
  
ofen	
  not	
  explicit	
  in	
  the	
  language	
  
–  programmer	
  needs	
  very	
  extensive	
  “under	
  the	
  hood”	
  knowledge	
  to	
  
write	
  efficient	
  code	
  
–  this	
  is	
  a	
  research	
  topic	
  (also	
  for	
  Spark),	
  moving	
  target	
  
hans.desterck@monash.edu	
   ICPADS	
  2015	
  
our contribution: efficient parallel speedup of ALS
recommendation in Spark
■ existing implementation of ALS in Spark (Chris Johnson,
Spotify) minimize f=||R – UTM||2’
– store	
  both	
  R	
  and	
  RT	
  
– local	
  LS	
  problems:	
  to	
  update	
  user	
  factor	
  i,	
  need	
  all	
  
movie	
  factors	
  j	
  that	
  i	
  has	
  rated	
  (shuffle!)	
  (efficient)	
  
R	
  
0	
  
1	
  
2	
  
3	
  
0	
   1	
   2	
   3	
  
RT	
  
0	
   1	
   2	
   3	
  
M	
  
U	
  
1	
  0	
   2	
   3	
  
i
j1 j2
hans.desterck@monash.edu	
   ICPADS	
  2015	
  
our contribution: efficient parallel speedup of ALS
recommendation in Spark
■ our work: efficient parallel implementation of ALS-NCG in Spark
minimize f(x)=||R – UTM||2’
– store	
  our	
  vectors	
  x	
  and	
  g	
  
	
  consistent	
  with	
  ALS	
  RDDs,	
  
	
  and	
  employ	
  similar	
  efficient	
  
	
  shuffling	
  scheme	
  for	
  gradient	
  
– BLAS	
  vector	
  opera>ons	
  
– line	
  search:	
  f(x)	
  is	
  a	
  polynomial	
  
	
  of	
  degree	
  4:	
  compute	
  
	
  coefficients	
  once	
  in	
  parallel	
  
U	
  
hans.desterck@monash.edu	
   ICPADS	
  2015	
  
our contribution: efficient parallel speedup of ALS
recommendation in Spark
■ performance: linear granularity scaling for ALS-NCG as for ALS
(no	
  new	
  parallel	
  
boFlenecks	
  for	
  the	
  
more	
  advanced	
  
algorithm)	
  
hans.desterck@monash.edu	
   ICPADS	
  2015	
  
our contribution: efficient parallel speedup of ALS
recommendation in Spark
■ performance: ALS-NCG much faster than ALS (20M MovieLens
data, 8 nodes/128 cores)
hans.desterck@monash.edu	
   ICPADS	
  2015	
  
our contribution: efficient parallel speedup of ALS
recommendation in Spark
■ performance: ALS-NCG speeds up ALS on 16 nodes/256 cores
in Spark for 800M ratings by a factor of about 5
(great	
  speedup,	
  in	
  parallel,	
  
in	
  Spark,	
  for	
  large	
  problem	
  
on	
  256	
  cores)	
  
hans.desterck@monash.edu	
   ICPADS	
  2015	
  
some general conclusions ...
■ Spark enables advanced algorithms for Big Data analytics
(linear algebra, optimization, machine learning, ...) (lots of
work: investigate algorithms, implementations, scalability, ...
in Spark)
■ Spark offers a suitable environment for compute-intensive
work!
■ slower than MPI/HPC, but data locality, fault-tolerance,
situated within Big Data “eco-system” (HDFS data, familiar
software stack, ...)
■ will HPC and Big Data hardware/software converge? (also
for “exascale” ...), and if so, which aspects of the Spark
(and others ...) or MPI/HPC approaches will prevail?

More Related Content

PPTX
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
PDF
Collaborative Filtering with Spark
PDF
Recommender Systems with Apache Spark's ALS Function
PDF
Matrix Factorization In Recommender Systems
PDF
CF Models for Music Recommendations At Spotify
PDF
Algorithmic Music Recommendations at Spotify
PDF
Entity2rec recsys
PDF
Next directions in Mahout's recommenders
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Collaborative Filtering with Spark
Recommender Systems with Apache Spark's ALS Function
Matrix Factorization In Recommender Systems
CF Models for Music Recommendations At Spotify
Algorithmic Music Recommendations at Spotify
Entity2rec recsys
Next directions in Mahout's recommenders

What's hot (20)

PDF
Building Data Pipelines for Music Recommendations at Spotify
PPTX
Deep Learning in Recommender Systems - RecSys Summer School 2017
PPTX
Talk@rmit 09112017
PDF
Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16
PDF
Joey gonzalez, graph lab, m lconf 2013
PDF
Corinna Cortes, Head of Research, Google, at MLconf NYC 2017
PDF
Melanie Warrick, Deep Learning Engineer, Skymind.io at MLconf SF - 11/13/15
PDF
A Primer on Entity Resolution
PDF
[Paper reading] L-SHAPLEY AND C-SHAPLEY: EFFICIENT MODEL INTERPRETATION FOR S...
PDF
Foundations: Artificial Neural Networks
PPTX
Angular and Deep Learning
PPTX
Deep learning with TensorFlow
PDF
Sergei Vassilvitskii, Research Scientist, Google at MLconf NYC - 4/15/16
PDF
Deep learning and image analytics using Python by Dr Sanparit
PDF
Josh Patterson MLconf slides
PDF
Deep Learning for Recommender Systems RecSys2017 Tutorial
PDF
Dictionary Learning for Massive Matrix Factorization
PPTX
Daniel Shank, Data Scientist, Talla at MLconf SF 2016
PPTX
Neural Learning to Rank
PDF
Recommender Systems from A to Z – Model Evaluation
Building Data Pipelines for Music Recommendations at Spotify
Deep Learning in Recommender Systems - RecSys Summer School 2017
Talk@rmit 09112017
Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16
Joey gonzalez, graph lab, m lconf 2013
Corinna Cortes, Head of Research, Google, at MLconf NYC 2017
Melanie Warrick, Deep Learning Engineer, Skymind.io at MLconf SF - 11/13/15
A Primer on Entity Resolution
[Paper reading] L-SHAPLEY AND C-SHAPLEY: EFFICIENT MODEL INTERPRETATION FOR S...
Foundations: Artificial Neural Networks
Angular and Deep Learning
Deep learning with TensorFlow
Sergei Vassilvitskii, Research Scientist, Google at MLconf NYC - 4/15/16
Deep learning and image analytics using Python by Dr Sanparit
Josh Patterson MLconf slides
Deep Learning for Recommender Systems RecSys2017 Tutorial
Dictionary Learning for Massive Matrix Factorization
Daniel Shank, Data Scientist, Talla at MLconf SF 2016
Neural Learning to Rank
Recommender Systems from A to Z – Model Evaluation
Ad

Similar to Speeding up Distributed Big Data Recommendation in Spark (20)

PDF
Large scale logistic regression and linear support vector machines using spark
PPTX
Swift Parallel Scripting for High-Performance Workflow
PDF
Big Data Analytics and Ubiquitous computing
PPTX
PDF
Intel realtime analytics_spark
PDF
Advanced Data Science on Spark-(Reza Zadeh, Stanford)
PPTX
Evolution of spark framework for simplifying data analysis.
PDF
Data streaming fundamentals- EUDAT Summer School (Giuseppe Fiameni, CINECA)
PPTX
Big data analytics_beyond_hadoop_public_18_july_2013
PDF
A look under the hood at Apache Spark's API and engine evolutions
PDF
An introduction To Apache Spark
PPTX
Next generation analytics with yarn, spark and graph lab
PDF
EuroMPI 2016 Keynote: How Can MPI Fit Into Today's Big Computing
PDF
Ling liu part 02:big graph processing
PPTX
Yarn spark next_gen_hadoop_8_jan_2014
PDF
Big learning 1.2
PPTX
Stratosphere with big_data_analytics
PDF
Big data distributed processing: Spark introduction
PPTX
Distributed Deep Learning + others for Spark Meetup
PPTX
Apache spark sneha challa- google pittsburgh-aug 25th
Large scale logistic regression and linear support vector machines using spark
Swift Parallel Scripting for High-Performance Workflow
Big Data Analytics and Ubiquitous computing
Intel realtime analytics_spark
Advanced Data Science on Spark-(Reza Zadeh, Stanford)
Evolution of spark framework for simplifying data analysis.
Data streaming fundamentals- EUDAT Summer School (Giuseppe Fiameni, CINECA)
Big data analytics_beyond_hadoop_public_18_july_2013
A look under the hood at Apache Spark's API and engine evolutions
An introduction To Apache Spark
Next generation analytics with yarn, spark and graph lab
EuroMPI 2016 Keynote: How Can MPI Fit Into Today's Big Computing
Ling liu part 02:big graph processing
Yarn spark next_gen_hadoop_8_jan_2014
Big learning 1.2
Stratosphere with big_data_analytics
Big data distributed processing: Spark introduction
Distributed Deep Learning + others for Spark Meetup
Apache spark sneha challa- google pittsburgh-aug 25th
Ad

Recently uploaded (20)

PDF
Business Analytics and business intelligence.pdf
PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
PPTX
Database Infoormation System (DBIS).pptx
PPTX
1_Introduction to advance data techniques.pptx
PPTX
Supervised vs unsupervised machine learning algorithms
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PPTX
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PDF
Introduction to Data Science and Data Analysis
PDF
Introduction to the R Programming Language
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PPT
Reliability_Chapter_ presentation 1221.5784
PPTX
Introduction to machine learning and Linear Models
PPTX
IB Computer Science - Internal Assessment.pptx
PPTX
Market Analysis -202507- Wind-Solar+Hybrid+Street+Lights+for+the+North+Amer...
PPT
ISS -ESG Data flows What is ESG and HowHow
PDF
Clinical guidelines as a resource for EBP(1).pdf
PDF
Fluorescence-microscope_Botany_detailed content
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
Business Analytics and business intelligence.pdf
Business Ppt On Nestle.pptx huunnnhhgfvu
Database Infoormation System (DBIS).pptx
1_Introduction to advance data techniques.pptx
Supervised vs unsupervised machine learning algorithms
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
Introduction to Data Science and Data Analysis
Introduction to the R Programming Language
IBA_Chapter_11_Slides_Final_Accessible.pptx
Reliability_Chapter_ presentation 1221.5784
Introduction to machine learning and Linear Models
IB Computer Science - Internal Assessment.pptx
Market Analysis -202507- Wind-Solar+Hybrid+Street+Lights+for+the+North+Amer...
ISS -ESG Data flows What is ESG and HowHow
Clinical guidelines as a resource for EBP(1).pdf
Fluorescence-microscope_Botany_detailed content
MODULE 8 - DISASTER risk PREPAREDNESS.pptx

Speeding up Distributed Big Data Recommendation in Spark

  • 1. AUSTRALIA CHINA INDIA ITALY MALAYSIA SOUTH AFRICA monash.edu Algorithmic Acceleration of Parallel ALS for Collaborative Filtering: “Speeding up Distributed Big Data Recommendation in Spark” Hans De Sterck1,2, Manda Winlaw2, Mike Hynes2, Anthony Caterini2 1 Monash University, School of Mathematical Sciences 2 University of Waterloo, Canada, Applied Mathematics ICPADS 2015, Melbourne, December 2015
  • 2. hans.desterck@monash.edu   ICPADS  2015   a talk on algorithms for parallel big data analytics ... 1.  distributed computing frameworks for Big Data analytics – Spark (vs HPC, MPI, Hadoop, ...) 2.  recommendation – the Netflix prize problem 3.  our contribution: an algorithm to speed up ALS for recommendation 4.  our contribution: efficient parallel speedup of ALS recommendation in Spark
  • 3. hans.desterck@monash.edu   ICPADS  2015   1. distributed computing frameworks for Big Data analytics – Spark (vs HPC, MPI, Hadoop, ...) ■ my research background: – scalable  scien>fic  compu>ng  algorithms   (HPC)   – e.g.,  parallel  algebraic  mul>grid  (AMG)  for   solving  linear  systems  Ax=b   – e.g.,  on  Blue  Gene  (100,000s  of  cores),  MPI  
  • 4. hans.desterck@monash.edu   ICPADS  2015   distributed computing frameworks for Big Data analytics – Spark (vs HPC, MPI, Hadoop, ...) ■ more recently: there is a new game of large-scale distributed computing in town! – Google  PageRank  (1998)  (already  17  years...)   •  commodity  hardware  (fault-­‐tolerant  ...)   •  compute  where  the  data  is  (data-­‐locality)   •  scalability  is  essen>al!  (just  like  in  HPC)   •  beginning  of  “Big  Data”,  “Cloud”,  “Data   Analy>cs”,  ...   – new  Big  Data  analy>cs  applica>ons  are  now   appearing  everywhere!   web   crawl  
  • 5. hans.desterck@monash.edu   ICPADS  2015   distributed computing frameworks for Big Data analytics – Spark (vs HPC, MPI, Hadoop, ...) ■ “Data Analytics” has grown its own “eco-system”, “culture”, “software stack” (very different from HPC!) •  MapReduce   •  Hadoop   •  Spark,  ...   •  data  locality   •  “implicit”  communica>on  (restricted  (vs  MPI),  “shuffle”)   •  not  fast  (vs  HPC),  but  scalable   •  fault-­‐tolerant  (replicate  data,  restart  tasks)   (from  “Spark:  In-­‐Memory  Cluster  Compu>ng  for  Itera>ve  and  Interac>ve  Applica>ons”)  
  • 6. hans.desterck@monash.edu   ICPADS  2015   distributed computing frameworks for Big Data analytics – Spark (vs HPC, MPI, Hadoop, ...) ■ MapReduce/Hadoop: – major  disadvantage  for  itera>ve  algorithms:  writes   everything  to  disk  between  itera>ons!,  extremely   slow  (and:  not  programmer-­‐friendly)   è only  very  simple  algorithms  are  feasible  in    MapReduce   ■ the Spark “revolution”: – store  state  between  itera>ons  in  memory   – more  general  opera>ons  than  Hadoop/MapReduce  
  • 7. hans.desterck@monash.edu   ICPADS  2015   distributed computing frameworks for Big Data analytics – Spark (vs HPC, MPI, Hadoop, ...)
  • 8. hans.desterck@monash.edu   ICPADS  2015   distributed computing frameworks for Big Data analytics – Spark (vs HPC, MPI, Hadoop, ...) ■ the Spark “revolution”: – store  state  between  itera>ons  in  memory   – more  general  opera>ons  than  Hadoop/MapReduce   èmuch  faster  than  Hadoop!  (but  s>ll  much  slower  than  MPI)   •  data  locality   •  scalable   •  fault-­‐tolerant   •  “implicit”  communica>on  (restricted  (vs  MPI),  “shuffle”)   sea change (vs Hadoop): more advanced iterative algorithms for Data Analytics/Machine Learning are feasible in Spark
  • 9. hans.desterck@monash.edu   ICPADS  2015   k   2. recommendation – the Netflix prize problem ■ sparse ratings matrix R ■ k latent features: user factors U, movie factors M ■ similar to SVD, but only match known ratings ■ minimize f=||R – UTM||2’ , and UTM gives predicted ratings (collaborative filtering) R   n  users   m  movies   1        2                  5  i   j   ≈   UT   i   j   x    x    x   x   x   x  M   k  
  • 10. hans.desterck@monash.edu   ICPADS  2015   k   recommendation – the Netflix prize problem minimize f=||R – UTM||2’ : alternating least squares (ALS) ■ minimize ||R – U(0)T M(0)||2’ : freeze U(0), compute M(0) (LS) ■ minimize ||R – U(1)TM(0)||2’ : freeze M(0), compute U(1) (LS) ■ ... : local least squares problems (parallelizable) R   n  users   m  movies   1        2                  5  i   j   ≈   UT   i   j   x    x    x   x   x   x  M   k  
  • 11. hans.desterck@monash.edu   ICPADS  2015   recommendation – the Netflix prize problem minimize f=||R – UTM||2’ : alternating least squares (ALS) ■ ALS can converge very slowly (block nonlinear Gauss-Seidel) (g  =  grad  f  =  0)  
  • 12. hans.desterck@monash.edu   ICPADS  2015   3. our contribution: an algorithm to speed up ALS for recommendation min f(U,M)=||R – UTM||2’ , or g(U,M) = grad f(U,M) = 0 ■ nonlinear conjugate gradient (NCG) optimization algorithm for min f(x):
  • 13. hans.desterck@monash.edu   ICPADS  2015   our contribution: an algorithm to speed up ALS for recommendation min f(x)=||R – UTM||2’ , or g(x) = grad f(x) = 0 ■ our idea: use ALS as a nonlinear preconditioner for NCG define a preconditioned gradient direction: (De  Sterck  and  Winlaw,  2015)  
  • 14. hans.desterck@monash.edu   ICPADS  2015   our contribution: an algorithm to speed up ALS for recommendation min f(x)=||R – UTM||2’ , or g(x) = grad f(x) = 0 ■ our idea: use ALS as a nonlinear preconditioner for NCG (NCG  accelerates  ALS)  
  • 15. hans.desterck@monash.edu   ICPADS  2015   our contribution: an algorithm to speed up ALS for recommendation min f(x)=||R – UTM||2’ , or g(x) = grad f(x) = 0 ■ our idea: use ALS as a nonlinear preconditioner for NCG
  • 16. hans.desterck@monash.edu   ICPADS  2015   our contribution: an algorithm to speed up ALS for recommendation min f(x)=||R – UTM||2’ , or g(x) = grad f(x) = 0 ■ our idea: use ALS as a nonlinear preconditioner for NCG ALS-­‐NCG  is  much   faster  than  the  widely   used  ALS!  
  • 17. hans.desterck@monash.edu   ICPADS  2015   4. our contribution: efficient parallel speedup of ALS recommendation in Spark ■ Spark “Resilient Distributed Datasets” (RDDs) – par>>oned  collec>on  of  (key,  value)   pairs   – can  be  cached  in  memory   – built  using  data  flow  operators  on   other  RDDs  (map,  join,  group-­‐by-­‐key,   reduce-­‐by-­‐key,  ...)   – fault-­‐tolerance:  rebuild  from  lineage   – “implicit”  communica>on  (shuffling)       (≠  MPI)   key   (value1,  value2,  ...)   0   1   2   3  
  • 18. hans.desterck@monash.edu   ICPADS  2015   our contribution: efficient parallel speedup of ALS recommendation in Spark ■ efficient Spark programming: similar challenges as efficient GPU programming with CUDA! – of  course,  they  have  different  design  objec>ves  (GPU:   close  to  metal,  as  fast  as  possible;  Spark:  scalable,  fault-­‐tolerant,  data  locality...)   – but  ...  similari>es  in  how  one  gets  good  performance:   •  Spark,  CUDA:  it  is  easy  to  write  code  that  produces  the   correct  result  (but  may  be  very  far  from  achievable  speed)   •   Spark,  CUDA:  it  is  very  hard  to  write  efficient  code!   –  implementa>on  choices  that  are  crucial  for  performance  are  most   ofen  not  explicit  in  the  language   –  programmer  needs  very  extensive  “under  the  hood”  knowledge  to   write  efficient  code   –  this  is  a  research  topic  (also  for  Spark),  moving  target  
  • 19. hans.desterck@monash.edu   ICPADS  2015   our contribution: efficient parallel speedup of ALS recommendation in Spark ■ existing implementation of ALS in Spark (Chris Johnson, Spotify) minimize f=||R – UTM||2’ – store  both  R  and  RT   – local  LS  problems:  to  update  user  factor  i,  need  all   movie  factors  j  that  i  has  rated  (shuffle!)  (efficient)   R   0   1   2   3   0   1   2   3   RT   0   1   2   3   M   U   1  0   2   3   i j1 j2
  • 20. hans.desterck@monash.edu   ICPADS  2015   our contribution: efficient parallel speedup of ALS recommendation in Spark ■ our work: efficient parallel implementation of ALS-NCG in Spark minimize f(x)=||R – UTM||2’ – store  our  vectors  x  and  g    consistent  with  ALS  RDDs,    and  employ  similar  efficient    shuffling  scheme  for  gradient   – BLAS  vector  opera>ons   – line  search:  f(x)  is  a  polynomial    of  degree  4:  compute    coefficients  once  in  parallel   U  
  • 21. hans.desterck@monash.edu   ICPADS  2015   our contribution: efficient parallel speedup of ALS recommendation in Spark ■ performance: linear granularity scaling for ALS-NCG as for ALS (no  new  parallel   boFlenecks  for  the   more  advanced   algorithm)  
  • 22. hans.desterck@monash.edu   ICPADS  2015   our contribution: efficient parallel speedup of ALS recommendation in Spark ■ performance: ALS-NCG much faster than ALS (20M MovieLens data, 8 nodes/128 cores)
  • 23. hans.desterck@monash.edu   ICPADS  2015   our contribution: efficient parallel speedup of ALS recommendation in Spark ■ performance: ALS-NCG speeds up ALS on 16 nodes/256 cores in Spark for 800M ratings by a factor of about 5 (great  speedup,  in  parallel,   in  Spark,  for  large  problem   on  256  cores)  
  • 24. hans.desterck@monash.edu   ICPADS  2015   some general conclusions ... ■ Spark enables advanced algorithms for Big Data analytics (linear algebra, optimization, machine learning, ...) (lots of work: investigate algorithms, implementations, scalability, ... in Spark) ■ Spark offers a suitable environment for compute-intensive work! ■ slower than MPI/HPC, but data locality, fault-tolerance, situated within Big Data “eco-system” (HDFS data, familiar software stack, ...) ■ will HPC and Big Data hardware/software converge? (also for “exascale” ...), and if so, which aspects of the Spark (and others ...) or MPI/HPC approaches will prevail?