Presentation on experimental setup for verigying - "Slow Learners are Fast"

Machine Learning on Cell Processor

Supervisor: Dr. Eric McCreath
Student: Robin Srivastava

Background and Motivation
Machine
Learning

Batch Online
Learning Learning

HAM

Email-N ……..… email-2 Email-1

SPAM

Background and Motivation
Machine
Learning

Sequential
Batch Online in Nature
Learning Learning

HAM

Email-N ……..… email-2 Email-1

SPAM

Object
  Performance evaluation of a parallel online machine
learning algorithm (Langford et. al. [1])
  Target Machines
  Cell Processor: One 3 GHz 64-bit IBM PowerPC, six
specialized co-processors
  Intel Dual Core Machine: 2GHz dual core processor, 1.86 GB
of main memory

Stochastic Gradient Descent
  Step 1: Initialize weight vector w0 with some arbitrary
values
  Step 2: Update the weight vector as follows

w (t +1) = w t − η∇E ( w t )

where ∇E is the gradient of error function and η is the
learning rate
€
  Step 3: Follow Step 2 for all the units for data
€ €

Delayed Stochastic Gradient Descent
  Step 1: Initialize weight vector w0 with some arbitrary
values
  Step 2: Update the weight vector as follows

w (t +1) = w t − η∇E ( w t−τ )

where ∇E is the gradient of error function and η is the
learning rate
€
  Step 3: Follow Step 2 for all the units for data
€ €

Implementation Model
Complete Dataset

Implementation
  Dataset – TREC 2007 Public Corpus
  Number of mail: 75,419
  Each mail classified as either ‘ham’ or ‘spam’
  Pre-processing
  Total number of features extracted: 2,218,878
  Pre-processed email format

<Number of features><space><index>:<count><space>…………..<index>:<count>

Memory Requirement
  Algorithm Implemented
  Online Logistic Regression with delayed update
  Requirement per level of parallelization
  Two private copy of weight vectors
  Two shared copy of weight vectors
  Two error gradients
  Required Dimension for each = Number of features = 2,218,878
  Data type: Float (On Cell takes 4 bytes)
  Total = (6 x 2218878) x 4 = 53,253,072 bytes = 50.78 MB
  Size occupied by other auxiliary variables
  Alternatively
  Make only shared copy use the full dimension
  Total size = (2 x 2218878) x 4 = 16.9 MB + others

Limitations on Cell
  Memory limitation of SPE
  Available: 256 KB
  Required: approx. 51 MB
  Work Around:
  Reduced the number of features
  Done one more level of pre-processing
  SIMD limitation
  The time wasted in preparing the data for SIMD surpassed its
benefits for this implementation

Results
  Serial implementation of logistic regression on Intel Dual
core took 36.93 and 36.45 sec respectively for two
consecutive executions.
  Parallel implementation using stochastic gradient process

Results (contd.)
  Performance on Cell

Time in microseconds

References
①  John Langford, Alexander J. Samola and Martin Zinkevich.
Slow learners are fast published in Journal of Machine
Learning Research 1(2009)
②  Michael Kistler, Michael Perrone, Fabrizio Petrini. Cell
Multiprocessor Communication Network: Built for Speed.
③  Thomas Chen , Ram Raghavan , Jason Dale and Eiji Iwata. Cell
Broadband Engine Architecture and its first implementation
④  Jonathan Bartlett. Programming high-performance
applications on the Cell/B.E. processor, Part 6: Smart buffer
management with DMA transfers
⑤  Introduction to Statistical Machine Learning, 2010 course
assignment 1
⑥  Christopher Bishop, Pattern Recognition and Machine
Learning.

Presentation on experimental setup for verigying - "Slow Learners are Fast"

More Related Content

What's hot (20)

Similar to Presentation on experimental setup for verigying - "Slow Learners are Fast" (20)

Recently uploaded (20)