SlideShare a Scribd company logo
Spark Technology
Center
Convolutional Neural Networks at
Scale in MLlib
Jeremy
Nixon
Spark Technology
Center
1. Machine Learning Engineer at the Spark
Technology Center
2. Contributor to MLlib, dedicated to
scalable deep learning.
3. Previously, studied Applied Mathematics
to Computer Science and Economics at
Harvard
Jeremy Nixon
Future Work
1. Convolutional Neural Networks
a. Convolutional Layer Type
b. Max Pooling Layer Type
2. Flexible Deep Learning API
3. More Modern Optimizers
a. Adam
b. Adadelta + Nesterov Momentum
4. More Modern activations
5. Dropout / L2 Regularization
6. Batch Normalization
7. Tensor Support
8. Recurrent Neural Networks (LSTM)
Spark Technology
Center
1. Framing Deep Learning
2. MLlib Deep Learning API
3. Optimization
4. Performance
5. Future Work
Structure
Spark Technology
Center
1. Structural Assumptions
2. Automated Feature Engineering
3. Learning Representations
4. Applications
Framing
Convolutional
Neural Networks
Spark Technology
Center
- Network depth creates an extraordinary
range of possible models.
- That flexibility creates value in large
datasets to reduce variance.
Structural
Assumptions:
Combinatorial
Flexibility
Spark Technology
Center
X = Normalized Data, W1
, W2
= Weights
Forward:
1. Multiply data by first layer weights | (X*W1
)
2. Put output through non-linear activation | max(0, X*W1
)
3. Multiply output by second layer weights | max(0, X*W1
) *
W2
4. Return predicted output
Structural
Assumption:
The Model
Spark Technology
Center
- Pixels - Edges - Shapes - Parts - Objects
- Learn features that are optimized for the
data
- Makes transfer learning feasible
Structural
Assumptions:
Hierarchical
Abstraction
Spark Technology
Center
Structural
Assumptions:
Location
Invariance
- Convolution is a restriction on the
features that can be combined.
- Location Invariance leads to strong
accuracy in vision, audio, and
language.
colah.github.io
Spark Technology
Center
Automated
Feature
Engineering
Spark Technology
Center
Learning
Representations
Hidden Layer
+
Nonlinearity
http://colah.github.io/posts/2014-03-NN-Manifolds-To
pology/
Spark Technology
Center
1. CNNs - State of the art
a. Object Recognition
b. Object Localization
c. Image Segmentation
d. Image Restoration
e. Music Recommendation
2. RNNs (LSTM) - State of the Art
a. Speech Recognition
b. Question Answering
c. Machine Translation
d. Text Summarization
e. Named Entity Recognition
f. Natural Language Generation
g. Word Sense Disambiguation
h. Image / Video Captioning
i. Sentiment Analysis
Applications
Spark Technology
Center
Flexibility. High level enough to be efficient.
Low level enough to be expressive.
MLlib Flexible Deep
Learning API
Spark Technology
Center
Flexibility. High level enough to be efficient.
Low level enough to be expressive.
MLlib Flexible Deep
Learning API
Spark Technology
Center
Modularity enables Logistic Regression,
Feedforward Networks.
MLlib Flexible Deep
Learning API
Spark Technology
Center
Introducing Convolutional and
Max-Pooling Layer types.
MLlib
Convolutional
Neural Network
Spark Technology
Center
Optimization
Spark Technology
Center
Optimization
Spark Technology
Center
Parallel implementation of
backpropagation:
1. Each worker gets weights from master
node.
2. Each worker computes a gradient on its
data.
3. Each worker sends gradient to master.
4. Master averages the gradients and
updates the weights.
Distributed
Optimization
Spark Technology
Center
● Parallel MLP on Spark with 7 nodes ~=
Caffe w/GPU (single node).
● Advantages to parallelism diminish with
additional nodes due to
communication costs.
● Additional workers are valuable up to
~20 workers.
● See
https://github.com/avulanov/ann-benc
hmark for more details
Performance
Spark Technology
Center
Github: https://github.com/JeremyNixon/sparkdl
Spark Package:
https://spark-packages.org/package/JeremyNixon/s
parkdl
Access
Spark Technology
Center
1. GPU Acceleration (External)
2. Keras Integration
3. Residual Layers
4. Hardening
5. Regularization
6. Batch Normalization
7. Tensor Support
Future Work
Spark Technology
Center
Thank you for your attention!
Questions?

More Related Content

PPTX
Facial Expression Recognition System using Deep Convolutional Neural Networks.
PPTX
AI: Learning in AI
PPT
Vanishing & Exploding Gradients
PDF
I. Hill climbing algorithm II. Steepest hill climbing algorithm
PPTX
Transfer learning-presentation
PDF
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
PPTX
Semi-Supervised Learning
PDF
Artificial Intelligence Notes Unit 1
Facial Expression Recognition System using Deep Convolutional Neural Networks.
AI: Learning in AI
Vanishing & Exploding Gradients
I. Hill climbing algorithm II. Steepest hill climbing algorithm
Transfer learning-presentation
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Semi-Supervised Learning
Artificial Intelligence Notes Unit 1

What's hot (20)

PPTX
Online parking
PPTX
Job sequencing with deadline
DOCX
Computer Graphics Project- The Running Train
PDF
T9. Trust and reputation in multi-agent systems
PDF
Chat Application | RSD
PDF
Vc dimension in Machine Learning
PDF
Dimensionality Reduction
PDF
Transfer Learning
PPT
Support Vector Machines
PPTX
Deep learning
PDF
Artificial Neural Network Lect4 : Single Layer Perceptron Classifiers
PPT
Recommendation system
PPTX
Online Crime Reporting System
PDF
Sequential Pattern Mining and GSP
PPTX
Deep learning presentation
PDF
Recurrent neural networks rnn
PDF
Convolutional Neural Networks (CNN)
PDF
Recurrent Neural Networks, LSTM and GRU
PPTX
Deep learning
PDF
Meta learning tutorial
Online parking
Job sequencing with deadline
Computer Graphics Project- The Running Train
T9. Trust and reputation in multi-agent systems
Chat Application | RSD
Vc dimension in Machine Learning
Dimensionality Reduction
Transfer Learning
Support Vector Machines
Deep learning
Artificial Neural Network Lect4 : Single Layer Perceptron Classifiers
Recommendation system
Online Crime Reporting System
Sequential Pattern Mining and GSP
Deep learning presentation
Recurrent neural networks rnn
Convolutional Neural Networks (CNN)
Recurrent Neural Networks, LSTM and GRU
Deep learning
Meta learning tutorial
Ad

Similar to Convolutional Neural Networks at scale in Spark MLlib (20)

PDF
Jeremy Nixon, Machine Learning Engineer, Spark Technology Center at MLconf AT...
PDF
Neural Networks, Spark MLlib, Deep Learning
PPTX
AI and Spark - IBM Community AI Day
PDF
TensorFlow on Spark: A Deep Dive into Distributed Deep Learning
PDF
Startup.Ml: Using neon for NLP and Localization Applications
PPTX
Deep Learning and Recurrent Neural Networks in the Enterprise
PPTX
Deep Learning with Apache Spark: an Introduction
PDF
Atlanta Hadoop Users Meetup 09 21 2016
PDF
Build, Scale, and Deploy Deep Learning Pipelines with Ease Using Apache Spark
PPTX
Combining Machine Learning frameworks with Apache Spark
PDF
Spark Based Distributed Deep Learning Framework For Big Data Applications
PDF
Tallinn Estonia Advanced Java Meetup Spark + TensorFlow = TensorFrames Oct 24...
PPTX
CNN LSTM Transformers Presentation .pptx
PPTX
CNN LSTM Transformers Presentation .pptx
PPTX
Meetup deeplearningitalia-milano-valerio-morfino
PDF
AI and Deep Learning
PDF
Integrating Deep Learning Libraries with Apache Spark
PDF
Understanding Convolutional Neural Networks
PDF
Hands on image recognition with scala spark and deep learning4j
PDF
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Jeremy Nixon, Machine Learning Engineer, Spark Technology Center at MLconf AT...
Neural Networks, Spark MLlib, Deep Learning
AI and Spark - IBM Community AI Day
TensorFlow on Spark: A Deep Dive into Distributed Deep Learning
Startup.Ml: Using neon for NLP and Localization Applications
Deep Learning and Recurrent Neural Networks in the Enterprise
Deep Learning with Apache Spark: an Introduction
Atlanta Hadoop Users Meetup 09 21 2016
Build, Scale, and Deploy Deep Learning Pipelines with Ease Using Apache Spark
Combining Machine Learning frameworks with Apache Spark
Spark Based Distributed Deep Learning Framework For Big Data Applications
Tallinn Estonia Advanced Java Meetup Spark + TensorFlow = TensorFrames Oct 24...
CNN LSTM Transformers Presentation .pptx
CNN LSTM Transformers Presentation .pptx
Meetup deeplearningitalia-milano-valerio-morfino
AI and Deep Learning
Integrating Deep Learning Libraries with Apache Spark
Understanding Convolutional Neural Networks
Hands on image recognition with scala spark and deep learning4j
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Ad

More from DataWorks Summit (20)

PPTX
Data Science Crash Course
PPTX
Floating on a RAFT: HBase Durability with Apache Ratis
PPTX
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
PDF
HBase Tales From the Trenches - Short stories about most common HBase operati...
PPTX
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
PPTX
Managing the Dewey Decimal System
PPTX
Practical NoSQL: Accumulo's dirlist Example
PPTX
HBase Global Indexing to support large-scale data ingestion at Uber
PPTX
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
PPTX
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
PPTX
Supporting Apache HBase : Troubleshooting and Supportability Improvements
PPTX
Security Framework for Multitenant Architecture
PDF
Presto: Optimizing Performance of SQL-on-Anything Engine
PPTX
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
PPTX
Extending Twitter's Data Platform to Google Cloud
PPTX
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
PPTX
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
PPTX
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
PDF
Computer Vision: Coming to a Store Near You
PPTX
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Data Science Crash Course
Floating on a RAFT: HBase Durability with Apache Ratis
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
HBase Tales From the Trenches - Short stories about most common HBase operati...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Managing the Dewey Decimal System
Practical NoSQL: Accumulo's dirlist Example
HBase Global Indexing to support large-scale data ingestion at Uber
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Security Framework for Multitenant Architecture
Presto: Optimizing Performance of SQL-on-Anything Engine
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Extending Twitter's Data Platform to Google Cloud
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Computer Vision: Coming to a Store Near You
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark

Recently uploaded (20)

PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
A comparative analysis of optical character recognition models for extracting...
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Approach and Philosophy of On baking technology
PPT
Teaching material agriculture food technology
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PDF
Empathic Computing: Creating Shared Understanding
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PDF
Machine learning based COVID-19 study performance prediction
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PPTX
Spectroscopy.pptx food analysis technology
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PPTX
Big Data Technologies - Introduction.pptx
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Mobile App Security Testing_ A Comprehensive Guide.pdf
A comparative analysis of optical character recognition models for extracting...
MYSQL Presentation for SQL database connectivity
Approach and Philosophy of On baking technology
Teaching material agriculture food technology
Reach Out and Touch Someone: Haptics and Empathic Computing
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Assigned Numbers - 2025 - Bluetooth® Document
Empathic Computing: Creating Shared Understanding
Digital-Transformation-Roadmap-for-Companies.pptx
Spectral efficient network and resource selection model in 5G networks
Review of recent advances in non-invasive hemoglobin estimation
gpt5_lecture_notes_comprehensive_20250812015547.pdf
Machine learning based COVID-19 study performance prediction
The Rise and Fall of 3GPP – Time for a Sabbatical?
Spectroscopy.pptx food analysis technology
Building Integrated photovoltaic BIPV_UPV.pdf
Big Data Technologies - Introduction.pptx
Profit Center Accounting in SAP S/4HANA, S4F28 Col11

Convolutional Neural Networks at scale in Spark MLlib