SlideShare a Scribd company logo
Persistent RNNs
(stashing recurrent weights on-chip)
Presenter: Gregory Diamos
Silicon Valley AI Lab
Baidu
Jun 20, 2016
Presenter: Gregory Diamos Persistent RNNs
Machine learning has benefited greatly from faster computer systems.
GPUs in particular, have delivered a step forward.
Presenter: Gregory Diamos Persistent RNNs
Imagine the problems that you could solve
with even faster systems.
Presenter: Gregory Diamos Persistent RNNs
HPC is an opportunity
10,000x
TitanX GPU
Fastest supercomputer
Presenter: Gregory Diamos Persistent RNNs
Limits of data-parallelism
Presenter: Gregory Diamos Persistent RNNs
Hardware limits
wall-clocktimetoconvergence
mini-batch size
inefficient hardware
Hardware becomes less efficient at small batch sizes.
Presenter: Gregory Diamos Persistent RNNs
Optimization limits
wall-clocktimetoconvergence
mini-batch size
inefficient optimization
Optimization algorithms perform more work at large batch sizes.
Presenter: Gregory Diamos Persistent RNNs
Mini-batch limits
wall-clocktimetoconvergence
mini-batch size
inefficient hardware inefficient optimization
These effects combine to limit the maximum number of GPUs.
Presenter: Gregory Diamos Persistent RNNs
Persistent RNNs
Open source CUDA implementation:
https://github.com/baidu-research/persistent-rnns
Presenter: Gregory Diamos Persistent RNNs
Persistent RNN Details
Presenter: Gregory Diamos Persistent RNNs
Persistent RNNs
weights
GEMM GEMM GEMM GEMM
Persistent RNN
weights
weights weights weights
data0 data1 data2 data3 data4
data0 data1 data2 data3 data4
RNNs built on GEMM routines reload the weights each timestep.
However, the weights are constant, and this is wasteful.
Presenter: Gregory Diamos Persistent RNNs
Cache weights in registers
weights
GPU thread
registers
datapath
Presenter: Gregory Diamos Persistent RNNs
A global barrier
data0 GPU data1 GPU
barrier
Presenter: Gregory Diamos Persistent RNNs
Experiments
Presenter: Gregory Diamos Persistent RNNs
Scaling to 128 GPUs
Presenter: Gregory Diamos Persistent RNNs
Exploring deep residual RNNs
Presenter: Gregory Diamos Persistent RNNs
Pascal and future
Future GPUs will enable bigger and faster RNN layers.
Presenter: Gregory Diamos Persistent RNNs
Three challenges
Presenter: Gregory Diamos Persistent RNNs
Close the gap with the fastest supercomputers.
Presenter: Gregory Diamos Persistent RNNs
Do not settle for inefficient algorithms.
Presenter: Gregory Diamos Persistent RNNs
Push performance to the edge of physical limits.
10 PetaFlops in 300 Watts.
150 ExaFlops in 25 MegaWatts.
Presenter: Gregory Diamos Persistent RNNs

More Related Content

PPTX
HPC Advisory Council Stanford Conference 2016
PDF
"Designing Deep Neural Network Algorithms for Embedded Devices," a Presentati...
PDF
Affordable AI Connects To A Better Life
PPTX
Voices presentation
PPTX
Designing Cloud Backup to reduce DR downtime for IT Professionals
PDF
Tiny intelligent computers and sensors - Open Hardware Event 2020
PDF
DeepImage_EmTech-public-small
PPTX
HPC Advisory Council Stanford Conference 2016
"Designing Deep Neural Network Algorithms for Embedded Devices," a Presentati...
Affordable AI Connects To A Better Life
Voices presentation
Designing Cloud Backup to reduce DR downtime for IT Professionals
Tiny intelligent computers and sensors - Open Hardware Event 2020
DeepImage_EmTech-public-small

Viewers also liked (12)

PPT
Flacso Mn Kn Singularity Pp 18 June 07
PDF
Sequence Learning with CTC technique
PPT
hashem bio chip
PPTX
Bio—chip ] sensor
PPTX
11/4 Top 5 Deep Learning Stories
PPT
Bio Chip Presentation
PPTX
Neural Networks with Google TensorFlow
PDF
機械学習と深層学習の数理
PPT
Biosensor
PDF
A Study On Deep Learning
PPTX
GENERATIONS OF COMPUTER
PDF
機械学習チュートリアル@Jubatus Casual Talks
Flacso Mn Kn Singularity Pp 18 June 07
Sequence Learning with CTC technique
hashem bio chip
Bio—chip ] sensor
11/4 Top 5 Deep Learning Stories
Bio Chip Presentation
Neural Networks with Google TensorFlow
機械学習と深層学習の数理
Biosensor
A Study On Deep Learning
GENERATIONS OF COMPUTER
機械学習チュートリアル@Jubatus Casual Talks
Ad

Similar to Persistent RNNs: Stashing Recurrent Weights On-Chip (20)

PDF
Der nächste Quantensprung bei Datacenter Technologien steht vor der Tür
PDF
Accelerating Data Science With GPUs
PDF
Resource Scheduling using Apache Mesos in Cloud Native Environments
PDF
Gömülü Sistemlerde Derin Öğrenme Uygulamaları
PDF
Harnessing the virtual realm for successful real world artificial intelligence
PDF
Persistent Memory Productization driven by AI & ML
PPT
DIGITAL SIGNAL PROCESSOR OVERVIEW
PPTX
Accelerated Any-Scale Solutions from DDN
PDF
AI, A New Computing Model
PPTX
Large-scale Deep Unsupervised Learning using Graphics Processors
PDF
Lean Enterprise, Microservices and Big Data
PPTX
Fujitsu - 27mai2011
PDF
Transforming the Database: Critical Innovations for Performance at Scale
PDF
NOVA IMS Microsoft on Innovation
PPTX
ImageCon CTO keynote
PDF
Enabling Artificial Intelligence - Alison B. Lowndes
PDF
GTC 2018: A New AI Era Dawns
PDF
NVIDIA DGX-1 超級電腦與人工智慧及深度學習
PDF
AI talk at CogX 2018
PDF
GTC Taiwan 2017 主題演說
Der nächste Quantensprung bei Datacenter Technologien steht vor der Tür
Accelerating Data Science With GPUs
Resource Scheduling using Apache Mesos in Cloud Native Environments
Gömülü Sistemlerde Derin Öğrenme Uygulamaları
Harnessing the virtual realm for successful real world artificial intelligence
Persistent Memory Productization driven by AI & ML
DIGITAL SIGNAL PROCESSOR OVERVIEW
Accelerated Any-Scale Solutions from DDN
AI, A New Computing Model
Large-scale Deep Unsupervised Learning using Graphics Processors
Lean Enterprise, Microservices and Big Data
Fujitsu - 27mai2011
Transforming the Database: Critical Innovations for Performance at Scale
NOVA IMS Microsoft on Innovation
ImageCon CTO keynote
Enabling Artificial Intelligence - Alison B. Lowndes
GTC 2018: A New AI Era Dawns
NVIDIA DGX-1 超級電腦與人工智慧及深度學習
AI talk at CogX 2018
GTC Taiwan 2017 主題演說
Ad

Recently uploaded (20)

PPTX
TLE Review Electricity (Electricity).pptx
PDF
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf
PDF
project resource management chapter-09.pdf
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PPTX
O2C Customer Invoices to Receipt V15A.pptx
PPTX
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
PDF
STKI Israel Market Study 2025 version august
PDF
A novel scalable deep ensemble learning framework for big data classification...
PDF
1 - Historical Antecedents, Social Consideration.pdf
PDF
Web App vs Mobile App What Should You Build First.pdf
PDF
Enhancing emotion recognition model for a student engagement use case through...
PDF
TrustArc Webinar - Click, Consent, Trust: Winning the Privacy Game
PDF
NewMind AI Weekly Chronicles – August ’25 Week III
PPTX
The various Industrial Revolutions .pptx
PPTX
OMC Textile Division Presentation 2021.pptx
PPTX
1. Introduction to Computer Programming.pptx
PPTX
observCloud-Native Containerability and monitoring.pptx
PDF
Univ-Connecticut-ChatGPT-Presentaion.pdf
PPTX
Programs and apps: productivity, graphics, security and other tools
PPTX
cloud_computing_Infrastucture_as_cloud_p
TLE Review Electricity (Electricity).pptx
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf
project resource management chapter-09.pdf
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
O2C Customer Invoices to Receipt V15A.pptx
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
STKI Israel Market Study 2025 version august
A novel scalable deep ensemble learning framework for big data classification...
1 - Historical Antecedents, Social Consideration.pdf
Web App vs Mobile App What Should You Build First.pdf
Enhancing emotion recognition model for a student engagement use case through...
TrustArc Webinar - Click, Consent, Trust: Winning the Privacy Game
NewMind AI Weekly Chronicles – August ’25 Week III
The various Industrial Revolutions .pptx
OMC Textile Division Presentation 2021.pptx
1. Introduction to Computer Programming.pptx
observCloud-Native Containerability and monitoring.pptx
Univ-Connecticut-ChatGPT-Presentaion.pdf
Programs and apps: productivity, graphics, security and other tools
cloud_computing_Infrastucture_as_cloud_p

Persistent RNNs: Stashing Recurrent Weights On-Chip