SlideShare a Scribd company logo
Enabling Ubiquitous Visual Intelligence
Through Deep Learning	

Dr. Ren Wu	

Distinguished Scientist, Baidu	

wuren@baidu.com 	

@韧在百度
Dr. Ren Wu	

•  Distinguished Scientist, Baidu	

•  HSA Chief Software Architect, AMD	

•  PI, HP Labs CUDA Research Center	

•  World Computer Xiangqi Champion	

•  AI expert	

•  Heterogeneous Computing expert	

•  Computational scientist
Eight Years Ago - 05/11/1997
Deep Blue	

A classic example of application-specific system design comprised
of an IBM supercomputer with 480 custom-madeVLSI chess chips, running
massively parallel search algorithm with highly optimized implementation.
Computer Chess and Moore’s Law
Deep Learning Works	

“We deepened our investment in
advanced technologies like Deep
Learning, which is already yielding near
term enhancements in customer ROI
and is expected to drive
transformational change over the
longer term.” 	

	

– Robin Li, Baidu CEO
Amount of data	

Performance	

Deep learning	

Old algorithms	

Deep Learning
Deep Learning vs. Human Brain	

pixels	

edges	

object parts	

(combination 	

of edges)	

object models	

Deep Architecture in the Brain
Retina
Area V1
Area V2
Area V4
pixels
Edge detectors
Primitive shape
detectors
Higher level visual
abstractions
Slide credit: Andrew Ng
Voice	

Text	

 Image	

 User
Deep Convolutional Neural Networks	
* Efficient mapping of the training of Convolutional Neural Networks to a CUDA-based cluster	

courtesy of Jonatan Ward, Sergey Andreev, Francisco Heredia, Bogdan Lazar, Zlatka Manevska
Big Data	
•  >2000PBStorage
•  10-100PB/dayProcessing
•  100b-1000bWebpages
•  100b-1000bIndex
•  1b-10b/dayUpdate
•  100TB~1PB/dayLog
Heterogeneous Computing	

1993 world #1	

Think Machine CM5/1024	

131 GFlops	

2013	

Samsung Note 3 smartphone	

(Qualcomm SnapDragon 800)	

129 Gflops	

2000 world #1 	

ASCI White (IBM RS/6000SP)	

6MW power, 106 tons	

12.3 TFlops	

2013	

Two MacPro workstation	

(dual AMD GPUs each)	

14 TFlops
Deep Learning: Two Step Process	

Supercomputers used for
training	

	

And then deploy the trained
models everywhere!	

Datacenters	

 Tablets, smartphones	

 Wearable devices	

 IoTs
Deep Learning: Training 	

Big data + Deep learning + High performance computing =
Intelligence	

	

Big data + Deep learning + Heterogeneous computing =
Success
Image Recognition
human vs. machine	

http://7-themes.com/6977111-cute-little-girl-play-white-dog.html
ImageNet Classification Challenge
	

 •  ImageNet dataset	

•  More than 15 million images belonging to about 22,000 categories	

•  ILSVRC (ImageNet Large-Scale Visual Recognition Challenge)	

•  Classification task: 1.2 million images contains 1,000 categories	

•  One of the most challenging computer vision benchmarks	

•  Increasing attention both from industry and academic communities	

* Olga Russakovsky et al. ECCV 2014
ImageNet Classification Challenge
	

* courtesy of Feifei Li
ImageNet Classification Challenge
ImageNet Classification 2012-2014 	

Team	

 Year	

 Place	

 Error (top-5)	

 Uses external
data	

SuperVision	

 2012 	

 -	

 16.4%	

 no	

SuperVision	

 2012 	

 1st	

 15.3%	

 ImageNet 22k	

Clarifai	

 2013	

 -	

 11.7%	

 no	

Clarifai	

 2013	

 1st	

 11.2%	

 ImageNet 22k	

MSRA	

 2014 	

 3rd	

 7.35%	

 no	

VGG	

 2014	

 2nd	

 7.32%	

 no	

GoogLeNet	

 2014	

 1st	

 6.67%	

 no	

Slide credit: Yangqing Jia, Google	

 Invincible ?
Latest Results
Latest Results	

Team Date Top-5 test error
GoogLeNet 2014 6.66%
Deep Image 01/12/2015 5.98%
Deep Image 02/05/2015 5.33%
Microsoft 02/05/2015 4.94%
Google 03/02/2015 4.82%
Deep Image 05/10/2015 4.58%
Insights and Inspirations	

多算胜少算不胜	

	

孙⼦子 计篇 (Sun Tzu, 544-496 BC)	

	

More calculations win, few
calculation lose	

元元本本殚⻅见洽闻	

	

班固 ⻄西都赋(Gu Ban, 32-92 AD) 	

	

Meaning the more you see the
more you know	

明⾜足以察秋毫之末	

	

孟⼦子梁惠⺩王上 (Mencius, 372-289 BC)	

	

ability to see very fine details
Project Minwa (百度敏娲)	

•  Minerva + Athena + ⼥女娲	

•  Athena: Goddess of Wisdom,Warfare,
Divine Intelligence,Architecture, and Crafts	

•  Minerva: Goddess of wisdom, magic,
medicine, arts, commerce and defense	

•  ⼥女娲: 抟⼟土造⼈人, 炼⽯石补天, 婚姻, 乐器	

	

World’s Largest Artificial Neural Networks	

	

v Pushing the State-of-the-Art	

v ~ 100x bigger than previous ones	

v New kind of Intelligence?
Hardware/Software Co-design	
•  Stochastic gradient descent (SGD)	

•  High compute density	

•  Scale up, up to 100 nodes	

•  High bandwidth low latency	

•  36 nodes, 144 GPUs, 6.9TB Host, 1.7TB Device	

•  0.6 PFLOPS 	

•  Highly Optimized software stack	

•  RDMA/GPU Direct 	

•  New data partition and communication
strategies	

GPUs	
Infiniband
Minwa
Speedup (wall time for convergence) 	
Validation set accuracy for different numbers of GPUs	
0	
  
0.1	
  
0.2	
  
0.3	
  
0.4	
  
0.5	
  
0.6	
  
0.7	
  
0.8	
  
0.9	
  
0.25	
   0.5	
   1	
   2	
   4	
   8	
   16	
   32	
   64	
   128	
   256	
  
Accuracy
Time (hours)
32 GPU
16 GPU
1 GPU
Accuracy 80%
32 GPU: 8.6 hours
1 GPU: 212 hours
Speedup: 24.7x
Never have enough training
examples!	

	

Key observations 	

•  Invariant to illuminant of the scene	

•  Invariant to observers 	

Augmentation approaches	

•  Color casting	

•  Optical distortion	

•  Rotation and cropping etc	

Data Augmentation	

“⻅见多识⼲⼴广”
And the Color Constancy 	

	

Key observations 	

•  Invariant to illuminant of the scene	

•  Invariant to observers 	

Augmentation approaches	

•  Color casting	

•  Optical distortion	

•  Rotation and cropping etc	

The Color of the Dress	

“Inspired by the color constancy principal.
Essentially, this ‘forces’ our neural network to
develop its own color constancy ability.”
Data Augmentation	

Augmentation The number of possible changes
Color casting 68920
Vignetting 1960
Lens distortion 260
Rotation 20
Flipping 2
Cropping 82944(crop size is 224x224, input image
size is 512x512)
Possible variations
The Deep Image system learned from ~2 billion examples, out
of 90 billion possible candidates.
Data Augmentation vs. Overfitting
Examples	

Bathtub	
 Isopod	
Indian elephant	
 Ice bear	
Some hard cases addressed by adding our data augmentation.
Multi-scale Training	

•  Same crop size, different
resolution	

•  Fixed-size 224*224	

•  Downsized training images	

•  Reduces computational costs	

•  But not for state-of-the-art	

•  Different models trained by
different image sizes	

256*256	
512*512	
•  High-resolution model works	

•  256x256: top-5 7.96%	

•  512x512: top-5 7.42% 	

•  Multi-scale models are
complementary	

•  Fused model: 6.97%	

“明查秋毫”
Multi-scale Training	

Tricycle	
Washer	
Backpack	
Little blue heron
Tricycle
Single Model Performance	

•  One basic configuration has 16 layers	

•  The number of weights in our configuration is 212.7M	

•  About 40% bigger than VGG’s	

Team Top-5 val. error
VGG 8.0%
GoogLeNet 7.89%
BN-Inception 5.82%
MSRA, PReLU-net 5.71%
Deep Image 5.40%
Robustness
"Enabling Ubiquitous Visual Intelligence Through Deep Learning," a Keynote Presentation from Baidu
"Enabling Ubiquitous Visual Intelligence Through Deep Learning," a Keynote Presentation from Baidu
"Enabling Ubiquitous Visual Intelligence Through Deep Learning," a Keynote Presentation from Baidu
"Enabling Ubiquitous Visual Intelligence Through Deep Learning," a Keynote Presentation from Baidu
Major Differentiators 	

•  Customized built supercomputer dedicated for DL	

•  Simple, scalable algorithm + Fully optimized
software stack	

•  Larger models	

•  More Aggressive data augmentation	

•  Multi-scale, include high-resolution images	

Scalability + Insights 	

	

 	

 	

and push for extreme
Deep Learning: Deployment	

Big data + Deep learning + High performance computing =
Intelligence	

	

Big data + Deep learning + Heterogeneous computing =
Success
Owl of Minwa (百度敏鸮)	

Supercomputers	

 Datacenters	

 Tablets, smartphones	

Models trained by supercomputers	

Trained models will be deployed in many ways	

data centers (cloud), smartphones, and even wearables and IoTs	

d	

	

OpenCL based, light weight and high performance	

	

DNNs everywhere !	

knowledge, wisdom, perspicacity and erudition
DNNs Everywhere	

Supercomputers	

 Datacenters	

 Tablets, smartphones	

 Wearable devices	

IoTs	

1000s GPUs	

 100k-1m servers	

 2b (in China)	

 50b in 2020?	

Supercomputer used for training	

Trained DNNs then deployed to data centers (cloud),
smartphones, and even wearables and IoTs
Offline Mobile DNN App	

•  Image recognition on mobile device	

•  Real time and no connectivity
needed	

•  directly from video stream, what
you point is what you get	

•  Everything is done within the device	

	

•  OpenCL based, highly optimized	

•  Large deep neural network models	

•  Thousands of objects, flowers, dogs,
and bags etc	

•  Unleashed the full potential of the
device hardware	

	

•  Smart phones now, Wearables and
IoTs tomorrow
"Enabling Ubiquitous Visual Intelligence Through Deep Learning," a Keynote Presentation from Baidu
Cloud Computing: What’s Missing?	

Bandwidth?	

Latency?	

and	

Power consumption?	

*ArtemVasilyev: CNN optimizations for embedded systems and FFT	

Moving data around is expensive, very expensive!
Cloud Computing: What’s Missing?	

How about 	

privacy?
What’s Next?	

Dedicated Hardware + Heterogeneous Computing	

*MarkHorowitz
Heterogeneous Computing	

“Human mind and brain is not a single general-purpose processor
but a collection of highly specialized components, each solving a
different, specific problem and yet collectively making up who we
are as human beings and thinkers. “ - Prof. Nancy Kanwisher
© Copyright Khronos Group 2015 - Page 50
Vision Processing Power Efficiency
• Wearables will need ‘always-on’ vision
-  With smaller thermal limit / battery than phones!
• GPUs have x10 imaging power efficiency over CPU
-  GPUs architected for efficient pixel handling
• Dedicated Hardware/DSPs can be even more efficient
-  With some loss of generality
• Mobile SOCs have space for more transistors
-  But can’t turn on at same time = Dark Silicon
-  Can integrate more gates ‘for free’ if careful
how and when they are used
PowerEfficiency
Computation Flexibility
Dedicated
Hardware
GPU
Compute
Multi-core
CPU
X1
X10
X100
Potential for dedicated sensor/vision silicon to be
integrated into Mobile Processors
But how will they be programmed for
PORTABILITY and POWER EFFICIENCY?
© Copyright Khronos Group 2015 - Page 51
OpenCL Ecosystem
Implementers
Desktop/Mobile/FPGA
Working Group Members
Apps/Tools/Tests/Courseware
Single Source C++ Programming
Portable Kernel Intermediate Language
Core API and Language Specs
Everything
Connected	
Everything
Intelligent
Big data era AI era
I2
oT
Intelligent Internet of Things
Thank you!

More Related Content

PDF
Introduction to Deeplearning4j
PPTX
The Next Generation of AI and Deep Learning - GTC17
PPTX
Deep Learning In Industries
PPTX
The Convergence of HPC and Deep Learning
PPTX
A Year of Innovation Using the DGX-1 AI Supercomputer
PPTX
Top 5 Deep Learning and AI Stories 3/9
PPTX
Your brain is too small to manage your business
PPTX
Exploring the Momentum: The Intersection of AI and HPC
Introduction to Deeplearning4j
The Next Generation of AI and Deep Learning - GTC17
Deep Learning In Industries
The Convergence of HPC and Deep Learning
A Year of Innovation Using the DGX-1 AI Supercomputer
Top 5 Deep Learning and AI Stories 3/9
Your brain is too small to manage your business
Exploring the Momentum: The Intersection of AI and HPC

What's hot (20)

PPTX
GTC 2017: The AI Revolution
PDF
Building New Realities in AEC with NVIDIA Quadro VR Webinar
PDF
Big Data LDN 2017: Machine Learning: What Works And What They Won’t Tell You
DOCX
NVIDIA Testimony at Senate Commerce, Science, and Transportation Committee He...
PDF
GTC 2015 Highlights
PPTX
AI Predictions 2017
PPTX
Deep Learning Workflows: Training and Inference
PPTX
AI For Enterprise
PPTX
Driving Computer Vision Research Innovation In Artificial Intelligence
PPTX
Top 5 AI and Deep Learning Stories 4/21
PPTX
Inception Awards: The Top Six AI Startups Changing The World
PPTX
The AI Era Ignited by GPU Deep Learning
PPTX
NVIDIA 2017 Overview
PDF
Harness the Power of AI and Deep Learning for Business
PPTX
9/23 Top 5 Deep Learning
PPTX
AdTech 2017 Sydney
PDF
Baidu World 2016 With NVIDIA CEO Jen-Hsun Huang
PPTX
HPC Top 5 Stories: Dec. 12, 2016
PPTX
HPC Top 5 Stories: Dec. 7, 2016
PPTX
HPC Top 5 Stories: March 29, 2017
GTC 2017: The AI Revolution
Building New Realities in AEC with NVIDIA Quadro VR Webinar
Big Data LDN 2017: Machine Learning: What Works And What They Won’t Tell You
NVIDIA Testimony at Senate Commerce, Science, and Transportation Committee He...
GTC 2015 Highlights
AI Predictions 2017
Deep Learning Workflows: Training and Inference
AI For Enterprise
Driving Computer Vision Research Innovation In Artificial Intelligence
Top 5 AI and Deep Learning Stories 4/21
Inception Awards: The Top Six AI Startups Changing The World
The AI Era Ignited by GPU Deep Learning
NVIDIA 2017 Overview
Harness the Power of AI and Deep Learning for Business
9/23 Top 5 Deep Learning
AdTech 2017 Sydney
Baidu World 2016 With NVIDIA CEO Jen-Hsun Huang
HPC Top 5 Stories: Dec. 12, 2016
HPC Top 5 Stories: Dec. 7, 2016
HPC Top 5 Stories: March 29, 2017
Ad

Viewers also liked (19)

PDF
04 history of cv computer vision, neural networks and pattern recognition - ...
PDF
P03 neural networks cvpr2012 deep learning methods for vision
PDF
Scene classification using Convolutional Neural Networks - Jayani Withanawasam
PPTX
Deep Learning - A Literature survey
PPTX
Multiple intelligence
PDF
Word Embeddings - Introduction
PDF
Predictive Analytics - Big Data & Artificial Intelligence
ODP
An Introduction to Computer Vision
PPTX
Introduction to Deep Learning for Image Analysis at Strata NYC, Sep 2015
PPTX
Deep Visual Understanding from Deep Learning by Prof. Jitendra Malik
PPTX
Computer Vision Introduction
PPTX
Andrew Ng, Chief Scientist at Baidu
PDF
Convolutional Neural Networks (CNN)
PDF
用十分鐘理解 《神經網路發展史》
PDF
以深度學習加速語音及影像辨識應用發展
PPTX
Deep Learning - Convolutional Neural Networks - Architectural Zoo
PPTX
Computer Vision Crash Course
PPTX
Computer Vision
PDF
Deep Learning Use Cases - Data Science Pop-up Seattle
04 history of cv computer vision, neural networks and pattern recognition - ...
P03 neural networks cvpr2012 deep learning methods for vision
Scene classification using Convolutional Neural Networks - Jayani Withanawasam
Deep Learning - A Literature survey
Multiple intelligence
Word Embeddings - Introduction
Predictive Analytics - Big Data & Artificial Intelligence
An Introduction to Computer Vision
Introduction to Deep Learning for Image Analysis at Strata NYC, Sep 2015
Deep Visual Understanding from Deep Learning by Prof. Jitendra Malik
Computer Vision Introduction
Andrew Ng, Chief Scientist at Baidu
Convolutional Neural Networks (CNN)
用十分鐘理解 《神經網路發展史》
以深度學習加速語音及影像辨識應用發展
Deep Learning - Convolutional Neural Networks - Architectural Zoo
Computer Vision Crash Course
Computer Vision
Deep Learning Use Cases - Data Science Pop-up Seattle
Ad

Similar to "Enabling Ubiquitous Visual Intelligence Through Deep Learning," a Keynote Presentation from Baidu (20)

PDF
DeepImage_GTC15_public
PDF
DeepImage_EmTech-public-small
PDF
IBM Cloud Paris Meetup 20180517 - Deep Learning Challenges
PDF
Discovering Your AI Super Powers - Tips and Tricks to Jumpstart your AI Projects
PDF
Tutorial on Deep Learning
PDF
DLD meetup 2017, Efficient Deep Learning
PPTX
Introduction to computer vision with Convoluted Neural Networks
PDF
Introduction to Deep Learning (NVIDIA)
PPTX
Introduction to computer vision
PPTX
Strata London - Deep Learning 05-2015
PPTX
abelbrownnvidiarakuten2016-170208065814 (1).pptx
PDF
imageclassification-160206090009.pdf
PPT
Introduction_to_DEEP_LEARNING.sfsdafsadfsadfsdafsdppt
PDF
Opening Keynote at GTC 2015: Leaps in Visual Computing
PPT
Introduction_to_DEEP_LEARNING ppt 101ppt
PPTX
AI on the Edge
PPTX
Deep Learning Fundamentals
PPT
Introduction_to_DEEP_LEARNING.ppt
PPTX
Ai in 45 minutes
PPTX
Learn to Build an App to Find Similar Images using Deep Learning- Piotr Teterwak
DeepImage_GTC15_public
DeepImage_EmTech-public-small
IBM Cloud Paris Meetup 20180517 - Deep Learning Challenges
Discovering Your AI Super Powers - Tips and Tricks to Jumpstart your AI Projects
Tutorial on Deep Learning
DLD meetup 2017, Efficient Deep Learning
Introduction to computer vision with Convoluted Neural Networks
Introduction to Deep Learning (NVIDIA)
Introduction to computer vision
Strata London - Deep Learning 05-2015
abelbrownnvidiarakuten2016-170208065814 (1).pptx
imageclassification-160206090009.pdf
Introduction_to_DEEP_LEARNING.sfsdafsadfsadfsdafsdppt
Opening Keynote at GTC 2015: Leaps in Visual Computing
Introduction_to_DEEP_LEARNING ppt 101ppt
AI on the Edge
Deep Learning Fundamentals
Introduction_to_DEEP_LEARNING.ppt
Ai in 45 minutes
Learn to Build an App to Find Similar Images using Deep Learning- Piotr Teterwak

More from Edge AI and Vision Alliance (20)

PDF
“Optimizing Real-time SLAM Performance for Autonomous Robots with GPU Acceler...
PDF
“LLMs and VLMs for Regulatory Compliance, Quality Control and Safety Applicat...
PDF
“Simplifying Portable Computer Vision with OpenVX 2.0,” a Presentation from AMD
PDF
“Quantization Techniques for Efficient Deployment of Large Language Models: A...
PDF
“Introduction to Data Types for AI: Trade-Offs and Trends,” a Presentation fr...
PDF
“Introduction to Radar and Its Use for Machine Perception,” a Presentation fr...
PDF
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
PDF
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
PDF
“Computer Vision at Sea: Automated Fish Tracking for Sustainable Fishing,” a ...
PDF
“Squinting Vision Pipelines: Detecting and Correcting Errors in Vision Models...
PDF
“ONNX and Python to C++: State-of-the-art Graph Compilation,” a Presentation ...
PDF
“Beyond the Demo: Turning Computer Vision Prototypes into Scalable, Cost-effe...
PDF
“Running Accelerated CNNs on Low-power Microcontrollers Using Arm Ethos-U55, ...
PDF
“Scaling i.MX Applications Processors’ Native Edge AI with Discrete AI Accele...
PDF
“A Re-imagination of Embedded Vision System Design,” a Presentation from Imag...
PDF
“MPU+: A Transformative Solution for Next-Gen AI at the Edge,” a Presentation...
PDF
“Evolving Inference Processor Software Stacks to Support LLMs,” a Presentatio...
PDF
“Efficiently Registering Depth and RGB Images,” a Presentation from eInfochips
PDF
“How to Right-size and Future-proof a Container-first Edge AI Infrastructure,...
PDF
“Image Tokenization for Distributed Neural Cascades,” a Presentation from Goo...
“Optimizing Real-time SLAM Performance for Autonomous Robots with GPU Acceler...
“LLMs and VLMs for Regulatory Compliance, Quality Control and Safety Applicat...
“Simplifying Portable Computer Vision with OpenVX 2.0,” a Presentation from AMD
“Quantization Techniques for Efficient Deployment of Large Language Models: A...
“Introduction to Data Types for AI: Trade-Offs and Trends,” a Presentation fr...
“Introduction to Radar and Its Use for Machine Perception,” a Presentation fr...
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
“Computer Vision at Sea: Automated Fish Tracking for Sustainable Fishing,” a ...
“Squinting Vision Pipelines: Detecting and Correcting Errors in Vision Models...
“ONNX and Python to C++: State-of-the-art Graph Compilation,” a Presentation ...
“Beyond the Demo: Turning Computer Vision Prototypes into Scalable, Cost-effe...
“Running Accelerated CNNs on Low-power Microcontrollers Using Arm Ethos-U55, ...
“Scaling i.MX Applications Processors’ Native Edge AI with Discrete AI Accele...
“A Re-imagination of Embedded Vision System Design,” a Presentation from Imag...
“MPU+: A Transformative Solution for Next-Gen AI at the Edge,” a Presentation...
“Evolving Inference Processor Software Stacks to Support LLMs,” a Presentatio...
“Efficiently Registering Depth and RGB Images,” a Presentation from eInfochips
“How to Right-size and Future-proof a Container-first Edge AI Infrastructure,...
“Image Tokenization for Distributed Neural Cascades,” a Presentation from Goo...

Recently uploaded (20)

PDF
Electronic commerce courselecture one. Pdf
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Network Security Unit 5.pdf for BCA BBA.
PPTX
Big Data Technologies - Introduction.pptx
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Encapsulation theory and applications.pdf
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
NewMind AI Monthly Chronicles - July 2025
PPTX
A Presentation on Artificial Intelligence
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PPTX
MYSQL Presentation for SQL database connectivity
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Electronic commerce courselecture one. Pdf
Understanding_Digital_Forensics_Presentation.pptx
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Network Security Unit 5.pdf for BCA BBA.
Big Data Technologies - Introduction.pptx
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Encapsulation_ Review paper, used for researhc scholars
Chapter 3 Spatial Domain Image Processing.pdf
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
NewMind AI Weekly Chronicles - August'25 Week I
Digital-Transformation-Roadmap-for-Companies.pptx
Encapsulation theory and applications.pdf
The Rise and Fall of 3GPP – Time for a Sabbatical?
Dropbox Q2 2025 Financial Results & Investor Presentation
NewMind AI Monthly Chronicles - July 2025
A Presentation on Artificial Intelligence
Diabetes mellitus diagnosis method based random forest with bat algorithm
MYSQL Presentation for SQL database connectivity
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx

"Enabling Ubiquitous Visual Intelligence Through Deep Learning," a Keynote Presentation from Baidu

  • 1. Enabling Ubiquitous Visual Intelligence Through Deep Learning Dr. Ren Wu Distinguished Scientist, Baidu wuren@baidu.com @韧在百度
  • 2. Dr. Ren Wu •  Distinguished Scientist, Baidu •  HSA Chief Software Architect, AMD •  PI, HP Labs CUDA Research Center •  World Computer Xiangqi Champion •  AI expert •  Heterogeneous Computing expert •  Computational scientist
  • 3. Eight Years Ago - 05/11/1997
  • 4. Deep Blue A classic example of application-specific system design comprised of an IBM supercomputer with 480 custom-madeVLSI chess chips, running massively parallel search algorithm with highly optimized implementation.
  • 5. Computer Chess and Moore’s Law
  • 6. Deep Learning Works “We deepened our investment in advanced technologies like Deep Learning, which is already yielding near term enhancements in customer ROI and is expected to drive transformational change over the longer term.” – Robin Li, Baidu CEO
  • 7. Amount of data Performance Deep learning Old algorithms Deep Learning
  • 8. Deep Learning vs. Human Brain pixels edges object parts (combination of edges) object models Deep Architecture in the Brain Retina Area V1 Area V2 Area V4 pixels Edge detectors Primitive shape detectors Higher level visual abstractions Slide credit: Andrew Ng Voice Text Image User
  • 9. Deep Convolutional Neural Networks * Efficient mapping of the training of Convolutional Neural Networks to a CUDA-based cluster courtesy of Jonatan Ward, Sergey Andreev, Francisco Heredia, Bogdan Lazar, Zlatka Manevska
  • 10. Big Data •  >2000PBStorage •  10-100PB/dayProcessing •  100b-1000bWebpages •  100b-1000bIndex •  1b-10b/dayUpdate •  100TB~1PB/dayLog
  • 11. Heterogeneous Computing 1993 world #1 Think Machine CM5/1024 131 GFlops 2013 Samsung Note 3 smartphone (Qualcomm SnapDragon 800) 129 Gflops 2000 world #1 ASCI White (IBM RS/6000SP) 6MW power, 106 tons 12.3 TFlops 2013 Two MacPro workstation (dual AMD GPUs each) 14 TFlops
  • 12. Deep Learning: Two Step Process Supercomputers used for training And then deploy the trained models everywhere! Datacenters Tablets, smartphones Wearable devices IoTs
  • 13. Deep Learning: Training Big data + Deep learning + High performance computing = Intelligence Big data + Deep learning + Heterogeneous computing = Success
  • 14. Image Recognition human vs. machine http://7-themes.com/6977111-cute-little-girl-play-white-dog.html
  • 15. ImageNet Classification Challenge •  ImageNet dataset •  More than 15 million images belonging to about 22,000 categories •  ILSVRC (ImageNet Large-Scale Visual Recognition Challenge) •  Classification task: 1.2 million images contains 1,000 categories •  One of the most challenging computer vision benchmarks •  Increasing attention both from industry and academic communities * Olga Russakovsky et al. ECCV 2014
  • 16. ImageNet Classification Challenge * courtesy of Feifei Li
  • 18. ImageNet Classification 2012-2014 Team Year Place Error (top-5) Uses external data SuperVision 2012 - 16.4% no SuperVision 2012 1st 15.3% ImageNet 22k Clarifai 2013 - 11.7% no Clarifai 2013 1st 11.2% ImageNet 22k MSRA 2014 3rd 7.35% no VGG 2014 2nd 7.32% no GoogLeNet 2014 1st 6.67% no Slide credit: Yangqing Jia, Google Invincible ?
  • 20. Latest Results Team Date Top-5 test error GoogLeNet 2014 6.66% Deep Image 01/12/2015 5.98% Deep Image 02/05/2015 5.33% Microsoft 02/05/2015 4.94% Google 03/02/2015 4.82% Deep Image 05/10/2015 4.58%
  • 21. Insights and Inspirations 多算胜少算不胜 孙⼦子 计篇 (Sun Tzu, 544-496 BC) More calculations win, few calculation lose 元元本本殚⻅见洽闻 班固 ⻄西都赋(Gu Ban, 32-92 AD) Meaning the more you see the more you know 明⾜足以察秋毫之末 孟⼦子梁惠⺩王上 (Mencius, 372-289 BC) ability to see very fine details
  • 22. Project Minwa (百度敏娲) •  Minerva + Athena + ⼥女娲 •  Athena: Goddess of Wisdom,Warfare, Divine Intelligence,Architecture, and Crafts •  Minerva: Goddess of wisdom, magic, medicine, arts, commerce and defense •  ⼥女娲: 抟⼟土造⼈人, 炼⽯石补天, 婚姻, 乐器 World’s Largest Artificial Neural Networks v Pushing the State-of-the-Art v ~ 100x bigger than previous ones v New kind of Intelligence?
  • 23. Hardware/Software Co-design •  Stochastic gradient descent (SGD) •  High compute density •  Scale up, up to 100 nodes •  High bandwidth low latency •  36 nodes, 144 GPUs, 6.9TB Host, 1.7TB Device •  0.6 PFLOPS •  Highly Optimized software stack •  RDMA/GPU Direct •  New data partition and communication strategies GPUs Infiniband
  • 24. Minwa
  • 25. Speedup (wall time for convergence) Validation set accuracy for different numbers of GPUs 0   0.1   0.2   0.3   0.4   0.5   0.6   0.7   0.8   0.9   0.25   0.5   1   2   4   8   16   32   64   128   256   Accuracy Time (hours) 32 GPU 16 GPU 1 GPU Accuracy 80% 32 GPU: 8.6 hours 1 GPU: 212 hours Speedup: 24.7x
  • 26. Never have enough training examples! Key observations •  Invariant to illuminant of the scene •  Invariant to observers Augmentation approaches •  Color casting •  Optical distortion •  Rotation and cropping etc Data Augmentation “⻅见多识⼲⼴广”
  • 27. And the Color Constancy Key observations •  Invariant to illuminant of the scene •  Invariant to observers Augmentation approaches •  Color casting •  Optical distortion •  Rotation and cropping etc The Color of the Dress “Inspired by the color constancy principal. Essentially, this ‘forces’ our neural network to develop its own color constancy ability.”
  • 28. Data Augmentation Augmentation The number of possible changes Color casting 68920 Vignetting 1960 Lens distortion 260 Rotation 20 Flipping 2 Cropping 82944(crop size is 224x224, input image size is 512x512) Possible variations The Deep Image system learned from ~2 billion examples, out of 90 billion possible candidates.
  • 29. Data Augmentation vs. Overfitting
  • 30. Examples Bathtub Isopod Indian elephant Ice bear Some hard cases addressed by adding our data augmentation.
  • 31. Multi-scale Training •  Same crop size, different resolution •  Fixed-size 224*224 •  Downsized training images •  Reduces computational costs •  But not for state-of-the-art •  Different models trained by different image sizes 256*256 512*512 •  High-resolution model works •  256x256: top-5 7.96% •  512x512: top-5 7.42% •  Multi-scale models are complementary •  Fused model: 6.97% “明查秋毫”
  • 34. Single Model Performance •  One basic configuration has 16 layers •  The number of weights in our configuration is 212.7M •  About 40% bigger than VGG’s Team Top-5 val. error VGG 8.0% GoogLeNet 7.89% BN-Inception 5.82% MSRA, PReLU-net 5.71% Deep Image 5.40%
  • 40. Major Differentiators •  Customized built supercomputer dedicated for DL •  Simple, scalable algorithm + Fully optimized software stack •  Larger models •  More Aggressive data augmentation •  Multi-scale, include high-resolution images Scalability + Insights and push for extreme
  • 41. Deep Learning: Deployment Big data + Deep learning + High performance computing = Intelligence Big data + Deep learning + Heterogeneous computing = Success
  • 42. Owl of Minwa (百度敏鸮) Supercomputers Datacenters Tablets, smartphones Models trained by supercomputers Trained models will be deployed in many ways data centers (cloud), smartphones, and even wearables and IoTs d OpenCL based, light weight and high performance DNNs everywhere ! knowledge, wisdom, perspicacity and erudition
  • 43. DNNs Everywhere Supercomputers Datacenters Tablets, smartphones Wearable devices IoTs 1000s GPUs 100k-1m servers 2b (in China) 50b in 2020? Supercomputer used for training Trained DNNs then deployed to data centers (cloud), smartphones, and even wearables and IoTs
  • 44. Offline Mobile DNN App •  Image recognition on mobile device •  Real time and no connectivity needed •  directly from video stream, what you point is what you get •  Everything is done within the device •  OpenCL based, highly optimized •  Large deep neural network models •  Thousands of objects, flowers, dogs, and bags etc •  Unleashed the full potential of the device hardware •  Smart phones now, Wearables and IoTs tomorrow
  • 46. Cloud Computing: What’s Missing? Bandwidth? Latency? and Power consumption? *ArtemVasilyev: CNN optimizations for embedded systems and FFT Moving data around is expensive, very expensive!
  • 47. Cloud Computing: What’s Missing? How about privacy?
  • 48. What’s Next? Dedicated Hardware + Heterogeneous Computing *MarkHorowitz
  • 49. Heterogeneous Computing “Human mind and brain is not a single general-purpose processor but a collection of highly specialized components, each solving a different, specific problem and yet collectively making up who we are as human beings and thinkers. “ - Prof. Nancy Kanwisher
  • 50. © Copyright Khronos Group 2015 - Page 50 Vision Processing Power Efficiency • Wearables will need ‘always-on’ vision -  With smaller thermal limit / battery than phones! • GPUs have x10 imaging power efficiency over CPU -  GPUs architected for efficient pixel handling • Dedicated Hardware/DSPs can be even more efficient -  With some loss of generality • Mobile SOCs have space for more transistors -  But can’t turn on at same time = Dark Silicon -  Can integrate more gates ‘for free’ if careful how and when they are used PowerEfficiency Computation Flexibility Dedicated Hardware GPU Compute Multi-core CPU X1 X10 X100 Potential for dedicated sensor/vision silicon to be integrated into Mobile Processors But how will they be programmed for PORTABILITY and POWER EFFICIENCY?
  • 51. © Copyright Khronos Group 2015 - Page 51 OpenCL Ecosystem Implementers Desktop/Mobile/FPGA Working Group Members Apps/Tools/Tests/Courseware Single Source C++ Programming Portable Kernel Intermediate Language Core API and Language Specs
  • 52. Everything Connected Everything Intelligent Big data era AI era I2 oT Intelligent Internet of Things