SlideShare a Scribd company logo
Batch Normalization
Taka Wang
20210721
Batch
Normalization
Distribution changes
Higher learning rate
Less care about initialization
2015, ICML, Google Researchers
Mini-Batch
Recap
When training neural networks we'll feed in observations and compare the
expected output to the true output of the network. We'll then use gradient
descent to update the parameters of the model in the direction which will
minimize the difference between our expected (or ideal) outcome and the true
outcome. In other words, we're attempting to minimize the error we observe in
our model's predictions.
Ultimately, gradient descent is a search among a loss function surface in an
attempt to find the values for each parameter such that the loss function is
minimized. In other words, we're looking for the lowest value on the loss function
surface.
3
Source: jeremyjordan
Recap
height
age
NBA?
age
height
Contour Plot (Loss function)
4
Neuron
c
b
a
output Y
w1
w2
w3
x =
aw1
+bw2
+cw3
f(x)
Linear
Combination
Activation
function
5
Optimization
6
Source: jeremyjordan
Learning Rate Matters
7
Source: jeremyjordan
Why Feature Scaling Matter
Player Age Height NBA
#1 45 190 No
#2 19 192 Yes
#3 32 195 No
#4 48 200 No
#5 25 193 Yes
#6 31 182 Yes
#7 49 177 No
#8 22 201 Yes
8
Feature Scaling (Normalization)
9
age
Contour Plot (Loss function)
age
height
Contour Plot (Loss function)
height
10
Z-Score Normalization
Min-Max Normalization
Source: Codecademy
11
Source: GIGAcalculator
12
平均數 (Mean)
13
變異數 (variance) 是用來衡量資料發散程度
Batch, Stochastic & Mini Batch Gradient Descent
14
(Batch) Gradient Descent
Stochastic Gradient Descent
Mini Batch Gradient Descent
(Batch = 4)
Vanishing gradient Problem
15
Source: Andre Ye
Batch
Normalization
Distribution changes
Higher learning rate
Less care about initialization
2015, ICML, Google Researchers
Mini-Batch
17
直覺
★ 既然 feature scaling 對於 input layer 有幫助,那 hidden layer 應該也有幫助
★ 如果你把麥克風放大器的旋鈕轉到接近0,別人就聽不到你的聲音,但如果你
把它轉到接近最大聲,你的聲音就會飽和。
★ 現在想像一下你將這種放大器串接起來,你必須正確的設定它們,才能讓聲音
在串接的末端既響亮又清晰。你的聲音從各個放大器出來時的振幅必須和它進
入該放大器時一樣。
BN Layer
c
b
a
output Y
w1
w2
w3
x =
aw1
+bw2
+cw3
f(x)
18
Linear
Combination
Activation
function
19
20
Before/After Activation Function
Input Output
Source: 莫煩
21
Back to Paper
μ and σ2
are calculated
on a per-batch basis
while γ and β are
learned parameters
used across all
batches.
22
Source: Andre Ye
23
Stable Training
2018, NIPS, MIT
Internal
Covariate Shift
Smooth
Landscape
24
Source: How Does Batch Normalization Help Optimization?
Pros and Cons
Pros
➔ Allow sub-optimal starts (weight
initialization)
➔ Speed up training (Larger learning
rate)
➔ Solve vanishing gradient problem
➔ Acts a regularizer (Introduce
randomness)
Cons
➔ Small batch size leads to
unstable mean and variance
(Batch >= 32)
➔ Not for RNN (Recurrent neural
networks)
25
Further Reading
- 什麼是 Batch Normalization 批標準化 (Youtube 5:08)
- Batch Normalization - Explained (Youtube 8:48)
- Batch Normalization 介紹 - (Medium in Mandarin)
- Normalizing your data (specifically, input and batch normalization)
26

More Related Content

PPTX
Introduction to CNN
PDF
Compiler Design- Machine Independent Optimizations
PDF
Convolutional Neural Networks (CNN)
PPTX
Batch normalization presentation
PDF
Recurrent Neural Networks. Part 1: Theory
PDF
Convolutional Neural Network Models - Deep Learning
PDF
Digital Image Processing: Image Segmentation
PPTX
Semantic Segmentation Methods using Deep Learning
Introduction to CNN
Compiler Design- Machine Independent Optimizations
Convolutional Neural Networks (CNN)
Batch normalization presentation
Recurrent Neural Networks. Part 1: Theory
Convolutional Neural Network Models - Deep Learning
Digital Image Processing: Image Segmentation
Semantic Segmentation Methods using Deep Learning

What's hot (20)

PPTX
Mpeg video compression
PPTX
Image classification using CNN
PDF
PR-231: A Simple Framework for Contrastive Learning of Visual Representations
PDF
Image Restoration (Digital Image Processing)
PPT
Wavelet transform in image compression
PPTX
Analytical learning
PPTX
Recurrent Neural Networks (RNNs)
PDF
Overview of Convolutional Neural Networks
PPTX
Recognition-of-tokens
PPTX
Image classification using cnn
PDF
Target language in compiler design
PPT
Morphological Image Processing
PPTX
Forms of learning in ai
PDF
Image processing fundamentals
PDF
Basics of image processing using MATLAB
PDF
Autoencoders
PPTX
04 Multi-layer Feedforward Networks
PPTX
Image Sampling and Quantization.pptx
DOC
Digital image processing questions
PPTX
Point processing
Mpeg video compression
Image classification using CNN
PR-231: A Simple Framework for Contrastive Learning of Visual Representations
Image Restoration (Digital Image Processing)
Wavelet transform in image compression
Analytical learning
Recurrent Neural Networks (RNNs)
Overview of Convolutional Neural Networks
Recognition-of-tokens
Image classification using cnn
Target language in compiler design
Morphological Image Processing
Forms of learning in ai
Image processing fundamentals
Basics of image processing using MATLAB
Autoencoders
04 Multi-layer Feedforward Networks
Image Sampling and Quantization.pptx
Digital image processing questions
Point processing
Ad

Similar to Introduction to batch normalization (20)

PDF
[系列活動] 手把手的深度學習實務
PDF
Hands-on Tutorial of Deep Learning
PDF
machine learning a gentle introduction 2018 (edited)
PDF
Deep Generative Models II (DLAI D10L1 2017 UPC Deep Learning for Artificial I...
PPTX
Deep learning: what? how? why? How to win a Kaggle competition
PDF
lec2_annotated.pdf ml csci 567 vatsal sharan
PPTX
Anomaly Detection for Real-World Systems
PDF
What is a GAN Generative Adversarial Networks Guide.pdf
PDF
What is a GAN Generative Adversarial Networks Guide.pdf
PPTX
Robust Design And Variation Reduction Using DiscoverSim
PDF
What is a GAN Generative Adversarial Networks Guide.pdf
PDF
Reinforcement Learning - DQN
PDF
[系列活動] 手把手的深度學實務
PPTX
UNIT IV NEURAL NETWORKS - Multilayer perceptron
PDF
Using Topological Data Analysis on your BigData
PPTX
Illustrative Introductory Neural Networks
PDF
Getting Started with Machine Learning
PDF
PPTX
ML Study Jams - Session 3.pptx
PDF
04 numerical
[系列活動] 手把手的深度學習實務
Hands-on Tutorial of Deep Learning
machine learning a gentle introduction 2018 (edited)
Deep Generative Models II (DLAI D10L1 2017 UPC Deep Learning for Artificial I...
Deep learning: what? how? why? How to win a Kaggle competition
lec2_annotated.pdf ml csci 567 vatsal sharan
Anomaly Detection for Real-World Systems
What is a GAN Generative Adversarial Networks Guide.pdf
What is a GAN Generative Adversarial Networks Guide.pdf
Robust Design And Variation Reduction Using DiscoverSim
What is a GAN Generative Adversarial Networks Guide.pdf
Reinforcement Learning - DQN
[系列活動] 手把手的深度學實務
UNIT IV NEURAL NETWORKS - Multilayer perceptron
Using Topological Data Analysis on your BigData
Illustrative Introductory Neural Networks
Getting Started with Machine Learning
ML Study Jams - Session 3.pptx
04 numerical
Ad

More from Jamie (Taka) Wang (20)

PDF
20200606_insight_Ignition
PDF
20200727_Insight workstation
PDF
20200723_insight_release_plan
PDF
20210105_量產技轉
PDF
20200808自營電商平台策略討論
PDF
20200427_hardware
PDF
20200429_ec
PDF
20200607_insight_sync
PDF
20220113_product_day
PDF
20200429_software
PDF
20200602_insight_business
PDF
20200408_gen11_sequence_diagram
PDF
20190827_activity_diagram
PDF
20150722 - AGV
PDF
20161220 - microservice
PDF
20160217 - Overview of Vortex Intelligent Data Sharing Platform
PDF
20151111 - IoT Sync Up
PDF
20151207 - iot strategy
PDF
20141210 - Microservice Container
PDF
20161027 - edge part2
20200606_insight_Ignition
20200727_Insight workstation
20200723_insight_release_plan
20210105_量產技轉
20200808自營電商平台策略討論
20200427_hardware
20200429_ec
20200607_insight_sync
20220113_product_day
20200429_software
20200602_insight_business
20200408_gen11_sequence_diagram
20190827_activity_diagram
20150722 - AGV
20161220 - microservice
20160217 - Overview of Vortex Intelligent Data Sharing Platform
20151111 - IoT Sync Up
20151207 - iot strategy
20141210 - Microservice Container
20161027 - edge part2

Recently uploaded (20)

PPTX
ECG_Course_Presentation د.محمد صقران ppt
PPTX
7. General Toxicologyfor clinical phrmacy.pptx
PPTX
Cell Membrane: Structure, Composition & Functions
PDF
Mastering Bioreactors and Media Sterilization: A Complete Guide to Sterile Fe...
PPTX
neck nodes and dissection types and lymph nodes levels
PPTX
GEN. BIO 1 - CELL TYPES & CELL MODIFICATIONS
PPT
POSITIONING IN OPERATION THEATRE ROOM.ppt
PPTX
Microbiology with diagram medical studies .pptx
PPTX
BIOMOLECULES PPT........................
PDF
Sciences of Europe No 170 (2025)
PDF
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
PPTX
Introduction to Fisheries Biotechnology_Lesson 1.pptx
PPTX
Derivatives of integument scales, beaks, horns,.pptx
PDF
Phytochemical Investigation of Miliusa longipes.pdf
PPTX
EPIDURAL ANESTHESIA ANATOMY AND PHYSIOLOGY.pptx
PDF
Unveiling a 36 billion solar mass black hole at the centre of the Cosmic Hors...
PDF
Formation of Supersonic Turbulence in the Primordial Star-forming Cloud
PPTX
ognitive-behavioral therapy, mindfulness-based approaches, coping skills trai...
PPTX
Introduction to Cardiovascular system_structure and functions-1
PPT
The World of Physical Science, • Labs: Safety Simulation, Measurement Practice
ECG_Course_Presentation د.محمد صقران ppt
7. General Toxicologyfor clinical phrmacy.pptx
Cell Membrane: Structure, Composition & Functions
Mastering Bioreactors and Media Sterilization: A Complete Guide to Sterile Fe...
neck nodes and dissection types and lymph nodes levels
GEN. BIO 1 - CELL TYPES & CELL MODIFICATIONS
POSITIONING IN OPERATION THEATRE ROOM.ppt
Microbiology with diagram medical studies .pptx
BIOMOLECULES PPT........................
Sciences of Europe No 170 (2025)
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
Introduction to Fisheries Biotechnology_Lesson 1.pptx
Derivatives of integument scales, beaks, horns,.pptx
Phytochemical Investigation of Miliusa longipes.pdf
EPIDURAL ANESTHESIA ANATOMY AND PHYSIOLOGY.pptx
Unveiling a 36 billion solar mass black hole at the centre of the Cosmic Hors...
Formation of Supersonic Turbulence in the Primordial Star-forming Cloud
ognitive-behavioral therapy, mindfulness-based approaches, coping skills trai...
Introduction to Cardiovascular system_structure and functions-1
The World of Physical Science, • Labs: Safety Simulation, Measurement Practice

Introduction to batch normalization