SlideShare a Scribd company logo
陳穗碧(Mora Chen)
20190421@AI Tech
YOLO v2
Better, Faster, Stronger
2
Joseph Redmon
YOLO-one-stage object detection
• Object Detection = Image Classification + Feature Extraction +
Image Classification+ Object Localization
• YOLO series
 YOLO v1[2015]
 YOLO v2,YOLO 9000[2016/12]
 YOLO v3[2018]
 Spiking-YOLO: Spiking Neural Network for Real-time Object
[2019] Accuracy 97%
3
What Are Covered
• YOLOv1 review
• YOLOv2 Improvements Over YOLOv1 (Better)
• YOLOv2 Using Darknet-19 (Faster)
• YOLO9000 by WordTree (Stronger)
• Deep Learning on Event Detection for River Surveillance in Taiwan
4
三個YOLO重要的步驟
• 1.Resize輸入的圖到448*448
2. 執行一個卷積神經網路
3. 基於模型輸出的信心程度(Confidence)依據閾值和Non-max suppression得到偵測結果
5
Activation function是Leaky ReLU
tensor dimension: S x S x (B x 5 + c)
6
𝐶 = 𝑃 𝑜𝑏𝑗𝑒𝑐𝑡 × 𝐼𝑂𝑈 𝑝 𝐶 = 𝑝(𝐶𝑙𝑎𝑠𝑠 |𝑜𝑏𝑗𝑒𝑐𝑡)
 每個 box裡面有特定Class的機率
𝑝 𝐶𝑙𝑎𝑠𝑠 × 𝐼𝑂𝑈 = 𝑝(𝐶𝑙𝑎𝑠𝑠 |𝑜𝑏𝑗𝑒𝑐𝑡) × 𝑃 𝑜𝑏𝑗𝑒𝑐𝑡 × 𝐼𝑂𝑈
bbox中心點:0~1bbox長寬:0~1
S x S
B=2
5
C=20
模型假設:每一個grid cell負責偵測一個物體,不管有多少bounding box
Loss function
7
𝜆 = 0.5 𝐶 = 0,沒有物體,, 如果 𝐶 ≠ 0,
代表模型估計錯誤,是背景卻是辨識成物體 的錯誤,
不希望模型著墨於此,因此 𝜆 <1
1 :代表cell i主要採用box j來預測物體
𝜆 = 5
希望中心點距離越近越好
1 :代表cell i 有物體
Q: 有物件是少數,所以有unbalance的問題?
大物體產生同樣誤差的距離的機率>小物體產生同樣誤
差距離的機率,透過開平方根來處理
8
大物體產生同樣誤差的距離的機率>小物體產生同樣誤
差距離的機率,透過開平方根來處理
Non-max Suppression(非极大值抑制)
9
論文:Improving Object Detection With One Line of Code
目的:去除多餘的bounding box,保留最好的一個
𝑝 𝐶𝑙𝑎𝑠𝑠 × 𝐼𝑂𝑈
𝑝 𝐶𝑙𝑎𝑠𝑠 × 𝐼𝑂𝑈
10
Limitaion of YOLOv1
• YOLOv1每個cell上雖輸出B個bounding box,但卻只有㇐組class存在機率,限制了每個cell只能預測出
㇐種物件,如果幾個較大物件的中心剛好出現在同㇐個cell,可能就會做出錯誤判斷。(這個限制在YOLO
v2中就被改掉了)
• Non-max Suppression(非极大值抑制),如果多個物件位置重疊率高,可能因此被NMS判斷為同一物件。
• 對於很小的物體會較難辨識
• Fast R-CNN相比,YOLO會產生較多的bounding boxes的定位錯誤。
• 基於region proposal的檢測系統相比,YOLO的Recall較低。
• 如果要做預測的圖中所出現的物件的⾧寬比或排列方式較為異常(即訓練資料中不常見),則YOLO並無這
樣的泛化學習能力,而容易做出錯誤預測。
• CNN分類器架構中有多層max-pooling layer,使某些特徵消失。
• 雖然loss function中第(2)(2)項有考慮相同誤差在較大圖中的影響應該要較小,但在IoU的計算中卻是沒
有辦法做到同樣的處理。
11
YOLO v2
Better, Faster, Stronger
YOLOv2 怎樣調整越來越better
13
YOLO V2
 1.Batch Normalization
 2.High resolution Classifier
 3.Convolution with Anchor Boxes
 4.Dimension Clusters
 5.Direct location prediction
 6.Fine-Grained Features
 7.Multi-Scale Training
YOLOv21.Batch Normalization
14
dropout
• Dropout通常放在激活函數之後
• 卷基層訓練參數本來就少,Dropout應用在卷基層中,降低模型參數效果有限
• 目前網絡中,全連接層被全局平均持化層(Global Average Pooling)所取代,不但能降低模型尺
寸,還可以提升性能,因此,漸漸大家都不採用Dropout
YOLOv21.Batch Normalization
15
What : dropout batch normalization
How : adding batch normalization on all of the convolution layers
Why : without overfitting+加速訓練的速度+避免梯度消失
Value : 2% mAP
Batch Normalizationdropout
Sigmoid
derivative
Near zero
YOLOv22.Convolution with Anchor Boxes
16
Anchor Boxesbounding boxes
一個格子預
測一個物體
一個格式預
測多個物體
(grid cell,anchor box)grid cell
Q:選擇anchor box 數量? 5到10個anchor box 越多涵蓋比例種類越多
Q:選擇anchor box比例?
anchor boxes 的尺寸該怎麼挑選? Faster RCNN是人為選定(1:1,2:1,1:2)
 為了處理兩個對象出現在同一個格子的情況
YoloV2 (5 anchors)
正確地調整Anchor Boxes1比例可以大大提高模型檢測某些位置大小和形狀的對象能力
YOLOv23.Convolution with Anchor Boxes
17
bbox Anchor Boxes
 每個 Anchor Boxes裡面有特定Class的機率
𝑝 𝐶𝑙𝑎𝑠𝑠 × 𝐼𝑂𝑈
= 𝑝(𝐶𝑙𝑎𝑠𝑠 |𝑜𝑏𝑗𝑒𝑐𝑡) × 𝑃 𝑜𝑏𝑗𝑒𝑐𝑡 × 𝐼𝑂𝑈
YOLOv22.Convolution with Anchor Boxes
18
448 × 448 416 × 416
odd number spatial dimension
What : 448 416
How : Shrink the network o operate on 416 input images
Why : Large objects , tend to occupy the center of the image so it’s good to have a single location right
at the center to predict these objects.
YOLOv24.Dimensional Clusters
19
Q:網絡自身可以學着不斷調節box的大小,給一個好的Anchor boxes的
真的會比較好嗎?
YOLOv24.Dimensional Clusters Step1: 先決定群組數k
Step2:決定群中心,
Step3:重複做圖3–5
Step4:直到群心不太變動(收斂)Hand picked
K-means
clustering
參考資料來源:
YOLOv24.Dimensional Clusters
Hand picked
K-means
clustering
Q:如何建立群之間的距離?
Q:標準的k-means方法用的是歐氏距離,適用嗎?
導致 larger boxes generate more error than smaller boxes.
what we really want are priors that lead to good IOU scores,
which is independent of the size of the box.
We choose k = 5 as a good tradeoff between model complexity
and high recall.
 (prior information)
學習算法融入應用場景的資訊
YOLOv24.Dimensional Clusters
22
Hand picked
K-means
clustering
This indicates that using k-means to generate our bounding box starts the model off with a better
representation and makes the task easier to learn.
The cluster centroids are significantly different than hand-
picked anchor boxes.
There are fewer short, wide boxes and more tall, thin boxes
YOLOv25.Direct location prediction
23
unconstrained constrained
In region proposal networks the network predicts values 𝑡 and
𝑡 and the (x, y) center coordinates are calculated as:
For example, a prediction of 𝑡 = 1 would shift the box to the
right by the width of the anchor box, a prediction of 𝑡 = −1
would shift it to the left by the same amount. This formulation is
unconstrained so any anchor box can end up at any point in the
image, regardless of what location predicted the box. With
random initialization the model takes a long time to stabilize to
predicting sensible offsets.
predict location coordinates relative to the location of the grid
cell
YOLOv25.Direct location prediction
24
In Yolo v2 anchors (width, height) - are sizes of objects relative to the final feature
map
ANCHORS = [0.57273, 0.677385, 1.87446, 2.06253, 3.33843, 5.47434,
7.88282, 3.52778, 9.77052, 9.16828]
prior
YOLOv25.Direct location prediction
• The final values are based on not coordinates but grid values.
• YOLO default set:
anchors = 1.3221, 1.73145, 3.19275, 4.00944, 5.05587, 8.09892, 9.47112, 4.84053, 11.2364, 10.0071
this means the height and width of first anchor is slightly over one grid cell [1.3221, 1.73145] and the
last anchor almost covers the whole image [11.2364, 10.0071] considering the image is 13x13 grid.
25
YOLOv25.Direct location prediction
26
What : unconstrained constrained
How : predict location coordinates relative to the location of the grid cell
Why : Using anchor boxes->model instability(especially during early iterations)(predicting the (x,y) for the ox)
Value : 5% mAP
YOLOv26.Fine-Grained Features
27
Add passthrough
layer
• Faster R-CNN and SSD both run their proposal networks at various feature
maps in the network o get a range of resolutions.
• Add a pass-through layer that brings features from an earlier layer at 26*26
resolution.
YOLOv26.Fine-Grained Features
28
The passthrough layer concatenates the higher resolution features with the
low resolution features by stacking adjacent features into different channels
instead of spatial locations, similar to the identity mappings in ResNet.
資料來源
YOLOv26.Fine-Grained Features[reorg]
• 這步驟是YOLOv2先用到的功能,全名是reorganization。目的是希望能將圖的大小在減少1/2,
但又不希望像maxpool的作法會丟掉訊息,因此利用相對位置的資訊,將1張圖拆成4小張,這
樣大圖的訊息就能完全保留下來。因此如果feature map大小是100*100*126 (長*寬*通道數)經
由[reorg],feature map將會變成 50*50*(126*4)= 50*50*504。
29
資料來源
YOLOv26.Fine-Grained Features[concatenates]
30
Reshape
concatenates
Pass through
Q:為什麼要先pass through要先經過conv 64 filters?(擷取較重要的features?)
Q: 為什麼filters的數量一直增加?(增加feature的細微度?)
Q: 3*3和1*1為什麼要交互使用?
YOLOv27.Multi-Scale Training
31
228
352
416
480544
What : Fixing the input images size Multi-Scale Training
How : Every 10 batches our network randomly chooses a new image dimension size from the multiples of 32:{320,352,…,608}
Why : Be robust to running on images of different sizes.
[Since our model only uses convolutional and pooling layers it can be resized on the fly]
Value : 1% mAP
Fixing the input
images size
Multi-Scale
Training
YOLOv2 怎樣調整越來越faster
32
Training for classification
on ImageNet
Training for detection
19 convolutional layers and 5 maxpooling layers
• YOLOv2的目標是高精度實時檢測,所以期望在不增大網路、精度不下降的前提下來對定位錯
誤和低Recall進行改善。
33
2019預計活動
34
邀請外部講師
4/28 (with Women Techmakers Taichung)
胡筱薇教授: 社群網路分析和情感分析
杜岳華:review of Graph Neural Network
Meetup
• 論文導讀
• 實作經驗分享
課程&競賽
• Google MLCC
• Julia(3/31:Neural ODE & DiffEqFlux)
• 組團比賽
姊妹社團
• Pytorch Taichung(4/27_LSTM)
• Women TechmakersTaichung
Q and A

More Related Content

PDF
YOLO9000 - PR023
PPTX
PPTX
You only look once
PDF
YOLOv4: optimal speed and accuracy of object detection review
PDF
PR-207: YOLOv3: An Incremental Improvement
PPTX
Deep learning for object detection
PPTX
You Only Look Once: Unified, Real-Time Object Detection
PPTX
You only look once (YOLO) : unified real time object detection
YOLO9000 - PR023
You only look once
YOLOv4: optimal speed and accuracy of object detection review
PR-207: YOLOv3: An Incremental Improvement
Deep learning for object detection
You Only Look Once: Unified, Real-Time Object Detection
You only look once (YOLO) : unified real time object detection

What's hot (20)

PPTX
PPTX
Real Time Object Dectection using machine learning
PDF
[PR12] You Only Look Once (YOLO): Unified Real-Time Object Detection
PPTX
Tutorial on Object Detection (Faster R-CNN)
PDF
Single Shot Multibox Detector
PDF
Anatomy of YOLO - v1
PPTX
You only look once: Unified, real-time object detection (UPC Reading Group)
PPTX
Yolo releases gianmaria
PPTX
Yolov3
PDF
Introduction of Faster R-CNN
PDF
SSD: Single Shot MultiBox Detector (UPC Reading Group)
PPTX
PDF
Anchor free object detection by deep learning
PDF
Introduction to object detection
PPTX
YOLO v1
PDF
Yolov3
PPTX
PDF
Object detection and Instance Segmentation
PPTX
Objects as points
PDF
Faster R-CNN: Towards real-time object detection with region proposal network...
Real Time Object Dectection using machine learning
[PR12] You Only Look Once (YOLO): Unified Real-Time Object Detection
Tutorial on Object Detection (Faster R-CNN)
Single Shot Multibox Detector
Anatomy of YOLO - v1
You only look once: Unified, real-time object detection (UPC Reading Group)
Yolo releases gianmaria
Yolov3
Introduction of Faster R-CNN
SSD: Single Shot MultiBox Detector (UPC Reading Group)
Anchor free object detection by deep learning
Introduction to object detection
YOLO v1
Yolov3
Object detection and Instance Segmentation
Objects as points
Faster R-CNN: Towards real-time object detection with region proposal network...
Ad

Similar to Yolo v2 ai_tech_20190421 (20)

PPTX
YOLO_review.pptxThis is a test document that is used to satisfy the requireme...
PDF
PR-132: SSD: Single Shot MultiBox Detector
PDF
Review: You Only Look One-level Feature
PDF
“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...
PDF
Lecture 2.B: Computer Vision Applications - Full Stack Deep Learning - Spring...
PPTX
Classification of Object Detection Algorithms
PDF
Pixel RNN to Pixel CNN++
PDF
SimCLR: A Simple Framework for Contrastive Learning of Visual Representations
PDF
MLIP - Chapter 5 - Detection, Segmentation, Captioning
PPTX
PPTX
InternImage: Exploring Large-Scale Vision Foundation Models with Deformable C...
PPTX
Week5-Faster R-CNN.pptx
PDF
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
PDF
Faster R-CNN - PR012
PDF
Comparing_AI_Models_for_Object_Detection.pdf
PDF
#10 pydata warsaw object detection with dn ns
PPTX
Densebox
PDF
20190927 generative models_aia
PDF
object detection paper review
PDF
Auro tripathy - Localizing with CNNs
YOLO_review.pptxThis is a test document that is used to satisfy the requireme...
PR-132: SSD: Single Shot MultiBox Detector
Review: You Only Look One-level Feature
“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...
Lecture 2.B: Computer Vision Applications - Full Stack Deep Learning - Spring...
Classification of Object Detection Algorithms
Pixel RNN to Pixel CNN++
SimCLR: A Simple Framework for Contrastive Learning of Visual Representations
MLIP - Chapter 5 - Detection, Segmentation, Captioning
InternImage: Exploring Large-Scale Vision Foundation Models with Deformable C...
Week5-Faster R-CNN.pptx
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
Faster R-CNN - PR012
Comparing_AI_Models_for_Object_Detection.pdf
#10 pydata warsaw object detection with dn ns
Densebox
20190927 generative models_aia
object detection paper review
Auro tripathy - Localizing with CNNs
Ad

Recently uploaded (20)

PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PPTX
Cloud computing and distributed systems.
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
NewMind AI Monthly Chronicles - July 2025
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
cuic standard and advanced reporting.pdf
PDF
KodekX | Application Modernization Development
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
Reach Out and Touch Someone: Haptics and Empathic Computing
Review of recent advances in non-invasive hemoglobin estimation
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
MYSQL Presentation for SQL database connectivity
Advanced methodologies resolving dimensionality complications for autism neur...
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
“AI and Expert System Decision Support & Business Intelligence Systems”
The Rise and Fall of 3GPP – Time for a Sabbatical?
Building Integrated photovoltaic BIPV_UPV.pdf
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
Cloud computing and distributed systems.
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Understanding_Digital_Forensics_Presentation.pptx
Chapter 3 Spatial Domain Image Processing.pdf
NewMind AI Monthly Chronicles - July 2025
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
cuic standard and advanced reporting.pdf
KodekX | Application Modernization Development
CIFDAQ's Market Insight: SEC Turns Pro Crypto

Yolo v2 ai_tech_20190421