SlideShare a Scribd company logo
Video Object Segmentation
๊ณ ๋ ค๋Œ€ํ•™๊ต ๊ณ ์˜์ค€
Segmentation
โ€ข Divide data into meaningful segments
Segmentation
Superpixel Image segmentation
Video segmentation Video object segmentation
Video Object Segmentation
โ€ข Semi-supervised video object segmentation
โ€ข Primary object segmentation
โ€ข Multiple object segmentation
Semi-supervised Video Object Segmentation
โ€ข Track and segment a target object
โ€ข Annotated by a user in the first frame
First frame
& user annotation
Segment track
Primary Object Segmentation
โ€ข Segment a primary object in a video automatically
Primary object: Diver
Primary object: Tennis player
Multiple Object Segmentation
โ€ข Extract multiple segment tracks as many as possible
Primary Object Segmentation
Primary Object Segmentation
โ€ข Primary object segmentation
โ€ข Initial region estimation
โ€ข Motion boundaries
โ€ข Object proposal
โ€ข Saliency maps
โ€ข Refinement
โ€ข Construct models for the primary object and the background,
e.g. Gaussian mixture models (GMMs)
โ€ข Propose augmentation and reduction process (ARP)
Primary Object Segmentation in Videos Based on
Region Augmentation and Reduction
โ€ข Overview
โ€ข Input: A set of consecutive video frames
โ€ข Output: A set of pixel-wise segments to delineate the primary
object
Candidate Region Generation
โ€ข Candidate regions
โ€ข Ultrametric contour map (UCM)
โ€ข Obtain color-based and motion-based UCMs
โ€ข Each region in UCM becomes a superpixel
Candidate Region Generation
โ€ข Candidate regions
โ€ข Generate candidate regions by merging neighboring superpixels
โ€ข Determine the pair, ๐‘  ๐‘š and ๐‘  ๐‘›, sharing the weakest boundary
โ€ข Merge ๐‘  ๐‘š and ๐‘  ๐‘› in a single superpixel
โ€ข Repeat this process only one superpixel remains
Candidate Region Generation
โ€ข Foreground confidence
โ€ข Measure the foreground confidence of each candidate region
โ€ข Appearance confidence ๐œ™๐‘–
(๐‘ก)
โ€ข Obtain a saliency map using technique in [1]
โ€ข Average the saliency values within the candidate region
โ€ข Edge confidence ๐œ“๐‘–
(๐‘ก)
โ€ข Combine color-based edge map and motion-based edge map
๐‘๐‘–
(๐‘ก)
= ๐œ™๐‘–
(๐‘ก)
+ ๐œ“๐‘–
(๐‘ก)
[1] W.-D. Jang, C. Lee, and C.-S. Kim, โ€œPrimary object segmentation in videos via alternate convex optimization of foreground and
background distributions,โ€ CVPR, 2016
Candidate Region Generation
โ€ข Foreground confidence
โ€ข Select the top 20 candidate regions
โ€ข Warp the selected candidate regions to neighboring frames
โ€ข Rearrange the set of candidate regions ๐’ฌ(๐‘ก) = ๐‘ž1
๐‘ก
, ๐‘ž2
๐‘ก
, โ€ฆ , ๐‘ž ๐‘
(๐‘ก)
โ€ข Feature description
โ€ข Describe the feature ๐Ÿ๐‘–
(๐‘ก)
of each candidate region ๐‘ž๐‘–
(๐‘ก)
using the
bag-of-visual-words approach
Initial Region Estimation
โ€ข Selecting initial primary object regions
โ€ข Choose the main region ๐‘ž ๐›ฟ
(๐‘ก)
among candidate regions
โ€ข Exploit the recurrence property that a primary object appears
repeatedly in a video sequence
Input frames
Candidate region
generation
Initial region
estimation
Initial Region Estimation
โ€ข Selecting initial primary object regions
โ€ข Assume that feature of main region ๐‘ž ๐›ฟ
(๐‘ก)
should be similar to
features of the main regions in the other frames
โ€ข ๐ฉ ๐œ
denotes the feature of the main region in frame ๐ผ(๐œ)
๐›ฟ = arg min เท
๐œ=1,๐œโ‰ ๐‘ก
๐‘‘ ๐œ’ ๐Ÿ๐‘–
(๐‘ก)
, ๐ฉ ๐œ
Input frames
Candidate region
generation
Initial region
estimation
Initial Region Estimation
โ€ข Selecting initial primary object regions
โ€ข Initialization of ๐ฉ ๐œ
โ€ข Superpose features of all candidate region in ๐’ฌ(๐œ)
โ€ข Combine features of candidate regions, ๐…(๐œ) = ๐Ÿ1
๐œ
, โ€ฆ , ๐Ÿ ๐‘
๐œ
, using
the foreground confidence vector ๐œ(๐œ) = ๐‘1
๐œ
, โ€ฆ , ๐‘ ๐‘
๐œ
๐‘‡
โ€ข Obtain the main region ๐‘ž ๐›ฟ
(๐‘ก)
by applying ๐ฉ ๐œ
for each frame
โ€ข Alternative update of the main regions
โ€ข Update ๐ฉ ๐‘ก for each frame by ๐ฉ ๐‘ก โ† ๐Ÿ๐›ฟ
๐œ
โ€ข Choose the main region using the updated features
๐ฉ ๐œ
= ๐…(๐œ)
๐œ(๐œ)
๐›ฟ = arg min เท
๐œ=1,๐œโ‰ ๐‘ก
๐‘‘ ๐œ’ ๐Ÿ๐‘–
(๐‘ก)
, ๐ฉ ๐œ
Primary Object Region Refinement
โ€ข Refinement of primary object regions
โ€ข Initial regions may exclude parts of primary objects or include
noisy regions (background or other objects)
โ€ข Attempt to refine initial regions
โ€ข Augment initial regions with missing region
โ€ข Reducing initial regions by removing noisy regions
Primary Object Region Refinement
โ€ข Augmented regions
โ€ข Augment initial regions ๐‘ž ๐›ฟ
๐‘ก
with candidate region ๐‘ž๐‘–
๐‘ก
in ๐’ฌ(๐‘ก)
โ€ข Reduced regions
โ€ข Reduce initial regions ๐‘ž ๐›ฟ
๐‘ก
using candidate region ๐‘ž ๐‘—
๐‘ก
in ๐’ฌ(๐‘ก)
๐‘ž ๐›ฟ
๐‘ก
๐‘ž๐‘–
๐‘ก
๐‘ž๐‘–
๐‘ก
๐‘ž ๐›ฟ
๐‘ก
๐‘Ÿ๐‘–
๐‘ก
= ๐‘ž ๐›ฟ
๐‘ก
โˆช ๐‘ž๐‘–
๐‘ก
๐‘ž ๐›ฟ
๐‘ก
๐‘ž ๐‘—
๐‘ก
๐‘ž ๐›ฟ
๐‘ก
๐‘ž ๐‘—
๐‘ก
๐‘Ÿ๐‘—
๐‘ก
= ๐‘ž ๐›ฟ
๐‘ก
โˆฉ ๐‘ž ๐‘—
๐‘ก
Primary Object Region Refinement
โ€ข Augmentation and reduction process (ARP)
โ€ข Determine whether to augment or reduce ๐‘ž ๐›ฟ
๐‘ก
by cost function
โ€ข Data cost
โ€ข Constrain that the refined region ๐‘Ÿ๐‘–
(๐‘ก)
should be similar to initial
regions in all frames
โ€ข Segmentation cost
โ€ข Make the refined region ๐‘Ÿ๐‘–
(๐‘ก)
as dissimilar from its nearby
background as possible
๐ถ ๐‘Ÿ๐‘–
(๐‘ก)
= ๐ถdata ๐‘Ÿ๐‘–
(๐‘ก)
+ ๐›พ โ‹… ๐ถseg ๐‘Ÿ๐‘–
(๐‘ก)
๐ถdata ๐‘Ÿ๐‘–
(๐‘ก)
=
1
๐‘‡
เท
๐œ=1
๐‘‘ ๐œ’ ๐Ÿr,๐‘–
(๐‘ก)
, ๐Ÿ๐›ฟ
(๐‘ก)
๐ถseg ๐‘Ÿ๐‘–
(๐‘ก)
= โˆ’๐‘‘ ๐œ’ ๐Ÿr,๐‘–
(๐‘ก)
, ๐Ÿb,๐‘–
(๐‘ก)
Primary Object Region Refinement
โ€ข Augmentation and reduction process (ARP)
โ€ข Minimize the cost function for the optimal refined region
โ€ข Perform ARP iteratively
โ€ข Construct the set of augmented and reduced regions again by
employing ๐‘Ÿโˆ—
๐‘ก
as the initial region
โ€ข Find the optimal ๐‘Ÿโˆ—
๐‘ก
by minimizing ๐ถ ๐‘Ÿ๐‘–
(๐‘ก)
โ€ข Repeat until ๐‘Ÿโˆ—
๐‘ก
is unchanged
๐‘Ÿโˆ—
๐‘ก
= arg min ๐ถ ๐‘Ÿ๐‘–
(๐‘ก)
Primary Object Region Refinement
โ€ข Augmentation and reduction process (ARP)
โ€ข DAVIS dataset [2]
โ€ข 50 video sequences (3,455 annotated frames)
โ€ข Performance measure
โ€ข Region similarity ๐’ฅ: Intersection over union
โ€ข Contour accuracy โ„ฑ: F-measure that is the harmonic mean of the
contour precision and recall rates
Experimental results
[2] F. Perazzi, J. Pont-Tuset, B. McWilliams, L. Van Gool, M. Gross, and A. Sorkine-Hornung, โ€œA benchmark dataset and evaluation
methodology for video object segmentation,โ€ CVPR 2016
Experimental results
โ€ข Impacts of ARP
โ€ข Compare ARP with the conventional refinement techniques [20,
36]
โ€ข Apply refinement techniques to our initial regions (IR)
[20] A. Papazoglou and V. Ferrari, โ€œFast object segmentation in unconstrained video,โ€ ICCV,2013.
[36] D. Zhang, O. Javed, and M. Shah, โ€œVideo object segmentation through spatially accurate and temporally dense extraction of
primary object regions,โ€ CVPR, 2013.
Experimental results
โ€ข Quantitative comparison
โ€ข Semi-supervised: Human annotation at the first frame
โ€ข Multiple VOS: Output multiple objects
โ€ข POS: Output primary object objects
Experimental results
โ€ข Qualitative results
Multiple Object Segmentation
Multiple Object Segmentation
โ€ข Multiple object segmentation
โ€ข Motion segmentation
โ€ข Cluster point trajectories in a video
โ€ข Video object proposal
โ€ข Proposal matching
โ€ข Proposal clustering
โ€ข Segmentation guided by object detection and tracking
CDTS: Collaborative Detection, Tracking, and Segmentation
for Online Multiple Object Segmentation in videos
โ€ข Overview
โ€ข Input: A set of consecutive video frames
โ€ข Output: Multiple segment tracks
Input frames
Detection and
tracking results
Joint detection
and tracking
ASE segmentationObject track generation
Object Track Generation
โ€ข Joint detection and tracking
โ€ข Detector [3]
โ€ข Find object location without manual annotations
โ€ข Some objects may remain undetected
โ€ข Tracker [4]
โ€ข Boost the recall rate of objects using temporal correlations
โ€ข Three cases
โ€ข Both detection and tracking boxes
โ€ข Only detection box
โ€ข Only tracking box
[3] Y. Li, K. He, J. Sun, et al. โ€œR-FCN: Object detection via region-based fully convolutional networks,โ€ NIPS, 2016
[4] H.-U. Kim, D.-Y. Lee, J.-Y. Sim, and C.-S. Kim, โ€œSOWP: Spatially ordered and weighted patch descriptor for visual tracking,โ€ ICCV, 2015.
Object Track Generation
โ€ข Joint detection and tracking
โ€ข Both detection and tracking boxes
โ€ข Match detection and tracking boxes
โ€ข The Hungarian algorithm
โ€ข Choose the more accurate box for each matching pair
โ€ข Link the selected box to the corresponding object track
โ€ข Unmatched detection box
โ€ข Regard as newly appearing object
โ€ข Unmatched tracking box
โ€ข Link to the corresponding object track
ASE Segmentation
โ€ข Alternate shrinking and expansion (ASE)
โ€ข Over-segment frame in to superpixels
โ€ข Dichotomize each superpixel within and near the box into
either foreground or background class
ASE Segmentation
โ€ข Over-segmentation
โ€ข Obtain superpixels using UCM
โ€ข Preliminary classification
โ€ข Exploit overlap ratio between the box and each superpixel
โ€ข Refine preliminary foreground regions
ASE Segmentation
โ€ข Intra-frame refinement
โ€ข Constrain foreground regions to have intense edge strengths
โ€ข Boundary cost
โ€ข Shrink foreground regions by remove superpixels to minimize
the boundary cost in a greedy manner
๐ถbnd ๐น๐‘–
(๐‘ก)
= โˆ’ เท
๐ฑโˆˆ๐œ•๐น๐‘–
(๐‘ก)
๐‘ˆ ๐‘ก
๐ฑ
ASE Segmentation
โ€ข Inter-frame refinement
โ€ข Constrain that the refined region should be similar to the
segmentation results in previous frames
โ€ข Cost function
โ€ข Expand foreground regions by augmenting superpixels
โ€ข Perform shrinking in a similar way
๐ถinter ๐น๐‘–
(๐‘ก)
, โ„ฌ๐‘–
(๐‘ก)
= ๐›ผ โ‹… ๐ถtmp ๐น๐‘–
๐‘ก
+ ๐ถseg ๐น๐‘–
(๐‘ก)
, โ„ฌ๐‘–
(๐‘ก)
+๐ถbnd ๐น๐‘–
(๐‘ก)
ASE Segmentation
Experimental Results
โ€ข YouTube-Objects dataset
โ€ข Contain 126 videos for 10 object classes
โ€ข Performance measure
โ€ข Intersection over union (IoU)
[34] Y.-H. Tsai, G. Zhong, and M.-H. Yang, โ€œSemantic cosegmentation in videos.,โ€ ECCV,2016.
[42] Y. Zhang, X. Chen, J. Li, C. Wang, and C. Xia, โ€œSemantic object segmentation via detection in weakly labeled video,โ€ CVPR 2015.
Experimental results
โ€ข Qualitative results
Q&A
โ€ข Thank you

More Related Content

PPT
Face recognition ppt
ODP
Intro to Agent-based System
PDF
Deep Generative Models
PPTX
Video Segmentation
PDF
Convolutional Neural Networks (CNN)
PPTX
Object detection with deep learning
PDF
Digital Image Processing: Image Segmentation
PDF
Autoencoders
Face recognition ppt
Intro to Agent-based System
Deep Generative Models
Video Segmentation
Convolutional Neural Networks (CNN)
Object detection with deep learning
Digital Image Processing: Image Segmentation
Autoencoders

What's hot (20)

PPTX
Chapter 9 morphological image processing
PPTX
Deep learning for object detection
PDF
Performance Metrics for Machine Learning Algorithms
DOCX
Template Matching - Pattern Recognition
PDF
Automated Neural Image Caption Generator for Visually Impaired People
PPTX
Image Smoothing using Frequency Domain Filters
PDF
Machine Learning Interpretability / Explainability
PPTX
Difference between Vector Quantization and Scalar Quantization
PPTX
supervised learning
PPTX
Object detection
PPT
Dip lect2-Machine Vision Fundamentals
PPTX
Histogram Equalization
PPTX
Radial basis function network ppt bySheetal,Samreen and Dhanashri
PPTX
Recurrent neural network
PDF
Deep Learning for Speech Recognition - Vikrant Singh Tomar
PPTX
HSI MODEL IN COLOR IMAGE PROCESSING
PPTX
Face Detection
PPT
face recognition system using LBP
PPT
My8clst
Chapter 9 morphological image processing
Deep learning for object detection
Performance Metrics for Machine Learning Algorithms
Template Matching - Pattern Recognition
Automated Neural Image Caption Generator for Visually Impaired People
Image Smoothing using Frequency Domain Filters
Machine Learning Interpretability / Explainability
Difference between Vector Quantization and Scalar Quantization
supervised learning
Object detection
Dip lect2-Machine Vision Fundamentals
Histogram Equalization
Radial basis function network ppt bySheetal,Samreen and Dhanashri
Recurrent neural network
Deep Learning for Speech Recognition - Vikrant Singh Tomar
HSI MODEL IN COLOR IMAGE PROCESSING
Face Detection
face recognition system using LBP
My8clst
Ad

Viewers also liked (20)

PDF
Step-by-step approach to question answering
PDF
๋ฐ”๋‘‘์ธ์„ ์œ„ํ•œ ์•ŒํŒŒ๊ณ 
PDF
์กฐ์Œ Goodness-Of-Pronunciation ์ž์งˆ์„ ์ด์šฉํ•œ ์˜์–ด ํ•™์Šต์ž์˜ ์กฐ์Œ ์˜ค๋ฅ˜ ์ง„๋‹จ
PDF
Multimodal Sequential Learning for Video QA
PDF
Introduction of Deep Reinforcement Learning
PDF
์•ŒํŒŒ๊ณ  ํ•ด๋ถ€ํ•˜๊ธฐ 1๋ถ€
PDF
๋”ฅ๋Ÿฌ๋‹์„ ํ™œ์šฉํ•œ ๋น„๋””์˜ค ์Šคํ† ๋ฆฌ ์งˆ์˜์‘๋‹ต: ๋ฝ€๋กœ๋กœQA์™€ ์‹ฌ์ธต ์ž„๋ฒ ๋”ฉ ๋ฉ”๋ชจ๋ฆฌ๋ง
PDF
Deep Learning, Where Are You Going?
PDF
์•ŒํŒŒ๊ณ  ํ’€์–ด๋ณด๊ธฐ / Alpha Technical Review
PDF
Online video object segmentation via convolutional trident network
PDF
RLCode์™€ A3C ์‰ฝ๊ณ  ๊นŠ๊ฒŒ ์ดํ•ดํ•˜๊ธฐ
PDF
Finding connections among images using CycleGAN
PDF
1์‹œ๊ฐ„๋งŒ์— GAN(Generative Adversarial Network) ์™„์ „ ์ •๋ณตํ•˜๊ธฐ
PDF
[2017 PYCON ํŠœํ† ๋ฆฌ์–ผ]OpenAI Gym์„ ์ด์šฉํ•œ ๊ฐ•ํ™”ํ•™์Šต ์—์ด์ „ํŠธ ๋งŒ๋“ค๊ธฐ
PDF
์•ŒํŒŒ๊ณ  (๋ฐ”๋‘‘ ์ธ๊ณต์ง€๋Šฅ)์˜ ์ž‘๋™ ์›๋ฆฌ
PDF
๋”ฅ๋Ÿฌ๋‹๊ณผ ๊ฐ•ํ™” ํ•™์Šต์œผ๋กœ ๋‚˜๋ณด๋‹ค ์ž˜ํ•˜๋Š” ์ฟ ํ‚ค๋Ÿฐ AI ๊ตฌํ˜„ํ•˜๊ธฐ DEVIEW 2016
PDF
Let Android dream electric sheep: Making emotion model for chat-bot with Pyth...
PDF
์•Œ์•„๋‘๋ฉด ์“ธ๋ฐ์žˆ๋Š” ์‹ ๊ธฐํ•œ ๊ฐ•ํ™”ํ•™์Šต NAVER 2017
PDF
what is_tabs_share
PDF
[142] ์ƒ์ฒด ์ดํ•ด์— ๊ธฐ๋ฐ˜ํ•œ ๋กœ๋ด‡ โ€“ ๊ณ ์„ฑ๋Šฅ ๋กœ๋ด‡์—๊ฒŒ ์ธ๊ฐ„์˜ ์œ ์—ฐํ•จ๊ณผ ์•ˆ์ „์„ฑ ๋ถ€์—ฌํ•˜๊ธฐ
Step-by-step approach to question answering
๋ฐ”๋‘‘์ธ์„ ์œ„ํ•œ ์•ŒํŒŒ๊ณ 
์กฐ์Œ Goodness-Of-Pronunciation ์ž์งˆ์„ ์ด์šฉํ•œ ์˜์–ด ํ•™์Šต์ž์˜ ์กฐ์Œ ์˜ค๋ฅ˜ ์ง„๋‹จ
Multimodal Sequential Learning for Video QA
Introduction of Deep Reinforcement Learning
์•ŒํŒŒ๊ณ  ํ•ด๋ถ€ํ•˜๊ธฐ 1๋ถ€
๋”ฅ๋Ÿฌ๋‹์„ ํ™œ์šฉํ•œ ๋น„๋””์˜ค ์Šคํ† ๋ฆฌ ์งˆ์˜์‘๋‹ต: ๋ฝ€๋กœ๋กœQA์™€ ์‹ฌ์ธต ์ž„๋ฒ ๋”ฉ ๋ฉ”๋ชจ๋ฆฌ๋ง
Deep Learning, Where Are You Going?
์•ŒํŒŒ๊ณ  ํ’€์–ด๋ณด๊ธฐ / Alpha Technical Review
Online video object segmentation via convolutional trident network
RLCode์™€ A3C ์‰ฝ๊ณ  ๊นŠ๊ฒŒ ์ดํ•ดํ•˜๊ธฐ
Finding connections among images using CycleGAN
1์‹œ๊ฐ„๋งŒ์— GAN(Generative Adversarial Network) ์™„์ „ ์ •๋ณตํ•˜๊ธฐ
[2017 PYCON ํŠœํ† ๋ฆฌ์–ผ]OpenAI Gym์„ ์ด์šฉํ•œ ๊ฐ•ํ™”ํ•™์Šต ์—์ด์ „ํŠธ ๋งŒ๋“ค๊ธฐ
์•ŒํŒŒ๊ณ  (๋ฐ”๋‘‘ ์ธ๊ณต์ง€๋Šฅ)์˜ ์ž‘๋™ ์›๋ฆฌ
๋”ฅ๋Ÿฌ๋‹๊ณผ ๊ฐ•ํ™” ํ•™์Šต์œผ๋กœ ๋‚˜๋ณด๋‹ค ์ž˜ํ•˜๋Š” ์ฟ ํ‚ค๋Ÿฐ AI ๊ตฌํ˜„ํ•˜๊ธฐ DEVIEW 2016
Let Android dream electric sheep: Making emotion model for chat-bot with Pyth...
์•Œ์•„๋‘๋ฉด ์“ธ๋ฐ์žˆ๋Š” ์‹ ๊ธฐํ•œ ๊ฐ•ํ™”ํ•™์Šต NAVER 2017
what is_tabs_share
[142] ์ƒ์ฒด ์ดํ•ด์— ๊ธฐ๋ฐ˜ํ•œ ๋กœ๋ด‡ โ€“ ๊ณ ์„ฑ๋Šฅ ๋กœ๋ด‡์—๊ฒŒ ์ธ๊ฐ„์˜ ์œ ์—ฐํ•จ๊ณผ ์•ˆ์ „์„ฑ ๋ถ€์—ฌํ•˜๊ธฐ
Ad

Similar to Video Object Segmentation in Videos (20)

PPT
2D/Multi-view Segmentation and Tracking
PDF
Video Object Segmentation - Laura Leal-Taixรฉ - UPC Barcelona 2018
PDF
3 video segmentation
PPTX
Various object detection and tracking methods
PDF
G04743943
PPTX
Video Annotation for Visual Tracking via Selection and Refinement_tran.pptx
PDF
Automatic identification of animal using visual and motion saliency
PPT
presentation.ppt
PPT
Fast object re-detection and localization in video for spatio-temporal fragme...
PDF
IRJET- Segmenting and Classifying the Moving Object from HEVC Compressed Surv...
PDF
Deep Video Object Segmentation - Xavier Giro - UPC Barcelona 2019
PDF
Recognition and tracking moving objects using moving camera in complex scenes
PDF
Dochelp.net-video-google-a-text-retrieval-approach-to-object-matching-in-videos
PDF
A New Algorithm for Tracking Objects in Videos of Cluttered Scenes
PDF
IRJET-Multiple Object Detection using Deep Neural Networks
PDF
International Journal of Engineering Research and Development (IJERD)
PDF
IRJET- Real Time Video Object Tracking using Motion Estimation
PDF
Fast object re detection and localization in video for spatio-temporal fragme...
PDF
OBJECT DETECTION AND RECOGNITION: A SURVEY
PDF
A Novel Approach for Moving Object Detection from Dynamic Background
2D/Multi-view Segmentation and Tracking
Video Object Segmentation - Laura Leal-Taixรฉ - UPC Barcelona 2018
3 video segmentation
Various object detection and tracking methods
G04743943
Video Annotation for Visual Tracking via Selection and Refinement_tran.pptx
Automatic identification of animal using visual and motion saliency
presentation.ppt
Fast object re-detection and localization in video for spatio-temporal fragme...
IRJET- Segmenting and Classifying the Moving Object from HEVC Compressed Surv...
Deep Video Object Segmentation - Xavier Giro - UPC Barcelona 2019
Recognition and tracking moving objects using moving camera in complex scenes
Dochelp.net-video-google-a-text-retrieval-approach-to-object-matching-in-videos
A New Algorithm for Tracking Objects in Videos of Cluttered Scenes
IRJET-Multiple Object Detection using Deep Neural Networks
International Journal of Engineering Research and Development (IJERD)
IRJET- Real Time Video Object Tracking using Motion Estimation
Fast object re detection and localization in video for spatio-temporal fragme...
OBJECT DETECTION AND RECOGNITION: A SURVEY
A Novel Approach for Moving Object Detection from Dynamic Background

More from NAVER Engineering (20)

PDF
React vac pattern
PDF
๋””์ž์ธ ์‹œ์Šคํ…œ์— ์ง๋ฐฉ ZUIX
PDF
์ง„ํ™”ํ•˜๋Š” ๋””์ž์ธ ์‹œ์Šคํ…œ(๊ฑธ์Œ๋งˆ ํŽธ)
PDF
์„œ๋น„์Šค ์šด์˜์„ ์œ„ํ•œ ๋””์ž์ธ์‹œ์Šคํ…œ ํ”„๋กœ์ ํŠธ
PDF
BPL(Banksalad Product Language) ๋ฌด์•ผํ˜ธ
PDF
์ด๋ฒˆ ์ƒ์— ๋””์ž์ธ ์‹œ์Šคํ…œ์€ ์ฒ˜์Œ์ด๋ผ
PDF
๋‚ ๊ณ  ์žˆ๋Š” ์—ฌ๋Ÿฌ ๋น„ํ–‰๊ธฐ ๋„˜๋‚˜ ๋“ค๋ฉฐ ์ •๋น„ํ•˜๊ธฐ
PDF
์˜์นดํ”„๋ ˆ์ž„ ๊ตฌ์ถ• ๋ฐฐ๊ฒฝ๊ณผ ๊ณผ์ •
PDF
ํ”Œ๋žซํผ ๋””์ž์ด๋„ˆ ์—†์ด ๋””์ž์ธ ์‹œ์Šคํ…œ์„ ๊ตฌ์ถ•ํ•˜๋Š” ํ”„๋กœ๋•ํŠธ ๋””์ž์ด๋„ˆ์˜ ์šฐ๋‹นํƒ•ํƒ• ๊ณ ํ†ต ์—ฐ๋Œ€๊ธฐ
PDF
200820 NAVER TECH CONCERT 15_Code Review is Horse(์ฝ”๋“œ๋ฆฌ๋ทฐ๋Š” ๋ง์ด์•ผ)(feat.Latte)
PDF
200819 NAVER TECH CONCERT 03_ํ™”๋ คํ•œ ์ฝ”๋ฃจํ‹ด์ด ๋‚ด ์•ฑ์„ ๊ฐ์‹ธ๋„ค! ์ฝ”๋ฃจํ‹ด์œผ๋กœ ์ž‘์„ฑํ•ด๋ณด๋Š” ๊น”๋”ํ•œ ๋น„๋™๊ธฐ ์ฝ”๋“œ
PDF
200819 NAVER TECH CONCERT 10_๋งฅ๋ถ์—์„œ๋„ ์•„์ด๋งฅํ”„๋กœ์—์„œ ๋นŒ๋“œํ•˜๋Š” ๊ฒƒ์ฒ˜๋Ÿผ ๋นŒ๋“œ ์†๋„ ๋น ๋ฅด๊ฒŒ ํ•˜๊ธฐ
PDF
200819 NAVER TECH CONCERT 08_์„ฑ๋Šฅ์„ ๊ณ ๋ฏผํ•˜๋Š” ์Šฌ๊ธฐ๋กœ์šด ๊ฐœ๋ฐœ์ž ์ƒํ™œ
PDF
200819 NAVER TECH CONCERT 05_๋ชจ๋ฅด๋ฉด ์†ํ•ด๋ณด๋Š” Android ๋””๋ฒ„๊น…/๋ถ„์„ ๊ฟ€ํŒ ๋Œ€๋ฐฉ์ถœ
PDF
200819 NAVER TECH CONCERT 09_Case.xcodeproj - ์ข‹์€ ๋™๋ฃŒ๋กœ ๊ฑฐ๋“ญ๋‚˜๊ธฐ ์œ„ํ•œ ๋…ธํ•˜์šฐ
PDF
200820 NAVER TECH CONCERT 14_์•ผ ๋„ˆ๋‘ ํ•  ์ˆ˜ ์žˆ์–ด. ๋น„์ „๊ณต์ž, COBOL ๊ฐœ๋ฐœ์ž๋ฅผ ๊ฑฐ์ณ ๋„ค์ด๋ฒ„์—์„œ FE ๊ฐœ๋ฐœํ•˜๊ฒŒ ๋œ...
PDF
200820 NAVER TECH CONCERT 13_๋„ค์ด๋ฒ„์—์„œ ์˜คํ”ˆ ์†Œ์Šค ๊ฐœ๋ฐœ์„ ํ†ตํ•ด ์„ฑ์žฅํ•˜๋Š” ๋ฐฉ๋ฒ•
PDF
200820 NAVER TECH CONCERT 12_์ƒ๋ฐ˜๊ธฐ ๋„ค์ด๋ฒ„ ์ธํ„ด์„ ๋Œ์•„๋ณด๋ฉฐ
PDF
200820 NAVER TECH CONCERT 11_๋น ๋ฅด๊ฒŒ ์„ฑ์žฅํ•˜๋Š” ์Šˆํผ๋ฃจํ‚ค๋กœ ๊ฑฐ๋“ญ๋‚˜๊ธฐ
PDF
200819 NAVER TECH CONCERT 07_์‹ ์ž… iOS ๊ฐœ๋ฐœ์ž ๊ฐœ๋ฐœ์—…๋ฌด ์ ์‘๊ธฐ
React vac pattern
๋””์ž์ธ ์‹œ์Šคํ…œ์— ์ง๋ฐฉ ZUIX
์ง„ํ™”ํ•˜๋Š” ๋””์ž์ธ ์‹œ์Šคํ…œ(๊ฑธ์Œ๋งˆ ํŽธ)
์„œ๋น„์Šค ์šด์˜์„ ์œ„ํ•œ ๋””์ž์ธ์‹œ์Šคํ…œ ํ”„๋กœ์ ํŠธ
BPL(Banksalad Product Language) ๋ฌด์•ผํ˜ธ
์ด๋ฒˆ ์ƒ์— ๋””์ž์ธ ์‹œ์Šคํ…œ์€ ์ฒ˜์Œ์ด๋ผ
๋‚ ๊ณ  ์žˆ๋Š” ์—ฌ๋Ÿฌ ๋น„ํ–‰๊ธฐ ๋„˜๋‚˜ ๋“ค๋ฉฐ ์ •๋น„ํ•˜๊ธฐ
์˜์นดํ”„๋ ˆ์ž„ ๊ตฌ์ถ• ๋ฐฐ๊ฒฝ๊ณผ ๊ณผ์ •
ํ”Œ๋žซํผ ๋””์ž์ด๋„ˆ ์—†์ด ๋””์ž์ธ ์‹œ์Šคํ…œ์„ ๊ตฌ์ถ•ํ•˜๋Š” ํ”„๋กœ๋•ํŠธ ๋””์ž์ด๋„ˆ์˜ ์šฐ๋‹นํƒ•ํƒ• ๊ณ ํ†ต ์—ฐ๋Œ€๊ธฐ
200820 NAVER TECH CONCERT 15_Code Review is Horse(์ฝ”๋“œ๋ฆฌ๋ทฐ๋Š” ๋ง์ด์•ผ)(feat.Latte)
200819 NAVER TECH CONCERT 03_ํ™”๋ คํ•œ ์ฝ”๋ฃจํ‹ด์ด ๋‚ด ์•ฑ์„ ๊ฐ์‹ธ๋„ค! ์ฝ”๋ฃจํ‹ด์œผ๋กœ ์ž‘์„ฑํ•ด๋ณด๋Š” ๊น”๋”ํ•œ ๋น„๋™๊ธฐ ์ฝ”๋“œ
200819 NAVER TECH CONCERT 10_๋งฅ๋ถ์—์„œ๋„ ์•„์ด๋งฅํ”„๋กœ์—์„œ ๋นŒ๋“œํ•˜๋Š” ๊ฒƒ์ฒ˜๋Ÿผ ๋นŒ๋“œ ์†๋„ ๋น ๋ฅด๊ฒŒ ํ•˜๊ธฐ
200819 NAVER TECH CONCERT 08_์„ฑ๋Šฅ์„ ๊ณ ๋ฏผํ•˜๋Š” ์Šฌ๊ธฐ๋กœ์šด ๊ฐœ๋ฐœ์ž ์ƒํ™œ
200819 NAVER TECH CONCERT 05_๋ชจ๋ฅด๋ฉด ์†ํ•ด๋ณด๋Š” Android ๋””๋ฒ„๊น…/๋ถ„์„ ๊ฟ€ํŒ ๋Œ€๋ฐฉ์ถœ
200819 NAVER TECH CONCERT 09_Case.xcodeproj - ์ข‹์€ ๋™๋ฃŒ๋กœ ๊ฑฐ๋“ญ๋‚˜๊ธฐ ์œ„ํ•œ ๋…ธํ•˜์šฐ
200820 NAVER TECH CONCERT 14_์•ผ ๋„ˆ๋‘ ํ•  ์ˆ˜ ์žˆ์–ด. ๋น„์ „๊ณต์ž, COBOL ๊ฐœ๋ฐœ์ž๋ฅผ ๊ฑฐ์ณ ๋„ค์ด๋ฒ„์—์„œ FE ๊ฐœ๋ฐœํ•˜๊ฒŒ ๋œ...
200820 NAVER TECH CONCERT 13_๋„ค์ด๋ฒ„์—์„œ ์˜คํ”ˆ ์†Œ์Šค ๊ฐœ๋ฐœ์„ ํ†ตํ•ด ์„ฑ์žฅํ•˜๋Š” ๋ฐฉ๋ฒ•
200820 NAVER TECH CONCERT 12_์ƒ๋ฐ˜๊ธฐ ๋„ค์ด๋ฒ„ ์ธํ„ด์„ ๋Œ์•„๋ณด๋ฉฐ
200820 NAVER TECH CONCERT 11_๋น ๋ฅด๊ฒŒ ์„ฑ์žฅํ•˜๋Š” ์Šˆํผ๋ฃจํ‚ค๋กœ ๊ฑฐ๋“ญ๋‚˜๊ธฐ
200819 NAVER TECH CONCERT 07_์‹ ์ž… iOS ๊ฐœ๋ฐœ์ž ๊ฐœ๋ฐœ์—…๋ฌด ์ ์‘๊ธฐ

Recently uploaded (20)

PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Empathic Computing: Creating Shared Understanding
PDF
Encapsulation theory and applications.pdf
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PPTX
Big Data Technologies - Introduction.pptx
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PPTX
Cloud computing and distributed systems.
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
ย 
PPTX
sap open course for s4hana steps from ECC to s4
DOCX
The AUB Centre for AI in Media Proposal.docx
ย 
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Empathic Computing: Creating Shared Understanding
Encapsulation theory and applications.pdf
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
Big Data Technologies - Introduction.pptx
MIND Revenue Release Quarter 2 2025 Press Release
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Reach Out and Touch Someone: Haptics and Empathic Computing
Cloud computing and distributed systems.
Advanced methodologies resolving dimensionality complications for autism neur...
Mobile App Security Testing_ A Comprehensive Guide.pdf
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Spectral efficient network and resource selection model in 5G networks
Dropbox Q2 2025 Financial Results & Investor Presentation
ย 
sap open course for s4hana steps from ECC to s4
The AUB Centre for AI in Media Proposal.docx
ย 

Video Object Segmentation in Videos

  • 3. โ€ข Divide data into meaningful segments Segmentation Superpixel Image segmentation Video segmentation Video object segmentation
  • 4. Video Object Segmentation โ€ข Semi-supervised video object segmentation โ€ข Primary object segmentation โ€ข Multiple object segmentation
  • 5. Semi-supervised Video Object Segmentation โ€ข Track and segment a target object โ€ข Annotated by a user in the first frame First frame & user annotation Segment track
  • 6. Primary Object Segmentation โ€ข Segment a primary object in a video automatically Primary object: Diver Primary object: Tennis player
  • 7. Multiple Object Segmentation โ€ข Extract multiple segment tracks as many as possible
  • 9. Primary Object Segmentation โ€ข Primary object segmentation โ€ข Initial region estimation โ€ข Motion boundaries โ€ข Object proposal โ€ข Saliency maps โ€ข Refinement โ€ข Construct models for the primary object and the background, e.g. Gaussian mixture models (GMMs) โ€ข Propose augmentation and reduction process (ARP)
  • 10. Primary Object Segmentation in Videos Based on Region Augmentation and Reduction โ€ข Overview โ€ข Input: A set of consecutive video frames โ€ข Output: A set of pixel-wise segments to delineate the primary object
  • 11. Candidate Region Generation โ€ข Candidate regions โ€ข Ultrametric contour map (UCM) โ€ข Obtain color-based and motion-based UCMs โ€ข Each region in UCM becomes a superpixel
  • 12. Candidate Region Generation โ€ข Candidate regions โ€ข Generate candidate regions by merging neighboring superpixels โ€ข Determine the pair, ๐‘  ๐‘š and ๐‘  ๐‘›, sharing the weakest boundary โ€ข Merge ๐‘  ๐‘š and ๐‘  ๐‘› in a single superpixel โ€ข Repeat this process only one superpixel remains
  • 13. Candidate Region Generation โ€ข Foreground confidence โ€ข Measure the foreground confidence of each candidate region โ€ข Appearance confidence ๐œ™๐‘– (๐‘ก) โ€ข Obtain a saliency map using technique in [1] โ€ข Average the saliency values within the candidate region โ€ข Edge confidence ๐œ“๐‘– (๐‘ก) โ€ข Combine color-based edge map and motion-based edge map ๐‘๐‘– (๐‘ก) = ๐œ™๐‘– (๐‘ก) + ๐œ“๐‘– (๐‘ก) [1] W.-D. Jang, C. Lee, and C.-S. Kim, โ€œPrimary object segmentation in videos via alternate convex optimization of foreground and background distributions,โ€ CVPR, 2016
  • 14. Candidate Region Generation โ€ข Foreground confidence โ€ข Select the top 20 candidate regions โ€ข Warp the selected candidate regions to neighboring frames โ€ข Rearrange the set of candidate regions ๐’ฌ(๐‘ก) = ๐‘ž1 ๐‘ก , ๐‘ž2 ๐‘ก , โ€ฆ , ๐‘ž ๐‘ (๐‘ก) โ€ข Feature description โ€ข Describe the feature ๐Ÿ๐‘– (๐‘ก) of each candidate region ๐‘ž๐‘– (๐‘ก) using the bag-of-visual-words approach
  • 15. Initial Region Estimation โ€ข Selecting initial primary object regions โ€ข Choose the main region ๐‘ž ๐›ฟ (๐‘ก) among candidate regions โ€ข Exploit the recurrence property that a primary object appears repeatedly in a video sequence Input frames Candidate region generation Initial region estimation
  • 16. Initial Region Estimation โ€ข Selecting initial primary object regions โ€ข Assume that feature of main region ๐‘ž ๐›ฟ (๐‘ก) should be similar to features of the main regions in the other frames โ€ข ๐ฉ ๐œ denotes the feature of the main region in frame ๐ผ(๐œ) ๐›ฟ = arg min เท ๐œ=1,๐œโ‰ ๐‘ก ๐‘‘ ๐œ’ ๐Ÿ๐‘– (๐‘ก) , ๐ฉ ๐œ Input frames Candidate region generation Initial region estimation
  • 17. Initial Region Estimation โ€ข Selecting initial primary object regions โ€ข Initialization of ๐ฉ ๐œ โ€ข Superpose features of all candidate region in ๐’ฌ(๐œ) โ€ข Combine features of candidate regions, ๐…(๐œ) = ๐Ÿ1 ๐œ , โ€ฆ , ๐Ÿ ๐‘ ๐œ , using the foreground confidence vector ๐œ(๐œ) = ๐‘1 ๐œ , โ€ฆ , ๐‘ ๐‘ ๐œ ๐‘‡ โ€ข Obtain the main region ๐‘ž ๐›ฟ (๐‘ก) by applying ๐ฉ ๐œ for each frame โ€ข Alternative update of the main regions โ€ข Update ๐ฉ ๐‘ก for each frame by ๐ฉ ๐‘ก โ† ๐Ÿ๐›ฟ ๐œ โ€ข Choose the main region using the updated features ๐ฉ ๐œ = ๐…(๐œ) ๐œ(๐œ) ๐›ฟ = arg min เท ๐œ=1,๐œโ‰ ๐‘ก ๐‘‘ ๐œ’ ๐Ÿ๐‘– (๐‘ก) , ๐ฉ ๐œ
  • 18. Primary Object Region Refinement โ€ข Refinement of primary object regions โ€ข Initial regions may exclude parts of primary objects or include noisy regions (background or other objects) โ€ข Attempt to refine initial regions โ€ข Augment initial regions with missing region โ€ข Reducing initial regions by removing noisy regions
  • 19. Primary Object Region Refinement โ€ข Augmented regions โ€ข Augment initial regions ๐‘ž ๐›ฟ ๐‘ก with candidate region ๐‘ž๐‘– ๐‘ก in ๐’ฌ(๐‘ก) โ€ข Reduced regions โ€ข Reduce initial regions ๐‘ž ๐›ฟ ๐‘ก using candidate region ๐‘ž ๐‘— ๐‘ก in ๐’ฌ(๐‘ก) ๐‘ž ๐›ฟ ๐‘ก ๐‘ž๐‘– ๐‘ก ๐‘ž๐‘– ๐‘ก ๐‘ž ๐›ฟ ๐‘ก ๐‘Ÿ๐‘– ๐‘ก = ๐‘ž ๐›ฟ ๐‘ก โˆช ๐‘ž๐‘– ๐‘ก ๐‘ž ๐›ฟ ๐‘ก ๐‘ž ๐‘— ๐‘ก ๐‘ž ๐›ฟ ๐‘ก ๐‘ž ๐‘— ๐‘ก ๐‘Ÿ๐‘— ๐‘ก = ๐‘ž ๐›ฟ ๐‘ก โˆฉ ๐‘ž ๐‘— ๐‘ก
  • 20. Primary Object Region Refinement โ€ข Augmentation and reduction process (ARP) โ€ข Determine whether to augment or reduce ๐‘ž ๐›ฟ ๐‘ก by cost function โ€ข Data cost โ€ข Constrain that the refined region ๐‘Ÿ๐‘– (๐‘ก) should be similar to initial regions in all frames โ€ข Segmentation cost โ€ข Make the refined region ๐‘Ÿ๐‘– (๐‘ก) as dissimilar from its nearby background as possible ๐ถ ๐‘Ÿ๐‘– (๐‘ก) = ๐ถdata ๐‘Ÿ๐‘– (๐‘ก) + ๐›พ โ‹… ๐ถseg ๐‘Ÿ๐‘– (๐‘ก) ๐ถdata ๐‘Ÿ๐‘– (๐‘ก) = 1 ๐‘‡ เท ๐œ=1 ๐‘‘ ๐œ’ ๐Ÿr,๐‘– (๐‘ก) , ๐Ÿ๐›ฟ (๐‘ก) ๐ถseg ๐‘Ÿ๐‘– (๐‘ก) = โˆ’๐‘‘ ๐œ’ ๐Ÿr,๐‘– (๐‘ก) , ๐Ÿb,๐‘– (๐‘ก)
  • 21. Primary Object Region Refinement โ€ข Augmentation and reduction process (ARP) โ€ข Minimize the cost function for the optimal refined region โ€ข Perform ARP iteratively โ€ข Construct the set of augmented and reduced regions again by employing ๐‘Ÿโˆ— ๐‘ก as the initial region โ€ข Find the optimal ๐‘Ÿโˆ— ๐‘ก by minimizing ๐ถ ๐‘Ÿ๐‘– (๐‘ก) โ€ข Repeat until ๐‘Ÿโˆ— ๐‘ก is unchanged ๐‘Ÿโˆ— ๐‘ก = arg min ๐ถ ๐‘Ÿ๐‘– (๐‘ก)
  • 22. Primary Object Region Refinement โ€ข Augmentation and reduction process (ARP)
  • 23. โ€ข DAVIS dataset [2] โ€ข 50 video sequences (3,455 annotated frames) โ€ข Performance measure โ€ข Region similarity ๐’ฅ: Intersection over union โ€ข Contour accuracy โ„ฑ: F-measure that is the harmonic mean of the contour precision and recall rates Experimental results [2] F. Perazzi, J. Pont-Tuset, B. McWilliams, L. Van Gool, M. Gross, and A. Sorkine-Hornung, โ€œA benchmark dataset and evaluation methodology for video object segmentation,โ€ CVPR 2016
  • 24. Experimental results โ€ข Impacts of ARP โ€ข Compare ARP with the conventional refinement techniques [20, 36] โ€ข Apply refinement techniques to our initial regions (IR) [20] A. Papazoglou and V. Ferrari, โ€œFast object segmentation in unconstrained video,โ€ ICCV,2013. [36] D. Zhang, O. Javed, and M. Shah, โ€œVideo object segmentation through spatially accurate and temporally dense extraction of primary object regions,โ€ CVPR, 2013.
  • 25. Experimental results โ€ข Quantitative comparison โ€ข Semi-supervised: Human annotation at the first frame โ€ข Multiple VOS: Output multiple objects โ€ข POS: Output primary object objects
  • 28. Multiple Object Segmentation โ€ข Multiple object segmentation โ€ข Motion segmentation โ€ข Cluster point trajectories in a video โ€ข Video object proposal โ€ข Proposal matching โ€ข Proposal clustering โ€ข Segmentation guided by object detection and tracking
  • 29. CDTS: Collaborative Detection, Tracking, and Segmentation for Online Multiple Object Segmentation in videos โ€ข Overview โ€ข Input: A set of consecutive video frames โ€ข Output: Multiple segment tracks Input frames Detection and tracking results Joint detection and tracking ASE segmentationObject track generation
  • 30. Object Track Generation โ€ข Joint detection and tracking โ€ข Detector [3] โ€ข Find object location without manual annotations โ€ข Some objects may remain undetected โ€ข Tracker [4] โ€ข Boost the recall rate of objects using temporal correlations โ€ข Three cases โ€ข Both detection and tracking boxes โ€ข Only detection box โ€ข Only tracking box [3] Y. Li, K. He, J. Sun, et al. โ€œR-FCN: Object detection via region-based fully convolutional networks,โ€ NIPS, 2016 [4] H.-U. Kim, D.-Y. Lee, J.-Y. Sim, and C.-S. Kim, โ€œSOWP: Spatially ordered and weighted patch descriptor for visual tracking,โ€ ICCV, 2015.
  • 31. Object Track Generation โ€ข Joint detection and tracking โ€ข Both detection and tracking boxes โ€ข Match detection and tracking boxes โ€ข The Hungarian algorithm โ€ข Choose the more accurate box for each matching pair โ€ข Link the selected box to the corresponding object track โ€ข Unmatched detection box โ€ข Regard as newly appearing object โ€ข Unmatched tracking box โ€ข Link to the corresponding object track
  • 32. ASE Segmentation โ€ข Alternate shrinking and expansion (ASE) โ€ข Over-segment frame in to superpixels โ€ข Dichotomize each superpixel within and near the box into either foreground or background class
  • 33. ASE Segmentation โ€ข Over-segmentation โ€ข Obtain superpixels using UCM โ€ข Preliminary classification โ€ข Exploit overlap ratio between the box and each superpixel โ€ข Refine preliminary foreground regions
  • 34. ASE Segmentation โ€ข Intra-frame refinement โ€ข Constrain foreground regions to have intense edge strengths โ€ข Boundary cost โ€ข Shrink foreground regions by remove superpixels to minimize the boundary cost in a greedy manner ๐ถbnd ๐น๐‘– (๐‘ก) = โˆ’ เท ๐ฑโˆˆ๐œ•๐น๐‘– (๐‘ก) ๐‘ˆ ๐‘ก ๐ฑ
  • 35. ASE Segmentation โ€ข Inter-frame refinement โ€ข Constrain that the refined region should be similar to the segmentation results in previous frames โ€ข Cost function โ€ข Expand foreground regions by augmenting superpixels โ€ข Perform shrinking in a similar way ๐ถinter ๐น๐‘– (๐‘ก) , โ„ฌ๐‘– (๐‘ก) = ๐›ผ โ‹… ๐ถtmp ๐น๐‘– ๐‘ก + ๐ถseg ๐น๐‘– (๐‘ก) , โ„ฌ๐‘– (๐‘ก) +๐ถbnd ๐น๐‘– (๐‘ก)
  • 37. Experimental Results โ€ข YouTube-Objects dataset โ€ข Contain 126 videos for 10 object classes โ€ข Performance measure โ€ข Intersection over union (IoU) [34] Y.-H. Tsai, G. Zhong, and M.-H. Yang, โ€œSemantic cosegmentation in videos.,โ€ ECCV,2016. [42] Y. Zhang, X. Chen, J. Li, C. Wang, and C. Xia, โ€œSemantic object segmentation via detection in weakly labeled video,โ€ CVPR 2015.