SlideShare a Scribd company logo
Computer Vision
From traditional approaches to deep neural
networks
Stanislav Frolov München, 27.02.2018
● Computer vision
● Human vision
● Traditional approaches and methods
● Artificial neural networks
● Summary
2
Outline of this talk
What we are going to talk about
● trained deep neural networks for object
detection during master thesis
● still fascinated and interested
3
Stanislav Frolov
Big Data Engineer @inovex
● Teach computers how to see
● Automatic extraction, analysis and understanding of
images
● Infer useful information, interpret and make decisions
● Automate tasks that human visual system can do
● One of the most exciting fields in AI and ML
4
What is computer vision
General
5
What is computer vision
Motivation
● Era of pixels
● Internet consists
mostly of images
● Explosion of visual
data
● Cannot be labeled
by humans
6
What is computer vision
Drivers
● Two drivers for computer vision explosion
○ Compute (faster and cheaper)
○ Data (more data > algorithms)
7
What is computer vision
Interdisciplinary field
Computer Science
Mathematics
Engineering
Physics
Biology
Psychology
Information
Retrieval
Machine
LearningGraphs,
Algorithms
Systems
Architecture
Robotics
Speech,
NLP
Image
Processing
Optics
Solid-State
Physics
Neuroscience
Cognitive
SciencesBiological vision
Synonyms?
8
● Imaging for statistical pattern recognition
● Image transformations such as pixel-by-pixel operations
○ Contrast enhancement
○ Edge extraction
○ Noise reduction
○ Geometrical and spatial operations (i.e rotations)
9
What is computer vision
Related fields - image processing
● Creates new images from scene descriptions
● Produces image data from 3D models
● “Inverse” of computer vision
● AR as a combination of both
10
What is computer vision
Related fields - computer graphics
● Mainly manufacturing applications
● Image-based automatic inspection, process control,
robot guidance
● Usually employs strong assumptions (colour, shape,
light, structure, orientation, ...) -> works very well
● Output often pass/fail or good/bad
● Additionally numerical/measurement data, counts
11
What is computer vision
Related fields - machine vision
● Create “intelligent” systems
● Studying computational aspects of intelligence
● Make computers do things at which, at the moment,
people are better
● Many techniques play an important role (ML, ANNs)
● Currently does a few things better/faster at scale than
humans can
● Ability to do anything “human” is not answered
12
What is computer vision
Related fields - AI
● Related fields have a large intersection
● Basic techniques used, developed and studied are very
similar
13
What is computer vision
Related fields- summary
Short trip to human vision
14
● Two stage process
○ Eyes take in light reflected off the objects and retina
converts 3D objects into 2D images
○ Brain’s visual system interprets 2D images and “rebuilds”
a 3D model
15
What is human vision
General
● Pair of 2D images with slightly different view allows to
infer depth
● Position of nearby objects will vary more across the two
images than the position of more distant objects
16
What is human vision
Stereoscopic vision
● Prior knowledge of relative sizes and depths is often key
for understanding and interpretation
17
What is human vision
Prior knowledge
● Texture and texture change helps solving depth
perception
18
What is human vision
Texture pattern
19
What is human vision
Biases and illusions in human perception
● Shadows make all the difference in interpretation
● Gradual changes in light ignored to not be misled by
shadow
20
What is human vision
A few more illusions
● Two arrows with different orientations have the same
length
● Assumptions and familiarity (distorted room)
● Face recognition bias
● Up-down orientation bias
21
What is human vision
Biases and illusions in human perception
22
What is human vision
Summary
● Illusions are fun, but the complete puzzle to understand
human vision is far from being complete
Back to computer vision
23
● Recognition
● Localization
● Detection
● Segmentation
24
What is computer vision
Typical tasks
● Part-based detection
○ Deformable parts model
○ Pose estimation and poselets
25
What is computer vision
Typical tasks
● Image captioning
(actions, attributes)
26
What is computer vision
Typical tasks
● Motion analysis
○ Egomotion (camera)
○ Optical flow (pixels)
27
What is computer vision
Typical tasks
● Scene understanding and reconstruction
28
What is computer vision
Typical tasks
● Image restoration
● Colouring black & white photos
29
What is computer vision
Typical tasks
Solving this is useful for many applications
30
31
What is computer vision
Typical applications
● Assistance systems for cars and people
● Surveillance
● Navigation (obstacle avoidance, road following, path
planning)
● Photo interpretation
● Military (“smart” weapons)
● Manufacturing (inspection, identification)
● Robotics
● Autonomous vehicles (dangerous zones)
32
What is computer vision
Typical applications
● Recognition and tracking
● Event detection
● Interaction (man-machine interfaces)
● Modeling (medical, manufacturing, training, education)
● Organizing (database index, sorting/clustering)
● Fingerprint and biometrics
● …
Why so difficult?
33
34
What is computer vision
Why it is difficult
● Occlusion
● Deformation
● Scale
● Clutter
● Illumination
● Viewpoint
● Object pose
● Tons of classes and
variants
● Often n:1 mapping
● Computationally
expensive
● Full understanding of
biological vision is
missing
System overview
35
● Input: image(s) + labels
● Output: Semantic data, labels
● Digital image pixels usually have three channels [R,G,B]
each [0...255] + Location[x,y]
● Digital images are just vectors
36
What is computer vision
System overview
1. Image acquisition (camera, sensors)
2. Pre-processing (sampling, noise reduction,
augmentation)
3. Feature extraction (lines, edges, regions, points)
4. Detection and segmentation
5. Post-processing (verification, estimation, recognition)
6. Decision making
● -> Ability of a machine to step back and interpret the big
picture of those pixels
37
What is computer vision
System overview
Some history
38
1950s
● 2D imaging for statistical pattern recognition
● Theory of optical flow based on a fixed point
towards which one moves
39
What is computer vision
History
Image processing
● Histograms
● Filtering
● Stitching
● Thresholding
● ...
40
What is computer vision
Traditional approaches
1960s
● Desire to extract 3D structure from 2D images for
scene understanding
● Began at pioneering AI universities to mimic human
visual system as stepping stone for intelligent robots
● Summer vision project at MIT: attach camera to
computer and having it “describe what it saw”
41
What is computer vision
History
● Given to 10 undergraduate students
● … an attempt to use our summer workers effectively …
● … construction of a significant part of a visual system …
● … task can be segmented into sub-problems …
● … participate in the construction of a system complex
enough to be a real landmark in the development of
“pattern recognition” …
42
What is computer vision
History: summer vision project @MIT 1966
● Goal: analyse scenes and identify objects
● Structure of system:
○ Region proposal
○ Property lists for regions
○ Boundary construction
○ Match with properties
○ Segment
● Basic foreground/background segmentation with simple
objects (cubes, cylinders, ….)
43
What is computer vision
History: summer vision project @MIT 1966
● Unlike general intelligence, computer vision seemed
tractable
● Amusing anecdote, but it did never aimed to “solve”
computer vision
● Computer vision today differs from what it was thought
to be in 1966
44
What is computer vision
History: summer vision project @MIT 1966
1970s
● Formed many algorithms that exist today
● Edges, lines and objects as interconnected
structures
45
What is computer vision
History
46
What is computer vision
Traditional approaches
Edge detection based on
● Brightness
● Gradients
● Geometry
● Illumination
47
What is computer vision
Traditional approaches - part based detector
● Objects composed of features of parts and their spatial
relationship
● Challenge: how to define and combine
1980s
● More rigorous mathematical analysis and
quantitative aspects
● Optical character recognition
● Sliding window approaches
● Usage of artificial neural networks
48
What is computer vision
History
49
What is computer vision
Traditional approaches - HOG detection (histogram of
oriented gradients)
● Concept in 80s but used only in 2005
● Create HOG descriptors (object generalizations)
● One feature vector per object
● Train with SVM
● Sliding window @multiple scales
50
What is computer vision
Traditional approaches - HOG detection (histogram of
oriented gradients)
● Computation of HOG descriptors:
1. Compute gradients
2. Compute histograms on cells
3. Normalize histograms
4. Concatenate histograms
● Requires a lot of engineering
● Must build ensembles of feature descriptors
1990s
● Significant interaction with computer graphics
(rendering, morphing, stitching)
● Approaches using statistical learning
● Eigenface (Ghostfaces) through principal component
analysis (PCA)
51
What is computer vision
History
52
What is computer vision
Traditional approaches - deformable parts model (DPM)
● Objects constructed by its parts
● First match whole object, then refine on the parts
● HOG + part-based + modern features
● Slow but good at difficult objects
● Involves many heuristics
53
What is computer vision
Features
● Feature points
○ Small area of pixels with certain properties
● Feature detection
○ Use features for identification
○ Activate if “object” present
● Examples:
○ Lines, edges, colours, blobs, …
○ Animals, faces, cars, ...
54
What is computer vision
Traditional approaches - classical recognition
● Init: extract features for objects in different scales,
colours, orientations, rotations, occlusion levels
● Inference: extract features from query image and find
closest match in database or train a classifier
● Computationally expensive (hundreds of features in
image, millions in database) and complex due to errors
and mismatches
55
What is computer vision
History
Before the new era
● Bags of features
● Handcrafted ensembles
Input Feat. 2
Feat. 1
Feat. n
Final
Decision
Feature Extraction
The new era of computer vision
56
● Elementary building
block
● Inspired by biological
neurons
● Mathematical function
y=f(wx+b)
● Learnable weights
57
Artificial neural networks
Fundamentals - artificial neuron
● Collection of neurons
organized in layers
● Universal
approximators
● Fully-connected
network here
58
Artificial neural networks
Fundamentals - artificial neural networks
59
Artificial neural networks
Fundamentals - training
● Basically an optimization
problem
● Find minimum of a loss
function by an iterative
process (training)
● Designing the loss function
is sometimes tricky
60
Artificial neural networks
Fundamentals - training
Simple optimizer algorithm:
1. Forward pass with a batch of data
2. Calculate error between actual and wanted output
3. Nudge weights in proportion to error into the right
direction (same data would result in smaller error)
4. Repeat until convergence
61
Artificial neural networks
Fundamentals - CNN
● Local neighborhood
contributes to activation
● Exploit spatial
information
● Hierarchical feature
extractors
● Less parameters input
activation
filters
receptive field
62
Artificial neural networks
Fundamentals - CNN
● Filter of size 3x3 applied to an input of 7x7
63
Artificial neural networks
Fundamentals - pooling
● Max-pooling
● Dimension reduction/adaption
● Existence is more important than location
64
Artificial neural networks
Fundamentals - pooling
● Zero-padding
● Controlling dimensions
65
Artificial neural networks
Fundamentals - general network architecture
Input
image
convolutional layers
... Final
decision
66
Artificial neural networks
Fundamentals - hierarchical feature extractors
Lines, edges, blobs,
colours, ...
Abstract objectsParts of abstract objects
First layers Deeper layers
Activations
for:
Modern history of object recognition
67
● Classification and detection
○ 27k images
○ 20 classes
■ person, bird, cat, cow, dog, horse, sheep, aeroplane,
bicycle, boat, bus, car, motorbike, train, bottle,
chair, dining table, potted plant, sofa, tv/ monitor
68
Benchmark
Datasets - PASCAL VOC
● Challenges on a subset of ImageNet
○ 14kk labeled images
○ 20k object categories
● ILSVRC* usually on 10k categories including 90 out of
120 dog breeds
69
Benchmark
Datasets - ImageNet
*ImageNet Large Scale Visual Recognition
Challenge
● ILSVRC 2012 winner by a large margin from 25% to 16%
● Proved effectiveness of CNNs and kicked of a new era
● 8 layers, 650k neurons, 60kk parameters
70
Artificial neural networks
Roadmap - AlexNet
● ILSVRC 2013 winner with a best top-5 error of 11.6%
● AlexNet but using smaller 7x7 kernels to keep more
information in deeper layers
71
Artificial neural networks
Roadmap - ZFNet
● ILSVRC 2013 localization winner
● Uses AlexNet on multi-scale input images with sliding
window approach
● Accumulates bounding boxes for final detection (instead
of non-max suppression)
72
Artificial neural networks
Roadmap - OverFeat
● 2k proposals generated by selective search
● SVM trained for classification
● Multi-stage pipeline
73
Artificial neural networks
Roadmap - RCNN (region based CNN)
● Not a winner but famous due to simplicity and
effectiveness
● Replace large-kernel convolutions by stacking several
small-kernel convolutions
74
Artificial neural networks
Roadmap - VGGNet
● ILSVRC 2014 winner
● Stacks up “inception” modules
● 22 layers, 5kk parameters
75
Artificial neural networks
Roadmap - InceptionNet (GoogleNet)
● Jointly learns region proposal and detection
● Employs a region of interest (RoI) that allows to reuse
the computations
76
Artificial neural networks
Roadmap - Fast RCNN
● Directly predicts all objects and classes in one shot
● Very fast
● Processes images at ~40 FPS on a Titan X GPU
● First real-time state-of-the-art detector
● Divides input images into multiple grid cells which are
then classified
77
Artificial neural networks
Roadmap - YOLO (you only look once)
● ILSVRC 2015 winner with a 3.6% error rate (human
performance is 5-10%)
● Employs residual blocks which allows to build deep
networks (hundreds of layers)
● Additional identity mapping
78
Artificial neural networks
Roadmap - ResNet (Microsoft)
● Not a recognition network
● A region proposal network
● Popularized prior/anchor boxes (found through
clustering) to predict offsets
● Much better strategy than starting the predictions with
random coordinates
● Since then heuristic approaches have been gradually
fading out and replaced
79
Artificial neural networks
Roadmap - MultiBox
● Fast RCNN with heuristic region proposal replaced by
region proposal network (RPN) inspired by MultiBox
● RPN shares full-image convolutional features with the
detection network (cost-free region proposal)
● RPN uses “attention” mechanism to tell where to look
● ~5 FPS on a Titan K40 GPU
● End-to-end training
80
Artificial neural networks
Roadmap - Faster RCNN
● SSD leverages the Faster RCNN’s RPN to directly
classify objects inside each prior box (similar to YOLO)
● Predicts category scores and box offsets for a fixed set
of default bounding boxes
● Fixes the predefined grid cells used in YOLO by using
multiple aspect ratios
● Produces predictions of different scales
● ~59 FPS
81
Artificial neural networks
Roadmap - SSD (single shot multibox detector)
● Open-source software library for machine learning
applications
● Tensorflow Object Detection API
○ A collection of pretrained models
○ construct, train and deploy object detection models
82
Artificial neural networks
TensorFlow object detection API
Summary
83
● Humans are good at understanding the big picture
● Neural networks are good at details
● But they can be fooled...
84
Summary
Human vs machine
● Need a large amount data
● Lots of engineering
● Trial and error
● Long training time
● Still lots of hyperparameter parameter tuning
● No general network (generalization not answered)
● Little mathematical foundation
85
Summary
Computer vision is still difficult
● Despite all of these advances, the dream of having a
computer interpret an image at the same level as a
human remains unrealized
86
Summary
Computer vision is hard
Thank You
Stanislav Frolov
Big Data Engineer
sfrolov@inovex.de
0173 318 11 35
inovex GmbH
Lindberghstraße 3
80939 München

More Related Content

PPTX
AI Computer vision
PPTX
Computer vision
PPTX
Computer vision
PDF
Digital Image Processing: Digital Image Fundamentals
PPTX
Image proccessing and its application
PPTX
ANIMATION SEQUENCE
PPTX
Image enhancement techniques
PPTX
Computer vision ppt
AI Computer vision
Computer vision
Computer vision
Digital Image Processing: Digital Image Fundamentals
Image proccessing and its application
ANIMATION SEQUENCE
Image enhancement techniques
Computer vision ppt

What's hot (20)

PPT
Digital Image Processing_ ch1 introduction-2003
PPTX
Object tracking
PPTX
Fundamental Steps of Digital Image Processing & Image Components
PPTX
Computer Vision
PPTX
Computer vision
PPTX
Computer Vision - Artificial Intelligence
PDF
fusion of Camera and lidar for autonomous driving II
PDF
Machine learning in image processing
DOCX
EDGE DETECTION
PPTX
Ai lecture 03 computer vision
PPTX
1. digital image processing
PPTX
Image processing ppt
PDF
Seminar(Pattern Recognition)
PPTX
Image Acquisition and Representation
PPT
ImageProcessing10-Segmentation(Thresholding) (1).ppt
PPTX
Edge detection
PPTX
Machine learning seminar ppt
PPTX
Stereo vision
PPTX
COM2304: Introduction to Computer Vision & Image Processing
PDF
Driver Drowsiness Detection report
Digital Image Processing_ ch1 introduction-2003
Object tracking
Fundamental Steps of Digital Image Processing & Image Components
Computer Vision
Computer vision
Computer Vision - Artificial Intelligence
fusion of Camera and lidar for autonomous driving II
Machine learning in image processing
EDGE DETECTION
Ai lecture 03 computer vision
1. digital image processing
Image processing ppt
Seminar(Pattern Recognition)
Image Acquisition and Representation
ImageProcessing10-Segmentation(Thresholding) (1).ppt
Edge detection
Machine learning seminar ppt
Stereo vision
COM2304: Introduction to Computer Vision & Image Processing
Driver Drowsiness Detection report
Ad

Similar to Computer Vision – From traditional approaches to deep neural networks (20)

PDF
Computer Vision in 2024 _ All The Things You Need To Know.pdf
PPT
vision-1.ppt
PPTX
Computer vision
PPTX
Class PPT based on engineering subject cv.pptx
PPTX
Computer Vision Crash Course
PPT
vision.ppt
PPT
vision.ppt
PPT
vision_2.ppt
PPTX
Computer Vision(4).pptx
PPTX
Computer vision introduction
PPTX
Machine Learning
PDF
Computer vision basics
PPTX
Introduction-to-Computer-Vision PPPP.pptx
PPTX
01 CM Introduction of Computer Vision.pptx
PPTX
I have not done hard tests for this, but you should gain about
PPTX
Introduction to Computer Vision - Image formation
PDF
IEEE EED2021 AI use cases in Computer Vision
PPTX
seminar_computer_vision.pptx
PDF
Saksham seminar report
PPTX
Computer vision
Computer Vision in 2024 _ All The Things You Need To Know.pdf
vision-1.ppt
Computer vision
Class PPT based on engineering subject cv.pptx
Computer Vision Crash Course
vision.ppt
vision.ppt
vision_2.ppt
Computer Vision(4).pptx
Computer vision introduction
Machine Learning
Computer vision basics
Introduction-to-Computer-Vision PPPP.pptx
01 CM Introduction of Computer Vision.pptx
I have not done hard tests for this, but you should gain about
Introduction to Computer Vision - Image formation
IEEE EED2021 AI use cases in Computer Vision
seminar_computer_vision.pptx
Saksham seminar report
Computer vision
Ad

More from inovex GmbH (20)

PDF
lldb – Debugger auf Abwegen
PDF
Are you sure about that?! Uncertainty Quantification in AI
PDF
Why natural language is next step in the AI evolution
PDF
WWDC 2019 Recap
PDF
Network Policies
PDF
Interpretable Machine Learning
PDF
Jenkins X – CI/CD in wolkigen Umgebungen
PDF
AI auf Edge-Geraeten
PDF
Prometheus on Kubernetes
PDF
Deep Learning for Recommender Systems
PDF
Azure IoT Edge
PDF
Representation Learning von Zeitreihen
PDF
Talk to me – Chatbots und digitale Assistenten
PDF
Künstlich intelligent?
PDF
Dev + Ops = Go
PDF
Das Android Open Source Project
PDF
Machine Learning Interpretability
PDF
Performance evaluation of GANs in a semisupervised OCR use case
PDF
People & Products – Lessons learned from the daily IT madness
PDF
Infrastructure as (real) Code – Manage your K8s resources with Pulumi
lldb – Debugger auf Abwegen
Are you sure about that?! Uncertainty Quantification in AI
Why natural language is next step in the AI evolution
WWDC 2019 Recap
Network Policies
Interpretable Machine Learning
Jenkins X – CI/CD in wolkigen Umgebungen
AI auf Edge-Geraeten
Prometheus on Kubernetes
Deep Learning for Recommender Systems
Azure IoT Edge
Representation Learning von Zeitreihen
Talk to me – Chatbots und digitale Assistenten
Künstlich intelligent?
Dev + Ops = Go
Das Android Open Source Project
Machine Learning Interpretability
Performance evaluation of GANs in a semisupervised OCR use case
People & Products – Lessons learned from the daily IT madness
Infrastructure as (real) Code – Manage your K8s resources with Pulumi

Recently uploaded (20)

PPTX
CHAPTER 2 - PM Management and IT Context
PDF
How to Migrate SBCGlobal Email to Yahoo Easily
PDF
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
PPTX
L1 - Introduction to python Backend.pptx
PDF
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
PDF
Navsoft: AI-Powered Business Solutions & Custom Software Development
PDF
Design an Analysis of Algorithms I-SECS-1021-03
PPTX
history of c programming in notes for students .pptx
PDF
top salesforce developer skills in 2025.pdf
PDF
Nekopoi APK 2025 free lastest update
PPTX
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
PDF
Audit Checklist Design Aligning with ISO, IATF, and Industry Standards — Omne...
PPTX
ISO 45001 Occupational Health and Safety Management System
PPTX
Odoo POS Development Services by CandidRoot Solutions
PPTX
ai tools demonstartion for schools and inter college
PDF
Softaken Excel to vCard Converter Software.pdf
PDF
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
PDF
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
PPT
Introduction Database Management System for Course Database
PPTX
Online Work Permit System for Fast Permit Processing
CHAPTER 2 - PM Management and IT Context
How to Migrate SBCGlobal Email to Yahoo Easily
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
L1 - Introduction to python Backend.pptx
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
Navsoft: AI-Powered Business Solutions & Custom Software Development
Design an Analysis of Algorithms I-SECS-1021-03
history of c programming in notes for students .pptx
top salesforce developer skills in 2025.pdf
Nekopoi APK 2025 free lastest update
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
Audit Checklist Design Aligning with ISO, IATF, and Industry Standards — Omne...
ISO 45001 Occupational Health and Safety Management System
Odoo POS Development Services by CandidRoot Solutions
ai tools demonstartion for schools and inter college
Softaken Excel to vCard Converter Software.pdf
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
Introduction Database Management System for Course Database
Online Work Permit System for Fast Permit Processing

Computer Vision – From traditional approaches to deep neural networks

  • 1. Computer Vision From traditional approaches to deep neural networks Stanislav Frolov München, 27.02.2018
  • 2. ● Computer vision ● Human vision ● Traditional approaches and methods ● Artificial neural networks ● Summary 2 Outline of this talk What we are going to talk about
  • 3. ● trained deep neural networks for object detection during master thesis ● still fascinated and interested 3 Stanislav Frolov Big Data Engineer @inovex
  • 4. ● Teach computers how to see ● Automatic extraction, analysis and understanding of images ● Infer useful information, interpret and make decisions ● Automate tasks that human visual system can do ● One of the most exciting fields in AI and ML 4 What is computer vision General
  • 5. 5 What is computer vision Motivation ● Era of pixels ● Internet consists mostly of images ● Explosion of visual data ● Cannot be labeled by humans
  • 6. 6 What is computer vision Drivers ● Two drivers for computer vision explosion ○ Compute (faster and cheaper) ○ Data (more data > algorithms)
  • 7. 7 What is computer vision Interdisciplinary field Computer Science Mathematics Engineering Physics Biology Psychology Information Retrieval Machine LearningGraphs, Algorithms Systems Architecture Robotics Speech, NLP Image Processing Optics Solid-State Physics Neuroscience Cognitive SciencesBiological vision
  • 9. ● Imaging for statistical pattern recognition ● Image transformations such as pixel-by-pixel operations ○ Contrast enhancement ○ Edge extraction ○ Noise reduction ○ Geometrical and spatial operations (i.e rotations) 9 What is computer vision Related fields - image processing
  • 10. ● Creates new images from scene descriptions ● Produces image data from 3D models ● “Inverse” of computer vision ● AR as a combination of both 10 What is computer vision Related fields - computer graphics
  • 11. ● Mainly manufacturing applications ● Image-based automatic inspection, process control, robot guidance ● Usually employs strong assumptions (colour, shape, light, structure, orientation, ...) -> works very well ● Output often pass/fail or good/bad ● Additionally numerical/measurement data, counts 11 What is computer vision Related fields - machine vision
  • 12. ● Create “intelligent” systems ● Studying computational aspects of intelligence ● Make computers do things at which, at the moment, people are better ● Many techniques play an important role (ML, ANNs) ● Currently does a few things better/faster at scale than humans can ● Ability to do anything “human” is not answered 12 What is computer vision Related fields - AI
  • 13. ● Related fields have a large intersection ● Basic techniques used, developed and studied are very similar 13 What is computer vision Related fields- summary
  • 14. Short trip to human vision 14
  • 15. ● Two stage process ○ Eyes take in light reflected off the objects and retina converts 3D objects into 2D images ○ Brain’s visual system interprets 2D images and “rebuilds” a 3D model 15 What is human vision General
  • 16. ● Pair of 2D images with slightly different view allows to infer depth ● Position of nearby objects will vary more across the two images than the position of more distant objects 16 What is human vision Stereoscopic vision
  • 17. ● Prior knowledge of relative sizes and depths is often key for understanding and interpretation 17 What is human vision Prior knowledge
  • 18. ● Texture and texture change helps solving depth perception 18 What is human vision Texture pattern
  • 19. 19 What is human vision Biases and illusions in human perception ● Shadows make all the difference in interpretation ● Gradual changes in light ignored to not be misled by shadow
  • 20. 20 What is human vision A few more illusions ● Two arrows with different orientations have the same length
  • 21. ● Assumptions and familiarity (distorted room) ● Face recognition bias ● Up-down orientation bias 21 What is human vision Biases and illusions in human perception
  • 22. 22 What is human vision Summary ● Illusions are fun, but the complete puzzle to understand human vision is far from being complete
  • 23. Back to computer vision 23
  • 24. ● Recognition ● Localization ● Detection ● Segmentation 24 What is computer vision Typical tasks
  • 25. ● Part-based detection ○ Deformable parts model ○ Pose estimation and poselets 25 What is computer vision Typical tasks
  • 26. ● Image captioning (actions, attributes) 26 What is computer vision Typical tasks
  • 27. ● Motion analysis ○ Egomotion (camera) ○ Optical flow (pixels) 27 What is computer vision Typical tasks
  • 28. ● Scene understanding and reconstruction 28 What is computer vision Typical tasks
  • 29. ● Image restoration ● Colouring black & white photos 29 What is computer vision Typical tasks
  • 30. Solving this is useful for many applications 30
  • 31. 31 What is computer vision Typical applications ● Assistance systems for cars and people ● Surveillance ● Navigation (obstacle avoidance, road following, path planning) ● Photo interpretation ● Military (“smart” weapons) ● Manufacturing (inspection, identification) ● Robotics ● Autonomous vehicles (dangerous zones)
  • 32. 32 What is computer vision Typical applications ● Recognition and tracking ● Event detection ● Interaction (man-machine interfaces) ● Modeling (medical, manufacturing, training, education) ● Organizing (database index, sorting/clustering) ● Fingerprint and biometrics ● …
  • 34. 34 What is computer vision Why it is difficult ● Occlusion ● Deformation ● Scale ● Clutter ● Illumination ● Viewpoint ● Object pose ● Tons of classes and variants ● Often n:1 mapping ● Computationally expensive ● Full understanding of biological vision is missing
  • 36. ● Input: image(s) + labels ● Output: Semantic data, labels ● Digital image pixels usually have three channels [R,G,B] each [0...255] + Location[x,y] ● Digital images are just vectors 36 What is computer vision System overview
  • 37. 1. Image acquisition (camera, sensors) 2. Pre-processing (sampling, noise reduction, augmentation) 3. Feature extraction (lines, edges, regions, points) 4. Detection and segmentation 5. Post-processing (verification, estimation, recognition) 6. Decision making ● -> Ability of a machine to step back and interpret the big picture of those pixels 37 What is computer vision System overview
  • 39. 1950s ● 2D imaging for statistical pattern recognition ● Theory of optical flow based on a fixed point towards which one moves 39 What is computer vision History
  • 40. Image processing ● Histograms ● Filtering ● Stitching ● Thresholding ● ... 40 What is computer vision Traditional approaches
  • 41. 1960s ● Desire to extract 3D structure from 2D images for scene understanding ● Began at pioneering AI universities to mimic human visual system as stepping stone for intelligent robots ● Summer vision project at MIT: attach camera to computer and having it “describe what it saw” 41 What is computer vision History
  • 42. ● Given to 10 undergraduate students ● … an attempt to use our summer workers effectively … ● … construction of a significant part of a visual system … ● … task can be segmented into sub-problems … ● … participate in the construction of a system complex enough to be a real landmark in the development of “pattern recognition” … 42 What is computer vision History: summer vision project @MIT 1966
  • 43. ● Goal: analyse scenes and identify objects ● Structure of system: ○ Region proposal ○ Property lists for regions ○ Boundary construction ○ Match with properties ○ Segment ● Basic foreground/background segmentation with simple objects (cubes, cylinders, ….) 43 What is computer vision History: summer vision project @MIT 1966
  • 44. ● Unlike general intelligence, computer vision seemed tractable ● Amusing anecdote, but it did never aimed to “solve” computer vision ● Computer vision today differs from what it was thought to be in 1966 44 What is computer vision History: summer vision project @MIT 1966
  • 45. 1970s ● Formed many algorithms that exist today ● Edges, lines and objects as interconnected structures 45 What is computer vision History
  • 46. 46 What is computer vision Traditional approaches Edge detection based on ● Brightness ● Gradients ● Geometry ● Illumination
  • 47. 47 What is computer vision Traditional approaches - part based detector ● Objects composed of features of parts and their spatial relationship ● Challenge: how to define and combine
  • 48. 1980s ● More rigorous mathematical analysis and quantitative aspects ● Optical character recognition ● Sliding window approaches ● Usage of artificial neural networks 48 What is computer vision History
  • 49. 49 What is computer vision Traditional approaches - HOG detection (histogram of oriented gradients) ● Concept in 80s but used only in 2005 ● Create HOG descriptors (object generalizations) ● One feature vector per object ● Train with SVM ● Sliding window @multiple scales
  • 50. 50 What is computer vision Traditional approaches - HOG detection (histogram of oriented gradients) ● Computation of HOG descriptors: 1. Compute gradients 2. Compute histograms on cells 3. Normalize histograms 4. Concatenate histograms ● Requires a lot of engineering ● Must build ensembles of feature descriptors
  • 51. 1990s ● Significant interaction with computer graphics (rendering, morphing, stitching) ● Approaches using statistical learning ● Eigenface (Ghostfaces) through principal component analysis (PCA) 51 What is computer vision History
  • 52. 52 What is computer vision Traditional approaches - deformable parts model (DPM) ● Objects constructed by its parts ● First match whole object, then refine on the parts ● HOG + part-based + modern features ● Slow but good at difficult objects ● Involves many heuristics
  • 53. 53 What is computer vision Features ● Feature points ○ Small area of pixels with certain properties ● Feature detection ○ Use features for identification ○ Activate if “object” present ● Examples: ○ Lines, edges, colours, blobs, … ○ Animals, faces, cars, ...
  • 54. 54 What is computer vision Traditional approaches - classical recognition ● Init: extract features for objects in different scales, colours, orientations, rotations, occlusion levels ● Inference: extract features from query image and find closest match in database or train a classifier ● Computationally expensive (hundreds of features in image, millions in database) and complex due to errors and mismatches
  • 55. 55 What is computer vision History Before the new era ● Bags of features ● Handcrafted ensembles Input Feat. 2 Feat. 1 Feat. n Final Decision Feature Extraction
  • 56. The new era of computer vision 56
  • 57. ● Elementary building block ● Inspired by biological neurons ● Mathematical function y=f(wx+b) ● Learnable weights 57 Artificial neural networks Fundamentals - artificial neuron
  • 58. ● Collection of neurons organized in layers ● Universal approximators ● Fully-connected network here 58 Artificial neural networks Fundamentals - artificial neural networks
  • 59. 59 Artificial neural networks Fundamentals - training ● Basically an optimization problem ● Find minimum of a loss function by an iterative process (training) ● Designing the loss function is sometimes tricky
  • 60. 60 Artificial neural networks Fundamentals - training Simple optimizer algorithm: 1. Forward pass with a batch of data 2. Calculate error between actual and wanted output 3. Nudge weights in proportion to error into the right direction (same data would result in smaller error) 4. Repeat until convergence
  • 61. 61 Artificial neural networks Fundamentals - CNN ● Local neighborhood contributes to activation ● Exploit spatial information ● Hierarchical feature extractors ● Less parameters input activation filters receptive field
  • 62. 62 Artificial neural networks Fundamentals - CNN ● Filter of size 3x3 applied to an input of 7x7
  • 63. 63 Artificial neural networks Fundamentals - pooling ● Max-pooling ● Dimension reduction/adaption ● Existence is more important than location
  • 64. 64 Artificial neural networks Fundamentals - pooling ● Zero-padding ● Controlling dimensions
  • 65. 65 Artificial neural networks Fundamentals - general network architecture Input image convolutional layers ... Final decision
  • 66. 66 Artificial neural networks Fundamentals - hierarchical feature extractors Lines, edges, blobs, colours, ... Abstract objectsParts of abstract objects First layers Deeper layers Activations for:
  • 67. Modern history of object recognition 67
  • 68. ● Classification and detection ○ 27k images ○ 20 classes ■ person, bird, cat, cow, dog, horse, sheep, aeroplane, bicycle, boat, bus, car, motorbike, train, bottle, chair, dining table, potted plant, sofa, tv/ monitor 68 Benchmark Datasets - PASCAL VOC
  • 69. ● Challenges on a subset of ImageNet ○ 14kk labeled images ○ 20k object categories ● ILSVRC* usually on 10k categories including 90 out of 120 dog breeds 69 Benchmark Datasets - ImageNet *ImageNet Large Scale Visual Recognition Challenge
  • 70. ● ILSVRC 2012 winner by a large margin from 25% to 16% ● Proved effectiveness of CNNs and kicked of a new era ● 8 layers, 650k neurons, 60kk parameters 70 Artificial neural networks Roadmap - AlexNet
  • 71. ● ILSVRC 2013 winner with a best top-5 error of 11.6% ● AlexNet but using smaller 7x7 kernels to keep more information in deeper layers 71 Artificial neural networks Roadmap - ZFNet
  • 72. ● ILSVRC 2013 localization winner ● Uses AlexNet on multi-scale input images with sliding window approach ● Accumulates bounding boxes for final detection (instead of non-max suppression) 72 Artificial neural networks Roadmap - OverFeat
  • 73. ● 2k proposals generated by selective search ● SVM trained for classification ● Multi-stage pipeline 73 Artificial neural networks Roadmap - RCNN (region based CNN)
  • 74. ● Not a winner but famous due to simplicity and effectiveness ● Replace large-kernel convolutions by stacking several small-kernel convolutions 74 Artificial neural networks Roadmap - VGGNet
  • 75. ● ILSVRC 2014 winner ● Stacks up “inception” modules ● 22 layers, 5kk parameters 75 Artificial neural networks Roadmap - InceptionNet (GoogleNet)
  • 76. ● Jointly learns region proposal and detection ● Employs a region of interest (RoI) that allows to reuse the computations 76 Artificial neural networks Roadmap - Fast RCNN
  • 77. ● Directly predicts all objects and classes in one shot ● Very fast ● Processes images at ~40 FPS on a Titan X GPU ● First real-time state-of-the-art detector ● Divides input images into multiple grid cells which are then classified 77 Artificial neural networks Roadmap - YOLO (you only look once)
  • 78. ● ILSVRC 2015 winner with a 3.6% error rate (human performance is 5-10%) ● Employs residual blocks which allows to build deep networks (hundreds of layers) ● Additional identity mapping 78 Artificial neural networks Roadmap - ResNet (Microsoft)
  • 79. ● Not a recognition network ● A region proposal network ● Popularized prior/anchor boxes (found through clustering) to predict offsets ● Much better strategy than starting the predictions with random coordinates ● Since then heuristic approaches have been gradually fading out and replaced 79 Artificial neural networks Roadmap - MultiBox
  • 80. ● Fast RCNN with heuristic region proposal replaced by region proposal network (RPN) inspired by MultiBox ● RPN shares full-image convolutional features with the detection network (cost-free region proposal) ● RPN uses “attention” mechanism to tell where to look ● ~5 FPS on a Titan K40 GPU ● End-to-end training 80 Artificial neural networks Roadmap - Faster RCNN
  • 81. ● SSD leverages the Faster RCNN’s RPN to directly classify objects inside each prior box (similar to YOLO) ● Predicts category scores and box offsets for a fixed set of default bounding boxes ● Fixes the predefined grid cells used in YOLO by using multiple aspect ratios ● Produces predictions of different scales ● ~59 FPS 81 Artificial neural networks Roadmap - SSD (single shot multibox detector)
  • 82. ● Open-source software library for machine learning applications ● Tensorflow Object Detection API ○ A collection of pretrained models ○ construct, train and deploy object detection models 82 Artificial neural networks TensorFlow object detection API
  • 84. ● Humans are good at understanding the big picture ● Neural networks are good at details ● But they can be fooled... 84 Summary Human vs machine
  • 85. ● Need a large amount data ● Lots of engineering ● Trial and error ● Long training time ● Still lots of hyperparameter parameter tuning ● No general network (generalization not answered) ● Little mathematical foundation 85 Summary Computer vision is still difficult
  • 86. ● Despite all of these advances, the dream of having a computer interpret an image at the same level as a human remains unrealized 86 Summary Computer vision is hard
  • 87. Thank You Stanislav Frolov Big Data Engineer sfrolov@inovex.de 0173 318 11 35 inovex GmbH Lindberghstraße 3 80939 München