SlideShare a Scribd company logo
Online Security Analytics On
Large Scale Video Surveillance
System
Yu Cao, Xiaoyan Guo
EMC Corporation
Security Analytics On Video
Surveillance
• Search across video systems in all store locations to
identify the customer of a fraudulent card transaction
and his/her other transactions
• Correlate register transactions and surveillance video
to identify employee fraud transactions where there
is no customer present
• If multiple stores in a region are robbed, identify any
faces that were in all of those stores in the weeks
leading up to the events
2
-- Retail Industry Customer Cases
Challenges In Big & Fast Data Era
3
Cloud Integration
M&O
Fast Data
Ingestion
Multi-
Latency
Analytics
Scalable
Data
Storage
EMC Video Analytics Data Lake
4
Where Spark Resides @ VADL
Offline Video Analytics &
Model Training
Object
Detection
Feature
Extraction
Classification
Abnormal
Detection
Face
Recognition
Feature
Indexing
Online Video Processing and Detection
Ad-Hoc Video Content Search
Video &
Feature
Storage Analytics
Model
Streaming
MLlib & GraphX
Core & SQL
Deep Learning on
Deep Learning Framework
5
Enable Spark to Process Raw
Video Data
6
• Spark has no built-in video processing capability
• Combine Spark program (Scala, Java) with video
processing library(C++)
PipedRDD: Invoking External
Programs
7
• PipedRDD[T]: T => Linux Command(T) => String,T is text line
• Spark Pipe
– pipe interface takes an input of an external command, and then execute it
externally. The input stream of this program is the content of RDD in spark,
the output of this external program will form a new RDD
• JAVA API
– JavaRDD<String> pipe(String command)
– JavaRDD<String> pipe(java.util.List<String> command)
– JavaRDD<String> pipe(java.util.List<String> command,
java.util.Map<String,String> env)
– Return an RDD created by piping elements to a forked external process
Video Processing Function
Implementation
8
• OpenCV
– Popular open source computer vision library
• Home-grown algorithms, e.g. CNN
• Video Processing Functions
– video file => video transcoding => list of frame images
– frame image => background extraction => background
image
– frame image => object detection => list of objects
– object => feature extraction => object features
– ……
Pipeline Video Processing Tasks
9
• Steps
– Implement all required video processing sub-components
as external programs
– Pipeline these processing units by utilizing PipedRDD in
Spark jobs
• Pseudo-code (Chaining & Pipeline)
sc.fromCameraStream (“rtsp://10.67.89.10/road?fps=15”)
.pipe(“video_transcoding”)
.pipe(“object_detection”)
.pipe(“feature extraction”)
.writeToHBase()
Online Video Processing During
Ingestion
Video Ingestion System
10
Online Video Processing During
Ingestion
11
video streams
Object
Detection
Feature
Extraction
Classification/
Recognition
Indexing/
Storing
Deep Learning Platform
Model
Real-time
Detection
Real-time
Dashboard
Video Processing in Spark
Streaming
12
• Receive Video Stream
– val snapList = stream.queueStream(rddQueue)
– Read video stream in certain time interval,put data into msgQueue
– rddQueue += sc.makeRDD(msgQueue)
– Then process snapList
Spark Job
Spark
Streaming
rdd.pipe(“video_transcoding”)
.pipe(“object_detection”)
.pipe(“feature extraction”)
.writeToHBase()
Feature & object
store
Online Video Analytics App
Video Content Search
13
• Content-based video object
search
– Search similar objects by a given
object instance
– E.g. search suspect from history video
records by given the suspect's
identification photo
• Semantic-based video object
search
– Search matched objects by given
semantic declaration
– E.g. given keywords: search "Red
Porsche", "a woman sitting and
smoking", etc
Video Content Search Workflow
14
camera
streams
Object
Detection
Feature
Extraction
Index
Building
HBase
Ingestion
Index
Ingestion
video pre-processing:
object detection and feature
extraction
Web Dashboard
Web Backend
SearchEngine
HDFS
Multi-Tier Video Storage
HBase
HBase
Client
Feature
Extraction
query image
similar object
search
features
object information
query
top-k
objects
similar objects
• Video Pre-processing and Feature Extraction
• Scalable Storage
• Object-based Indexing
• Similar Object Search Engine
Feature &
Index
Object
Info
Original
Video Data
Video Object Similarity
15
Local Binary Pattern(LBP)
• Similarity of Features == Similarity of Video Objects
– Color, Texture, Shape
– SIFT
• 160 features, each is a vector of 128 dimensions
– Deep Learning Features …
Deep Learning Features in Different Layers
Feature Dimensionality Reduction
16
• PCA
– MLlib version PCA (when D is small)
• Scalable PCA
– Distributed PPCA implemented atop Spark (when D is large)
• LSH (Locality-Sensitive Hashing)
– LSH hashes input items so that similar items map to the same “buckets” with
high probability
Resize
Grayscale
SIFT
PCA
Spark Top-K Query Pipeline
17
workers:
--f(i)--
--f(i)--
--f(i)--
--f(i)--
--f(i)--
--f(i)--
map: --f(i)--
--qf--
--qf--
query feature
--f(i)--
--qf--
--f(i)--
--qf--
order: Array[s1, s2, …,sn] Array[s1, s2, …,sn] Array[s1, s2, …,sn]
top-k
top-k most similar features
Scala Code Example
18
def topRankScore(sc:SparkContext, top:Int, queryInput:String, trainPath:String,
useMethod:(Array[Double], Array[Double])=>Double ) = {
val query = sc.makeRDD(Array(queryInput)).map( _.split(" ").map( _.toDouble )
).collect()(0)
featureFile.filter( _.length > 0 ).map{ line =>
val parts = line.split(" ", 2)
(useMethod(query, parts(1).split(" ").map( _.toDouble)), parts(0))
}.
takeOrdered(top).map( i => (i._2, i._1) )
}
topRankScore(sc, topNumber.toInt, imageFeaturesStr, names, cosScore).foreach(println)
Parameter: (sc, top-k, queried feature, HDFS feature file, similarity computing method)
Deep Learning @ VADL
19
• Feature extraction for
detected video objects
– faces
– humans
• Classification of video
objects
– With trained model
• Suspect detection and
recognition Training neural networks with many layers
20
Deep Learning With Spark
• DeepLearning4J (DL4J)
– Open source
– Variety of NNs &
Flexibility
– Cross-platform & Scale
– Java Implementation
• parallelization (Yarn,
Spark)
• GPU support
– Also supports multi-GPU
per host node
• DeepDist
– Open source
– Deep belief networks
– Asynchronous
stochastic gradient
descent for data stored
on HDFS / Spark
– Python Implementation
THANK YOU.
yu.cao@emc.com

More Related Content

PDF
Top 5 mistakes when writing Streaming applications
PDF
Spark-on-Yarn: The Road Ahead-(Marcelo Vanzin, Cloudera)
PPTX
Realtime streaming architecture in INFINARIO
PDF
Self-Service Apache Spark Structured Streaming Applications and Analytics
PDF
Spark Streaming and IoT by Mike Freedman
PDF
A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)
PDF
Building a Data Warehouse for Business Analytics using Spark SQL-(Blagoy Kalo...
PDF
How to Boost 100x Performance for Real World Application with Apache Spark-(G...
Top 5 mistakes when writing Streaming applications
Spark-on-Yarn: The Road Ahead-(Marcelo Vanzin, Cloudera)
Realtime streaming architecture in INFINARIO
Self-Service Apache Spark Structured Streaming Applications and Analytics
Spark Streaming and IoT by Mike Freedman
A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)
Building a Data Warehouse for Business Analytics using Spark SQL-(Blagoy Kalo...
How to Boost 100x Performance for Real World Application with Apache Spark-(G...

What's hot (20)

PPTX
Lambda architecture with Spark
PPTX
Lambda architecture: from zero to One
PPTX
From Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's Data
PDF
Spark Summit EU talk by Bas Geerdink
PDF
Large Scale Feature Aggregation Using Apache Spark with Pulkit Bhanot and Ami...
PDF
Spark Summit EU talk by Ruben Pulido Behar Veliqi
PPTX
Implementing the Lambda Architecture efficiently with Apache Spark
PDF
OAP: Optimized Analytics Package for Spark Platform with Daoyuan Wang and Yua...
PDF
End-to-End Spark/TensorFlow/PyTorch Pipelines with Databricks Delta
PDF
A Journey to Building an Autonomous Streaming Data Platform—Scaling to Trilli...
PDF
Efficient State Management With Spark 2.0 And Scale-Out Databases
PPTX
Real time data viz with Spark Streaming, Kafka and D3.js
PDF
Trends for Big Data and Apache Spark in 2017 by Matei Zaharia
PDF
Lessons Learned from Managing Thousands of Production Apache Spark Clusters w...
PDF
Monitoring Large-Scale Apache Spark Clusters at Databricks
PDF
Streaming Analytics with Spark, Kafka, Cassandra and Akka
PDF
Modern ETL Pipelines with Change Data Capture
PPTX
Spark Summit EU talk by Kaarthik Sivashanmugam
PDF
Spark Summit EU talk by Kaarthik Sivashanmugam
PDF
Sa introduction to big data pipelining with cassandra &amp; spark west mins...
Lambda architecture with Spark
Lambda architecture: from zero to One
From Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's Data
Spark Summit EU talk by Bas Geerdink
Large Scale Feature Aggregation Using Apache Spark with Pulkit Bhanot and Ami...
Spark Summit EU talk by Ruben Pulido Behar Veliqi
Implementing the Lambda Architecture efficiently with Apache Spark
OAP: Optimized Analytics Package for Spark Platform with Daoyuan Wang and Yua...
End-to-End Spark/TensorFlow/PyTorch Pipelines with Databricks Delta
A Journey to Building an Autonomous Streaming Data Platform—Scaling to Trilli...
Efficient State Management With Spark 2.0 And Scale-Out Databases
Real time data viz with Spark Streaming, Kafka and D3.js
Trends for Big Data and Apache Spark in 2017 by Matei Zaharia
Lessons Learned from Managing Thousands of Production Apache Spark Clusters w...
Monitoring Large-Scale Apache Spark Clusters at Databricks
Streaming Analytics with Spark, Kafka, Cassandra and Akka
Modern ETL Pipelines with Change Data Capture
Spark Summit EU talk by Kaarthik Sivashanmugam
Spark Summit EU talk by Kaarthik Sivashanmugam
Sa introduction to big data pipelining with cassandra &amp; spark west mins...
Ad

Viewers also liked (20)

PDF
Succinct Spark: Fast Interactive Queries on Compressed RDDs by Rachit Agarwal
PDF
Operational Tips for Deploying Spark by Miklos Christine
PPT
New trends in video analytics and surveillance systems for the mining industry
PDF
Generalized Linear Models in Spark MLlib and SparkR by Xiangrui Meng
PDF
Unified Framework for Real Time, Near Real Time and Offline Analysis of Video...
PDF
Beyond Parallelize and Collect by Holden Karau
PDF
Realtime Analytical Query Processing and Predictive Model Building on High Di...
PDF
Paul Collins Final PPP
PDF
Mapping Brain Connectivity Through Large-Scale Segmentation and Analysis by S...
PDF
Uss references
PPTX
New 2020 Vision web site coming soon.
PDF
Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasan...
PPTX
Veracity's Coldstore Arcus - Storage as the foundation of your surveillance s...
PPTX
Analyse des médias étrangers CNN vs CCTV
PPT
IBM : Gouvernance de l\'Information - Principes &amp; Mise en oeuvre
PDF
Using GraphX/Pregel on Browsing History to Discover Purchase Intent by Lisa Z...
PDF
Building a Recommendation Engine Using Diverse Features by Divyanshu Vats
PDF
An Introduction to Video Analytics
PPTX
Addressing your security needs with V-Secur
Succinct Spark: Fast Interactive Queries on Compressed RDDs by Rachit Agarwal
Operational Tips for Deploying Spark by Miklos Christine
New trends in video analytics and surveillance systems for the mining industry
Generalized Linear Models in Spark MLlib and SparkR by Xiangrui Meng
Unified Framework for Real Time, Near Real Time and Offline Analysis of Video...
Beyond Parallelize and Collect by Holden Karau
Realtime Analytical Query Processing and Predictive Model Building on High Di...
Paul Collins Final PPP
Mapping Brain Connectivity Through Large-Scale Segmentation and Analysis by S...
Uss references
New 2020 Vision web site coming soon.
Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasan...
Veracity's Coldstore Arcus - Storage as the foundation of your surveillance s...
Analyse des médias étrangers CNN vs CCTV
IBM : Gouvernance de l\'Information - Principes &amp; Mise en oeuvre
Using GraphX/Pregel on Browsing History to Discover Purchase Intent by Lisa Z...
Building a Recommendation Engine Using Diverse Features by Divyanshu Vats
An Introduction to Video Analytics
Addressing your security needs with V-Secur
Ad

Similar to Online Security Analytics on Large Scale Video Surveillance System by Yu Cao and Xiaoyan Guo (20)

PPTX
Paris Data Geek - Spark Streaming
PPTX
New Microsoft PowerPoint Presentation (2).pptx
PPTX
Dissecting Open Source Cloud Evolution: An OpenStack Case Study
PPTX
J1 2015 "Debugging Java Apps in Containers: No Heavy Welding Gear Required"
PPT
Broadcast Music Inc - Release Automation Rockstars!
PDF
Managed Feature Store for Machine Learning
PDF
C* Summit 2013: Real-time Analytics using Cassandra, Spark and Shark by Evan ...
PDF
Build Large-Scale Data Analytics and AI Pipeline Using RayDP
PPTX
ETL with SPARK - First Spark London meetup
PDF
Hopsworks data engineering melbourne april 2020
PPT
Apache Spark™ + IBM Watson + Twitter DataPalooza SF 2015
PDF
CI / CD / CS - Continuous Security in Kubernetes
PDF
Apache Submarine: Unified Machine Learning Platform
PDF
20170126 big data processing
PPT
Processor Design Flow for architecture design
PDF
WIA 2019 - Unearth the Journey of Implementing Vision Based Deep Learning Sol...
PPTX
Designing a Highly Available Environment Using Methods of Modern IT Infrastru...
PDF
IMAGE CAPTURE, PROCESSING AND TRANSFER VIA ETHERNET UNDER CONTROL OF MATLAB G...
PDF
"Wie passen Serverless & Autonomous zusammen?"
PPTX
20171122 aws usergrp_coretech-spn-cicd-aws-v01
Paris Data Geek - Spark Streaming
New Microsoft PowerPoint Presentation (2).pptx
Dissecting Open Source Cloud Evolution: An OpenStack Case Study
J1 2015 "Debugging Java Apps in Containers: No Heavy Welding Gear Required"
Broadcast Music Inc - Release Automation Rockstars!
Managed Feature Store for Machine Learning
C* Summit 2013: Real-time Analytics using Cassandra, Spark and Shark by Evan ...
Build Large-Scale Data Analytics and AI Pipeline Using RayDP
ETL with SPARK - First Spark London meetup
Hopsworks data engineering melbourne april 2020
Apache Spark™ + IBM Watson + Twitter DataPalooza SF 2015
CI / CD / CS - Continuous Security in Kubernetes
Apache Submarine: Unified Machine Learning Platform
20170126 big data processing
Processor Design Flow for architecture design
WIA 2019 - Unearth the Journey of Implementing Vision Based Deep Learning Sol...
Designing a Highly Available Environment Using Methods of Modern IT Infrastru...
IMAGE CAPTURE, PROCESSING AND TRANSFER VIA ETHERNET UNDER CONTROL OF MATLAB G...
"Wie passen Serverless & Autonomous zusammen?"
20171122 aws usergrp_coretech-spn-cicd-aws-v01

More from Spark Summit (20)

PDF
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
PDF
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
PDF
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
PDF
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
PDF
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
PDF
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
PDF
Apache Spark and Tensorflow as a Service with Jim Dowling
PDF
Apache Spark and Tensorflow as a Service with Jim Dowling
PDF
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
PDF
Next CERN Accelerator Logging Service with Jakub Wozniak
PDF
Powering a Startup with Apache Spark with Kevin Kim
PDF
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
PDF
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
PDF
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
PDF
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
PDF
Goal Based Data Production with Sim Simeonov
PDF
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
PDF
Getting Ready to Use Redis with Apache Spark with Dvir Volk
PDF
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
PDF
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim Dowling
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
Next CERN Accelerator Logging Service with Jakub Wozniak
Powering a Startup with Apache Spark with Kevin Kim
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Goal Based Data Production with Sim Simeonov
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Getting Ready to Use Redis with Apache Spark with Dvir Volk
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...

Recently uploaded (20)

PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PDF
Lecture1 pattern recognition............
PDF
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
PPTX
Logistic Regression ml machine learning.pptx
PPTX
Database Infoormation System (DBIS).pptx
PPTX
A Quantitative-WPS Office.pptx research study
PPTX
Moving the Public Sector (Government) to a Digital Adoption
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PDF
Fluorescence-microscope_Botany_detailed content
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PPTX
Computer network topology notes for revision
PPTX
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
PDF
.pdf is not working space design for the following data for the following dat...
PPT
Quality review (1)_presentation of this 21
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PDF
Foundation of Data Science unit number two notes
PPTX
Supervised vs unsupervised machine learning algorithms
PDF
Mega Projects Data Mega Projects Data
PPTX
Business Acumen Training GuidePresentation.pptx
Acceptance and paychological effects of mandatory extra coach I classes.pptx
Lecture1 pattern recognition............
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
Logistic Regression ml machine learning.pptx
Database Infoormation System (DBIS).pptx
A Quantitative-WPS Office.pptx research study
Moving the Public Sector (Government) to a Digital Adoption
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
Fluorescence-microscope_Botany_detailed content
Introduction-to-Cloud-ComputingFinal.pptx
Computer network topology notes for revision
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
.pdf is not working space design for the following data for the following dat...
Quality review (1)_presentation of this 21
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
Foundation of Data Science unit number two notes
Supervised vs unsupervised machine learning algorithms
Mega Projects Data Mega Projects Data
Business Acumen Training GuidePresentation.pptx

Online Security Analytics on Large Scale Video Surveillance System by Yu Cao and Xiaoyan Guo

  • 1. Online Security Analytics On Large Scale Video Surveillance System Yu Cao, Xiaoyan Guo EMC Corporation
  • 2. Security Analytics On Video Surveillance • Search across video systems in all store locations to identify the customer of a fraudulent card transaction and his/her other transactions • Correlate register transactions and surveillance video to identify employee fraud transactions where there is no customer present • If multiple stores in a region are robbed, identify any faces that were in all of those stores in the weeks leading up to the events 2 -- Retail Industry Customer Cases
  • 3. Challenges In Big & Fast Data Era 3 Cloud Integration M&O Fast Data Ingestion Multi- Latency Analytics Scalable Data Storage
  • 4. EMC Video Analytics Data Lake 4
  • 5. Where Spark Resides @ VADL Offline Video Analytics & Model Training Object Detection Feature Extraction Classification Abnormal Detection Face Recognition Feature Indexing Online Video Processing and Detection Ad-Hoc Video Content Search Video & Feature Storage Analytics Model Streaming MLlib & GraphX Core & SQL Deep Learning on Deep Learning Framework 5
  • 6. Enable Spark to Process Raw Video Data 6 • Spark has no built-in video processing capability • Combine Spark program (Scala, Java) with video processing library(C++)
  • 7. PipedRDD: Invoking External Programs 7 • PipedRDD[T]: T => Linux Command(T) => String,T is text line • Spark Pipe – pipe interface takes an input of an external command, and then execute it externally. The input stream of this program is the content of RDD in spark, the output of this external program will form a new RDD • JAVA API – JavaRDD<String> pipe(String command) – JavaRDD<String> pipe(java.util.List<String> command) – JavaRDD<String> pipe(java.util.List<String> command, java.util.Map<String,String> env) – Return an RDD created by piping elements to a forked external process
  • 8. Video Processing Function Implementation 8 • OpenCV – Popular open source computer vision library • Home-grown algorithms, e.g. CNN • Video Processing Functions – video file => video transcoding => list of frame images – frame image => background extraction => background image – frame image => object detection => list of objects – object => feature extraction => object features – ……
  • 9. Pipeline Video Processing Tasks 9 • Steps – Implement all required video processing sub-components as external programs – Pipeline these processing units by utilizing PipedRDD in Spark jobs • Pseudo-code (Chaining & Pipeline) sc.fromCameraStream (“rtsp://10.67.89.10/road?fps=15”) .pipe(“video_transcoding”) .pipe(“object_detection”) .pipe(“feature extraction”) .writeToHBase()
  • 10. Online Video Processing During Ingestion Video Ingestion System 10
  • 11. Online Video Processing During Ingestion 11 video streams Object Detection Feature Extraction Classification/ Recognition Indexing/ Storing Deep Learning Platform Model Real-time Detection Real-time Dashboard
  • 12. Video Processing in Spark Streaming 12 • Receive Video Stream – val snapList = stream.queueStream(rddQueue) – Read video stream in certain time interval,put data into msgQueue – rddQueue += sc.makeRDD(msgQueue) – Then process snapList Spark Job Spark Streaming rdd.pipe(“video_transcoding”) .pipe(“object_detection”) .pipe(“feature extraction”) .writeToHBase() Feature & object store Online Video Analytics App
  • 13. Video Content Search 13 • Content-based video object search – Search similar objects by a given object instance – E.g. search suspect from history video records by given the suspect's identification photo • Semantic-based video object search – Search matched objects by given semantic declaration – E.g. given keywords: search "Red Porsche", "a woman sitting and smoking", etc
  • 14. Video Content Search Workflow 14 camera streams Object Detection Feature Extraction Index Building HBase Ingestion Index Ingestion video pre-processing: object detection and feature extraction Web Dashboard Web Backend SearchEngine HDFS Multi-Tier Video Storage HBase HBase Client Feature Extraction query image similar object search features object information query top-k objects similar objects • Video Pre-processing and Feature Extraction • Scalable Storage • Object-based Indexing • Similar Object Search Engine Feature & Index Object Info Original Video Data
  • 15. Video Object Similarity 15 Local Binary Pattern(LBP) • Similarity of Features == Similarity of Video Objects – Color, Texture, Shape – SIFT • 160 features, each is a vector of 128 dimensions – Deep Learning Features … Deep Learning Features in Different Layers
  • 16. Feature Dimensionality Reduction 16 • PCA – MLlib version PCA (when D is small) • Scalable PCA – Distributed PPCA implemented atop Spark (when D is large) • LSH (Locality-Sensitive Hashing) – LSH hashes input items so that similar items map to the same “buckets” with high probability Resize Grayscale SIFT PCA
  • 17. Spark Top-K Query Pipeline 17 workers: --f(i)-- --f(i)-- --f(i)-- --f(i)-- --f(i)-- --f(i)-- map: --f(i)-- --qf-- --qf-- query feature --f(i)-- --qf-- --f(i)-- --qf-- order: Array[s1, s2, …,sn] Array[s1, s2, …,sn] Array[s1, s2, …,sn] top-k top-k most similar features
  • 18. Scala Code Example 18 def topRankScore(sc:SparkContext, top:Int, queryInput:String, trainPath:String, useMethod:(Array[Double], Array[Double])=>Double ) = { val query = sc.makeRDD(Array(queryInput)).map( _.split(" ").map( _.toDouble ) ).collect()(0) featureFile.filter( _.length > 0 ).map{ line => val parts = line.split(" ", 2) (useMethod(query, parts(1).split(" ").map( _.toDouble)), parts(0)) }. takeOrdered(top).map( i => (i._2, i._1) ) } topRankScore(sc, topNumber.toInt, imageFeaturesStr, names, cosScore).foreach(println) Parameter: (sc, top-k, queried feature, HDFS feature file, similarity computing method)
  • 19. Deep Learning @ VADL 19 • Feature extraction for detected video objects – faces – humans • Classification of video objects – With trained model • Suspect detection and recognition Training neural networks with many layers
  • 20. 20 Deep Learning With Spark • DeepLearning4J (DL4J) – Open source – Variety of NNs & Flexibility – Cross-platform & Scale – Java Implementation • parallelization (Yarn, Spark) • GPU support – Also supports multi-GPU per host node • DeepDist – Open source – Deep belief networks – Asynchronous stochastic gradient descent for data stored on HDFS / Spark – Python Implementation