SlideShare a Scribd company logo
Image Analysis & Retrieval
CS/EE 5590 Special Topics (Class Ids: 44873,44874)
Fall 2016,M/W 4-5:15pm@Bloch0012
Lec 08
Feature Aggregation II:
Fisher Vector, Super Vector and AKULA
Zhu Li
Dept of CSEE, UMKC
Office: FH560E,Email: lizhu@umkc.edu, Ph: x 2346.
http://l.web.umkc.edu/lizhu
p.1Z. Li, Image Analysis&Retrv.2016
Outline
 ReCap of Lecture 07
 Image Retrieval System
 BoW
 VLAD
 Dense SIFT
 Fisher Vector Aggregation
 AKULA
 Summary
Z. Li, Image Analysis&Retrv.2016 p.2
Precision, Recall, F-measure
Precision, TPR = TP/(TP + FP),
Recall = TP/(TP + FN),
 FPR=FP/(TP+FP)
F-measure
= 2*(precision*recall)/(precision +
recall)
Precision:
is the probability that a
retrieved document
is relevant.
Recall:
is the probability that a
relevant document
is retrieved in a search.
Z. Li, Image Analysis&Retrv.2016 p.3
Why Aggregation ?
 Curse of Dimensionality
Decision Boundary / Indexing
Z. Li, Image Analysis&Retrv.2016 p.4
+
…..
Bag-of-Words: Histogram Coding
Codebook:
 Feature space: Rd, k-means to get k centroids, {𝜇1, 𝜇2, … , 𝜇 𝑘}
 BoW Hard Encoding:
 For n feature points,{x1, x2, …,xn} assignment matrix: kxn,
with column only 1-non zero entry
 Aggregated dimension: k
Z. Li, Image Analysis&Retrv.2016 p.5
k
n
Kernel Code Book Soft Encoding
Kernel Code Book Soft Encoding
 Kernel Affinity: 𝐾 𝑥𝑗, 𝜇 𝑘 = 𝑒−𝑘|𝑥 𝑗−𝜇 𝑘|2
 Assignment Matrix: 𝐴𝑗,𝑘 = 𝐾(𝑥𝑗, 𝜇 𝑘)/ 𝑘 𝐾(𝑥𝑗, 𝜇 𝑘)
 Encoding: k-dimensional: X(k)=
1
𝑛 𝑗 𝐴𝑗,𝑘
Z. Li, Image Analysis&Retrv.2016 p.6
VLAD- Vector of Locally Aggregated Descriptors
 Aggregate feature difference
from the codebook
 Hard assignment by finding the
NN of feature {xk} to {𝜇 𝑘}
 Compute aggregated
differences
 L2 normalize
 Final feature: k x d
Z. Li, Image Analysis&Retrv.2016 p.7
 3
x
v1 v2
v3 v4
v5
1
 4
 2
 5
① assign descriptors
② compute x-  i
③ vi=sum x-  i for cell i
𝑣 𝑘 =
∀𝑗,𝑠.𝑡.𝑁𝑁 𝑥 𝑗 =𝜇 𝑘
𝑥𝑗 − 𝜇 𝑘
𝑣 𝑘 = 𝑣 𝑘/ 𝑣 𝑘 2
VLAD on SIFT
 Example of aggregating SIFT with VLAD
 K=16 codebook entries
 Each cell is a SIFT visualized as centroids in blue, and VLAD
difference in red
 Top row: left image, bottom row: right image, red: code
book, blue: encoded VLAD
Z. Li, Image Analysis&Retrv.2016 p.8
Outline
 ReCap of Lecture 07
 Image Retrieval System
 BoW
 VLAD
 Dense SIFT
 Fisher Vector Aggregation
 AKULA
 Summary
Z. Li, Image Analysis&Retrv.2016 p.9
One more trick
 Recall that SIFT is a powerful descriptor
 VL_FEAT: vl_dsift
 A dense description of image by computing SIFT descriptor
(no spatial-scale space extrema detection) at predetermined
grid
 Supplement HoG as an alternative texture descriptor
Z. Li, Image Analysis&Retrv.2016 p.10
VL_FEAT: vl_dsift
 Compute dense SIFT as a texture descriptor for the
image
 [f, dsift]=vl_dsift(single(rgb2gray(im)), ‘step’, 2);
 There’s also a FAST option
 [f, dsift]=vl_dsift(single(rgb2gray(im)), ‘fast’, ‘step’, 2);
 Huge amount of SIFT data will be generated
Z. Li, Image Analysis&Retrv.2016 p.11
Fisher Vector
 Fisher Vector and variations:
 Winning in image classification:
 Winning in the MPEG object re-identification:
o SCFV(Scalable Coded Fisher Vec) in CDVS
Z. Li, Image Analysis&Retrv.2016 p.12
Codebook: Gaussian Mixture Model (GMM)
 GMM is a generative model to express data
 Assuming data is generated from with parameters {𝑤 𝑘, 𝜇 𝑘, 𝜎 𝑘}
Z. Li, Image Analysis&Retrv.2016 p.13
𝑥 𝑘 ~
𝑘=1
𝐾
𝑤 𝑘 𝑁(𝜇 𝑘, 𝜎 𝑘)
𝑁 𝜇 𝑘, 𝜎 𝑘 =
1
2𝜋
𝑑
2 Σ 𝑘
1/2
𝑒−
1
2
𝑥−𝜇 𝑘
′Σ 𝑘
−1
(𝑥−𝜇 𝑘)
A bit of Theory: Fisher Kernel
Encode the derivation from the generative model
 Observed feature set, {x1, x2, …,xn} in Rd, e.g, d=128 for
SIFT.
 How’s these observations derivate from the given GMM
model with a set of parameter, 𝜆 = 𝑤 𝑘, 𝜇 𝑘, 𝜎 𝑘 ?
o i.e, how the parameter, e.g, mean will move to best fit the observation
?
Z. Li, Image Analysis&Retrv.2016 p.14
𝜇4
𝜇3
𝜇2
𝜇1
X1
+
A bit of Theory: Fisher Kernel
Score function w.r.t. the likelihood function 𝜇 𝜆(𝑋)
 𝐺𝜆
𝑋
= 𝛻𝜆 log 𝑢 𝜆(𝑋): derivative on the log likelihood
 The dimension of score function is m, where m is the number
of generative model parameters, m=3 for GMM
 Given the observed data X, score function indicate how
likelihood function parameter (e.g, mean) should move to
better fit the data.
Distance/Derivation of two observation X, Y w.r.t the
generative model
 Fisher Info Matrix (roughly the covariance in the Mahanolibis
distance)
𝐹𝜆 = 𝐸 𝑋 𝐺𝜆
𝑋
𝐺𝜆
𝑋′
 Fisher Kernel Distance: normalized by the Fisher Info
Matrix:
Z. Li, Image Analysis&Retrv.2016 p.15
𝐾𝐹𝐾 𝑋, 𝑌 = 𝐺𝜆
𝑋′
𝐹𝜆
−1
𝐺𝜆
𝑋
Fisher Vector
 KFK(X, Y) is a measure of similarity,
w.r.t. the generative model
 Similar to the Mahanolibis distance case,
we can decompose this kernel as,
 That give us a kernel feature mappingof
X to Fisher Vector
 For observed images features {xt}, can
be computed as,
Z. Li, Image Analysis&Retrv.2016 p.16
𝐾𝐹𝐾 𝑋, 𝑌 = 𝐺𝜆
𝑋′
𝐹𝜆
−1
𝐺𝜆
𝑋
= 𝐺𝜆
𝑋′
𝐿 𝜆′𝐿 𝜆 𝐺𝜆
𝑋
GMM Fisher Vector
Encode the derivation from the generative model
 Observed feature set, {x1, x2, …,xn} in Rd, e.g, d=128 (!) for SIFT.
 How’s these observations derivate from the given GMM model with a set
of parameter, 𝜃 = 𝑎 𝑘, 𝜇 𝑘, 𝜎 𝑘 ?
 GMM Log Likelihood Gradient
 Let 𝑤 𝑘 =
𝑒 𝑎 𝑘
𝑗 𝑒
𝑎 𝑗
, Then we have
Z. Li, Image Analysis&Retrv.2016 p.17
weight
mean
variance
GMM Fisher Vector VL_FEAT implementation
 GMM codebook
 For a K-component GMM, we only allow 3K parameters,
𝜋 𝑘, 𝜇 𝑘, 𝜎 𝑘 𝑘 = 1. . 𝐾}, i.e, iid Gaussian component
 Posterior prob of feature point xi to GMM component k
Z. Li, Image Analysis&Retrv.2016 p.18
Σ 𝑘 =
𝜎 𝑘 0 0 0
0 𝜎 𝑘 0 0
….
𝜎 𝑘
GMM Fisher Vector VL_FEAT implementation
 FV encoding
 Gradient on the mean, for GMM component k, j=1..D
 In the end, we have 2K x D aggregation on the derivation
w.r.t. the means and variances
Z. Li, Image Analysis&Retrv.2016 p.19
𝐹𝑉 = [𝑢1, 𝑢2,… , 𝑢 𝐾, 𝑣1, 𝑣2, … , 𝑣 𝐾]
VL_FEAT GMM/FV API
 Compute GMM model with VL_FEAT
 Prepare data:
numPoints = 1000 ; dimension = 2 ;
data = rand(dimension,N) ;
 Call vl_gmm:
numClusters = 30 ;
[means, covariances, priors] = vl_gmm(data, numClusters) ;
 Visualize:
figure ;
hold on ;
plot(data(1,:),data(2,:),'r.') ;
for i=1:numClusters
vl_plotframe([means(:,i)' sigmas(1,i) 0 sigmas(2,i)]);
end
Z. Li, Image Analysis&Retrv.2016 p.20
VL_FEAT API
 FV encoding
encoding = vl_fisher(datatoBeEncoded, means, covariances,
priors);
 Bonus points:
 Encode HoG features with Fisher Vector ?
 randomly collect 2~3 images from each class
 Stack all HoG features together into an n x 36 data matrix
 Compute its GMM
 Use this GMM to encode all image HoG features (other than
average)
Z. Li, Image Analysis&Retrv.2016 p.21
Super Vector Aggregation – Speaker ID
 Fisher Vector: Aggregates Features against a GMM
 Super Vector: Aggregates GMM against GMM
 Ref:
o William M. Campbell, Douglas E. Sturim, Douglas A. Reynolds: Support vector
machines using GMM supervectors for speaker verification. IEEE Signal Process. Lett.
13(5): 308-311(2006)
Z. Li, Image Analysis&Retrv.2016 p.22
“Yes, We Can !”
?
Super Vector from MFCC
 Motivated from Speaker ID work
 Speech is a continuousevolution of the vocal tract
 Need to extract a sequence of spectra or sequence of spectral coefficients
 Use a sliding window - 25 ms window, 10 ms shift
Z. Li, Image Analysis&Retrv.2016 p.23
DCTLog|X(ω)|
MFCC
GMM Model from MFCC
 GMM on MFCC feature
Z. Li, Image Analysis&Retrv.2016 p.24


M
j
s
j
s
j
s
j
s
pp
1
)()()()(
),|()|(  xx
• The acoustic vectors (MFCC) of speaker s is modeled by a
prob. density function parameterized by
M
j
s
j
s
j
s
j
s
1
)()()()(
},,{  
• Gaussian mixture model (GMM) for speaker s:
M
j
s
j
s
j
s
j
s
1
)()()()(
},,{  
Universal Background Model
 UBM GMM Model:
Z. Li, Image Analysis&Retrv.2016 p.25


M
j
jjj pp
1
)ubm()ubm()ubm()ubm(
),|()|(  xx
• The acoustic vectors of a general population is modeled by
another GMM called the universal background model
(UBM):
• Parameters of the UBM
M
jjjj 1
)ubm()ubm()ubm()ubm(
},,{  
MAP Adaption
 Given the UBM GMM, how is the new observation
derivate ?
 The adapted mean is given by:
Z. Li, Image Analysis&Retrv.2016 p.26
Supervector Distance
 Assuming we have UBM GMM model
𝜆 𝑈𝐵𝑀 = {𝑃𝑘, 𝜇 𝑘, Σ 𝑘},
with identical prior and covariance
Then for two utterance samples a and b, with GMM models
 𝜆 𝑎 = {𝑃𝑘, 𝜇 𝑘
𝑎
, Σ 𝑘},
 𝜆 𝑏 = {𝑃𝑘, 𝜇 𝑘
𝑏
,Σ 𝑘},
The SV distance is,
It means the means of two models need to be normalized by the UBM
covariance induced Mahanolibis distance metric
This is also a linear kernel function scaled by the UBM covariances
Z. Li, Image Analysis&Retrv.2016 p.27
𝐾 𝜆 𝑎, 𝜆 𝑏 =
𝑘
𝑃𝑘Σ 𝑘
−(
1
2
)
𝜇 𝑘
𝑎
𝑇
( 𝑃𝑘Σ 𝑘
−(
1
2
)
𝜇 𝑘
𝑏)
Supervector Performance in NIST Speaker ID
 System 5: Gaussian SV
 DCF (Detection Cost Function)
Z. Li, Image Analysis&Retrv.2016 p.28
m31491
AKULA – Adaptive KLUster Aggregation
2013/10/25
Abhishek Nagar, Zhu Li, Gaurav Srivastava and Kyungmo Park
Z. Li, Image Analysis&Retrv.2016 p.29
Outline
Motivation
Adaptive Aggregation
Results with TM7
Summary
Z. Li, Image Analysis&Retrv.2016 p.30
Motivation
Better Aggregation
 Fisher Vector and VLAD type aggregation depending on a
global model
 AKULA removes this dependence, and directly coding the
cluster centroids and sift count
 SCFV/RVD all having situations where clusters are turned off
due to no assignment, this can be avoided in AKULA
SIFTdetection & selection K-means AKULA description
Z. Li, Image Analysis&Retrv.2016 p.31
Motivation
Better Subspace Choice
 Both SCFV and RVD do fixed normalization and PCA
projection based on heuristic.
 What is the best possible subspace to do the aggregation ?
 Using a boosting scheme to keep adding subspaces and
aggregations in an iterative fashion, and tune TPR-FPR to
the desired operating points on FPR.
Z. Li, Image Analysis&Retrv.2016 p.32
CE2: AKULA – Adaptive KLUster Aggregation
AKULA Descriptor: cluster centroids +
SIFT count
A2={yc2
1, yc2
2, …, yc2
k ; pc2
1, pc2
2, …, pc2
k }
Distance metric:
 Min centroids distance, weighted
by SIFT count
d A1 ,A2 =
1
𝑘 𝑗=0
𝑘
d 𝑚𝑖𝑛
1
𝑗 𝑤 𝑚𝑖𝑛
1
(𝑗) +
1
𝑘 𝑖=0
𝑘
d 𝑚𝑖𝑛
2
𝑖 𝑤 𝑚𝑖𝑛
2
(𝑖)
A1={yc1
1, yc1
2, …, yc1
k ; pc1
1, pc1
2, …, pc1
k },
d 𝑚𝑖𝑛
1
𝑗 = min
𝑖
𝑑𝑗,𝑖
d 𝑚𝑖𝑛
2
𝑖 = min
𝑗
𝑑𝑗,𝑖
w 𝑚𝑖𝑛
1
𝑗 = 𝑤𝑗,𝑖∗ , 𝑖∗ = 𝑎𝑟𝑔min
𝑖
𝑑𝑗,𝑖
w 𝑚𝑖𝑛
2
𝑖 = 𝑤𝑗∗,𝑖, 𝑗∗ = 𝑎𝑟𝑔min
𝑗
𝑑𝑗,𝑖
Z. Li, Image Analysis&Retrv.2016 p.33
AKULA implementation in TM7
Inner loop aggregation
 Dimension is fixed at 8
 Numb of clusters, or nc=8, 16, 32, to hit 64, 128, and 256
bytes
 Quantization: scale by ½ and quantized to int8, sift count is
8 bits, total (nc+1)*dim bytes per aggregation
Z. Li, Image Analysis&Retrv.2016 p.34
AKULA implementation in TM7
Outer loop subspace optimization by boosting
 Initial set of subspace models {Ak} computed from MIR
FLICKR data set SIFT extractions by k-means the space to
4096 clusters
 Iterative search on subspaces to generate AKULA
aggregation that can improve performance in precision-
recall
 Notice that aggregation is de-coupled in subspace iteration,
to allow more DoF in aggregation, to find subspaces that
provides complimentary info.
The algorithm is still being debugged, hence only
having 1st iteration results in TM7
Z. Li, Image Analysis&Retrv.2016 p.35
AKULA implementation in TM7
Outer loop subspace optimization by boosting
 Initial set of subspace models {Ak} computed from MIR
FLICKR data set SIFT extractions by k-means the space to 4096
clusters
 Iterative search on subspaces to generate AKULA aggregation
that can improve performance in precision-recall
 Notice that aggregation is de-coupled in subspace iteration, to
allow more DoF in aggregation, to find subspaces that provides
complimentary info.
The algorithm is still being debugged, hence only having
1st iteration results in TM7
Indexing/Hashing is required for AKULA, it involves nc x
dim multiplications and additions at this time. A
binarization scheme will be considered once its
performance is optimized in non-binary form.
Z. Li, Image Analysis&Retrv.2016 p.36
GD Only TPR-FPR: AKULA vs SCFV
Data set 1:
 AKULA (128bytes, dim=8, nc=16) distance is just 1-way
dmin1.*wt
 Forcing a weighted sum on SCFV (512 bytes) hamming
distances without 2D decision fitting, i.e, count hamming
distance between common active clusters, and sum up their
distances
Z. Li, Image Analysis&Retrv.2016 p.37
GD Only TPR-FPR: AKULA vs SCFV
Data set 2, 3:
 AKULA distance is just 1-way dmin1.*wt
 AKULA=128bytes, SCFV = 512 bytes.
Z. Li, Image Analysis&Retrv.2016 p.38
3D object set: 4 , 5
Data set4, 5:
Z. Li, Image Analysis&Retrv.2016 p.39
AKULA in PM
FPR performance:
AKULA rates:
pm rates m akula rates
512 8 64
1K 16 128
2K 16 128
1K_4K 16 128
2K_4K 16 128
4K 16 128
8K 32 256
16K 32 256
Z. Li, Image Analysis&Retrv.2016 p.40
TPR@1% FPR
0
20
40
60
80
100
120
1a 1b 1c 2 3 4 5
TPR(%)
bitrate:512
TM7
AKULA
0
20
40
60
80
100
120
1a 1b 1c 2 3 4 5TPR(%)
bitrate:1k
TM7
AKULA
Z. Li, Image Analysis&Retrv.2016 p.41
TPR@1%FPR:
0
20
40
60
80
100
120
1a 1b 1c 2 3 4 5
TPR(%)
bitrate:2k
TM7
AKULA
0
20
40
60
80
100
120
1a 1b 1c 2 3 4 5
TPR(%)
bitrate:1k-4k
TM7
AKULA
Z. Li, Image Analysis&Retrv.2016 p.42
TPR@1%FPR:
0
20
40
60
80
100
120
1a 1b 1c 2 3 4 5
TPR(%)
bitrate:2k-4k
TM7
AKULA
0
20
40
60
80
100
120
1a 1b 1c 2 3 4 5
TPR(%)
bitrate:4k
TM7
AKULA
Z. Li, Image Analysis&Retrv.2016 p.43
TPR@1%FPR:
75
80
85
90
95
100
105
1a 1b 1c 2 3 4 5
TPR(%)
bitrate:8k
TM7
AKULA
80
85
90
95
100
105
1a 1b 1c 2 3 4 5
TPR(%)
bitrate:16k
TM7
AKULA
Z. Li, Image Analysis&Retrv.2016 p.44
AKULA Localization
Quite some improvements: 2.7%
Z. Li, Image Analysis&Retrv.2016 p.45
AKULA Summary
Benefits:
 Allow more DoF in aggregation optimization,
o by an outer loop boosting scheme for subspace projection optimization
o And an inner loop adaptive clustering without the constraint of the
global GMM model
 Simple weighted distance sum metric, with no need to tune a
multi-dimensional decision boundary
 The overall pair wise matching matched up with TM7 SCFV
with 2-dimensional decision boundary
 In GD only matching outperforms the TM7 GD
 Good improvements to the localization accuracy
 Light in extraction, but still heavy in pair wise matching, and
need binarization scheme and/or indexing scheme to work for
retrieval
 Future Improvements:
 SupervectorAKULA ?
Z. Li, Image Analysis&Retrv.2016 p.46
Lec 08 Summary
 Fisher Vector
 Aggregate features {Xk} in RD
against GMM
Super Vector
 Aggregate GMM against a global
GMM (UBM)
 AKULA
 Direct Aggregation
Z. Li, Image Analysis&Retrv.2016 p.47
+
+ + +

More Related Content

PPTX
Color fundamentals and color models - Digital Image Processing
PDF
Bayesian networks
PPTX
Fundamentals of Data compression
PDF
Asssignment problem
PPTX
Traveling salesman problem
PDF
High Performance Computer Architecture
PPTX
Associative memory network
Color fundamentals and color models - Digital Image Processing
Bayesian networks
Fundamentals of Data compression
Asssignment problem
Traveling salesman problem
High Performance Computer Architecture
Associative memory network

What's hot (20)

PPTX
Unit 3 - Harmony in the family and society
PPTX
Lecture 23 alpha beta pruning
PPTX
Lossless predictive coding in Digital Image Processing
DOC
Transportation problem_Operation research
PDF
AI 7 | Constraint Satisfaction Problem
PDF
Lecture 16 KL Transform in Image Processing
PPT
0/1 knapsack
PPTX
Understanding the human being as co existence of  self
PPT
KNOWLEDGE REPRESENTATION ISSUES.ppt
PPTX
Interconnection Network
PPTX
Random Forest
PPTX
Point processing
POTX
Presentation of Lossy compression
PDF
TCP over wireless slides
PDF
Bus structure in Computer Organization.pdf
PDF
Universal Approximation Theorem
PPTX
learn about Direct View Storage Tube.pptx
PPTX
Complement in DLD
PDF
Elements of visual perception
Unit 3 - Harmony in the family and society
Lecture 23 alpha beta pruning
Lossless predictive coding in Digital Image Processing
Transportation problem_Operation research
AI 7 | Constraint Satisfaction Problem
Lecture 16 KL Transform in Image Processing
0/1 knapsack
Understanding the human being as co existence of  self
KNOWLEDGE REPRESENTATION ISSUES.ppt
Interconnection Network
Random Forest
Point processing
Presentation of Lossy compression
TCP over wireless slides
Bus structure in Computer Organization.pdf
Universal Approximation Theorem
learn about Direct View Storage Tube.pptx
Complement in DLD
Elements of visual perception
Ad

Viewers also liked (13)

PDF
4 new-patch-agggregation.pptx
PDF
Timbral modeling for music artist recognition using i-vectors
PDF
ICME 2013
PDF
Mobile Visual Search: Object Re-Identification Against Large Repositories
PDF
A Survey about Object Retrieval
PDF
Lec16 subspace optimization
PDF
Lec11 rate distortion optimization
PDF
Lec07 aggregation-and-retrieval-system
PPTX
Application of Fisher Linear Discriminant Analysis to Speech/Music Classifica...
PDF
Factor analysis
PPTX
Voice Identification And Recognition System, Matlab
PDF
Image Retrieval with Fisher Vectors of Binary Features (MIRU'14)
PDF
Speaker Recognition System using MFCC and Vector Quantization Approach
4 new-patch-agggregation.pptx
Timbral modeling for music artist recognition using i-vectors
ICME 2013
Mobile Visual Search: Object Re-Identification Against Large Repositories
A Survey about Object Retrieval
Lec16 subspace optimization
Lec11 rate distortion optimization
Lec07 aggregation-and-retrieval-system
Application of Fisher Linear Discriminant Analysis to Speech/Music Classifica...
Factor analysis
Voice Identification And Recognition System, Matlab
Image Retrieval with Fisher Vectors of Binary Features (MIRU'14)
Speaker Recognition System using MFCC and Vector Quantization Approach
Ad

Similar to Lec-08 Feature Aggregation II: Fisher Vector, AKULA and Super Vector (20)

PDF
Lec12 review-part-i
PPT
16 17 bag_words
PDF
bag-of-words models
PDF
PDF
5 DimensionalityReduction.pdf
PDF
Pca analysis
PDF
Linear Discriminant Analysis for Human Face Recognition
PDF
Nonlinear component analysis as a kernel eigenvalue problem
PPT
siftppthttps://www.youtube.com/watch?v=ckftH9saonM.ppt
PDF
Machine learning (11)
PDF
Cs229 notes10
PPTX
DimensionalityReduction.pptx
PDF
Math behind the kernels
PDF
23 Machine Learning Feature Generation
PPTX
2010 ICML
PDF
Linearity of Feature Extraction Techniques for Medical Images by using Scale ...
PDF
Paper id 21201483
PPT
Fisher Kernel based Relevance Feedback for Multimodal Video Retrieval
PDF
Pattern learning and recognition on statistical manifolds: An information-geo...
PPT
PPT
Lec12 review-part-i
16 17 bag_words
bag-of-words models
5 DimensionalityReduction.pdf
Pca analysis
Linear Discriminant Analysis for Human Face Recognition
Nonlinear component analysis as a kernel eigenvalue problem
siftppthttps://www.youtube.com/watch?v=ckftH9saonM.ppt
Machine learning (11)
Cs229 notes10
DimensionalityReduction.pptx
Math behind the kernels
23 Machine Learning Feature Generation
2010 ICML
Linearity of Feature Extraction Techniques for Medical Images by using Scale ...
Paper id 21201483
Fisher Kernel based Relevance Feedback for Multimodal Video Retrieval
Pattern learning and recognition on statistical manifolds: An information-geo...
PPT

More from United States Air Force Academy (10)

PDF
Lec14 eigenface and fisherface
PDF
Lec15 graph laplacian embedding
PDF
Lec17 sparse signal processing & applications
PDF
Lec-03 Entropy Coding I: Hoffmann & Golomb Codes
PDF
Multimedia Communication Lec02: Info Theory and Entropy
PDF
ECE 4490 Multimedia Communication Lec01
PDF
Tutorial on MPEG CDVS/CDVA Standardization at ICNITS L3 Meeting
PDF
Light Weight Fingerprinting for Video Playback Verification in MPEG DASH
PDF
Subspace Indexing on Grassmannian Manifold for Large Scale Visual Identification
PDF
Scaled Eigen Appearance and Likelihood Prunning for Large Scale Video Duplica...
Lec14 eigenface and fisherface
Lec15 graph laplacian embedding
Lec17 sparse signal processing & applications
Lec-03 Entropy Coding I: Hoffmann & Golomb Codes
Multimedia Communication Lec02: Info Theory and Entropy
ECE 4490 Multimedia Communication Lec01
Tutorial on MPEG CDVS/CDVA Standardization at ICNITS L3 Meeting
Light Weight Fingerprinting for Video Playback Verification in MPEG DASH
Subspace Indexing on Grassmannian Manifold for Large Scale Visual Identification
Scaled Eigen Appearance and Likelihood Prunning for Large Scale Video Duplica...

Recently uploaded (20)

PDF
A systematic review of self-coping strategies used by university students to ...
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PDF
Classroom Observation Tools for Teachers
PDF
102 student loan defaulters named and shamed – Is someone you know on the list?
PPTX
Cell Structure & Organelles in detailed.
PDF
STATICS OF THE RIGID BODIES Hibbelers.pdf
PPTX
Institutional Correction lecture only . . .
PPTX
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
PPTX
Tissue processing ( HISTOPATHOLOGICAL TECHNIQUE
PDF
RMMM.pdf make it easy to upload and study
PPTX
Pharmacology of Heart Failure /Pharmacotherapy of CHF
PPTX
Cell Types and Its function , kingdom of life
PPTX
master seminar digital applications in india
PPTX
Lesson notes of climatology university.
PDF
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PDF
O5-L3 Freight Transport Ops (International) V1.pdf
PDF
Microbial disease of the cardiovascular and lymphatic systems
PDF
GENETICS IN BIOLOGY IN SECONDARY LEVEL FORM 3
PDF
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
A systematic review of self-coping strategies used by university students to ...
Final Presentation General Medicine 03-08-2024.pptx
Classroom Observation Tools for Teachers
102 student loan defaulters named and shamed – Is someone you know on the list?
Cell Structure & Organelles in detailed.
STATICS OF THE RIGID BODIES Hibbelers.pdf
Institutional Correction lecture only . . .
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
Tissue processing ( HISTOPATHOLOGICAL TECHNIQUE
RMMM.pdf make it easy to upload and study
Pharmacology of Heart Failure /Pharmacotherapy of CHF
Cell Types and Its function , kingdom of life
master seminar digital applications in india
Lesson notes of climatology university.
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
Final Presentation General Medicine 03-08-2024.pptx
O5-L3 Freight Transport Ops (International) V1.pdf
Microbial disease of the cardiovascular and lymphatic systems
GENETICS IN BIOLOGY IN SECONDARY LEVEL FORM 3
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape

Lec-08 Feature Aggregation II: Fisher Vector, AKULA and Super Vector

  • 1. Image Analysis & Retrieval CS/EE 5590 Special Topics (Class Ids: 44873,44874) Fall 2016,M/W 4-5:15pm@Bloch0012 Lec 08 Feature Aggregation II: Fisher Vector, Super Vector and AKULA Zhu Li Dept of CSEE, UMKC Office: FH560E,Email: lizhu@umkc.edu, Ph: x 2346. http://l.web.umkc.edu/lizhu p.1Z. Li, Image Analysis&Retrv.2016
  • 2. Outline  ReCap of Lecture 07  Image Retrieval System  BoW  VLAD  Dense SIFT  Fisher Vector Aggregation  AKULA  Summary Z. Li, Image Analysis&Retrv.2016 p.2
  • 3. Precision, Recall, F-measure Precision, TPR = TP/(TP + FP), Recall = TP/(TP + FN),  FPR=FP/(TP+FP) F-measure = 2*(precision*recall)/(precision + recall) Precision: is the probability that a retrieved document is relevant. Recall: is the probability that a relevant document is retrieved in a search. Z. Li, Image Analysis&Retrv.2016 p.3
  • 4. Why Aggregation ?  Curse of Dimensionality Decision Boundary / Indexing Z. Li, Image Analysis&Retrv.2016 p.4 + …..
  • 5. Bag-of-Words: Histogram Coding Codebook:  Feature space: Rd, k-means to get k centroids, {𝜇1, 𝜇2, … , 𝜇 𝑘}  BoW Hard Encoding:  For n feature points,{x1, x2, …,xn} assignment matrix: kxn, with column only 1-non zero entry  Aggregated dimension: k Z. Li, Image Analysis&Retrv.2016 p.5 k n
  • 6. Kernel Code Book Soft Encoding Kernel Code Book Soft Encoding  Kernel Affinity: 𝐾 𝑥𝑗, 𝜇 𝑘 = 𝑒−𝑘|𝑥 𝑗−𝜇 𝑘|2  Assignment Matrix: 𝐴𝑗,𝑘 = 𝐾(𝑥𝑗, 𝜇 𝑘)/ 𝑘 𝐾(𝑥𝑗, 𝜇 𝑘)  Encoding: k-dimensional: X(k)= 1 𝑛 𝑗 𝐴𝑗,𝑘 Z. Li, Image Analysis&Retrv.2016 p.6
  • 7. VLAD- Vector of Locally Aggregated Descriptors  Aggregate feature difference from the codebook  Hard assignment by finding the NN of feature {xk} to {𝜇 𝑘}  Compute aggregated differences  L2 normalize  Final feature: k x d Z. Li, Image Analysis&Retrv.2016 p.7  3 x v1 v2 v3 v4 v5 1  4  2  5 ① assign descriptors ② compute x-  i ③ vi=sum x-  i for cell i 𝑣 𝑘 = ∀𝑗,𝑠.𝑡.𝑁𝑁 𝑥 𝑗 =𝜇 𝑘 𝑥𝑗 − 𝜇 𝑘 𝑣 𝑘 = 𝑣 𝑘/ 𝑣 𝑘 2
  • 8. VLAD on SIFT  Example of aggregating SIFT with VLAD  K=16 codebook entries  Each cell is a SIFT visualized as centroids in blue, and VLAD difference in red  Top row: left image, bottom row: right image, red: code book, blue: encoded VLAD Z. Li, Image Analysis&Retrv.2016 p.8
  • 9. Outline  ReCap of Lecture 07  Image Retrieval System  BoW  VLAD  Dense SIFT  Fisher Vector Aggregation  AKULA  Summary Z. Li, Image Analysis&Retrv.2016 p.9
  • 10. One more trick  Recall that SIFT is a powerful descriptor  VL_FEAT: vl_dsift  A dense description of image by computing SIFT descriptor (no spatial-scale space extrema detection) at predetermined grid  Supplement HoG as an alternative texture descriptor Z. Li, Image Analysis&Retrv.2016 p.10
  • 11. VL_FEAT: vl_dsift  Compute dense SIFT as a texture descriptor for the image  [f, dsift]=vl_dsift(single(rgb2gray(im)), ‘step’, 2);  There’s also a FAST option  [f, dsift]=vl_dsift(single(rgb2gray(im)), ‘fast’, ‘step’, 2);  Huge amount of SIFT data will be generated Z. Li, Image Analysis&Retrv.2016 p.11
  • 12. Fisher Vector  Fisher Vector and variations:  Winning in image classification:  Winning in the MPEG object re-identification: o SCFV(Scalable Coded Fisher Vec) in CDVS Z. Li, Image Analysis&Retrv.2016 p.12
  • 13. Codebook: Gaussian Mixture Model (GMM)  GMM is a generative model to express data  Assuming data is generated from with parameters {𝑤 𝑘, 𝜇 𝑘, 𝜎 𝑘} Z. Li, Image Analysis&Retrv.2016 p.13 𝑥 𝑘 ~ 𝑘=1 𝐾 𝑤 𝑘 𝑁(𝜇 𝑘, 𝜎 𝑘) 𝑁 𝜇 𝑘, 𝜎 𝑘 = 1 2𝜋 𝑑 2 Σ 𝑘 1/2 𝑒− 1 2 𝑥−𝜇 𝑘 ′Σ 𝑘 −1 (𝑥−𝜇 𝑘)
  • 14. A bit of Theory: Fisher Kernel Encode the derivation from the generative model  Observed feature set, {x1, x2, …,xn} in Rd, e.g, d=128 for SIFT.  How’s these observations derivate from the given GMM model with a set of parameter, 𝜆 = 𝑤 𝑘, 𝜇 𝑘, 𝜎 𝑘 ? o i.e, how the parameter, e.g, mean will move to best fit the observation ? Z. Li, Image Analysis&Retrv.2016 p.14 𝜇4 𝜇3 𝜇2 𝜇1 X1 +
  • 15. A bit of Theory: Fisher Kernel Score function w.r.t. the likelihood function 𝜇 𝜆(𝑋)  𝐺𝜆 𝑋 = 𝛻𝜆 log 𝑢 𝜆(𝑋): derivative on the log likelihood  The dimension of score function is m, where m is the number of generative model parameters, m=3 for GMM  Given the observed data X, score function indicate how likelihood function parameter (e.g, mean) should move to better fit the data. Distance/Derivation of two observation X, Y w.r.t the generative model  Fisher Info Matrix (roughly the covariance in the Mahanolibis distance) 𝐹𝜆 = 𝐸 𝑋 𝐺𝜆 𝑋 𝐺𝜆 𝑋′  Fisher Kernel Distance: normalized by the Fisher Info Matrix: Z. Li, Image Analysis&Retrv.2016 p.15 𝐾𝐹𝐾 𝑋, 𝑌 = 𝐺𝜆 𝑋′ 𝐹𝜆 −1 𝐺𝜆 𝑋
  • 16. Fisher Vector  KFK(X, Y) is a measure of similarity, w.r.t. the generative model  Similar to the Mahanolibis distance case, we can decompose this kernel as,  That give us a kernel feature mappingof X to Fisher Vector  For observed images features {xt}, can be computed as, Z. Li, Image Analysis&Retrv.2016 p.16 𝐾𝐹𝐾 𝑋, 𝑌 = 𝐺𝜆 𝑋′ 𝐹𝜆 −1 𝐺𝜆 𝑋 = 𝐺𝜆 𝑋′ 𝐿 𝜆′𝐿 𝜆 𝐺𝜆 𝑋
  • 17. GMM Fisher Vector Encode the derivation from the generative model  Observed feature set, {x1, x2, …,xn} in Rd, e.g, d=128 (!) for SIFT.  How’s these observations derivate from the given GMM model with a set of parameter, 𝜃 = 𝑎 𝑘, 𝜇 𝑘, 𝜎 𝑘 ?  GMM Log Likelihood Gradient  Let 𝑤 𝑘 = 𝑒 𝑎 𝑘 𝑗 𝑒 𝑎 𝑗 , Then we have Z. Li, Image Analysis&Retrv.2016 p.17 weight mean variance
  • 18. GMM Fisher Vector VL_FEAT implementation  GMM codebook  For a K-component GMM, we only allow 3K parameters, 𝜋 𝑘, 𝜇 𝑘, 𝜎 𝑘 𝑘 = 1. . 𝐾}, i.e, iid Gaussian component  Posterior prob of feature point xi to GMM component k Z. Li, Image Analysis&Retrv.2016 p.18 Σ 𝑘 = 𝜎 𝑘 0 0 0 0 𝜎 𝑘 0 0 …. 𝜎 𝑘
  • 19. GMM Fisher Vector VL_FEAT implementation  FV encoding  Gradient on the mean, for GMM component k, j=1..D  In the end, we have 2K x D aggregation on the derivation w.r.t. the means and variances Z. Li, Image Analysis&Retrv.2016 p.19 𝐹𝑉 = [𝑢1, 𝑢2,… , 𝑢 𝐾, 𝑣1, 𝑣2, … , 𝑣 𝐾]
  • 20. VL_FEAT GMM/FV API  Compute GMM model with VL_FEAT  Prepare data: numPoints = 1000 ; dimension = 2 ; data = rand(dimension,N) ;  Call vl_gmm: numClusters = 30 ; [means, covariances, priors] = vl_gmm(data, numClusters) ;  Visualize: figure ; hold on ; plot(data(1,:),data(2,:),'r.') ; for i=1:numClusters vl_plotframe([means(:,i)' sigmas(1,i) 0 sigmas(2,i)]); end Z. Li, Image Analysis&Retrv.2016 p.20
  • 21. VL_FEAT API  FV encoding encoding = vl_fisher(datatoBeEncoded, means, covariances, priors);  Bonus points:  Encode HoG features with Fisher Vector ?  randomly collect 2~3 images from each class  Stack all HoG features together into an n x 36 data matrix  Compute its GMM  Use this GMM to encode all image HoG features (other than average) Z. Li, Image Analysis&Retrv.2016 p.21
  • 22. Super Vector Aggregation – Speaker ID  Fisher Vector: Aggregates Features against a GMM  Super Vector: Aggregates GMM against GMM  Ref: o William M. Campbell, Douglas E. Sturim, Douglas A. Reynolds: Support vector machines using GMM supervectors for speaker verification. IEEE Signal Process. Lett. 13(5): 308-311(2006) Z. Li, Image Analysis&Retrv.2016 p.22 “Yes, We Can !” ?
  • 23. Super Vector from MFCC  Motivated from Speaker ID work  Speech is a continuousevolution of the vocal tract  Need to extract a sequence of spectra or sequence of spectral coefficients  Use a sliding window - 25 ms window, 10 ms shift Z. Li, Image Analysis&Retrv.2016 p.23 DCTLog|X(ω)| MFCC
  • 24. GMM Model from MFCC  GMM on MFCC feature Z. Li, Image Analysis&Retrv.2016 p.24   M j s j s j s j s pp 1 )()()()( ),|()|(  xx • The acoustic vectors (MFCC) of speaker s is modeled by a prob. density function parameterized by M j s j s j s j s 1 )()()()( },,{   • Gaussian mixture model (GMM) for speaker s: M j s j s j s j s 1 )()()()( },,{  
  • 25. Universal Background Model  UBM GMM Model: Z. Li, Image Analysis&Retrv.2016 p.25   M j jjj pp 1 )ubm()ubm()ubm()ubm( ),|()|(  xx • The acoustic vectors of a general population is modeled by another GMM called the universal background model (UBM): • Parameters of the UBM M jjjj 1 )ubm()ubm()ubm()ubm( },,{  
  • 26. MAP Adaption  Given the UBM GMM, how is the new observation derivate ?  The adapted mean is given by: Z. Li, Image Analysis&Retrv.2016 p.26
  • 27. Supervector Distance  Assuming we have UBM GMM model 𝜆 𝑈𝐵𝑀 = {𝑃𝑘, 𝜇 𝑘, Σ 𝑘}, with identical prior and covariance Then for two utterance samples a and b, with GMM models  𝜆 𝑎 = {𝑃𝑘, 𝜇 𝑘 𝑎 , Σ 𝑘},  𝜆 𝑏 = {𝑃𝑘, 𝜇 𝑘 𝑏 ,Σ 𝑘}, The SV distance is, It means the means of two models need to be normalized by the UBM covariance induced Mahanolibis distance metric This is also a linear kernel function scaled by the UBM covariances Z. Li, Image Analysis&Retrv.2016 p.27 𝐾 𝜆 𝑎, 𝜆 𝑏 = 𝑘 𝑃𝑘Σ 𝑘 −( 1 2 ) 𝜇 𝑘 𝑎 𝑇 ( 𝑃𝑘Σ 𝑘 −( 1 2 ) 𝜇 𝑘 𝑏)
  • 28. Supervector Performance in NIST Speaker ID  System 5: Gaussian SV  DCF (Detection Cost Function) Z. Li, Image Analysis&Retrv.2016 p.28
  • 29. m31491 AKULA – Adaptive KLUster Aggregation 2013/10/25 Abhishek Nagar, Zhu Li, Gaurav Srivastava and Kyungmo Park Z. Li, Image Analysis&Retrv.2016 p.29
  • 30. Outline Motivation Adaptive Aggregation Results with TM7 Summary Z. Li, Image Analysis&Retrv.2016 p.30
  • 31. Motivation Better Aggregation  Fisher Vector and VLAD type aggregation depending on a global model  AKULA removes this dependence, and directly coding the cluster centroids and sift count  SCFV/RVD all having situations where clusters are turned off due to no assignment, this can be avoided in AKULA SIFTdetection & selection K-means AKULA description Z. Li, Image Analysis&Retrv.2016 p.31
  • 32. Motivation Better Subspace Choice  Both SCFV and RVD do fixed normalization and PCA projection based on heuristic.  What is the best possible subspace to do the aggregation ?  Using a boosting scheme to keep adding subspaces and aggregations in an iterative fashion, and tune TPR-FPR to the desired operating points on FPR. Z. Li, Image Analysis&Retrv.2016 p.32
  • 33. CE2: AKULA – Adaptive KLUster Aggregation AKULA Descriptor: cluster centroids + SIFT count A2={yc2 1, yc2 2, …, yc2 k ; pc2 1, pc2 2, …, pc2 k } Distance metric:  Min centroids distance, weighted by SIFT count d A1 ,A2 = 1 𝑘 𝑗=0 𝑘 d 𝑚𝑖𝑛 1 𝑗 𝑤 𝑚𝑖𝑛 1 (𝑗) + 1 𝑘 𝑖=0 𝑘 d 𝑚𝑖𝑛 2 𝑖 𝑤 𝑚𝑖𝑛 2 (𝑖) A1={yc1 1, yc1 2, …, yc1 k ; pc1 1, pc1 2, …, pc1 k }, d 𝑚𝑖𝑛 1 𝑗 = min 𝑖 𝑑𝑗,𝑖 d 𝑚𝑖𝑛 2 𝑖 = min 𝑗 𝑑𝑗,𝑖 w 𝑚𝑖𝑛 1 𝑗 = 𝑤𝑗,𝑖∗ , 𝑖∗ = 𝑎𝑟𝑔min 𝑖 𝑑𝑗,𝑖 w 𝑚𝑖𝑛 2 𝑖 = 𝑤𝑗∗,𝑖, 𝑗∗ = 𝑎𝑟𝑔min 𝑗 𝑑𝑗,𝑖 Z. Li, Image Analysis&Retrv.2016 p.33
  • 34. AKULA implementation in TM7 Inner loop aggregation  Dimension is fixed at 8  Numb of clusters, or nc=8, 16, 32, to hit 64, 128, and 256 bytes  Quantization: scale by ½ and quantized to int8, sift count is 8 bits, total (nc+1)*dim bytes per aggregation Z. Li, Image Analysis&Retrv.2016 p.34
  • 35. AKULA implementation in TM7 Outer loop subspace optimization by boosting  Initial set of subspace models {Ak} computed from MIR FLICKR data set SIFT extractions by k-means the space to 4096 clusters  Iterative search on subspaces to generate AKULA aggregation that can improve performance in precision- recall  Notice that aggregation is de-coupled in subspace iteration, to allow more DoF in aggregation, to find subspaces that provides complimentary info. The algorithm is still being debugged, hence only having 1st iteration results in TM7 Z. Li, Image Analysis&Retrv.2016 p.35
  • 36. AKULA implementation in TM7 Outer loop subspace optimization by boosting  Initial set of subspace models {Ak} computed from MIR FLICKR data set SIFT extractions by k-means the space to 4096 clusters  Iterative search on subspaces to generate AKULA aggregation that can improve performance in precision-recall  Notice that aggregation is de-coupled in subspace iteration, to allow more DoF in aggregation, to find subspaces that provides complimentary info. The algorithm is still being debugged, hence only having 1st iteration results in TM7 Indexing/Hashing is required for AKULA, it involves nc x dim multiplications and additions at this time. A binarization scheme will be considered once its performance is optimized in non-binary form. Z. Li, Image Analysis&Retrv.2016 p.36
  • 37. GD Only TPR-FPR: AKULA vs SCFV Data set 1:  AKULA (128bytes, dim=8, nc=16) distance is just 1-way dmin1.*wt  Forcing a weighted sum on SCFV (512 bytes) hamming distances without 2D decision fitting, i.e, count hamming distance between common active clusters, and sum up their distances Z. Li, Image Analysis&Retrv.2016 p.37
  • 38. GD Only TPR-FPR: AKULA vs SCFV Data set 2, 3:  AKULA distance is just 1-way dmin1.*wt  AKULA=128bytes, SCFV = 512 bytes. Z. Li, Image Analysis&Retrv.2016 p.38
  • 39. 3D object set: 4 , 5 Data set4, 5: Z. Li, Image Analysis&Retrv.2016 p.39
  • 40. AKULA in PM FPR performance: AKULA rates: pm rates m akula rates 512 8 64 1K 16 128 2K 16 128 1K_4K 16 128 2K_4K 16 128 4K 16 128 8K 32 256 16K 32 256 Z. Li, Image Analysis&Retrv.2016 p.40
  • 41. TPR@1% FPR 0 20 40 60 80 100 120 1a 1b 1c 2 3 4 5 TPR(%) bitrate:512 TM7 AKULA 0 20 40 60 80 100 120 1a 1b 1c 2 3 4 5TPR(%) bitrate:1k TM7 AKULA Z. Li, Image Analysis&Retrv.2016 p.41
  • 42. TPR@1%FPR: 0 20 40 60 80 100 120 1a 1b 1c 2 3 4 5 TPR(%) bitrate:2k TM7 AKULA 0 20 40 60 80 100 120 1a 1b 1c 2 3 4 5 TPR(%) bitrate:1k-4k TM7 AKULA Z. Li, Image Analysis&Retrv.2016 p.42
  • 43. TPR@1%FPR: 0 20 40 60 80 100 120 1a 1b 1c 2 3 4 5 TPR(%) bitrate:2k-4k TM7 AKULA 0 20 40 60 80 100 120 1a 1b 1c 2 3 4 5 TPR(%) bitrate:4k TM7 AKULA Z. Li, Image Analysis&Retrv.2016 p.43
  • 44. TPR@1%FPR: 75 80 85 90 95 100 105 1a 1b 1c 2 3 4 5 TPR(%) bitrate:8k TM7 AKULA 80 85 90 95 100 105 1a 1b 1c 2 3 4 5 TPR(%) bitrate:16k TM7 AKULA Z. Li, Image Analysis&Retrv.2016 p.44
  • 45. AKULA Localization Quite some improvements: 2.7% Z. Li, Image Analysis&Retrv.2016 p.45
  • 46. AKULA Summary Benefits:  Allow more DoF in aggregation optimization, o by an outer loop boosting scheme for subspace projection optimization o And an inner loop adaptive clustering without the constraint of the global GMM model  Simple weighted distance sum metric, with no need to tune a multi-dimensional decision boundary  The overall pair wise matching matched up with TM7 SCFV with 2-dimensional decision boundary  In GD only matching outperforms the TM7 GD  Good improvements to the localization accuracy  Light in extraction, but still heavy in pair wise matching, and need binarization scheme and/or indexing scheme to work for retrieval  Future Improvements:  SupervectorAKULA ? Z. Li, Image Analysis&Retrv.2016 p.46
  • 47. Lec 08 Summary  Fisher Vector  Aggregate features {Xk} in RD against GMM Super Vector  Aggregate GMM against a global GMM (UBM)  AKULA  Direct Aggregation Z. Li, Image Analysis&Retrv.2016 p.47 + + + +