SlideShare a Scribd company logo
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 06 Issue: 05 | May 2019 www.irjet.net p-ISSN: 2395-0072
© 2018, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 5082
Multiple Feature Fusion for Facial Expression Recognition in
Video: Survey
Shritika Wayker1, Mrs. V. L. Kolhe2
1P.G. Student, Department of Computer Engineering, DYPCOE, Pune, Maharashtra, India
2Assistant Professor, Department of Computer Engineering, DYPCOE, Pune, Maharshtra, India
----------------------------------------------------------------------***---------------------------------------------------------------------
Abstract - The most common way humans can express
their emotions is through facial gestures/motions. Recently
facial expression recognition based on video has been a long
standing issue. The key for an effective facial expression
recognition framework is to explore the various media
modalities and different features to successfully describe the
facial appearance and identify the changes caused due to
different facial gestures.
This survey exploits the existing methods for automatically
segmenting and recognizing human’s facial expression from
video sequences.
Key Words: Facial expression recognition, Facial
gestures, videos, multiple feature fusion, HOG- Histograms
of oriented gradients.
1. INTRODUCTION
Facial expression is a capable nonverbal channel that plays
a key role for individuals to pass on feelings and transmit
messages. Automatic facial expression recognition is an
interesting and challenging problem which has important
applications in many areas like human-computer
interaction [4]. It helps to build more intelligent robots
which have the ability to understand human emotions.
Automatic facial expression recognition (AFEC) can be
broadly connected in numerous fields, for example,
restorative evaluation, lie location and human PC
interaction. Ekman in early 1970s discriminated that there
are six universal emotional expressions across all cultures,
namely disgust, anger, happiness, sadness, surprise and
fear. These expressions could be identified by observing
the face signals [7]. Due to the importance of facial
expression recognition in designing human–computer
interaction systems, various feature extraction and
machine learning algorithms have been developed. Most of
these methods deployed hand-crafted features followed by
a classifier such as local binary pattern with SVM
classification Haar SIFT, and Gabor filters with fisher
linear discriminant and also Local phase quantization
(LPQ).
The recent success of convolutional neural networks
(CNNs) in tasks like image classification [4] has been
extended to the problem of facial expression recognition
[8]. Unlike traditional machine learning and computer
vision approaches where features are defined by hand,
CNN learns to extract the features directly from the
training database using iterative algorithms like gradient
descent. CNN is usually combined with feed-forward
neural network classifier which makes the model end-to-
end trainable on the dataset [5].
Facial expression recognition by aggregating various CNN
and SIFT models that achieves a state of art results on
different datasets like FER-2013 CK+, and etc.
Fig - 1: Examples of CK+ and FER-2013 datasets [5].
2. LITERATURE REVIEW
Junkai Chen[1], proposed “Facial Expression Recognition
in Video with Multiple Feature Fusion”, introduces the
visual modalities (face images) and audio modalities
(speech). A new feature descriptor called Histogram of
Oriented Gradients from Three Orthogonal Planes (HOG-
TOP) is invented to extract dynamic textures from video
sequels to characterize facial appearance changes. A new
effective geometric feature derived from the warp
transformation of facial vantage point is used to capture
facial configuration changes. Moreover, the lead of audio
modalities on recognition is also explored. The multiple
feature fusion to implement the video-based facial
expression detection problem under workshop
environment and in the wild.
Mundher Al-Shabi[5], proposed “Facial Expression
Recognition Using a Hybrid CNN–SIFT Aggregator”,
deriving an effective facial expression recognition
component is important for a successful human-computer
interaction system. Nonetheless, recognizing facial
expression remains a challenging task. This paper
describes a novel approach towards facial expression
detection task. This method is motivated by the success of
Convolutional Neural Networks (CNN) on the face
recognizance problem. Unlike other works, the main focus
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 05 Issue: 12 | Dec 2018 www.irjet.net p-ISSN: 2395-0072
© 2018, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 5083
is on achieving good definiteness while requiring only a
small sample data for training. Scale Invariant Feature
Transform (SIFT) features are used to increase the
accomplishment on small data as SIFT does not require
expanded training data to generate useful features. Both
Dense SIFT and regular SIFT are examined and compared
when merged with CNN features.
Varun Kumar Singhal[3], proposed “Multiple Feature
Fusion for Facial Expression Recognition in video”,
introduces video based facial expression recognition has
been a long standing issue and pulled in developing
consideration as of late. Thekey to a fruitful facial
expression recognition framework is to abuse the
possibilities of varying media modalities and plan vigorous
features to successfully portray the facial appearance and
setup changes caused by facial movements. The
investigation, of both visual modalities (confront pictures)
and sound modalities (discourse) are utilized. Another
feature descriptor called Histogram of Oriented Gradients
from Three Orthogonal Planes (HOG-TOP) is proposed to
extract dynamic surfaces from video successions to
portray facial appearance changes.
Xiaohua Huang[14], proposed “Dynamic Facial Expression
Recognition Using Boosted Component-based
Spatiotemporal Features and Multi-Classifier Fusion”,
introduces the component-based facial expression
recognition method by utilizing the spatiotemporal
features extracted from dynamic image sequences, where
the spatiotemporal features are extracted from facial areas
centered at 38 detected fiducially interest points.
Considering that not all features are important to the facial
expression recognition, the Ada Boost algorithm to select
the most discriminative features for expression
recognition. Moreover, based on median rule, mean rule,
and product rule of the classifier fusion strategy, also
present a framework for multi-classifier fusion to improve
the expression classification accuracy.
CigdemTuran, describes an emotion-based feature fusion
method using the Discriminant-Analysis of Canonical
Correlations (DCC) for facial expression recognition. There
have been many image features or descriptors proposed
for facial expression recognition. For the different
features, they may be more accurate for the recognition of
different expressions. The four effective descriptors for
facial expression representation, namely Local Binary
Pattern (LBP), Local Phase Quantization (LPQ), Weber
Local Descriptor (WLD), and Pyramid of Histogram of
Oriented Gradients (PHOG), are treated. Supervised
Locality Preserving Projection (SLPP) is applied to the
corresponding features for dimensionality reduction and
manifold learning. Analysis shows that descriptors are
also sensitive to the conditions of images, such as race,
lighting, pose, etc. Thus, an adaptive descriptor selection
algorithm is used, which determines the best two features
for each expression class on a given training set. These
two features are dissolve, so as to achieve a higher
recognition rate for each expression.
WeiFeng Liu[15], proposed “Facial Expression Recognition
Based on Fusion of Multiple Gabor Features”, accomplish
subject-independent facial expression recognition task, a
multiple Gabor features situated facial expression
recognition method. Different channels of Gabor filters
have different contributions on the facial expression
recognition and analytical combination of these features
can improve the performance of a facial expression
recognition system. NN based data fusion method is
designed for facial expression recognition. Analysis show
that the facial expression recognition rate can be
modernized by using multiple channel features and neural
network fusion.
3. TECHNIQUES USED
3.1 Histogram of Oriented Gradients from Three
Orthogonal Planes:-
Histograms of oriented gradients (HOG) [9] were first
determined for human discovery. The fundamental
thought of HOG is that local protest appearance and shape
can often be described somewhat well by the assignment
of local force gradients or edge headings. HOG is touchy to
question misshapenness. Outward appearances are caused
by facial muscle developments. For instance, mouth
opening and cocked eyebrows will produce an unexpected
outward appearance. These developments could be
viewed as kinds of misshapenness. HOG can successfully
catch and speak to these deformation. Be that as it may,
the first HOG is deminished to manage a static image [3].In
request to display dynamic surfaces from a video
succession with HOG, stretch out HOG to 3-D to process
the oriented gradients on three orthogonal planes XY, XT,
and YT (TOP),i.e. HOG-TOP.
3.2 Geometric Warp Feature
Facial expressions are caused by facial muscle
developments. These movements result in the relocations
of the facial landmarks. Here assumption is done that each
face picture comprises of numerous sub-regions. These
sub-districts can be framed with triangles with their
vertexes situated at facial landmarks, as appeared. The
displacements of facial landmarks cause the impairment of
the triangles. To use the distortions to represent facial
arrangement changes.[1] Facial demeanor can be
considered as a dynamic process including beginning,
pinnacle and offset. The displacement of the comparing
facial landmarks between onsets(neutral face) and
pinnacle (expressive face).
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 05 Issue: 12 | Dec 2018 www.irjet.net p-ISSN: 2395-0072
© 2018, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 5084
Fig- 2 : Facial landmarks describe the shape of a face [3].
3.3 Convolutional Neural Networks (CNNs):-
The network consists of six convolution layers, three Max-
Pooling layers, followed by two crammed fully connected
layers. Each time Max-Pooling is added, the number of the
next convolution filtrate doubles. The number of
convolution filters is 64, 128, and 256, respectively. The
window size of the filters is 3x3. Max pooling layers with a
stride of size 2x2 is placed after each of two convolutional
layers. Max-Pooling is used to encapsulate the filter area
which is considered as a type of non-linear down-
sampling. Max-Pooling is helpful in providing a form of
translation invariance and it reduces the computation for
the deeper layers. To contain the spatial size of the output
volumes, zero-padding is added around the borders. The
output of the convolution layers is formed and fed to the
dense layer. The dense layer consists of 2048 neurons
linked as a fully connected layer. Each of the Max-Pooling
and dense layers is pursued by a dropout layer to reduce
the risk of network over-fitting by preventing co-
adaptation of the feature extractor. Finally, a softmax layer
with seven outputs is placed at the final stage of the
network.
The threshold value 20 is selected using the FER 2013
validation set. Leaky ReLU is chosen over ordinary ReLU
to solve the dying ReLU problem. Instead of giving zero
when x < 0, leaky ReLU will provide a small negative slope.
Besides, its de-rivatives is not zero which make the
network learns faster than ReLU.
3.4 Multiple Feature Fusion:-
Highlights from various modalities can make distinctive
contributions. Traditional SVM connects diverse features
into a solitary component vector and constructed a
solitary portion for all these distinctive highlights. In any
case, developing a portion for each kind of highlights and
incorporating these bits optimally can upgrade the
discriminative energy of these features.
The consider in demonstrated that utilizing numerous
pieces with different sorts of highlights can enhance the
execution of SVM. A numerous portion SVM is intended to
learn both the decision limits between information from
various classes and the bit mix weights through a solitary
optimization problem.
3.5 Scale Invariant Feature Transform (SIFT):-
SIFT features are used to increase the execution on small
data as SIFT does not require extensive training data to
generate useful features. SIFT key point of objects are first
extracted from a set of reference images in order to avoid
from computing all points in an image[2]. It is thus found
that the search of interest points or lands in facial images
is more important to component-based approach. The
SIFT descriptor is resolute by partitioning the image into
4x4 squares. For each of the 16 squares, a vector length of
8. By merging all the vectors, a vector of size 128 for every
key-point. To use the key-point descriptors in
classification, a vector of fixed-fixed-size is needed [5].
Fig- 3: - SIFT and its key-points [2].
4. EXISTING SYSTEM
4.1 Facial Expression Recognition from Videos:-
Traditionally, video-based recognition problems used per
frame features such as SIFT, dense-SIFT, HOG and recently
deep features extracted with CNNs have been used. The
per-frame features are then used to allocate score to each
individual frame. Summary statistics of such perframe
features are then used for facial expression recognition.
Modification of Inception architecture to capture action
unit activation which can be beneficial for facial
expression recognition. Various techniques to capture the
secular evolution of the per-features. For example, LSTMs
have been successfully employed with various names such
as CNN-RNN, CNN-BRNN etc.
3D convolutional neural networks have also been used for
facial expression recognition. However, performance of a
single 3D-ConvNet was poor than applying LSTMs on per-
frame features [9]. State-of-art result reported in was
obtained by score fusion of multiple models of 3D-
ConvNets and CNN-RNNs. Covariance matrix
representation was used as one of the summary statistics
of per-frame features in[10] Kernel based partial least
squares (PLS) were then used for recognition. Here, the
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 05 Issue: 12 | Dec 2018 www.irjet.net p-ISSN: 2395-0072
© 2018, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 5085
methods in as baseline and use the SPD Riemannian
networks instead of kernel based PLS for recognition and
obtain slight improvement.
5. CONCLUSION
In this work, a survey of several methods for expression
recognition from video. A new feature descriptor called
Histogram of Oriented Gradients (HOG) is used to extract
dynamic textures from video sequences to characterize
facial appearance changes. The different methodologies
like Geometric Warp Feature, CNN, Multiple Feature
Fusion and etc are briefly described. Multiple Feature
Fusion has highest feature for segmenting and recognizing
human facial expression from video sequences.
REFERENCES
[1] Junkai Chen, Zenghai Chen, Zheru Chi, and Hong Fu,
“Facial Expression Recognition in Video with Multiple
Feature Fusion” 2016 IEEE.
[2] Berretti, S. et al.: “A Set of Selected SIFT Features for
3D Facial Expression Recog-nition”. In: 2010 20th
International Conference on Pattern Recognition (ICPR).
pp. 4125–4128 (2010).
[3] Varun Kumar Singhal “Multiple Feature Fusion for
Facial Expression Recognition in video” International
Journal of Electronics Engineering2018.
[4] He, K. et al.: “Deep Residual Learning for Image
Recognition”. ArXiv151203385 Cs. (2015).
[5] Mundher Al-Shabi, Wooi Ping Cheah, Tee Connie
“Facial Expression Recognition Using a Hybrid CNN–SIFT
Aggregator”
[6] Ilke C¸ ugu, Eren S¸ ener, EmreAkbas “MicroExpNet: An
Extremely Small and Fast Model For Expression
Recognition From Frontal Face Images”
arXiv:1711.07011v2 [cs.CV] 13 Aug 2018.
[7] Ekman, P., Friesen, W.V.: “Constants Across Cultures in
the Face and Emotion”. J. Pers. Soc. Psychol. 17, 2, 124–129
(1971).
[8] Kim, B.-K. et al.: “Hierarchical Committee of Deep
Convolutional Neural Net-works for Robust Facial
Expression Recognition”. J. Multimodal User Interfaces. 10,
2, 173–189 (2016).
[9] N. Dalal and B. Triggs,”Histograms of Oriented
Gradients forHuman Detection,” IEEE Conference on
Computer Vision and PatternRecognition, 2005, pp. 886-
893.
[10] M. Liu, R. Wang, S. Li, S. Shan, Z. Huang, and X.
Chen.”Combining multiple kernel methods on Riemannian
manifold for emotion recognition in the wild”. In
Proceedings of the 16th International Conference on
Multimodal Interaction, ICMI ’14, pages 494–501, New
York, NY, USA, 2014. ACM.
[11] C. F. Benitez-Quiroz, R. Srinivasan, and A. M. Martinez.
“Emotionet: An accurate, real-time algorithm for the
automatic annotation of a million facial expressions in the
wild”. In 2016 IEEE Conference on Computer Vision and
Pattern Recognition (CVPR), pages 5562–5570, June 2016.
[12] T. Kanade, J. F. Cohn, and Y. Tian. “Comprehensive
database for facial expression analysis”. In Proceedings
Fourth IEEE International Conference on Automatic Face
and Gesture Recognition (Cat. No. PR00580), pages 46–53,
2000.
[13] Dinesh Acharya, Zhiwu Huang, DandaPaniPaudel, Luc
Van Gool “Covariance Pooling for Facial Expression
Recognition” IEEE 2018.
[14] Xiaohua Huang, Guoying Zhao, MattiPietik¨ainen,
Wenming Zheng “Dynamic Facial Expression Recognition
Using Boosted Component-based Spatiotemporal Features
and Multi-Classifier Fusion”
[15] WeiFeng Liu, ZengFu Wang “Facial Expression
Recognition Based on Fusion of Multiple Gabor Features”
2006 IEEE.

More Related Content

PDF
IRJET- A Survey on Facial Expression Recognition Robust to Partial Occlusion
PDF
IRJET- Prediction of Facial Attribute without Landmark Information
PDF
An Efficient Face Recognition Using Multi-Kernel Based Scale Invariant Featur...
PDF
Facial Expression Recognition System Based on SVM and HOG Techniques
PDF
Facial Expression Recognition Using Local Binary Pattern and Support Vector M...
PDF
IRJET-Facial Expression Recognition using Efficient LBP and CNN
DOC
Facial expression identification by using features of salient facial landmarks
PDF
Face Recognition based on STWT and DTCWT using two dimensional Q-shift Filters
IRJET- A Survey on Facial Expression Recognition Robust to Partial Occlusion
IRJET- Prediction of Facial Attribute without Landmark Information
An Efficient Face Recognition Using Multi-Kernel Based Scale Invariant Featur...
Facial Expression Recognition System Based on SVM and HOG Techniques
Facial Expression Recognition Using Local Binary Pattern and Support Vector M...
IRJET-Facial Expression Recognition using Efficient LBP and CNN
Facial expression identification by using features of salient facial landmarks
Face Recognition based on STWT and DTCWT using two dimensional Q-shift Filters

What's hot (18)

PDF
Facial Expression Recognition Using SVM Classifier
PDF
Different Viewpoints of Recognizing Fleeting Facial Expressions with DWT
PDF
Paper id 29201416
PDF
Multi Local Feature Selection Using Genetic Algorithm For Face Identification
PDF
A04430105
PDF
IRJET- A Review on Face Recognition using Local Binary Pattern Algorithm
PDF
Feature Fusion and Classifier Ensemble Technique for Robust Face Recognition
PDF
A Review on Face Detection under Occlusion by Facial Accessories
PDF
Facial Expression Detection for video sequences using local feature extractio...
PDF
J01116164
PDF
IRJET- A Review on Face Detection and Expression Recognition
PDF
H0334749
PDF
A Novel Approach of Fuzzy Based Semi-Automatic Annotation for Similar Facial ...
PDF
An Accurate Facial Component Detection Using Gabor Filter
PDF
Pose and Illumination in Face Recognition Using Enhanced Gabor LBP & PCA
PDF
Face Recognition for Human Identification using BRISK Feature and Normal Dist...
PDF
Facial expression using 3 d animation
PDF
Facial expression using 3 d animation
Facial Expression Recognition Using SVM Classifier
Different Viewpoints of Recognizing Fleeting Facial Expressions with DWT
Paper id 29201416
Multi Local Feature Selection Using Genetic Algorithm For Face Identification
A04430105
IRJET- A Review on Face Recognition using Local Binary Pattern Algorithm
Feature Fusion and Classifier Ensemble Technique for Robust Face Recognition
A Review on Face Detection under Occlusion by Facial Accessories
Facial Expression Detection for video sequences using local feature extractio...
J01116164
IRJET- A Review on Face Detection and Expression Recognition
H0334749
A Novel Approach of Fuzzy Based Semi-Automatic Annotation for Similar Facial ...
An Accurate Facial Component Detection Using Gabor Filter
Pose and Illumination in Face Recognition Using Enhanced Gabor LBP & PCA
Face Recognition for Human Identification using BRISK Feature and Normal Dist...
Facial expression using 3 d animation
Facial expression using 3 d animation
Ad

Similar to IRJET- Multiple Feature Fusion for Facial Expression Recognition in Video: Survey (20)

PDF
Survey on Facial Expression Analysis and Recognition
PPTX
An Enhanced Independent Component-Based Human Facial Expression Recognition ...
PDF
Review of facial expression recognition system and
PDF
Review of facial expression recognition system and used datasets
DOC
Facial expression identification by using features of salient facial landmarks
PDF
Lc3420022006
PDF
Efficient Face Expression Recognition Methods FER A Literature Review
PDF
Efficient facial expression_recognition_algorithm_based_on_hierarchical_deep_...
PDF
Emotion Recognition from Facial Expression Based on Fiducial Points Detection...
PDF
A Literature Review On Emotion Recognition System Using Various Facial Expres...
PDF
Ijariie1177
PDF
A Study of Method in Facial Emotional Recognitation
PDF
IRJET - A Survey on Human Facial Expression Recognition Techniques
PDF
Feature extraction and classification methods of facial expression: A surey
PDF
IRJET- Facial Expression Recognition
PDF
Facial expression recongnition Techniques, Database and Classifiers
PDF
Facial Expression Identification System
PDF
Facial emotion recognition
PDF
Real time facial expression analysis using pca
PDF
Facial Expression Recognition Using Enhanced Deep 3D Convolutional Neural Net...
Survey on Facial Expression Analysis and Recognition
An Enhanced Independent Component-Based Human Facial Expression Recognition ...
Review of facial expression recognition system and
Review of facial expression recognition system and used datasets
Facial expression identification by using features of salient facial landmarks
Lc3420022006
Efficient Face Expression Recognition Methods FER A Literature Review
Efficient facial expression_recognition_algorithm_based_on_hierarchical_deep_...
Emotion Recognition from Facial Expression Based on Fiducial Points Detection...
A Literature Review On Emotion Recognition System Using Various Facial Expres...
Ijariie1177
A Study of Method in Facial Emotional Recognitation
IRJET - A Survey on Human Facial Expression Recognition Techniques
Feature extraction and classification methods of facial expression: A surey
IRJET- Facial Expression Recognition
Facial expression recongnition Techniques, Database and Classifiers
Facial Expression Identification System
Facial emotion recognition
Real time facial expression analysis using pca
Facial Expression Recognition Using Enhanced Deep 3D Convolutional Neural Net...
Ad

More from IRJET Journal (20)

PDF
Enhanced heart disease prediction using SKNDGR ensemble Machine Learning Model
PDF
Utilizing Biomedical Waste for Sustainable Brick Manufacturing: A Novel Appro...
PDF
Kiona – A Smart Society Automation Project
PDF
DESIGN AND DEVELOPMENT OF BATTERY THERMAL MANAGEMENT SYSTEM USING PHASE CHANG...
PDF
Invest in Innovation: Empowering Ideas through Blockchain Based Crowdfunding
PDF
SPACE WATCH YOUR REAL-TIME SPACE INFORMATION HUB
PDF
A Review on Influence of Fluid Viscous Damper on The Behaviour of Multi-store...
PDF
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
PDF
Explainable AI(XAI) using LIME and Disease Detection in Mango Leaf by Transfe...
PDF
BRAIN TUMOUR DETECTION AND CLASSIFICATION
PDF
The Project Manager as an ambassador of the contract. The case of NEC4 ECC co...
PDF
"Enhanced Heat Transfer Performance in Shell and Tube Heat Exchangers: A CFD ...
PDF
Advancements in CFD Analysis of Shell and Tube Heat Exchangers with Nanofluid...
PDF
Breast Cancer Detection using Computer Vision
PDF
Auto-Charging E-Vehicle with its battery Management.
PDF
Analysis of high energy charge particle in the Heliosphere
PDF
A Novel System for Recommending Agricultural Crops Using Machine Learning App...
PDF
Auto-Charging E-Vehicle with its battery Management.
PDF
Analysis of high energy charge particle in the Heliosphere
PDF
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
Enhanced heart disease prediction using SKNDGR ensemble Machine Learning Model
Utilizing Biomedical Waste for Sustainable Brick Manufacturing: A Novel Appro...
Kiona – A Smart Society Automation Project
DESIGN AND DEVELOPMENT OF BATTERY THERMAL MANAGEMENT SYSTEM USING PHASE CHANG...
Invest in Innovation: Empowering Ideas through Blockchain Based Crowdfunding
SPACE WATCH YOUR REAL-TIME SPACE INFORMATION HUB
A Review on Influence of Fluid Viscous Damper on The Behaviour of Multi-store...
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
Explainable AI(XAI) using LIME and Disease Detection in Mango Leaf by Transfe...
BRAIN TUMOUR DETECTION AND CLASSIFICATION
The Project Manager as an ambassador of the contract. The case of NEC4 ECC co...
"Enhanced Heat Transfer Performance in Shell and Tube Heat Exchangers: A CFD ...
Advancements in CFD Analysis of Shell and Tube Heat Exchangers with Nanofluid...
Breast Cancer Detection using Computer Vision
Auto-Charging E-Vehicle with its battery Management.
Analysis of high energy charge particle in the Heliosphere
A Novel System for Recommending Agricultural Crops Using Machine Learning App...
Auto-Charging E-Vehicle with its battery Management.
Analysis of high energy charge particle in the Heliosphere
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...

Recently uploaded (20)

PPTX
Welding lecture in detail for understanding
DOCX
573137875-Attendance-Management-System-original
PPTX
IOT PPTs Week 10 Lecture Material.pptx of NPTEL Smart Cities contd
PDF
R24 SURVEYING LAB MANUAL for civil enggi
PPTX
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
PPT
Mechanical Engineering MATERIALS Selection
PPTX
Lecture Notes Electrical Wiring System Components
PDF
Automation-in-Manufacturing-Chapter-Introduction.pdf
PDF
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
PDF
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
PPTX
web development for engineering and engineering
PDF
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
PDF
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
PDF
Well-logging-methods_new................
PPTX
Foundation to blockchain - A guide to Blockchain Tech
PPTX
UNIT-1 - COAL BASED THERMAL POWER PLANTS
PPTX
UNIT 4 Total Quality Management .pptx
PDF
composite construction of structures.pdf
PPTX
bas. eng. economics group 4 presentation 1.pptx
PDF
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
Welding lecture in detail for understanding
573137875-Attendance-Management-System-original
IOT PPTs Week 10 Lecture Material.pptx of NPTEL Smart Cities contd
R24 SURVEYING LAB MANUAL for civil enggi
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
Mechanical Engineering MATERIALS Selection
Lecture Notes Electrical Wiring System Components
Automation-in-Manufacturing-Chapter-Introduction.pdf
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
web development for engineering and engineering
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
Well-logging-methods_new................
Foundation to blockchain - A guide to Blockchain Tech
UNIT-1 - COAL BASED THERMAL POWER PLANTS
UNIT 4 Total Quality Management .pptx
composite construction of structures.pdf
bas. eng. economics group 4 presentation 1.pptx
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026

IRJET- Multiple Feature Fusion for Facial Expression Recognition in Video: Survey

  • 1. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 06 Issue: 05 | May 2019 www.irjet.net p-ISSN: 2395-0072 © 2018, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 5082 Multiple Feature Fusion for Facial Expression Recognition in Video: Survey Shritika Wayker1, Mrs. V. L. Kolhe2 1P.G. Student, Department of Computer Engineering, DYPCOE, Pune, Maharashtra, India 2Assistant Professor, Department of Computer Engineering, DYPCOE, Pune, Maharshtra, India ----------------------------------------------------------------------***--------------------------------------------------------------------- Abstract - The most common way humans can express their emotions is through facial gestures/motions. Recently facial expression recognition based on video has been a long standing issue. The key for an effective facial expression recognition framework is to explore the various media modalities and different features to successfully describe the facial appearance and identify the changes caused due to different facial gestures. This survey exploits the existing methods for automatically segmenting and recognizing human’s facial expression from video sequences. Key Words: Facial expression recognition, Facial gestures, videos, multiple feature fusion, HOG- Histograms of oriented gradients. 1. INTRODUCTION Facial expression is a capable nonverbal channel that plays a key role for individuals to pass on feelings and transmit messages. Automatic facial expression recognition is an interesting and challenging problem which has important applications in many areas like human-computer interaction [4]. It helps to build more intelligent robots which have the ability to understand human emotions. Automatic facial expression recognition (AFEC) can be broadly connected in numerous fields, for example, restorative evaluation, lie location and human PC interaction. Ekman in early 1970s discriminated that there are six universal emotional expressions across all cultures, namely disgust, anger, happiness, sadness, surprise and fear. These expressions could be identified by observing the face signals [7]. Due to the importance of facial expression recognition in designing human–computer interaction systems, various feature extraction and machine learning algorithms have been developed. Most of these methods deployed hand-crafted features followed by a classifier such as local binary pattern with SVM classification Haar SIFT, and Gabor filters with fisher linear discriminant and also Local phase quantization (LPQ). The recent success of convolutional neural networks (CNNs) in tasks like image classification [4] has been extended to the problem of facial expression recognition [8]. Unlike traditional machine learning and computer vision approaches where features are defined by hand, CNN learns to extract the features directly from the training database using iterative algorithms like gradient descent. CNN is usually combined with feed-forward neural network classifier which makes the model end-to- end trainable on the dataset [5]. Facial expression recognition by aggregating various CNN and SIFT models that achieves a state of art results on different datasets like FER-2013 CK+, and etc. Fig - 1: Examples of CK+ and FER-2013 datasets [5]. 2. LITERATURE REVIEW Junkai Chen[1], proposed “Facial Expression Recognition in Video with Multiple Feature Fusion”, introduces the visual modalities (face images) and audio modalities (speech). A new feature descriptor called Histogram of Oriented Gradients from Three Orthogonal Planes (HOG- TOP) is invented to extract dynamic textures from video sequels to characterize facial appearance changes. A new effective geometric feature derived from the warp transformation of facial vantage point is used to capture facial configuration changes. Moreover, the lead of audio modalities on recognition is also explored. The multiple feature fusion to implement the video-based facial expression detection problem under workshop environment and in the wild. Mundher Al-Shabi[5], proposed “Facial Expression Recognition Using a Hybrid CNN–SIFT Aggregator”, deriving an effective facial expression recognition component is important for a successful human-computer interaction system. Nonetheless, recognizing facial expression remains a challenging task. This paper describes a novel approach towards facial expression detection task. This method is motivated by the success of Convolutional Neural Networks (CNN) on the face recognizance problem. Unlike other works, the main focus
  • 2. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 05 Issue: 12 | Dec 2018 www.irjet.net p-ISSN: 2395-0072 © 2018, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 5083 is on achieving good definiteness while requiring only a small sample data for training. Scale Invariant Feature Transform (SIFT) features are used to increase the accomplishment on small data as SIFT does not require expanded training data to generate useful features. Both Dense SIFT and regular SIFT are examined and compared when merged with CNN features. Varun Kumar Singhal[3], proposed “Multiple Feature Fusion for Facial Expression Recognition in video”, introduces video based facial expression recognition has been a long standing issue and pulled in developing consideration as of late. Thekey to a fruitful facial expression recognition framework is to abuse the possibilities of varying media modalities and plan vigorous features to successfully portray the facial appearance and setup changes caused by facial movements. The investigation, of both visual modalities (confront pictures) and sound modalities (discourse) are utilized. Another feature descriptor called Histogram of Oriented Gradients from Three Orthogonal Planes (HOG-TOP) is proposed to extract dynamic surfaces from video successions to portray facial appearance changes. Xiaohua Huang[14], proposed “Dynamic Facial Expression Recognition Using Boosted Component-based Spatiotemporal Features and Multi-Classifier Fusion”, introduces the component-based facial expression recognition method by utilizing the spatiotemporal features extracted from dynamic image sequences, where the spatiotemporal features are extracted from facial areas centered at 38 detected fiducially interest points. Considering that not all features are important to the facial expression recognition, the Ada Boost algorithm to select the most discriminative features for expression recognition. Moreover, based on median rule, mean rule, and product rule of the classifier fusion strategy, also present a framework for multi-classifier fusion to improve the expression classification accuracy. CigdemTuran, describes an emotion-based feature fusion method using the Discriminant-Analysis of Canonical Correlations (DCC) for facial expression recognition. There have been many image features or descriptors proposed for facial expression recognition. For the different features, they may be more accurate for the recognition of different expressions. The four effective descriptors for facial expression representation, namely Local Binary Pattern (LBP), Local Phase Quantization (LPQ), Weber Local Descriptor (WLD), and Pyramid of Histogram of Oriented Gradients (PHOG), are treated. Supervised Locality Preserving Projection (SLPP) is applied to the corresponding features for dimensionality reduction and manifold learning. Analysis shows that descriptors are also sensitive to the conditions of images, such as race, lighting, pose, etc. Thus, an adaptive descriptor selection algorithm is used, which determines the best two features for each expression class on a given training set. These two features are dissolve, so as to achieve a higher recognition rate for each expression. WeiFeng Liu[15], proposed “Facial Expression Recognition Based on Fusion of Multiple Gabor Features”, accomplish subject-independent facial expression recognition task, a multiple Gabor features situated facial expression recognition method. Different channels of Gabor filters have different contributions on the facial expression recognition and analytical combination of these features can improve the performance of a facial expression recognition system. NN based data fusion method is designed for facial expression recognition. Analysis show that the facial expression recognition rate can be modernized by using multiple channel features and neural network fusion. 3. TECHNIQUES USED 3.1 Histogram of Oriented Gradients from Three Orthogonal Planes:- Histograms of oriented gradients (HOG) [9] were first determined for human discovery. The fundamental thought of HOG is that local protest appearance and shape can often be described somewhat well by the assignment of local force gradients or edge headings. HOG is touchy to question misshapenness. Outward appearances are caused by facial muscle developments. For instance, mouth opening and cocked eyebrows will produce an unexpected outward appearance. These developments could be viewed as kinds of misshapenness. HOG can successfully catch and speak to these deformation. Be that as it may, the first HOG is deminished to manage a static image [3].In request to display dynamic surfaces from a video succession with HOG, stretch out HOG to 3-D to process the oriented gradients on three orthogonal planes XY, XT, and YT (TOP),i.e. HOG-TOP. 3.2 Geometric Warp Feature Facial expressions are caused by facial muscle developments. These movements result in the relocations of the facial landmarks. Here assumption is done that each face picture comprises of numerous sub-regions. These sub-districts can be framed with triangles with their vertexes situated at facial landmarks, as appeared. The displacements of facial landmarks cause the impairment of the triangles. To use the distortions to represent facial arrangement changes.[1] Facial demeanor can be considered as a dynamic process including beginning, pinnacle and offset. The displacement of the comparing facial landmarks between onsets(neutral face) and pinnacle (expressive face).
  • 3. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 05 Issue: 12 | Dec 2018 www.irjet.net p-ISSN: 2395-0072 © 2018, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 5084 Fig- 2 : Facial landmarks describe the shape of a face [3]. 3.3 Convolutional Neural Networks (CNNs):- The network consists of six convolution layers, three Max- Pooling layers, followed by two crammed fully connected layers. Each time Max-Pooling is added, the number of the next convolution filtrate doubles. The number of convolution filters is 64, 128, and 256, respectively. The window size of the filters is 3x3. Max pooling layers with a stride of size 2x2 is placed after each of two convolutional layers. Max-Pooling is used to encapsulate the filter area which is considered as a type of non-linear down- sampling. Max-Pooling is helpful in providing a form of translation invariance and it reduces the computation for the deeper layers. To contain the spatial size of the output volumes, zero-padding is added around the borders. The output of the convolution layers is formed and fed to the dense layer. The dense layer consists of 2048 neurons linked as a fully connected layer. Each of the Max-Pooling and dense layers is pursued by a dropout layer to reduce the risk of network over-fitting by preventing co- adaptation of the feature extractor. Finally, a softmax layer with seven outputs is placed at the final stage of the network. The threshold value 20 is selected using the FER 2013 validation set. Leaky ReLU is chosen over ordinary ReLU to solve the dying ReLU problem. Instead of giving zero when x < 0, leaky ReLU will provide a small negative slope. Besides, its de-rivatives is not zero which make the network learns faster than ReLU. 3.4 Multiple Feature Fusion:- Highlights from various modalities can make distinctive contributions. Traditional SVM connects diverse features into a solitary component vector and constructed a solitary portion for all these distinctive highlights. In any case, developing a portion for each kind of highlights and incorporating these bits optimally can upgrade the discriminative energy of these features. The consider in demonstrated that utilizing numerous pieces with different sorts of highlights can enhance the execution of SVM. A numerous portion SVM is intended to learn both the decision limits between information from various classes and the bit mix weights through a solitary optimization problem. 3.5 Scale Invariant Feature Transform (SIFT):- SIFT features are used to increase the execution on small data as SIFT does not require extensive training data to generate useful features. SIFT key point of objects are first extracted from a set of reference images in order to avoid from computing all points in an image[2]. It is thus found that the search of interest points or lands in facial images is more important to component-based approach. The SIFT descriptor is resolute by partitioning the image into 4x4 squares. For each of the 16 squares, a vector length of 8. By merging all the vectors, a vector of size 128 for every key-point. To use the key-point descriptors in classification, a vector of fixed-fixed-size is needed [5]. Fig- 3: - SIFT and its key-points [2]. 4. EXISTING SYSTEM 4.1 Facial Expression Recognition from Videos:- Traditionally, video-based recognition problems used per frame features such as SIFT, dense-SIFT, HOG and recently deep features extracted with CNNs have been used. The per-frame features are then used to allocate score to each individual frame. Summary statistics of such perframe features are then used for facial expression recognition. Modification of Inception architecture to capture action unit activation which can be beneficial for facial expression recognition. Various techniques to capture the secular evolution of the per-features. For example, LSTMs have been successfully employed with various names such as CNN-RNN, CNN-BRNN etc. 3D convolutional neural networks have also been used for facial expression recognition. However, performance of a single 3D-ConvNet was poor than applying LSTMs on per- frame features [9]. State-of-art result reported in was obtained by score fusion of multiple models of 3D- ConvNets and CNN-RNNs. Covariance matrix representation was used as one of the summary statistics of per-frame features in[10] Kernel based partial least squares (PLS) were then used for recognition. Here, the
  • 4. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 05 Issue: 12 | Dec 2018 www.irjet.net p-ISSN: 2395-0072 © 2018, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 5085 methods in as baseline and use the SPD Riemannian networks instead of kernel based PLS for recognition and obtain slight improvement. 5. CONCLUSION In this work, a survey of several methods for expression recognition from video. A new feature descriptor called Histogram of Oriented Gradients (HOG) is used to extract dynamic textures from video sequences to characterize facial appearance changes. The different methodologies like Geometric Warp Feature, CNN, Multiple Feature Fusion and etc are briefly described. Multiple Feature Fusion has highest feature for segmenting and recognizing human facial expression from video sequences. REFERENCES [1] Junkai Chen, Zenghai Chen, Zheru Chi, and Hong Fu, “Facial Expression Recognition in Video with Multiple Feature Fusion” 2016 IEEE. [2] Berretti, S. et al.: “A Set of Selected SIFT Features for 3D Facial Expression Recog-nition”. In: 2010 20th International Conference on Pattern Recognition (ICPR). pp. 4125–4128 (2010). [3] Varun Kumar Singhal “Multiple Feature Fusion for Facial Expression Recognition in video” International Journal of Electronics Engineering2018. [4] He, K. et al.: “Deep Residual Learning for Image Recognition”. ArXiv151203385 Cs. (2015). [5] Mundher Al-Shabi, Wooi Ping Cheah, Tee Connie “Facial Expression Recognition Using a Hybrid CNN–SIFT Aggregator” [6] Ilke C¸ ugu, Eren S¸ ener, EmreAkbas “MicroExpNet: An Extremely Small and Fast Model For Expression Recognition From Frontal Face Images” arXiv:1711.07011v2 [cs.CV] 13 Aug 2018. [7] Ekman, P., Friesen, W.V.: “Constants Across Cultures in the Face and Emotion”. J. Pers. Soc. Psychol. 17, 2, 124–129 (1971). [8] Kim, B.-K. et al.: “Hierarchical Committee of Deep Convolutional Neural Net-works for Robust Facial Expression Recognition”. J. Multimodal User Interfaces. 10, 2, 173–189 (2016). [9] N. Dalal and B. Triggs,”Histograms of Oriented Gradients forHuman Detection,” IEEE Conference on Computer Vision and PatternRecognition, 2005, pp. 886- 893. [10] M. Liu, R. Wang, S. Li, S. Shan, Z. Huang, and X. Chen.”Combining multiple kernel methods on Riemannian manifold for emotion recognition in the wild”. In Proceedings of the 16th International Conference on Multimodal Interaction, ICMI ’14, pages 494–501, New York, NY, USA, 2014. ACM. [11] C. F. Benitez-Quiroz, R. Srinivasan, and A. M. Martinez. “Emotionet: An accurate, real-time algorithm for the automatic annotation of a million facial expressions in the wild”. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 5562–5570, June 2016. [12] T. Kanade, J. F. Cohn, and Y. Tian. “Comprehensive database for facial expression analysis”. In Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580), pages 46–53, 2000. [13] Dinesh Acharya, Zhiwu Huang, DandaPaniPaudel, Luc Van Gool “Covariance Pooling for Facial Expression Recognition” IEEE 2018. [14] Xiaohua Huang, Guoying Zhao, MattiPietik¨ainen, Wenming Zheng “Dynamic Facial Expression Recognition Using Boosted Component-based Spatiotemporal Features and Multi-Classifier Fusion” [15] WeiFeng Liu, ZengFu Wang “Facial Expression Recognition Based on Fusion of Multiple Gabor Features” 2006 IEEE.